Tag: xml

Saxon Performance

In this paper I have presented 10 characteristics of the Saxon XQuery implementation that contribute to its performance, and for most of these, I have attempted to quantify the size of that contribution for some selected queries. Few of these mechanisms are unique to Saxon; what makes Saxon distinctive is the deployment of a balanced portfolio of techniques to deliver efficient query execution over a variety of user workloads, coupled with a determination to place other qualities of the product (standards conformance, reliability, usability) ahead of raw performance. In a crowded marketplace with over 50 XQuery implementations competing for user attention, I believe it is this balanced approach that has led many users to make Saxon their preferred choice.

Google hates XML

I just came across an article that announced Google open sourced their ‘Protocol buffers’ but decided NOT to use XML. They claim they could not use XML because ‘it isn’t going to be efficient enough for this scale’. WTF??? If this statement came from someone else, I would understand, but these guys are supposed to KNOW markup. Their solution is supposedly “20-100x faster” – which I refer to as “Lies, damned lies, and statistics”. I bet I could make XML run circles around their system just by simplifying their schema.

what happened to oreilly standards? recently, they seem to have a lot of clueless “contributors”.

VTD-XML

The world’s most memory-efficient (1.3x~1.5x the size of an XML document) random-access XML parser. The world’s fastest XML parser: VTD-XML outperforms DOM parsers by 5x~12x, delivering 150~250 MB/sec per core sustained throughput. The world’s fastest XPath 1.0 implementation.

claims to be way faster than SAX. need to check out

Use RELAX NG

I have recently recommended to a large publishing client that they adopt RELAX NG as the basis of the formal definitions of their content, in preference to W3C XML Schema Definition Language (WXS). There are lots of individual bits of information on why RELAX NG should be preferred all over the web. Here is an attempt to condense some of the key information into 10 points

A better spec means better interoperability

Availability of a compact syntax

The specification is a stable ISO standard

No PSVI

No content defaulting

Better datatyping support

More sophisticated modelling

More sophisticated grammatical validation

Instances have no dependency

Growing consensus

now hopefully kml will get cured of the XSD disease

Microsoft Markup

I just hope that these are typos, otherwise it would certainly look like really weird XML. If they are typos, I wonder how many there are in the spec? Surely the guys at Ecma had enough time to review this? Why am I now suppose to act as a junior sub editor? I would understand if it was a missing quote or full-stop here or there. But 3 quite major glitches within consecutive 3 lines? Thats quite scary.

VML the undead, and the depths of ms office xml

XSD harmful

It is obvious that DTDs are a non-starter when judged against these criteria. I think it’s also obvious that Schematron does very well. I would claim that RELAX NG also does well here, and is better in this respect than other grammar-based schema language, in particular XSD. First, it carefully avoids anything that ties a document to a single schema

there’s nothing like xsi:schemaLocation or DOCTYPE declarations
there’s nothing that ties a particular namespace name to a particular schema; from RELAX NG’s perspective, a namespace name is just a label
there’s nothing in RELAX NG that changes a document’s infosetSecond, it has powerful features for expressing loose/open schemas:

it supports full regular tree grammars, with no ambiguity restrictions
it provides namespace-based wildcards for names in element and attribute patterns
it provides name classes with a name class difference operator

another in the ever growing litany why XSD sucks

But how do you interop with a world that uses XSD as the wire format for contracts? The minimum is to create a tool that can take a TEDI schema with XML annotations and generate an XSD. There’ll be limits because of the limited power of XSD (and these will need to be taken into consideration in designing the TEDI XML binding): some of the constraints of the TEDI schema might not be captured by the XSD. But that’s a normal situation: there are often complex constraints on an XML document being interchanged that cannot be expressed in XSD.

James Clark we need a new kind of schema language, and I concur.

Mindmap standard?

Eric Blue has A Call To Action: The Need For A Common Mind Map File Format. He provides some very good reasons for such a format, in particular the ability to share mindmaps on the web. Before going any further, I should say I take sharing things on the web as meaning with deep, granular access to the data/content, not just placing an opaque blob file on a server. Before getting to the format, there’s an underlying problem that may be harder. There isn’t as yet a sharable model of what a mindmap is.

maybe one day. mindmaps have been a long time coming but are still nowhere

XSLT Coordinate Extraction

cute

XSD schema for KML 2.1

finally! i haven’t looked at this in detail yet, but this has been necessary for a long long time. the KML out there is pretty much “everything goes”, sadly.

Validation considered harmful

Different consumers have different constraints as showstoppers, and that it is inefficient, frustrating and wasteful for your input to barf on constraints that don’t affect you in particular.