Infosets are pipes

xml applications are infoset pipelines. henry thompson, w3c

one of the most powerful concepts in computer science is arguably abstraction. by building on abstraction, it has been somewhat possible to manage complexity. productivity advanced or new approaches are more often than not based on abstracting existing ideas and piling on them.

xml is such a story as well. like everyone and their brother i long thought xml was basically a nice way to do markup, sort of html done right. which led to the question what all the hype was about. back then i attributed the hype to the ground breaking insight that simplicity matters. other than that, i failed to see what xml could be useful for.

as it turns out, tags and markup are irrelevant. what matters are infosets, or the information that is contained in xml. with the advent of xml schema it has become possible to add another layer of abstraction to the markup. you no longer have to think of your data in terms of tags and markup, but rather in terms of its types. what does that mean?

it means you can concern yourself with the (simple / complex) types you encounter in your problem space. like the notion of address. you do not care how address is encoded, you just care about its type. xml schema allows you to extract that information out of your data. once you have such rich, structured data, you can do a lot with that.

for instance, the concept of pipes, another powerful abstraction and fundamental to the unix way. conceived by ken thompson as little programs that sequentially work on each others output, it has inspired 20 years of operating system design.

now one of the basic assumptions of the pipes idea was that the data was basically character data. enhance this concept with xml (rich, strongly typed data) and you have the foundation for a lot of new, very powerful ideas. this is what i currently understand xml to be, and what is being built on with xml protocols and ultimately the xml processing model.

(heavily inspired by this keynote talk by henry thompson)

Uncategorized

Gregor J. Rothfuss

Infosets are pipes

Leave a comment

Related

Leave a comment