How To Achieve Flexible XML Messaging
|
|
In a previous piece I talked about the trouble with XML Schemas in support evolution. Nice thoughts, but of course people demand solutions.
There are two ways that I can think of to support schema evolution in XML. The first and most traditional way is to never have to evolve the schema. That seems completely counter intuitive, but I didn't invent the strategy! The primary culprit are the RDF folks.
The strategy of course isn't a simple as dogmatically sticking to a fixed schema. The strategy requires representing a generic data model. Some use name-value pairs, others support subject, predicate and object triples and others use datasets. You can even go as far as embedding type information in these structures. The schemas are fixed however the object model that's embedded is extremely flexible.
The conflict with using this approach is that it in general incompatible with XML dogma. If you have only a fixed set of element names, then why even bother representing it in XML? The whole idea of XML is to provide elements that users can define and re-define. On a pragmatic side, all those XML tools like schema validators, transforms, editors become practically ineffective with this approach.
A second approach is to realize the fact that grammar extension and reusable is a extremely difficult proposition. Ask any person who has attempted to build a parser. It's all too easy to define an ambiguous production. Now grammar extension is a whole other ball game.
Grammars by its very nature are extremely rigid, they require all productions to stitch together in a seamless way. However, what if we took a more loose approach. What if we take an approach were we just use a bag of productions. That is a set of loosely coupled productions, something like Schematron.
The beauty of Schematron is that a document is valid for only the parts that the recieving application is interested in. Everything else is effectively ignored. If you really think about this, it's actually the right way. Just ask postel who said "Be liberal in what you accept".
In a world of asynchronous communication, certain pieces of information arrive to your node that just happens to be there because its actually in transit. Assuming that every node has global knowledge and agreement as to what goes in every message that it comes across is simply asking to boil the ocean's water.
The plus with Schematron is that it works with XML. The minus is that semantics are arbitrary. However, what's to prevent one from using it as a poor man's rdf?
In fact, an extremely compelling approach that's making its rounds in the idea of semantic markup in HTML. That is, don't bother developing new XML tags, just use existing ones to express new semantics. It's the original fixed schema approach again but this time in a more pragmatic setting. The reason it works is that XHTML has a backdoor and attributes like "class" are being used to emmbed name value pairs. In addition you have CSS that is able to make queries against these tages. Interestingly enough, the use of CSS is just like the use of patterns in Schematron.
There you have it folks, the means towards flexible messaging. Step one, fix or never change the schema. Step two, only extract what you're interested in. Finally, for your own comfort, Tim Bray (co-creator of XML) writes:
Here’s the real dirty secret; every time you cook up your own tag-set, you lose interoperability. The deep semantics that XML tags are labels for can’t be captured in any one of a schema or a write-up or lunchroom chats or running code; they need all of these things. (The notion, inherent in the phrase “custom schemas”, that a schema captures the essence of a language, is just totally wrong). The lesson is, to the extent that you can use a language that someone else already wrote, you win.
Last modified 2004-06-19 07:54 AM

