The Trouble with Evolving XML Schemas
XML Schema is truly a monstrous mess, just ask Jim Clark. My impression was that XML Schema was designed by committee and had become unnecessarily complex at the same time riddled with conceptual holes.
But I had thought that those conceptual holes were minor and could only be discovered in the minutiae. Unfortunately, there's a glaring hole in XML Schema, and that's its support for schema evolution.
A bunch of people have discussed this issue previously. David Orchard discusses "Providing Compatible Schema Evolution". David discusses a few of the existing solutions:
- Type Extension - Requires both parties to update their schema, furthermore allows only extension after the last element.
- Change namespace name or element name - Essentially changing the schema that breaks compatibility everywhere.
- Use wildcard with ##other - A namespace author cannot extend their schema with extensions and correctly validate them because a wildcard cannot be constrained to exclude some extensions.
- Extension elements - The only solution that allows backwards and forwards compatibility, and correct validation using the original and extended schema. However is has a cumbersome syntax.
Dare Obasanjo discusses"On Versioing XML Vocabularies". He takes a crack at David Orchard's analysis:
David Orchard goes on to suggest a number of potential additions to future versions of W3C XML Schema which would make it easier to use it in defining extensible XML vocabularies. However given that my personal opinion is that adding features to W3C XML Schema is not only trying to put lipstick on a pig but also trying to build a castle on a foundation of sand, I won't go over each of his suggestions. My recent suggestion to some schema authors at Microsoft about solving this problem is that they should have two validation phases in their architecture. The first phase does validation according to W3C XML Schema rules while the other performs validation of “business rules“ specific to their scenarios. Most non-trivial vocabularies end up having such an architecture anyway since there are a number of document validation capabilities missing from W3C XML Schema so schema authors shouldn't be too focused on trying to force fit their vocabulary into the various quirks of W3C XML Schema.
In essence he's punting (i.e. giving up) on the issue.
Sean Mcgrath has his own proposal, he makes about 12 recommendations, specifically he writes:
- Messages must be designed for extensibility by implementing the notational aspects of the “mustIgnore/mustUnderstand” version processing model (see Appendix 1).
- Messages should also provide a property/value pair extension mechanism.
That is taking the extensibility element approach mentioned by David Orchard and augmenting it with additional semantics. In addition he adds a back door by introducing the tradional schema opaque construct name-value pairs (ala RDF).
Roger Costello discusess "eXtreme eXtensibility". He relaxes constraints and recommends this:
- Unlimited Vocabulary - There can be no restriction on the contents of the River element. This will allow for unforeseen data. Any vocabulary that the schema provides should be considered as merely a starting point.
- Unordered - There can be no restriction on the order of the data
Of course, as Roger states, its very much like RDF except for one caveat. That is, aggregating disparate schema declarations into a single schema when there are conflicting definitions of same element, an inherent problem with XML Schemas.
There you have if folks, a whole bunch of recommendations that you can use in your daily routine. But, just in case you can't see the writing on the wall, the consensus seems to be, if you want evolvable schemas, then don't even think of creating new elements!
It's that simple, if you can't handle that simplicity then take comfort in the fact that the Web scales because it's just plain uniform.
Last modified 2004-06-19 05:30 AM