Thursday, April 23, 2009

MVDs in binary or XML?

A pattern is emerging in the effect that the MVD concept is having on people. They take on board its power at representing variation but they don't like the idea of representing the data in binary form. Instead they think it is possible to represent variation in some form of XML. So far I've heard proposals to use TEI-XML, RDF or GraphML. It's tempting, of course, to carry on using XML when this is the tool we are all most familiar with. However, my point of developing the MVD format was precisely to get around the limitations of all forms of markup. You can't represent a variant graph in XML satisfactorily if the text you are recording the variation of is itself XML – and it usually is. The reason is that you can't represent cases where the markup itself varies: for example the deletion of a paragraph break:

<del></p><p></del>???

Of course there are hacks to get around this particular case but they have negative consequences. What you end up doing is modifying the markup to accommodate weaknesses in the representational power of markup itself. I think that is a fundamentally flawed strategy. It is just another form of putting presentational information into markup that is supposed to be generic. If you try to represent variation in a set of texts or in one text using markup you very quickly run up against the problem of overlap. And markup is very poor at representing that as we all know. The only way to completely get around the overlap problem is to represent variation using a non-markup based technology. That's the whole point of MVDs that doesn't seem to have been widely acknowledged yet.

No comments: