Tuesday, March 10, 2009

MVD is Not a Replacement for Markup

Some people still think of MVD as a replacement for markup. It isn't. It complements markup systems or any technology that can represent content. As I said in the main page What's a Multi-Version Document? an MVD represents the overlapping structure of a set of versions or markup perspectives. It doesn't need to represent any of the detail of the content, which is the responsibility of the markup.

I realise that it's easy, and natural, to seek to dismiss radical ideas simply because they are radical. The difference in this case is that MVD is a technology that definitely works. It's not all that radical anyway. Consider the direction in which multiple-sequence alignment is going in biology. They have also realised that the best way to represent multi-version genomes or protein sequences is via a directed graph (e.g. Raphael et al., 2004. A novel method for multiple alignment of sequences with repeated and shuffled elements, Genome Research, 14, 2336-2346). I prefer to think of that idea as parallel to mine, and his 'A-Bruijn' graph is rather different from my MVD, but it represents the same kind of data in much the same way. Acceptance that this basic idea can also be applied to texts in humanities and linguistics is just a matter of time.

The Inadequacy of Markup

If markup is adequate for linguistics texts, why is it that every year someone thinks up a new way to manipulate markup systems to try to represent overlap? If it were adequate there would be no need for new systems, but we continue to see 1-3 new papers on the subject every year. It's seen as a game. Look at the Balisage website: 'There's nothing so practical as a good theory'. Perceived as an unsolvable problem, overlap is the perfect topic for a paper or a thesis.

In the humanities, overlap in markup systems is more than an annoyance; it wrecks the whole process of digitisation. In simple texts you can just about get by, but it's a question of degree. Try to use markup to record the following structures:

  1. Deletion of a paragraph break
  2. Deletion of underlining
  3. Changes to document structure
  4. Transposition
  5. Overlapping variants
These can all be done somehow in markup, I admit, but very poorly. And they are features that occur all the time in original texts. The fundamental problem is that you can't adequately fit a non-hierarchical structure into a hierarchical template. To choose markup alone as a medium to preserve our textual cultural heritage is to resign yourself to mangling that information.

Why do we have to use markup to record complex structures it was never designed to represent? Hand that complexity over to the computer and let it work it out. That's what MVD lets you do. If you are getting a headache shuffling around angle brackets and xml:ids, then think again. Is this any proper way for humans of the 21st century to interact with the texts of their forebears?

No comments: