Multi-Version Documents: July 2009

Sunday, July 26, 2009

Alpha Prototype Ready

I am renaming the multi-version wiki Alpha, simply because it's easier to say than Phaidros. It's a bit of a joke, really, because 'Alpha' was just the description of the product I developed for DH2008. It was the 'alpha' release of that.

The old Alpha didn't do transpositions, and to remedy this deficiency I have been labouring hard for the past year. NMerge was revised to support transpositions, but I hadn't integrated it into the multi-version wiki. But when I finally saw the result of the new nmerge in the web browser, it was suddenly clear that there were still some bugs in the transposition algorithm. Finding out exactly what was going wrong, though, took me about a week of solid debugging. But it is done now and I am finally satisfied. And now I have something to take to Montréal to show the audience. And I can say: 'Hey folks, you said this conference was all about theory, but here's something that actually works.' I think that is a pretty good argument.

In this screendump of part of the TwinView of Galiano's 'El mapa de las aguas' you can see the transposition of 'otras de un hachazo' from after 'de un bocado rabioso' (in version B, left) to before (in version C, right). To consistently detect cases like this manually would be near to impossible.

Red text is deleted in the left-hand version with respect to the version on the right. Blue text is inserted, and transpositions are shown in grey. Black text is merged and, like transpositions, clicking on it aligns the text on each side. This use of these simple features of HTML results in a surprisingly effective UI.

Character-Level vs Word-Level Alignment

The use of character-level alignment by default is new to this version. For example, the expression 'el molino chico' became 'el molino' through the deletion of the character sequence 'o chic'. This goes to show that what humans would expect – the deletion of ' chico' – and what the computer detects, don't always correspond. I don't think that is a bad thing. The alternative would be to fail to see changes of spelling such as 'desaparecido' for 'desparecido' or the capitalisation of 'Ojos' for 'ojos'. A word-level granularity would puzzle the reader while he/she tried to work out the difference. It is clearer to see small changes like these highlighted, so I agree with the MEDITE people that character-level alignment is more powerful. After all, you can always reduce character-level granularity to word-level but if you only have word-level alignment you are stuck with it.

'Collation' programs based on XML use word-level granularity because a finer resolution would make the markup impossibly complex (you'd have to mark up each letter separately). That doesn't have to be a restriction once we abandon the print-oriented concept of 'apparatus.' For the digital medium, at least, a new digital presentation of variation is needed. Let it evolve.

Thursday, July 2, 2009

Interface 09 and Multi-Version Wiki

We will be presenting a poster at Interface09 at the University of Southampton. There will also be a demo of the multi-version wiki, which I hope will be an iteration further on from that presented at Oulu for Digital Humanities 2008. The new multi-version wiki is simply the old wiki with the new nmerge library added, but that includes support for transpositions, which is kind of important. It is a Jetty 6 based web application that runs inside your browser, and allows you to view and edit MVDs in a variety of intuitive ways.

Digital Variants Portal

Eventually the wiki will be broken up and integrated into the Digital Variants Website I am building. In this form the wiki will be a series of portlets inside a portal. Each portlet conforms to JSR 286 and is implemented in Jetspeed 2. A portal allows the user to configure his or her own interface on the web using the portlet components. It also promotes reuse of the portlets by other parties. We are going for broke with this design: I for one don't believe that deficient or obsolescent technology has any place in designs for the future. If we can build it, we will.