Multi-Version Documents: Improvements to Apparatus

Saturday, July 17, 2010

Improvements to Apparatus

One advantage of Multi-Version-Documents is that generating an apparatus is so easy. There is just a simple command in nmerge: you specify the range (offset and length), the desired base text, and it then computes the traditional 'apparatus' type display of all the variants aligned on word-boundaries. Maybe it's old-fashioned and print-related but it does show you many versions of the text in a very compact way. So I think it's still useful.

The problem I had been struggling with for the past couple of weeks was how to ensure that this range in the MVD could be specified precisely via a selection in the GUI. Of course what the user sees is not the contents of an MVD. It is extracted and transformed via XSLT (at the moment) and the user selection in HTML bears no clear relation to the corresponding selection in the underlying data. The problem boils down to aligning the XML and HTML versions of the text fast enough for the user not to notice. There are plenty of techniques for doing this, but they all take waaaay too long. I wanted it in fractions of a second. After perhaps the sixth try my new method finds the correct answer in around 28 milliseconds for the King Lear example in slow old PHP. What is perhaps most annoying is that the method I used was incredibly simple. It's just 58 lines of code. Strange that you can never see the simple things that are right under your nose. :-) And when you finally have the answer you can't explain why it didn't occur to you earlier.

If anyone really wants to know how I did it they can download the MVD_GUI code to find out. I'm not going to bore you all with technical details here. You might have to wait until I update the Google code site.

Saturday, July 17, 2010

Improvements to Apparatus

No comments: