Thursday, February 24, 2011

Multi-lingual MVDs

There are plenty of cases where the concept of 'work' spans more than one basic version in one language. Just think of the multi-lingual laws of the EU, the Romulo of Virgilio Malvezzi translated into several languages, each having its own textual history, or the Chronicles of Eusebius, in Latin, Greek and Armenian. The question is, how can you align the same text written in a different language? Can one align Latin and Greek, or French and German? In my opinion, no, or at least not automatically. Quite apart from the language dissimilarity, translations often have quite different structures, making alignment particularly difficult. But a tiny change to the definition of an MVD makes it possible to align such texts manually and to use the MVD format as a storage facility.

Tweaking the groups

MVDs have always had a simple grouping mechanism. You can group versions by type. For example, versions of a particular recension, or internal versions (corrections or revisions of a single manuscript) can be grouped together to keep them separate from versions in other physically different documents. Now if we assign one of these groups a simple attribute, called 'merge' and set it to 'true' or 'false', then we can control how an MVD is built up. For example, imagine we have French, German and Italian translations of some work, each in several versions. We could group all the Italian versions together, and similarly for the German and French ones. And we could set each group's attribute 'merge' to 'true'. But each such group would belong to a higher group, whose 'merge' attribute would be 'false'. So the merging program would know, on being given version 23 (French) to add to the MVD, not to merge it with version 16 (German) because their shared parent group is not merged. Here's how it would look schematically inside the resulting MVD:

This might also be a good strategy whenever the same 'work' is substantially rewritten, like the Morte d'Arthur and other medieval tales. Versions of each rewrite would get their own group and we wouldn't attempt to align them automatically because it just gets too messy.

Linking the translations

Now we can extend the standoff markup mechanism described in the previous post to link the texts of the different languages manually. We add a view that displays two versions of an MVD side by side:

Selecting some text on the right or left highlights it independently (you can do this in Javascript). Now select something in the opposite version and press the 'link' button. This creates an annotated property that specifies a link between the two selected ranges and records it via standoff markup. The view could then give the user graphical feedback by formatting the two selected blocks rigidly side-by-side:

They could also scroll together in sync, as they currently do in compare view. If blocks are transposed between languages (as often happens) the text might jump around a bit as you scroll, but so long as we align on the most central block it should work OK. Also, the alignment would hold for all the aligned versions on either side, not merely for the ones currently selected. If you had 12 German versions and 16 French ones, they would all be aligned at the same point of their shared text. You could even display an apparatus at the bottom of each side so the user could see the variants of the versions in each language.

How much work is that?

Although a special view would have to be designed, there is not much else needed to make it work. It might even be a good idea to add such a view to the MVD-GUI suite and see what people can do with it – but only once the standoff mechanism is up and running, because this solution depends on it.

No comments: