Sunday, November 16, 2008

Transpositions Conquered

Today the test program correctly merged three versions of a single sentence of the Sibylline Gospel, detecting four transpositions and encoding them correctly. The sentences were:

A: Et sumpno suscepto tribus diebus morte morietur et deinde ab inferis regressus ad lucem veniet.

B: Et mortem sortis finiet post tridui somnum et morte morietur tribus diebus somno suscepto et tunc ab inferis regressus ad lucem veniet.

C: Et sortem mortis tribus diebus sompno suscepto et tunc ab inferis regressus ad lucem veniet.

I must thank Nicoletta for supplying this splendid example, which in a small space contains so many transpositions. Here is the variant graph built automatically from the three versions. When I say 'automatically' what I mean is that I drew the graph manually from the program's textual output. The program was set to make no variants of less than five characters, although it does split arcs down to a single character. There are two transpositions, each present twice. I have indicated these by drawing the transposed forms in grey. The parent arcs are in black and the two are connected by dotted lines. The triple repetition of 'Et' at the start of the graph could be removed by reducing the minimal variant size. At the moment I am happy to see such high quality output without resorting to fine tuning.

The best thing about the program is the degree to which repetitions between versions have been systematically removed. This is the whole objective of the variant graph model.

This is, of course, only a test program. The algorithm will eventually be added to NMerge and all this will happen behind the scenes in the multi-version wiki whenever you save.

No comments: