<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4555078640999654611</id><updated>2012-01-19T11:31:26.827-08:00</updated><category term='Align'/><title type='text'>Multi-Version Documents</title><subtitle type='html'>This project is about creating a Wiki to handle documents consisting of multiple simultaneous versions (MVDs) or which contain overlapping markup.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>67</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-1146297305712334111</id><published>2011-12-23T00:26:00.001-08:00</published><updated>2011-12-23T10:44:48.712-08:00</updated><title type='text'>HritServer</title><content type='html'>&lt;p&gt;My colleagues at Loyola have inspired me with their questions to turn nmerge into a service. The idea of HritServer (Humanities Resources, Infrastructure and Tools) is to build a self-contained application that runs as a service from the commandline, and provides the infrastructure to build &lt;em&gt;any&lt;/em&gt; digital humanities website. It is a collection of the tools written over the past few years, including nmerge, the XML import/export tools, formatter and the GUIs I developed for Digital Variants. The structure will be a little like Tomcat: &lt;/p&gt;
&lt;ol&gt;&lt;li&gt;a back-end administrative interface allowing the admin user to add or import new texts and edit existing ones&lt;/li&gt;
&lt;li&gt;Example GUIs in Java and PHP that exercise each facility provided by HritServer: compare, view variants, indexed search, tree-view.&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;The service type will be strictly RESTful; everything will be done via HTTP. The database at the back end will be a modern key-value store rather than an old-fashioned relational design. Each resource will be accessible via a simple URL, with no complex access needed. For example, to get the formatted HTML of act1, scene 1, first folio of Shakespeare's King Lear, one would only need to fetch the URL:&lt;/p&gt;
&lt;p&gt;http://dhtestbed.ctsdh.luc.edu/html/english/shakespeare/kinglear/act1/scene1/F1/&lt;/p&gt;
&lt;p&gt;Anything can be stored at a similar URL in a simple hierarchical structure. The HTML is generated on the server from plain text and overlapping markup sets, and never stored. By passing parameters to the same URL, different formatted versions of the same text can be achieved. Different encodings of the same text can likewise be realised by specifying a different  collection of markup sets. The idea is to take the complexity out of building such websites, and to maximise automation by providing a powerful base infrastructure that will work for any set of texts. It should be achievable within a reasonable time, because almost everything already exists (although some tools are still incomplete). All I have to do is stitch it together and test it.&lt;/p&gt;
&lt;p&gt;I'll have to write a formal software specification but I've already made a good start on coding it.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-1146297305712334111?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/1146297305712334111/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=1146297305712334111' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/1146297305712334111'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/1146297305712334111'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2011/12/hritserver.html' title='HritServer'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-7919213413260437469</id><published>2011-11-10T02:52:00.000-08:00</published><updated>2011-11-13T19:21:30.658-08:00</updated><title type='text'>XML-Free Digital Editions</title><content type='html'>&lt;p&gt;Playing around with Apache's CouchDB today I realised that it uses JSON, not XML to handle exchanges between client and server. This opens the intriguing possibility of making XML-free digital editions. If the &lt;a href="http://dhtestbed.ctsdh.luc.edu/hritinfrastructure/index.php/standoff-properties"&gt;standoff properties&lt;/a&gt; for digital texts were also stored using JSON or YAML rather than XML - a simple enough change - then the entire edition could be XML-free. The only thing that comes close is the final conversion into HTML for the browser. But this can be in good old HTML (an SGML dialect) rather than XHTML, an XML dialect. I doubt that anyone has achieved that before, depending on how you define a 'Digital Edition'. By that term I mean an online digital archive of &lt;em&gt;marked up&lt;/em&gt; texts accessible over the web. I think this is rather a liberating idea that is actually inevitable. If we believe the XML afficionados disaster will ensue as soon as we abandon 'standards'. Actually what will happen is that digital editions will blossom with the possibilities offered by the new form of digital text. It's time to show people what is possible &lt;em&gt;without&lt;/em&gt; XML.&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-vjbDlVDehHg/TsCJKp3orCI/AAAAAAAAAU4/50zgReqm_rA/s1600/xmlfree.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 81px;" src="http://1.bp.blogspot.com/-vjbDlVDehHg/TsCJKp3orCI/AAAAAAAAAU4/50zgReqm_rA/s400/xmlfree.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5674686346617728034" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-7919213413260437469?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/7919213413260437469/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=7919213413260437469' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7919213413260437469'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7919213413260437469'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2011/11/xml-free-digital-editions.html' title='XML-Free Digital Editions'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-vjbDlVDehHg/TsCJKp3orCI/AAAAAAAAAU4/50zgReqm_rA/s72-c/xmlfree.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-8016754057727435614</id><published>2011-09-19T15:14:00.001-07:00</published><updated>2011-09-19T15:27:52.574-07:00</updated><title type='text'>Formatter tool with full overlap</title><content type='html'>&lt;p&gt;More progress, I'm afraid. I've incorporated the test program announced in the last post into the formatter tool. This is intended as a practical replacement for XSLT. So now I can convert real texts plus overlapping standoff properties into valid HTML. If the properties are derived from XML documents there won't be any overlap initially. What formatter does is loosen up that particular restriction. So in the GUI it will be possible to change properties or add new ones that overlap. And it will still format correctly. I'll be putting some test cases onto the &lt;a href="http://dhtestbed.ctsdh.luc.edu/"&gt;testbed at Loyola&lt;/a&gt; soon.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-8016754057727435614?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/8016754057727435614/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=8016754057727435614' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/8016754057727435614'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/8016754057727435614'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2011/09/formatter-tool-with-full-overlap.html' title='Formatter tool with full overlap'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-2495639169760585080</id><published>2011-09-08T02:00:00.000-07:00</published><updated>2011-09-08T03:25:45.504-07:00</updated><title type='text'>Web pages from overlapping properties</title><content type='html'>&lt;p&gt;I've made some progress in turning random overlapping properties into HTML. I've written a &lt;a href="http://dhtestbed.ctsdh.luc.edu/test.php"&gt;&lt;em&gt;test&lt;/em&gt; program to both demonstrate the principle and also to serve as a debugging tool for me&lt;/a&gt;. In the latter role it hasn't reported a single error for two days, so I'm starting to think this is it. Although it doesn't do anything useful it shows that neither embedded markup nor tree structures are necessary to markup up a text.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-2495639169760585080?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/2495639169760585080/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=2495639169760585080' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2495639169760585080'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2495639169760585080'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2011/09/web-pages-from-overlapping-properties.html' title='Web pages from overlapping properties'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-2799446391375124402</id><published>2011-07-16T04:18:00.000-07:00</published><updated>2011-07-16T05:09:44.899-07:00</updated><title type='text'>OCR of unevenly lit documents</title><content type='html'>&lt;p&gt;Someone gave me some scans in colour that needed converting via OCR into plain text. I thought I would run them through Tesseract, the main open source OCR tool. The results were dreadful, even when I converted them to greyscale as recommended. My images had three faults: &lt;/p&gt;
&lt;ol&gt;&lt;li&gt;they had a large border showing the book's binding and the surrounding environment of the image&lt;/li&gt;
&lt;li&gt;they were unevenly lit&lt;/li&gt;
&lt;li&gt;The text was curved - the result of trying to photograph a bound volume of typewritten pages that could not be fully opened without damage&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It seemed to me that these problems must be similar to those encountered in practically any digitisation project. But there didn't seem to be any good open-source solutions.&lt;/p&gt;
&lt;p&gt;I wanted to fix at least fault 2 to see how Tesseract would fare when the image was, as recommended, in plain black and white. However, after wasting a whole afternoon Googling the problem and trying every conceivable filter in Photoshop and Gimp I couldn't reduce the image to black and white. The problem was the difference in illumination:&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-KGQ1bFI2PiY/TiF216C27_I/AAAAAAAAAS8/7oG4RmZTIhI/s1600/dark-light.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 110px;" src="http://1.bp.blogspot.com/-KGQ1bFI2PiY/TiF216C27_I/AAAAAAAAAS8/7oG4RmZTIhI/s400/dark-light.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5629911677676220402" /&gt;&lt;/a&gt;
&lt;p&gt;Shown on the right is a section of the upper right hand portion of a page, on the left the bottom left hand portion. When these are turned to a global black or white value one is hopelessly too dark and the other too light:&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-fySB39nAHCU/TiF4CLOuAPI/AAAAAAAAATU/bzFrEVWVglg/s1600/blackandwhite.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 110px;" src="http://2.bp.blogspot.com/-fySB39nAHCU/TiF4CLOuAPI/AAAAAAAAATU/bzFrEVWVglg/s400/blackandwhite.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5629912987959427314" /&gt;&lt;/a&gt;
&lt;h4&gt;An idea&lt;/h4&gt;
&lt;p&gt;So I downloaded &lt;a href="http://freeimage.sourceforge.net/download.html"&gt;the FreeImage library&lt;/a&gt; and tried to use it to write a simple filter. I first reduced the image to greyscale and manually cropped it to simulate having already solved problem 1 above. Then I passed a small square 64x64 pixels over the image. For each square I computed the average greyscale value. Then I turned all pixels less than this value by at least 8 to black (lesser is darker). All others were turned to white. This very simple approach had the effect of obliterating the lighting differences and producing an evenly illuminated plain black and white text:&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-d0qjaWXpHCE/TiF4eX1sD-I/AAAAAAAAATc/UjVuSfah__0/s1600/section.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 123px;" src="http://1.bp.blogspot.com/-d0qjaWXpHCE/TiF4eX1sD-I/AAAAAAAAATc/UjVuSfah__0/s400/section.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5629913472380440546" /&gt;&lt;/a&gt;
&lt;h4&gt;Curvature&lt;/h4&gt;
&lt;p&gt;Unfortunately, Tesseract still doesn't like the strong curvature. It seems to split up lines based on strict horizontals, because it mixed up text from adjacent lines that curved into each other's path. The next stage will be to 'uncurve' the text automatically.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-2799446391375124402?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/2799446391375124402/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=2799446391375124402' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2799446391375124402'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2799446391375124402'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2011/07/ocr-of-unevenly-lit-documents.html' title='OCR of unevenly lit documents'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-KGQ1bFI2PiY/TiF216C27_I/AAAAAAAAAS8/7oG4RmZTIhI/s72-c/dark-light.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-2150716057602314341</id><published>2011-06-11T03:44:00.000-07:00</published><updated>2011-06-21T12:44:22.200-07:00</updated><title type='text'>From arbitrary overlap to HTML</title><content type='html'>&lt;p&gt;If we try to represent original documents not authored in the digital medium, we soon discover that the pen or the printed type used to create them were not constrained, as modern embedded markup languages are, to represent only tree-structures. It would thus be very liberating to encode such documents for digital presentation on the Web, using arbitrary overlapping external properties instead of an embedded hierarchy of tags. This would provide a number of distinct advantages:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Properties could represent the source texts more accurately.&lt;/li&gt;
&lt;li&gt;Different sets of properties could be combined in the same document. &lt;/li&gt;
&lt;li&gt;With appropriate software, it will be easier to edit separate text and markup files than complex embedded markup.&lt;/li&gt;
&lt;li&gt;Texts and markup, as separate building blocks, could be exchanged and reused for other applications.&lt;/li&gt;
&lt;/ol&gt;&lt;/p&gt;
&lt;p&gt;These are winning arguments for at least digital humanists, and maybe for other people who use embedded markup.&lt;/p&gt;
&lt;h4&gt;How it works&lt;/h4&gt;
&lt;p&gt;To see how this can be done let's specify some fictitious properties that apply to random ranges of a short text:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0,12,'banana'
3,7,'pear'
12,9,'refrigerator'
13,4,'orange'
18,12,'pineapple'
22,34,'guava'
35,12,'grape'
48,9,'penguin'
52,17,'dog'&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What this means is that the text, which is at least 69 bytes long, is 'marked-up' by a series of arbitrary properties. The offsets in the text where these properties start are specified by the first number in each line, their lengths by the second number, and their names by the quoted strings. Of course, in a real-world application the names would more likely be 'p' or 'span' or 'table' etc.&lt;/p&gt;
&lt;h4&gt;Reduction to intervals&lt;/h4&gt;
&lt;p&gt;But how can we turn this apparent chaos into syntactically correct HTML? The approach taken here is to break up the properties into a series of 'intervals' where all the properties are the same throughout. For example between offsets 52 and 55 the properties 'dog', 'guava' and 'penguin' are all active.&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-LXItR2lDZVo/TfQT-j9JxVI/AAAAAAAAASk/i3UpjCCBGV8/s1600/intervals.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 99px;" src="http://2.bp.blogspot.com/-LXItR2lDZVo/TfQT-j9JxVI/AAAAAAAAASk/i3UpjCCBGV8/s400/intervals.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5617136600762402130" /&gt;&lt;/a&gt;
&lt;p align="center"&gt;&lt;i&gt;Intervals defined by overlapping properties&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;Although dividing the properties into intervals removes the overlap it also creates too many short sequences for efficient HTML. So the next step is to work out where we might be able join them up. To do that we need to know which tags may appear inside which other tags. In other words, we need a kind of basic schema.&lt;/p&gt;
&lt;h4&gt;On-the-fly deduced schemas&lt;/h4&gt;
&lt;p&gt;Fortunately we already have the HTML schema. Since we will be using CSS (cascading style sheets) to format the text, we can use CSS rules to tell us which HTML elements will represent our properties, and then work backwards to figure out how they will nest.&lt;/p&gt;
&lt;p&gt;The size of the problem can be reduced by reflecting that not all of our overlapping properties will be rendered in HTML. Some have other uses, for example, to provide programming information, are not needed in the current view, or are intended for future use. So it is safe to ignore any properties that aren't mentioned in the CSS file. However, because HTML syntax is fairly loose, this only gives us part of the answer.&lt;/p&gt;
&lt;p&gt;The missing information can be deduced from the properties themselves. If we are recording a play, for example, properties like 'line' almost always nest inside 'speech'. In those few cases where they don't, we can split the 'line' property so that it always nests within the dominant tag 'speech'.&lt;/p&gt;
&lt;p&gt;Merging this statistical information about property nesting into that derived from the HTML schema allows us to reconstruct almost all of the structure needed to render the document correctly. However, since this relies on statistics, it is not absolutely guaranteed to work in all cases, but even when it doesn't we will still have valid HTML. The worst that can happen is that the formatting won't look right. &lt;/p&gt;
&lt;h4&gt;Using the deduced hierarchy information&lt;/h4&gt;
&lt;p&gt;One simple way to express a hierarchy is to record which elements may appear inside which other elements. If you have 10 elements that need rendering this means you must compute a 10x10 matrix. 10 is probably a realistic number in practice, but even with all 107 HTML5 tags a matrix of just 11,449 ints or 45K would suffice.&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-O67roV_TGkc/TfQZeRf5WXI/AAAAAAAAASs/pK1zxaojaRU/s1600/matrix.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 194px; height: 200px;" src="http://1.bp.blogspot.com/-O67roV_TGkc/TfQZeRf5WXI/AAAAAAAAASs/pK1zxaojaRU/s400/matrix.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5617142643121805682" /&gt;&lt;/a&gt;
&lt;p align="center"&gt;&lt;i&gt;Properties (left) that may appear within other properties (top)&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;In my test program I just specified the nesting matrix manually. Following the natural analogy a 'guava' may appear inside a 'penguin', 'dog' or 'refrigerator', but an 'orange' cannot be inside a 'pineapple'. In the finished program, of course, such a matrix would be computed as described above.&lt;/p&gt;
&lt;p&gt;From this hierarchy information we can easily work out when to start and stop tags. For each interval visited in order we separate the ranges into three sets:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;&lt;b&gt;closing:&lt;/b&gt; properties that are present in the previous interval but missing in the current one&lt;/li&gt;
&lt;li&gt;&lt;b&gt;opening:&lt;/b&gt; properties that are present in the current interval but were absent in the previous one&lt;/li&gt;
&lt;li&gt;&lt;b&gt;continuing:&lt;/b&gt; properties that are present in both the preceding and current intervals&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;After this initial classification we use the nesting matrix to correct any anomalies. Any property in the 'continuing' set that is contained by one from the 'closing' or 'opening' sets must be moved to the 'closing' set and also added to the 'opening' set. This is because, in order to preserve well-formedness, the closure or opening of the parent will force the closing and re-opening of a continuing child.&lt;/p&gt;
&lt;h4&gt;Remaining problems&lt;/h4&gt;
&lt;p&gt;Even after these measures two problems remain: &lt;/p&gt;
&lt;ol&gt;&lt;li&gt;Since we allowed arbitrary overlap there is nothing to prevent two incompatible properties such as 'guava' and 'grape' being defined for the same interval. Neither may contain the other, so if they occur together one must be dropped.&lt;/li&gt;
&lt;li&gt;Tags must be written out in the correct order: highest level containers first and the lowest level tags last. This can be achieved by sorting the ranges within each interval by their descending position in the hierarchy. We also use a stack to ensure that closing tags match and come out in the correct order.&lt;/li&gt;&lt;/ol&gt;
&lt;h4&gt;The result&lt;/h4&gt;
&lt;p&gt;Once these adjustments have been made, the intervals can be printed out one at a time, the closing tags followed by the opening ones. Here is the output of the test program, with dots representing the text for clarity:&lt;/p&gt;
&lt;p&gt;&amp;lt;banana&amp;gt;............&amp;lt;/banana&amp;gt;&amp;lt;refrigerator&amp;gt;.&amp;lt;orange&amp;gt;....&amp;lt;/orange&amp;gt;.&amp;lt;pineapple&amp;gt;...&amp;lt;/pineapple&amp;gt;&amp;lt;/refrigerator&amp;gt;&amp;lt;pineapple&amp;gt;.........&amp;lt;/pineapple&amp;gt;&amp;lt;guava&amp;gt;..................&amp;lt;/guava&amp;gt;&amp;lt;penguin&amp;gt;&amp;lt;guava&amp;gt;....&amp;lt;/guava&amp;gt;&amp;lt;/penguin&amp;gt;&amp;lt;dog&amp;gt;&amp;lt;penguin&amp;gt;&amp;lt;guava&amp;gt;....&amp;lt;/guava&amp;gt;.&amp;lt;/penguin&amp;gt;............&amp;lt;/dog&amp;gt;&lt;/p&gt;
&lt;p&gt;In this crazy random example the conflicting properties 'pear' and 'grape' had to be dropped. There was no way to render them given the containment rules. But the result is still well-formed XML and it would be HTML if we had used a CSS file to transform the properties.&lt;/p&gt;
&lt;h4&gt;Where to go from here&lt;/h4&gt;
&lt;p&gt;This test solution needs to be incorporated into the formatter tool and the whole thing converted into a php extension, so it can be used as a direct replacement for XSLT.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-2150716057602314341?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/2150716057602314341/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=2150716057602314341' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2150716057602314341'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2150716057602314341'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2011/06/from-freely-overlapping-properties-to.html' title='From arbitrary overlap to HTML'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-LXItR2lDZVo/TfQT-j9JxVI/AAAAAAAAASk/i3UpjCCBGV8/s72-c/intervals.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-5805966041291591830</id><published>2011-04-16T14:35:00.000-07:00</published><updated>2011-05-21T12:50:30.791-07:00</updated><title type='text'>From TEI to HRIT and back again</title><content type='html'>&lt;p&gt;Since we are designing a software suite to more or less replace embedded markup there has to be some way to import legacy texts. At first I thought the problem was insurmountable. Even if the original encoders had stuck to recommended guidelines such as the &lt;a href="http://tei-c.org"&gt;TEI (Text Encoding Initiative)&lt;/a&gt; they would have been forced to &lt;a href="http://www.tei-c.org/Guidelines/Customization/use_roma.xml"&gt;customise their encoding in two ways&lt;/a&gt;:
&lt;ol&gt;&lt;li&gt;By adding custom tags and attributes, and&lt;/li&gt;&lt;li&gt;By making a selection of tags from the large number of available ones&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;In the second case it is clear that any general solution that embraced an arbitrary subset of TEI would have to support &lt;em&gt;all&lt;/em&gt; of it. Since there are currently 519 tags in the scheme, and (probably) thousands of attributes, that is a daunting prospect for any programmer. And we are talking about &lt;em&gt;meaningful&lt;/em&gt; conversion into an entirely different software system, not a simple one-for-one mapping. And in respect to point 1 any customised tags would either have to be left out, or their function would need to be specified by the user.&lt;/p&gt;
&lt;h4&gt;Solving the problem&lt;/h4&gt;
&lt;p&gt;When forced to perform the task, however, I soon realised that any customised tags must have already been specified by a user who understood XML. So that same user could supply a customised table of conversion in XML to say what should be done with them. If they didn't follow the Guidelines then they have to do a little extra work, but they're not shut out.&lt;p&gt;
&lt;p&gt;And in the second case only a small subset of TEI is regularly used by digital humanists. For the purposes of defining versions, for example, only a small number of tags come into play, and even customised ones would have to follow one of only a couple of basic patterns, which could be programmed in as general functions. The customisations could be handled by a 'recipe', or set of instructions on how to convert the files. A default recipe would be provided for standard files, which the user could extend or change at will.&lt;/p&gt;
&lt;h4&gt;Why do this at all?&lt;/h4&gt;
&lt;p&gt;Because HRIT format is much more powerful than TEI:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;It allows arbitrary overlap of properties.&lt;/li&gt;
&lt;li&gt;It does not mandate any standard tag names&lt;/li&gt;
&lt;li&gt;It supports versions natively including transpositions&lt;/li&gt;
&lt;li&gt;It allows &lt;em&gt;mixing&lt;/em&gt; and matching of markup sets in the one text&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;That's more than enough reasons to move from TEI to HRIT. Another way of looking at it is to say that rather than replacing TEI it seeks to enhance it, and use it as an interchange format between HRIT and non-HRIT users. It depends on what kind of 'spin' you prefer.&lt;/p&gt;
&lt;h4&gt;Two-way conversion&lt;/h4&gt;
&lt;p&gt;Any conversion applied to legacy files (or, if you prefer, current files) would have to be reversible. Those who had imported their files into HRIT and changed their minds later on would feel 'locked in' if they couldn't back out, and those who hadn't made the switch would likewise be frightened off by that very prospect. So the overall process looks like this. Red/green arrows indicate as yet unavailable/available paths:&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-BIS6UmkHsIU/TaplcffRliI/AAAAAAAAARA/FF2mbNKekJM/s1600/tei-corcode.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 88px;" src="http://3.bp.blogspot.com/-BIS6UmkHsIU/TaplcffRliI/AAAAAAAAARA/FF2mbNKekJM/s400/tei-corcode.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5596397027124680226" /&gt;&lt;/a&gt;
&lt;p&gt;'TEI' refers to &lt;em&gt;any&lt;/em&gt; TEI-encoded file. The two-way process works like this:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;b&gt;Splitter&lt;/b&gt; splits the TEI file into N versions. By default it splits &amp;lt;app&amp;gt;&amp;lt;rdg&amp;gt;...&amp;lt;/rdg&amp;gt;&amp;lt;/app&amp;gt; structures as well as nested &amp;lt;del&amp;gt; and &amp;lt;add&amp;gt; and &amp;lt;choice&amp;gt; structures into versions. Unsplitter, not yet written, will take the versions (possibly modified) and try to put them back into one file, although this may be difficult. The &lt;em&gt;recipe&lt;/em&gt; file is used by splitter to direct the splitting. It can be customised by the user to control which elements are split and how.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Stripper&lt;/b&gt; removes the remaining markup from each separate version in TEI format. A different &lt;em&gt;recipe&lt;/em&gt; file specifies simplifications of elements intended to be rendered as formats in the final HTML. One simplification might be the reduction of &amp;lt;hi rend="italic"&amp;gt; to the property 'italics'. The output of stripper is the &lt;a href="http://digitalvariants.blogspot.com/2011/04/hrit-standoff-format.html"&gt;HRIT standoff XML format.&lt;/a&gt; (But stripper is written in such a way that another format can be added if required). It expresses every TEI element as a potentially overlapping property with possible 'annotations' or attributes. These attributes are ignored by the formatter but are not lost. Elements like the TEI-header, which contain metadata about the text, are entirely hidden but also not lost. This is to enable later reversal of the stripping process. Each version produces a pair of markup and plain text files that are separately merged into a single CorTex and a single CorCode file. It is &lt;em&gt;these&lt;/em&gt; files that are edited and read by the HRIT system.&lt;/li&gt;&lt;li&gt;&lt;a name="formatter"&gt;&lt;b&gt;Formatter&lt;/b&gt;&lt;/a&gt; takes the properties of the CorCode and combines them with the information from the CSS file into HTML. The CSS is used not only to change the appearance of the text on a web page but also to transform the markup. For example the CSS rule &lt;code&gt;span.italics&lt;/code&gt; can be used to change the appearance of italics, but also to convert properties called 'italics' into spans of class 'italics'. In this way we can avoid use of XSLT. But what about the 'annotations' that were originally attributes in the TEI-XML? They are simply ignored (although not lost). If you want to convert an element plus some attribute(s) into a HTML element using formatter, you must first specify a rule to simplify them to a plain property using splitter's recipe file.&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-5805966041291591830?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/5805966041291591830/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=5805966041291591830' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5805966041291591830'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5805966041291591830'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2011/04/from-tei-to-hrit-and-back-again.html' title='From TEI to HRIT and back again'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-BIS6UmkHsIU/TaplcffRliI/AAAAAAAAARA/FF2mbNKekJM/s72-c/tei-corcode.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-2217235088008911219</id><published>2011-02-24T02:10:00.000-08:00</published><updated>2011-02-26T14:29:08.596-08:00</updated><title type='text'>Multi-lingual MVDs</title><content type='html'>&lt;p&gt;There are plenty of cases where the concept of 'work' spans more than one basic version in one language. Just think of the multi-lingual laws of the EU, the Romulo of Virgilio Malvezzi translated into several languages, each having its own textual history, or the Chronicles of Eusebius, in Latin, Greek and Armenian. The question is, how can you align the same text written in a different language? Can one align Latin and Greek, or French and German? In my opinion, no, or at least not automatically. Quite apart from the language dissimilarity, translations often have quite different structures, making alignment particularly difficult. But a tiny change to the definition of an MVD makes it possible to align such texts manually and to use the MVD format as a storage facility.&lt;/p&gt;
&lt;h3&gt;Tweaking the groups&lt;/h3&gt;
&lt;p&gt;MVDs have always had a simple grouping mechanism. You can group versions by type. For example, versions of a particular recension, or internal versions (corrections or revisions of a single manuscript) can be grouped together to keep them separate from versions in other physically different documents. Now if we assign one of these groups a simple attribute, called 'merge' and set it to 'true' or 'false', then we can control how an MVD is built up. For example, imagine we have French, German and Italian translations of some work, each in several versions. We could group all the Italian versions together, and similarly for the German and French ones. And we could set each group's attribute 'merge' to 'true'. But each such group would belong to a higher group, whose 'merge' attribute would be 'false'. So the merging program would know, on being given version 23 (French) to add to the MVD, &lt;em&gt;not&lt;/em&gt; to merge it with version 16 (German) because their shared parent group is not merged. Here's how it would look schematically inside the resulting MVD:&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-9K434vRGQlQ/TWbclL1o7DI/AAAAAAAAAPA/HtWeEJq2bng/s1600/multilingualmvd.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 116px;" src="http://1.bp.blogspot.com/-9K434vRGQlQ/TWbclL1o7DI/AAAAAAAAAPA/HtWeEJq2bng/s400/multilingualmvd.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5577387719935978546" /&gt;&lt;/a&gt;
&lt;p&gt;This might also be a good strategy whenever the same 'work' is substantially rewritten, like the Morte d'Arthur and other medieval tales. Versions of each rewrite would get their own group and we wouldn't attempt to align them automatically because it just gets too messy.&lt;/p&gt;
&lt;h3&gt;Linking the translations&lt;/h3&gt;
&lt;p&gt;Now we can extend the standoff markup mechanism described in the previous post to link the texts of the &lt;em&gt;different&lt;/em&gt; languages manually. We add a view that displays two versions of an MVD side by side:&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-xqrv3DNyJow/TWl9THNqHgI/AAAAAAAAAPI/E6kvS9GNe5w/s1600/twi1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 267px;" src="http://2.bp.blogspot.com/-xqrv3DNyJow/TWl9THNqHgI/AAAAAAAAAPI/E6kvS9GNe5w/s400/twi1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5578127380782390786" /&gt;&lt;/a&gt;
&lt;p&gt;Selecting some text on the right or left highlights it independently (you can do this in Javascript). Now select something in the opposite version and press the 'link' button. This creates an annotated property that specifies a link between the two selected ranges and records it via standoff markup. The view could then give the user graphical feedback by formatting the two selected blocks rigidly side-by-side:&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-p4j46k7_0is/TWl-qOHv0mI/AAAAAAAAAPQ/NsNU9QMqE8I/s1600/twi2.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 267px;" src="http://3.bp.blogspot.com/-p4j46k7_0is/TWl-qOHv0mI/AAAAAAAAAPQ/NsNU9QMqE8I/s400/twi2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5578128877285266018" /&gt;&lt;/a&gt;
&lt;p&gt;They could also scroll together in sync, as they currently do in compare view. If blocks are transposed between languages (as often happens) the text might jump around a bit as you scroll, but so long as we align on the most central block it should work OK. Also, the alignment would hold for all the aligned versions on either side, not merely for the ones currently selected. If you had 12 German versions and 16 French ones, they would all be aligned at the same point of their shared text. You could even display an apparatus at the bottom of each side so the user could see the variants of the versions in each language.&lt;/p&gt;
&lt;h3&gt;How much work is that?&lt;/h3&gt;
&lt;p&gt;Although a special view would have to be designed, there is not much else needed to make it work. It might even be a good idea to add such a view to the MVD-GUI suite and see what people can do with it &amp;ndash; but only once the standoff mechanism is up and running, because this solution depends on it.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-2217235088008911219?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/2217235088008911219/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=2217235088008911219' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2217235088008911219'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2217235088008911219'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2011/02/multi-lingual-mvds.html' title='Multi-lingual MVDs'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-9K434vRGQlQ/TWbclL1o7DI/AAAAAAAAAPA/HtWeEJq2bng/s72-c/multilingualmvd.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-2007787430317604326</id><published>2011-02-13T11:54:00.001-08:00</published><updated>2011-05-16T13:24:40.217-07:00</updated><title type='text'>Standoff Properties explained</title><content type='html'>&lt;p&gt;I've been asked for a more detailed explanation of how CorCode works as a set of standoff properties. I'll try but it won't be all that brief.&lt;/p&gt;
&lt;h3&gt;Embedded markup&lt;/h3&gt;
&lt;p&gt;Since at least the 1980s humanities texts have been described using embedded markup codes, but this leads to several problems:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;The structure imposed on the text is a tree and maybe the structure we want to describe is not.&lt;/li&gt;
&lt;li&gt;The embedded codes need to be standardised because otherwise we can't share texts or create shared software. But there are so many codes we need to define that the standard soon becomes unwieldy.&lt;/li&gt;
&lt;li&gt;Embedded markup lacks flexibility. We can't easily exchange one set of markup for another, or merge two sets.&lt;/li&gt;
&lt;li&gt;Users who edit the texts have to read it through the smoke-screeen of the tags and their attributes. And they have to learn a complex system that is becoming ever more complex.&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;Standoff markup is a partial solution to these problems. Removing tags from the text clarifies it for the reader, and allows the exchange of one set of tags for another. But with standoff markup we still can't combine two tag sets or define non-tree structures. And because the standoff codes depend on the inviolability of the text, we can't edit it.&lt;/p&gt;
&lt;p&gt;What I was trying to explain in the previous post is that we can in fact overcome all of these problems by defining markup as a simple set of overlapping named properties. I'm not the first to suggest this by any means: in fact it resembles to varying degrees George's valency idea, LMNL, Thaller's extended string model, eComma, LORE and other annotation systems, and even TexMECS to some extent. But I'd like to describe my implementation because I think it offers some advantages over previous attempts.&lt;/p&gt;
&lt;h3&gt;Properties&lt;/h3&gt;
&lt;p&gt;Properties have a name, an offset and a length, which describe a range in the text. That's one string and two numbers. An example of a property is 'italics' at offset 23 with length 5. Let's just consider the offsets first. &lt;/p&gt;
&lt;h3&gt;Absolute versus relative offsets&lt;/h3&gt;
&lt;p&gt;With absolute offsets (as used in &lt;a href="http://hass.unsw.adfa.edu.au/ASEC/JITM/publications.html"&gt;JITM&lt;/a&gt; and every other standoff system I know) the offsets increase for properties as we move through the text. So if we had properties at offsets 2, 10, 23, 45, 106, 230, 1022, 1100, 1495, 1567 and we added 121 characters at the start of the text, the first offset would have to change to 123, AND we would have to add 121 to all subsquent ones: 123, 131, 144, 166, 227, 351, 1143, 1221, 1616, 1688.&lt;/p&gt;
&lt;p&gt;With relative offsets we only record the differences between an offset and the previous one. So the &lt;em&gt;same&lt;/em&gt; sequence would read: 2, 8, 13, 12, 61, 124, 792, 78, 395, 72. (That's obtained by subtracting 2 from 10, then 10 from 23, then 23 from 45 etc.) Now when we add 121 characters at the start, the sequence changes to 123, 8, 13, 12, 61, 124, 792, 78, 395, 72. Only the first one needs to change because the &lt;em&gt;relative&lt;/em&gt; distances between the other properties haven't altered.&lt;/p&gt;
&lt;h3&gt;Property lengths&lt;/h3&gt;
&lt;p&gt;If, instead of just inserting text outside of a property we altered the length of the property itself, say by extending a paragraph labelled with a 'p' (if we use TEI), then with relative offsets only the length of that property and the offset of the following one would change. Let's say that the length of the text covered by property 3 was 5 characters and we extended it to 12, then we'd change the &lt;em&gt;length&lt;/em&gt; of property 3 from 5 to 12 and the &lt;em&gt;offset&lt;/em&gt; of property 4 from 12 to 19 by adding 7 (i.e. 12-5). The other properties preceding property 3 &lt;em&gt;AND&lt;/em&gt; following property 4 would not change.&lt;/p&gt;
&lt;h3&gt;Property names&lt;/h3&gt;
&lt;p&gt;Now let's consider the names of the properties. We can make them multi-lingual and go beyond what TEI can do. Europeans see TEI as based on English texts. (e.g. look at Domenico's objections in Scrittura e filologica nell'era digitale p. 170). Why should we not call 'italics' 'kursiv' if we are Germans? The entire standard encoding scheme is based on English words and concepts. Why do we have to standardise them when we can just let the users choose what they want to call them? Or they can provide translations for their property names so others can read their markup. So instead of explicit names I propose that we have a table of properties at the start of the list:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;1 italics
2 paragraph
3 stage
etc.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then when we want to use the &lt;code&gt;italics&lt;/code&gt; property we just say #1 23 5 &amp;ndash; which means 'property 1 (italics) at relative offset 23 of length 5'. Of course the computer handles all this. We &lt;em&gt;never&lt;/em&gt; see these values directly, only through their representation on the screen via formatted text, not even when we edit them via the GUI.&lt;/p&gt;
&lt;p&gt;Having got the properties into this form we can write a table that provides translations of all the properties in the file into any other languages we choose. And texts marked with '#1' will show up as 'kursiv' for Germans and 'corsivo' for Italians (or المائل for Arabs). TEI can't do this because the English names are burned into the standard.&lt;/p&gt;

&lt;h3&gt;Editing the text in this form&lt;/h3&gt;
&lt;p&gt;Each time we edit the underlying text we have to adjust the standoff properties so that they still correspond. But thanks to the use of relative offsets this is easy. After editing the base text the user commits it to the server. The server computes the differences between the old version and the new one. From this we obtain a set of insertions and deletions.&lt;/p&gt;
&lt;h4&gt;Insertions&lt;/h4&gt;
&lt;p&gt;If we insert text outside of an property or within a property we just follow the rules described above in adjusting the relative offsets and property lengths of any lists of properties that describe that text.&lt;/p&gt;
&lt;h4&gt;Deletions&lt;/h4&gt;
&lt;p&gt;If we delete a bit of text that completely contains a property we delete that property and adjust the relative offset of the next one in the list. If the property's range is only partly deleted (at the start or at the end) we simply adjust its length and also the offset of the next property in the list.&lt;/p&gt;
&lt;p&gt;So, in both cases we can edit the text and its underlying properties quite cleanly.&lt;/p&gt;
&lt;h4&gt;Publishing digital editions&lt;/h4&gt;
&lt;p&gt;If we publish version 1 of a text and someone writes a property list for it, and then we change the text and issue edition 2, then their properties can easily be adjusted using these procedures. So, on requesting a copy of King Lear, the server informs the user that his edition of King Lear is out of date and would he/she like to update it. The updates are performed automatically and the old properties now refer to edition 2.&lt;/p&gt;
&lt;h3&gt;Merging property lists into CorCode&lt;/h3&gt;
&lt;p&gt;Yet another advantage of relative offsets is the ability to merge lists of properties belonging to different versions. Let's say we have 5 versions of Shakespeare's King Lear. We could define properties like stage, speaker, speech, paragraph, line, italics etc for ranges within each version, but like the text these properties would mostly be the same. Tedious. If we had used absolute offsets the lists of properties would all be different because they would contain &lt;em&gt;different&lt;/em&gt; offsets throughout. Just one extra character would change all the absolute offsets from then on, and it would fail to merge. But with relative offsets most of the properties, like the text they describe, will be exactly the same. So we can merge all the property lists into one CorCode to correspond to the one CorTex. And when we apply a new property to one version it will automatically be adopted by all other versions - should we so desire - without having to redefine it for each version separately.&lt;/p&gt;
&lt;h3&gt;Turning overlapping properties into HTML&lt;/h3&gt;
&lt;p&gt;To make all these advantages practical we will have to convert a text marked up in this way into HTML for the browser. But how to do it? There is no hierarachical structure left, it's not XML, we can't use XSLT and the target language &lt;em&gt;IS&lt;/em&gt; a hierarchy. In fact all this has already been done in eComma. How eComma works I don't know so I'll explain how &lt;em&gt;I&lt;/em&gt; would do it.&lt;/p&gt;
&lt;p&gt;We can scan the text and its property lists and &lt;em&gt;deduce&lt;/em&gt; the hierarchical structure. If a property of type &lt;code&gt;line&lt;/code&gt; is always inside a property called &lt;code&gt;speech&lt;/code&gt; we can deduce that we might render that in HTML as lines inside speeches, say as &amp;lt;span&amp;gt; inside &amp;lt;p&amp;gt;, i.e. as &amp;lt;p&amp;gt;&amp;lt;span&amp;gt;...&amp;lt;/span&amp;gt;&amp;lt;/p&amp;gt;. But often in Shakespeare a line is divided between two speakers. Then we can simply break the line up into two lines, &lt;em&gt;because it is most often contained by the speech property&lt;/em&gt;. (If it had been the other way around we would have had to split the &amp;lt;p&amp;gt;...&amp;lt;/p&amp;gt; instead). So we can resolve all cases of overlap and discover hierachical structure from a simple analysis of the properties.&lt;/p&gt;
&lt;p&gt;Translating the properties into HTML tags is also easy. Each HTML file these days comes with a CSS file that tells the browser how to format elements. So if we want to format &lt;code&gt;speech&lt;/code&gt; properties specially we provide a css rule:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;p.speech { text-indent: 5px }&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This tells the browser to indent paragraphs of class 'speech' by five pixels. The neat thing about CorCode is that we can reuse this definition to convert speech properties into paragraphs. We just follow the recipe contained in the CSS rule: speeches are p's of class 'speech'. So we don't need XSLT.&lt;/p&gt;
&lt;h3&gt;Simplication of XML to properties&lt;/h3&gt;
&lt;p&gt;When we convert legacy XML files to CorCode format we have to simplify XML elements with their attributes into property names. So &amp;lt;hi rend="italics"&amp;gt;word&amp;lt;/hi&amp;gt; becomes &lt;code&gt;#1 23 5&lt;/code&gt; (remember we defined '#1' to be italics). So the CSS rules can all be simple and don't have to take account of complex XML attributes. Not all XML properties are as simple as the italics example, but if we provide a list of recipes of how to convert them I think it can be done. For example the TEI coding for a page number looks like this: &lt;code&gt;&amp;lt;pb n="42" ed="1"/&amp;gt;&lt;/code&gt;. The "42" is really text and should be represented in the main text with the property of 'page-number'. The 'ed' attribute is really a version specification and should be expressed in the CorText and its versions. So there's nothing left. TEI markup is complex because it is a mixture of every kind of information we want to include that involves text. But we really need to separate out things like anchors to external images into a separate CorCode file that is handled specially in a GUI. When we focus on actual properties that the text really has, as opposed to programming information, there is not much left to represent.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-2007787430317604326?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/2007787430317604326/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=2007787430317604326' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2007787430317604326'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2007787430317604326'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2011/02/standoff-markup-explained.html' title='Standoff Properties explained'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-8390424207867108393</id><published>2011-02-11T15:26:00.000-08:00</published><updated>2011-02-25T21:45:48.113-08:00</updated><title type='text'>The death of the angle-bracket</title><content type='html'>&lt;p&gt;I was pleasantly surprised to learn that the &lt;a href="http://ecomma.cwrl.utexas.edu/e392k/"&gt;eComma project&lt;/a&gt; uses overlapping properties rather than embedded markup to encode humanities texts. This emboldens me to take a similar approach with my rewrite of the MVD-GUI. For a relatively small effort I can transform ugly bits of XML such as &lt;code&gt;&amp;lt;hi rend="italic"&amp;gt;word&amp;lt;/hi&amp;gt;&lt;/code&gt; into the &lt;em&gt;standoff&lt;/em&gt; property called &lt;code&gt;italics&lt;/code&gt; that applies to a specific range in the text. So to kill off angle brackets for good all I have to do is the following:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;Take a TEI text and use &lt;a href="http://multiversiondocs.blogspot.com/2010/10/electronic-editions-without-embedded.html"&gt;my splitter program&lt;/a&gt; to split the markup from the text for all the versions of a work. This yields as many versions of plain text as versions of markup.&lt;/li&gt;
&lt;li&gt;Simplify the markup: remove attributes by merging them with element names and swapping them for something shorter. And we can have &lt;em&gt;multi-lingual&lt;/em&gt; property-names &amp;ndash; no need to always use English.&lt;/li&gt;
&lt;li&gt;Merge the text of all the versions into a CorTex (MVD) and all the markup into a CorCode. The CorCode is just a list of properties and their ranges in the text, one for each version.&lt;/li&gt;
&lt;li&gt;Design 3 Joomla components: 
&lt;ol type="a"&gt;&lt;li&gt;A formatted view of any chosen version, with expanding/collapsing apparatus.&lt;/li&gt;
&lt;li&gt;Edit the CorCode. A formatted view of the CorTex+CorCode for the currently chosen version: oft-used markup tags on the right as buttons, the rest as a dropdown list. Either just pressing a button or selecting an item from the dropdown and pressing 'apply' would apply that format to the current selection.&lt;/li&gt;
&lt;li&gt;Edit the CorTex. This view is just a text editing box, with possibly an expanding/collapsing apparatus.&lt;/li&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/ol&gt;
&lt;p&gt;That's not too much work, and when it is done users won't have to struggle with complex syntax ever again. In its place a set of simple overlapping properties that automatically format themselves into HTML in the browser. And all steps will be reversible: so we can go back to the XML representation at any stage, with no loss of information (hopefully).&lt;/p&gt;
&lt;p&gt;Here are some mock-ups of how the user interface would look:&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-fGojt351L2s/TVYtRSv7GxI/AAAAAAAAAOQ/epqap_tRH-s/s1600/combined.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 326px; height: 400px;" src="http://1.bp.blogspot.com/-fGojt351L2s/TVYtRSv7GxI/AAAAAAAAAOQ/epqap_tRH-s/s400/combined.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5572691364031437586" /&gt;&lt;/a&gt;
&lt;p align="center"&gt;&lt;em&gt;The Combined view&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is partly implemented in &lt;a href="http://www.digitalvariants.net/mvdgui/index.php?option=com_mvdview"&gt;the new version,&lt;/a&gt; (all browsers) and more fully implemented in the &lt;a href="http://www.digitalvariants.net/mvdgui/index.php?option=com_mvd&amp;view=mvdsingle&amp;Itemid=77"&gt;old version&lt;/a&gt; (markup still embedded, Firefox only).&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-GCVJXv6t0Fw/TVYtU9sgFpI/AAAAAAAAAOY/krw0G0-aLVM/s1600/corecode.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 274px;" src="http://4.bp.blogspot.com/-GCVJXv6t0Fw/TVYtU9sgFpI/AAAAAAAAAOY/krw0G0-aLVM/s400/corecode.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5572691427099416210" /&gt;&lt;/a&gt;
&lt;p align="center"&gt;&lt;em&gt;The CorCode view&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We only have to show the properties present in &lt;em&gt;this&lt;/em&gt; text. Note the language dropdown menu &amp;ndash; this will translate the property names into whatever we provided in the property list. 'clear all' clears all properties from the current selection.&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-QHBj6F5qNy0/TVY7gO4TEmI/AAAAAAAAAO4/jHr-z6cbgR0/s1600/cortex2.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 307px; height: 400px;" src="http://4.bp.blogspot.com/-QHBj6F5qNy0/TVY7gO4TEmI/AAAAAAAAAO4/jHr-z6cbgR0/s400/cortex2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5572707013853647458" /&gt;&lt;/a&gt;
&lt;p align="center"&gt;&lt;em&gt;The CorTex view&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This is just a plain edit text box, although I have enhanced it with a collapsable apparatus showing textual (not formatting) variants. The user simply edits then clicks 'save'. Carriage returns are not passed on to the display so they can be added as desired to lay out the text so it is more readable. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-8390424207867108393?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/8390424207867108393/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=8390424207867108393' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/8390424207867108393'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/8390424207867108393'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2011/02/death-of-angle-bracket.html' title='The death of the angle-bracket'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-fGojt351L2s/TVYtRSv7GxI/AAAAAAAAAOQ/epqap_tRH-s/s72-c/combined.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-5786745266684598187</id><published>2011-02-04T12:35:00.001-08:00</published><updated>2011-02-07T12:40:06.948-08:00</updated><title type='text'>Intelligenza artificiale</title><content type='html'>&lt;p&gt;We had another publication this time in &lt;a href="http://eprints.qut.edu.au/39895/1/39895.pdf"&gt;Intelligenza artificiale&lt;/a&gt;, a journal of the IA*AI (University of Bologna). This is a published version of a conference paper presented by my colleague in Italy several years ago when we were just starting up this multi-version-document thing. So it's kind of interesting historically. I'm content that what we said then is more or less what we say now. In other words the idea appears to be stable.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-5786745266684598187?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/5786745266684598187/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=5786745266684598187' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5786745266684598187'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5786745266684598187'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2011/02/intelligenza-artificiale.html' title='Intelligenza artificiale'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-7162486919367570644</id><published>2011-01-04T15:04:00.000-08:00</published><updated>2011-01-05T00:45:55.717-08:00</updated><title type='text'>Drawing stemmas in php</title><content type='html'>&lt;p&gt;Tree view is part of the MVD-GUI. It shows a phylogenetic tree generated from the MVD variant data. The problem with this is that it uses the &lt;a href="evolution.genetics.washington.edu/phylip.html"&gt;Phylip package&lt;/a&gt; and so uses two commandline C-programs that have to be compiled for the architecture of the server. This makes building an installer for MVD-GUI very difficult. Either I have to ensure that the user has a C-compiler installed, and that it works on their platform with that code, or I have to add a lot of prebuilt binaries to the download, fattening it out nicely. So I gave up on that route.&lt;/p&gt;
&lt;h2&gt;Building a Newick Tree&lt;/h2&gt;
&lt;p&gt;An alternative is to draw the tree directly in php. No one seems to have done this before. The first stage is to draw a text version of the tree. This uses a format called a Newick Tree. It assumes a hierarchy like a true stemma even for an unrooted tree. Here's an example:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;((A2a:0.025555,(A2b:0.006838,A2c:0.014863)I5:0.023295)I3:0.078086,(Ba:0.00787,(Bb:0.041552,(D:0.002623,(Ea:0.032508,Eb:0.063992)I15:0.011577)I13:0.025273)I11:0.006917)I9:0.05633,A1:0.027864)I7;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The Phylip package I was using before uses Fitch to make the text-version of the tree and Fitch is about the slowest algorithm out there for this kind of work. Neighbour Join is a much faster technique but occasionally it produces edges of negative length, and there is no nice way around that. Methods like maximum parsimony are supposed to produce the best trees but they are expensive to run and are character-based (not difference-based like Neighbour Join). They are also designed for genetic sequences, not plain text data. In the end I settled on a technique called &lt;a href="http://www.ncbi.nlm.nih.gov/CBBresearch/Desper/FastME.html"&gt;FastME&lt;/a&gt;, which doesn't produce negative branch-lengths. I translated their program into Java and am about to incorporate it into nmerge.&lt;/p&gt;
&lt;h2&gt;Drawing a tree in php&lt;/h2&gt;
&lt;p&gt;Drawing can be done directly in php using GD. This uses primitive drawing functions to draw into a bitmap image. Since the branch-lengths hold much of the information about the stemma I elected not to use standard force-directed algorithms that obliterate the lengths. But I did adapt force-directed layout to improve the roughly drawn tree. Results are still preliminary but it is starting to look reasonable:&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_GGwOcLYrsVk/TSOsedVRbNI/AAAAAAAAAN0/HTFSjVy9pGU/s1600/tree.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 396px; height: 254px;" src="http://4.bp.blogspot.com/_GGwOcLYrsVk/TSOsedVRbNI/AAAAAAAAAN0/HTFSjVy9pGU/s400/tree.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5558476004375227602" /&gt;&lt;/a&gt;
&lt;p&gt;The idea is to incorporate this view into a revised version of MVD-GUI that I will soon release, that is:
&lt;ol&gt;
&lt;li&gt;Fully debugged and works on several popular browsers&lt;/li&gt;
&lt;li&gt;Installable in Joomla as a single component&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;My ambition when that is finished and all the other views have been added (i.e. compare, view, edit, list and variants) is to port it onto the iPad.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-7162486919367570644?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/7162486919367570644/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=7162486919367570644' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7162486919367570644'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7162486919367570644'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2011/01/drawing-stemmas-in-php.html' title='Drawing stemmas in php'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_GGwOcLYrsVk/TSOsedVRbNI/AAAAAAAAAN0/HTFSjVy9pGU/s72-c/tree.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-5588303719647714463</id><published>2010-11-16T14:46:00.000-08:00</published><updated>2010-11-17T01:52:14.954-08:00</updated><title type='text'>ESTS Pisa 2010</title><content type='html'>&lt;p&gt;So Chicago went well. To my surprise they particularly liked my idea of rolling versions of standoff markup into a separate MVD. Unfortunately I didn't finish the formatter commandline tool and the web demo (see below) and even now they are incomplete. I will continue to develop these tools as soon as I have a free moment, but there are more pressing needs.&lt;/p&gt;
&lt;p&gt;On the 25th we are presenting at the ESTS conference in Pisa, Italy. A lot of people will be there so I am keen to put on a good show. First we will show what we did for Rome in July properly, minus the technical glitches. (Hopefully there will be a better projector setup). And I also want to offer a single and simple installer for the MVD-GUI so that people can try it out. That means:
&lt;ol&gt;&lt;li&gt;Fixing remaining bugs in the finished modules&lt;/li&gt;
&lt;li&gt;Getting an all-in-one installer for my Joomla component and its modules.&lt;/li&gt;
&lt;/ol&gt;
There doesn't seem to be a kosher way to roll up installation of components and modules into &lt;em&gt;one&lt;/em&gt; install package in Joomla, but the authors of &lt;a href="http://www.joomdle.com/wiki/Main_Page"&gt;Joomdle&lt;/a&gt; seem to have worked out a way. I am following their lead, and building something similar for my installer. Disguised as a component, the MVD-GUI will, when finished, actually install one component, three modules and a template. Anyone who can operate Joomla! should be able to do it. For the moment, though, I'll leave out Tree View because installation requires compilation of programs on the server. But we will certainly demonstrate it in Pisa. Of course I'm not going myself (I'm exhausted) but one of my colleagues is going instead. If you want to know who take a look at the &lt;a href="http://www.textualscholarship.eu/conference-2010.html"&gt;programme.&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-5588303719647714463?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/5588303719647714463/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=5588303719647714463' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5588303719647714463'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5588303719647714463'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/11/ests-pisa-2010.html' title='ESTS Pisa 2010'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-3379846286740432917</id><published>2010-10-16T05:07:00.000-07:00</published><updated>2010-11-16T18:23:50.052-08:00</updated><title type='text'>Electronic Editions without Embedded Markup</title><content type='html'>&lt;p&gt;I am going to do a demo for Chicago, which I visit on the 28th for a talk at Loyola University. I want to demonstrate a working method for converting legacy XML files with multiple embedded versions into separate HTML files. A bit like &lt;a href="http://v-machine.org/"&gt;the Versioning Machine&lt;/a&gt;. However, between the XML and the HTML the interim files will be uniquely &lt;em&gt;plain text&lt;/em&gt; and potentially overlapping markup extracted from the XML. These two files could both be combined into separate Multi-Version-Documents so that electronic editions of multi-version texts can be subject to markup which &lt;em&gt;can be freely combined or removed at any time without affecting the text&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;This isn't just standoff markup. I'm using a custom technique called HRIT format, which is based on &lt;a href="http://www.piez.org/wendell/papers/dh2010/index.html"&gt;xLMNL&lt;/a&gt;. This just records a set of &lt;em&gt;ranges&lt;/em&gt; with attributes (garnered from the XML) so potentially any property can overlap with any other property. And these ranges, tied by fixed offsets to the original text, have &lt;em&gt;no&lt;/em&gt; hierarchical structure.&lt;/p&gt;
&lt;p&gt;Cool. But how does this get turned into &lt;em&gt;hierarchical&lt;/em&gt; HTML? That's the tricky part. I will define a simple CSS file, which is interpreted as a recipe for constructing the HTML file. For example, we might have a style definition "p.stage". This would mean that we should generate a paragraph (&amp;lt;p class="stage"&amp;gt;...&amp;lt;/p&amp;gt;) for all ranges called "stage", and apply the formats of the p.stage style definition. The beauty of this is that the same CSS file can be used both for formatting the HTML and for generating it. Now &lt;em&gt;that's&lt;/em&gt; cool. Here's an outline of the demo (I'll tick them off as I do them):&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;1. encode Act 1, scene 2 of King Lear all versions as ONE TEI-XML file, using parallel segmentation. ✓ &lt;font color="green"&gt;Done&lt;/font&gt;, at least first three folio versions. But I should really include the quartos too.&lt;/li&gt;
&lt;li&gt;Write a simple C-program to separate out the versions called &lt;code&gt;splitter&lt;/code&gt;. This produces copies of each version as separate TEI-XML files. ✓ &lt;font color="green"&gt;Done.&lt;/font&gt;&lt;/li&gt;
&lt;li&gt;Strip out the markup from these files with &lt;code&gt;stripper&lt;/code&gt; - another C-program. This produces 2 files for each XML file:&lt;ol type="a"&gt;&lt;li&gt;the original text, stripped of all markup. &lt;/li&gt;&lt;li&gt;The markup expressed in HRIT standoff format with coordinates for where it is now in the &lt;em&gt;plain text&lt;/em&gt;. ✓ &lt;font color="green"&gt;Done&lt;/font&gt;.&lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;
&lt;li&gt;Write a simple CSS stylesheet and another C-program &lt;code&gt;formatter&lt;/code&gt; that takes the standoff markup from step 3b and recombines it with 3a using the stylesheet definitions into HTML. This is the most complex program: it needs to parse CSS in a superficial way only and use definitions of the type &lt;code&gt;element.class&lt;/code&gt; to construct the HTML. The &lt;code&gt;class&lt;/code&gt; will be the name of an XML element and the &lt;code&gt;element&lt;/code&gt; will be the HTML element name. Then the program need only dumbly create elements for the given ranges. Since it was originally nested the result will also be nested. (This was in fact a requirement for XSLT to do the same work.) What we will eventually need is more flexiblity in the program later that can handle more intelligently the nesting property. ✓ &lt;font color="green"&gt;Done, but needs further extension.&lt;/font&gt;&lt;/li&gt;
&lt;li&gt;Display the result of one version in the browser. ✓ &lt;font color="green"&gt;Done&lt;/font&gt;&lt;/li&gt;
&lt;li&gt;Write a simple interactive web program consisting of a web page and some Javascript. Divide the page into two parts. On the right a few of the most common properties as buttons: paragraph, speaker, speech, etc. Less common properties can be selected from a dropdown menu and a button to apply the property. On the left the raw text of King Lear. Now select a bit of the text and press a button. This sends the selection to the server, which adds a format to that range, calls the &lt;code&gt;formatter&lt;/code&gt; program to change the HTML, then refreshes the page, so the text formats &lt;em&gt;interactively&lt;/em&gt;. For the server just use apache+PHP, and call the commandline tools via exec. &lt;font color="orange"&gt;In progress&lt;/font&gt;&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;Once this is incorporated into the MVD-GUI (in place of the XSLT step that currently transforms the XML of the versions into HTML) we will have an electronic edition that is truly free of embedded markup!&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-3379846286740432917?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/3379846286740432917/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=3379846286740432917' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3379846286740432917'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3379846286740432917'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/10/electronic-editions-without-embedded.html' title='Electronic Editions without Embedded Markup'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-9051086393031633111</id><published>2010-07-27T16:33:00.000-07:00</published><updated>2010-07-27T16:46:47.851-07:00</updated><title type='text'>Greek MVDs</title><content type='html'>&lt;p&gt;Having come from a background in classics it came as a shock to get a recent query about ancient Greek texts. And of course behind the problem was a bug. What nmerge actually does is merge a set of versions on their &lt;em&gt;byte&lt;/em&gt; not character boundaries, and one consequence of this is that in some encodings characters can get split. When you read an entire version this isn't a problem, but what if you want to compute the variants of a text? Then the bits that vary might be only half a character. This can play havoc with encodings like UTF-8 when used to encode anything more complex than English. So Greek was an acid test, and although it initially wasn't good enough, I fixed the problem by migrating the half-characters after the alignment to the correct 'side' of the arc. So no more splits.&lt;/p&gt;
&lt;p&gt;Here's the output of issuing an &lt;code&gt;nmerge -c variants&lt;/code&gt; command on two versions of Athenaeus' &lt;i&gt;Deipnosophists,&lt;/i&gt; with version 'A' as the base:&lt;pre&gt;&lt;code&gt;[B:συντετάσθαι]
[B:ὁ]
[B:ἔφη, ὥρα]
[B:κἂν]
[B:διαστησῶμεθ’]
[B:τραγῳδίαν·]
[B:αἰνίγμασιν· ἱκανῶς]&lt;/code&gt;&lt;/pre&gt;
I've begun to realise though that what this needs is a reference system so the user can relate the apparatus to the text. Which is more work, of course.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-9051086393031633111?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/9051086393031633111/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=9051086393031633111' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9051086393031633111'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9051086393031633111'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/07/greek-mvds.html' title='Greek MVDs'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-3781522572175679076</id><published>2010-07-17T03:43:00.000-07:00</published><updated>2010-07-18T15:20:44.361-07:00</updated><title type='text'>Improvements to Apparatus</title><content type='html'>&lt;p&gt;One advantage of Multi-Version-Documents is that generating an apparatus is so easy. There is just a simple command in nmerge: you specify the range (offset and length), the desired base text, and it then computes the traditional 'apparatus' type display of all the variants aligned on word-boundaries. Maybe it's old-fashioned and print-related but it does show you many versions of the text in a very compact way. So I think it's still useful.&lt;/p&gt;
&lt;p&gt;The problem I had been struggling with for the past couple of weeks was how to ensure that this range &lt;em&gt;in the MVD&lt;/em&gt; could be specified precisely via a selection &lt;em&gt;in the GUI.&lt;/em&gt; Of course what the user sees is not the contents of an MVD. It is extracted and transformed via XSLT (at the moment) and the user selection in HTML bears no clear relation to the corresponding selection in the underlying data. The problem boils down to aligning the XML and HTML versions of the text fast enough for the user not to notice. There are plenty of techniques for doing this, but they all take &lt;em&gt;waaaay&lt;/em&gt; too long. I wanted it in fractions of a second. After perhaps the sixth try my new method finds the correct answer in around &lt;em&gt;28 milliseconds&lt;/em&gt; for the King Lear example in slow old PHP. What is perhaps most annoying is that the method I used was incredibly simple. It's just 58 lines of code. Strange that you can never see the simple things that are right under your nose. :-) And when you finally have the answer you can't explain why it didn't occur to you earlier.&lt;/p&gt;
&lt;p&gt;If anyone really wants to know how I did it they can download the MVD_GUI code to find out. I'm not going to bore you all with technical details here. You might have to wait until I update the Google code site.&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_GGwOcLYrsVk/TEGM3Nww9BI/AAAAAAAAAMI/0j43Wp-dRdk/s1600/Screenshot-1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 155px;" src="http://4.bp.blogspot.com/_GGwOcLYrsVk/TEGM3Nww9BI/AAAAAAAAAMI/0j43Wp-dRdk/s400/Screenshot-1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5494827900583605266" /&gt;&lt;/a&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_GGwOcLYrsVk/TEGMzg8TrfI/AAAAAAAAAMA/VyLkHl3O87s/s1600/Screenshot.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 133px;" src="http://1.bp.blogspot.com/_GGwOcLYrsVk/TEGMzg8TrfI/AAAAAAAAAMA/VyLkHl3O87s/s400/Screenshot.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5494827837012815346" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-3781522572175679076?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/3781522572175679076/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=3781522572175679076' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3781522572175679076'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3781522572175679076'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/07/improvements-to-apparatus-display.html' title='Improvements to Apparatus'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_GGwOcLYrsVk/TEGM3Nww9BI/AAAAAAAAAMI/0j43Wp-dRdk/s72-c/Screenshot-1.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-2168376913409904202</id><published>2010-07-09T11:25:00.000-07:00</published><updated>2010-07-09T11:38:31.580-07:00</updated><title type='text'>Tree View</title><content type='html'>&lt;p&gt;Tree View is finally working. What this does is compute the genealogical tree of a set of versions. Although this is normally of use mostly for manuscript traditions, I believe that it is also useful for printed works. It can show at a glance the relationships between texts that make up a work. Previous attempts to do this (by others) were based on collation output and didn't take account of invariants, only variants. I think this casts doubt on the accuracy of the result. Also, rather than being offline and manual this method is online and automatic. There's a basic zoom facility which is useful for the larger trees. Changing any of the options recomputes the tree. Check it out at &lt;a href="http://www.digitalvariants.net/harpur"&gt;Harpur.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Here's a small sample from the DV website. Relationships between 9 texts of Vicenzo Cerami's the Serpent Woman. This was published in a newspaper (so it's kind of a print tradition) and the author made available the pre-texts in the form of edited drafts. The length of branches is significant (it indicates the distance between versions), but in case this gets confusing you can make all the lengths the same.&lt;/p&gt;
&lt;p align="center"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_GGwOcLYrsVk/TDdrkDEUHCI/AAAAAAAAAL4/R5enEm0cLGY/s1600/jpg9I6i65.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 327px;" src="http://2.bp.blogspot.com/_GGwOcLYrsVk/TDdrkDEUHCI/AAAAAAAAAL4/R5enEm0cLGY/s400/jpg9I6i65.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5491976537644473378" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you are wondering how I produce this online the process is basically:
&lt;ol&gt;&lt;li&gt;Query the MVD to produce a difference matrix (edit distance of each version from each other version)&lt;/li&gt;
&lt;li&gt;Pipe the result into the Fitch-Margoliash tree-building program from Phylip.&lt;/li&gt;
&lt;li&gt;Pipe the result into drawtree from Phylip. This outputs a postscript version of the diagram.&lt;/li&gt;
&lt;li&gt;Pipe the result of that into Ghostscript to produce a temporary JPG file, which you can view.
All this is done by executing a succession of binaries using exec() in PHP. I had to adapt fitch and drawtree extensively to get this to work with pipes. Fitch chokes a bit on the biggest tree (Sibylline Gospel), but that's to be expected. It does work, though.&lt;/li&gt;&lt;/ol&gt;
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-2168376913409904202?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/2168376913409904202/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=2168376913409904202' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2168376913409904202'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2168376913409904202'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/07/tree-view.html' title='Tree View'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_GGwOcLYrsVk/TDdrkDEUHCI/AAAAAAAAAL4/R5enEm0cLGY/s72-c/jpg9I6i65.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-2820754628141729705</id><published>2010-05-21T15:08:00.000-07:00</published><updated>2010-05-21T15:13:42.759-07:00</updated><title type='text'>Cross-browser compatibility</title><content type='html'>&lt;p&gt;I've fixed the incompatibility with lesser browsers. It's still not perfect but Chrome and Safari now also work, although none is quite as good as Firefox yet. I've also uploaded a very alpha incomplete version of &lt;a href="http://code.google.com/p/digitalvariants/downloads/list"&gt;the mvd component and modules for Joomla!&lt;/a&gt; There's no friendly installer but a README.txt that explains basically how to get it going. This is something to build on, but it does work as far as it goes.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-2820754628141729705?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/2820754628141729705/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=2820754628141729705' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2820754628141729705'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2820754628141729705'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/05/cross-browser-compatibility.html' title='Cross-browser compatibility'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-7324551595500993663</id><published>2010-05-20T23:52:00.001-07:00</published><updated>2010-05-30T13:19:57.409-07:00</updated><title type='text'>Syncro-scroll, I love you</title><content type='html'>&lt;p&gt;I don't know if anyone else has done this before. I suppose they have. What my Compare view now does is synchronise the left and right scrolling divs, even if they differ in length and content. With the Alpha web application I had something similar, but the user had to click on the text of one side, and the corresponding text on the other side scrolled down. This was done with hyperlinks, but it was a bit inconvenient because it required the user to actively click to get the two columns in alignment. Now I have a more discrete method with less 'excise' (as Cooper and Reiman would say). All the user has to do is scroll down or up in &lt;em&gt;either&lt;/em&gt; div and the text on the other side automatically maintains absolute alignment with the scrolling side, even if the two texts are radically different in length. That seems to be all the user needs to pick up the corresponding text on the other side. The eye runs across, naturally in the middle of the frame, to the other side, and there is the text in its different &lt;em&gt;con&lt;/em&gt;text. I'm so delighted with it, and it just took 100 lines or so of simple Javascript.&lt;/p&gt;
&lt;p&gt;CPU usage in Firefox is also good. When at rest the browser just shows the usual 0-4% activity. When you scroll continuously, usage can go as high as 13% momentarily, because it has to traverse the entire DOM tree every 1/2 a second. But as soon as you stop the Javascript detects no change and efficiently skips over most of the code.&lt;/p&gt;
&lt;p&gt;I've added it to the &lt;a href="http://www.digitalvariants.net/harpur/"&gt;Harpur site,&lt;/a&gt; where you can see it in action (no point in adding a lifeless screen dump here). I've tested it so far on Firefox and IE, which doesn't quite get the alignment right.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-7324551595500993663?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/7324551595500993663/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=7324551595500993663' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7324551595500993663'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7324551595500993663'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/05/syncro-scroll-i-love-you.html' title='Syncro-scroll, I love you'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-3133548272774312886</id><published>2010-05-10T16:22:00.000-07:00</published><updated>2010-05-11T16:30:28.836-07:00</updated><title type='text'>MVD Joomla Component</title><content type='html'>&lt;p&gt;I've decided to release a very preliminary version of the Joomla Component I am developing. It provides a GUI front end to nmerge. The point of development it has reached is that it has a single view, an import page and a list of available MVDs. It hasn't been extensively debugged but I just want to get something up there so people can see that it is being developed and if they really want they can try it out, warts and all, and as incomplete as it is. As soon as I can I'll post a new project on Googlecode and provide a link to it from here. You can already see it in action on &lt;a href="http://www.digitalvariants.net/harpur/index.php?option=com_mvd&amp;view=MVDSingle&amp;name=kinglear&amp;version1=1"&gt;the Harpur Site.&lt;/a&gt; For the record, here are some screen-dumps of single-view with the windowbox facility.&lt;/p&gt;
&lt;p&gt;Windowbox, when expanded, updates automatically as you scroll. You can also find the variants of any passage by selecting it. In order to do that it embeds invisible markers in the text which are used to tell nmerge where the selection begins and ends. These markers have a resolution of 256 bytes, so precise variation is not obtainable by this method. Further improvement is possible, but it is hard to derive the exact selection of the original text from the HTML version of it, and to do that consistently across platforms.&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_GGwOcLYrsVk/S-iWdV3bUGI/AAAAAAAAALA/CabltvUwG1Q/s1600/single-view-collapsed.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 329px; height: 400px;" src="http://3.bp.blogspot.com/_GGwOcLYrsVk/S-iWdV3bUGI/AAAAAAAAALA/CabltvUwG1Q/s400/single-view-collapsed.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5469787178271461474" /&gt;&lt;/a&gt;
&lt;p align="center"&gt;Single View with collapsed Windowbox&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_GGwOcLYrsVk/S-iZMj0HA6I/AAAAAAAAALQ/k1SY8YCjUQQ/s1600/single-view-expanded.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 332px; height: 400px;" src="http://4.bp.blogspot.com/_GGwOcLYrsVk/S-iZMj0HA6I/AAAAAAAAALQ/k1SY8YCjUQQ/s400/single-view-expanded.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5469790188492751778" /&gt;&lt;/a&gt;
&lt;p align="center"&gt;Single View with expanded Windowbox&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-3133548272774312886?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/3133548272774312886/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=3133548272774312886' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3133548272774312886'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3133548272774312886'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/05/mvd-joomla-component.html' title='MVD Joomla Component'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_GGwOcLYrsVk/S-iWdV3bUGI/AAAAAAAAALA/CabltvUwG1Q/s72-c/single-view-collapsed.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-3453462262537856214</id><published>2010-04-28T16:06:00.000-07:00</published><updated>2010-04-28T16:51:09.629-07:00</updated><title type='text'>Revised Variants Command</title><content type='html'>&lt;p&gt;The revised variants command for nmerge now works like this: You specify a range with a particular version and it computes all the variants that leave and rejoin that path. Mathematically it is very simple. Unfortunately, variants must be aligned on word-boundaries. It doesn't make sense to compute them on character boundaries (as they are of necessity in the MVD). If you did that you would end up with variants like 'Q1:a' and have no idea what the context of this 'a' in version 'Q1' is. The problem is that in extending the variant to its natural word boundaries, you can of course encounter more variation. This means that you can end up duplicating variants. To get around this several fixes were required:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;Equal variants in different versions can be merged. So 'Q1:Map' and 'Q2:Map' becomes 'Q1,Q2:Map'. Cool.&lt;/li&gt;
&lt;li&gt;A variant can also be part of another variant. The versions are the same, so you just drop the smaller variant.&lt;/li&gt;
&lt;li&gt;Because of imperfections in the nmerge program a 'variant' can have the same text as the base version. In this case each computed variant is compared with the text of the equivalent base version and dropped if it is the same.&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;Getting all that working has taken a month. Here's the output of &lt;code&gt;nmerge -c variants -m kinglear.mvd -o 2000 -k 100 -v 1&lt;/code&gt; (variants in King Lear, base version 1, at offset 2000, length of range = 100):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[Q1:for,]
[Q1,Q2:mother]
[F3,F4:fair,]
[F2,Q1,Q2:faire,]
[Q1,Q2:&amp;amp;]
[F2,F3,F4:whorson]
[Q1,Q2:whoreson]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's the original 6 versions that contain these variants. Note that the initial 'r:' gets extended back to the first word-boundary and is in fact 'for:':&lt;/p&gt;
&lt;p&gt;F1: r: yet was his Mother fayre, there was good sport at his making, and the horson must be acknowledged&lt;br&gt;
F2: r: yet was his Mother faire, there was good sport at his making, and the whorson must be acknowledged&lt;br&gt;
F3: r: yet was his Mother fair, there was good sport at his making, and the whorson must be acknowledged&lt;br&gt;
F4: r: yet was his Mother fair, there was good sport at his making, and the whorson must be acknowledged&lt;br&gt;
Q1: r, yet was his mother faire, there was good sport at his making, &amp;amp; the whoreson must be acknowledged&lt;br&gt;
Q2: r yet was his mother faire, there was good sport at his making, &amp;amp; the whoreson must be acknowledged&lt;/p&gt;
&lt;p&gt;Now you might say that a collation program could do as much. Yet I don't think so. In a collation program you have to collate the entire text of all the versions against the chosen base text to get that output, then sift through it to find the right location. Nmerge computes variants over ranges in the base text - actually it &lt;em&gt;reads&lt;/em&gt; them from the MVD. And the base version can be changed at will. This makes it possible to display variants dynamically in a GUI.&lt;/p&gt;
&lt;p&gt;Now all I have to do is call this via Ajax from the Joomla GUI. I'll need to filter it so that residual tags and entities get turned into something useful. Time, though, is beginning to run out.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-3453462262537856214?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/3453462262537856214/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=3453462262537856214' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3453462262537856214'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3453462262537856214'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/04/revised-variants-command.html' title='Revised Variants Command'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-6844673692680757722</id><published>2010-04-19T02:30:00.000-07:00</published><updated>2010-04-28T16:40:22.647-07:00</updated><title type='text'>Inadequacy of Embedded Markup</title><content type='html'>My paper on &lt;a href="http://llc.oxfordjournals.org/cgi/content/abstract/fqq007?
ijkey=ilzrEgphmlEtphb&amp;keytype=ref"&gt;'The Inadequacy of Embedded Markup for Cultural Heritage Texts'&lt;/a&gt; has just been published online by Literary and Linguistic Computing. It should be interesting to see what people make of it. It's not good to criticise, but sometimes if you don't the opposition will just keep saying that what we already have is good enough. And I'm tired of that.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-6844673692680757722?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/6844673692680757722/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=6844673692680757722' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6844673692680757722'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6844673692680757722'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/04/inadequacy-of-embedded-markup.html' title='Inadequacy of Embedded Markup'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-9053169297537446577</id><published>2010-03-24T20:27:00.000-07:00</published><updated>2010-03-25T14:44:58.242-07:00</updated><title type='text'>Viewing Variants</title><content type='html'>&lt;p&gt;One of the things that came out of the BookLogic seminar was the suggestion that the single view of the old Alpha application could be enhanced by adding a 'Windowbox' at the bottom of the window. This would display variants for sections of the text visible in the window or all of it. But how would it work?&lt;/p&gt;
&lt;p&gt;The problem is that &lt;em&gt;nmerge&lt;/em&gt; currently computes the innermost variants of a range of text in a specific version. Since there may be no variants for that stretch of text, it expands the selection outwards until it finds two points where at least one variant joins the selection in the specified version. Since it only expands outwards, this strategy can miss variants that occur &lt;em&gt;within&lt;/em&gt; the specified range:&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_GGwOcLYrsVk/S6rZPG6VKpI/AAAAAAAAAKo/7LRW3azRQUk/s1600/fig4-19.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 246px;" src="http://2.bp.blogspot.com/_GGwOcLYrsVk/S6rZPG6VKpI/AAAAAAAAAKo/7LRW3azRQUk/s400/fig4-19.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5452409152461941394" /&gt;&lt;/a&gt;
&lt;p&gt;This is not how the &lt;em&gt;apparatus criticus&lt;/em&gt; operates. Instead we really should specify a much broader range and then first compute the points to which other versions attach themselves to or split off from the specified version. At each pair of such points sharing a set of versions we would simply print out the variants. The &lt;a href="http://etjanst.hb.se/bhs/ith/4-00/md.htm"&gt;'Drowning By Versions'&lt;/a&gt; problem can be reduced by limiting the variants to those that split off from &lt;em&gt;and rejoin&lt;/em&gt; within the specified range and version. &lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_GGwOcLYrsVk/S6vUMopypCI/AAAAAAAAAK4/uTxzH2SMQyY/s1600/variants.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 394px; height: 55px;" src="http://2.bp.blogspot.com/_GGwOcLYrsVk/S6vUMopypCI/AAAAAAAAAK4/uTxzH2SMQyY/s400/variants.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5452685087398339618" /&gt;&lt;/a&gt;
&lt;p&gt;In the sketch above the variants of the selected pink region are B: 'white' for A: 'brown', and B: 'rabbit leaps' for A: 'fox jumps'. The variant 'horse walks around' in version C is disregarded because it does not start &lt;em&gt;and&lt;/em&gt; end in the selected region of A.&lt;/p&gt;
&lt;p&gt;In addition, the default selected range could be reduced to a narrow strip in the centre of the current window. In highly varying texts, Windowbox might only look for variants of only one line, but not less. This setting would be configurable globally as a parameter for the Joomla component, say the central 50% of the window by default. And if the user selected some specific text, it would automatically update Windowbox with the variants &lt;em&gt;of the selection.&lt;/em&gt; Windowbox should also be a collapsable element on the page, so the user can get a clean view of the reading text at any time.&lt;/p&gt;
&lt;p&gt;I am really getting close to a first release of the Joomla component. It will only do import, list texts and view single texts, but it will give people a flavour of what it can do and hopefully generate some feedback.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-9053169297537446577?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/9053169297537446577/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=9053169297537446577' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9053169297537446577'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9053169297537446577'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/03/viewing-variants.html' title='Viewing Variants'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_GGwOcLYrsVk/S6rZPG6VKpI/AAAAAAAAAKo/7LRW3azRQUk/s72-c/fig4-19.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-3070973664341502875</id><published>2010-03-14T16:59:00.000-07:00</published><updated>2010-03-25T14:50:34.794-07:00</updated><title type='text'>Synchronised scrolling of parallel texts</title><content type='html'>&lt;p&gt;The best feature in the old Alpha web application was twin-view: it showed two parallel texts that aligned automatically when you clicked on the black (i.e. the &lt;em&gt;same&lt;/em&gt;) text in either version. It does this by secretly writing each piece of black text as a hyperlink that calls a Javascript function instead of a link. For example, the picture below shows the alignment &lt;em&gt;after&lt;/em&gt; the user has clicked on the blue-highlighted text:&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_GGwOcLYrsVk/SFZBg-Q5JdI/AAAAAAAAACw/NbeteI3j3I4/s1600-h/synchronise2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_GGwOcLYrsVk/SFZBg-Q5JdI/AAAAAAAAACw/NbeteI3j3I4/s400/synchronise2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5212425653453399506" /&gt;&lt;/a&gt;
&lt;p&gt;That was fairly cool, but what I was &lt;em&gt;really&lt;/em&gt; trying to do was to automatically scroll down one column as the user scrolls in the other column. I didn't think it was possible, but today I found &lt;a href="http://codepunk.hardwar.org.uk/ajs02.htm"&gt;a link on good old Google&lt;/a&gt; that uses a timed Javascript routine that is called every 1/4 of a second and checks how far the user has scrolled, then sets the colour of the background accordingly. Rather than setting the background colour the two texts can be aligned every 1/4 of a second instead. With this new method &lt;em&gt;the user won't have to do anything except scroll&lt;/em&gt; and shouldn't even notice the slight jerkiness from the 1/4 second updates. Showing &lt;em&gt;which&lt;/em&gt; lines are currently in alignment could be achieved as in the old method by highlighting the two pieces of black text on either side. In fact it's really irritating that I didn't think of this before.&lt;/p&gt;
&lt;h3&gt;Single View&lt;/h3&gt;
&lt;p&gt;Single view is coming along just fine. I'm working on porting the search box code but everything else already works in Joomla:&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_GGwOcLYrsVk/S51_gPhP1PI/AAAAAAAAAKg/E0C7JbcQWbs/s1600-h/pic.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 331px;" src="http://3.bp.blogspot.com/_GGwOcLYrsVk/S51_gPhP1PI/AAAAAAAAAKg/E0C7JbcQWbs/s400/pic.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5448651316086691058" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-3070973664341502875?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/3070973664341502875/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=3070973664341502875' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3070973664341502875'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3070973664341502875'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/03/synchronised-scrolling-of-parallel.html' title='Synchronised scrolling of parallel texts'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_GGwOcLYrsVk/SFZBg-Q5JdI/AAAAAAAAACw/NbeteI3j3I4/s72-c/synchronise2.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-7461391516351174409</id><published>2010-03-10T13:54:00.000-08:00</published><updated>2010-03-10T17:37:08.295-08:00</updated><title type='text'>Progress on Joomla GUI</title><content type='html'>&lt;p&gt;The Joomla GUI, which is a replacement for the Alpha web application, is progressing nicely. Although technically I only have incomplete views for listing and choosing the texts, importing into an MVD, and viewing a single version, quite a bit of what is to follow is just an adaptation of the old Alpha application's XSL stylesheets. As a result I expect good progress over the next couple of weeks. It doesn't look like I will have all I wanted for the Book Logic workshop in Sydney after all.&lt;/p&gt;
&lt;p&gt;The audience is expected to number around 50 or so, mostly bibliographers, including some famous ones, so it should be a good test. But the majority will be non-technical editorial types so the real challenge will be to get across my ideas to them. Having explained this before I already know that it takes quite a while before people understand the key concepts. But everyone who &lt;em&gt;has&lt;/em&gt; listened so far, even starting from a position of extreme scepticism, has in the end always conceded that this is at least an intriguing idea. On the back of that experience, then, conveying anything to 50 people in the space of 10 minutes seems a bit daunting.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-7461391516351174409?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/7461391516351174409/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=7461391516351174409' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7461391516351174409'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7461391516351174409'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/03/prorgess-on-joomla-gui.html' title='Progress on Joomla GUI'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-2284093790833599633</id><published>2010-02-11T15:29:00.000-08:00</published><updated>2010-02-12T23:50:57.120-08:00</updated><title type='text'>A Native Version of nmerge</title><content type='html'>&lt;p&gt;Nmerge allows you to merge a number of versions into one file, a multi-version document. But it is written in Java, and so it is difficult to call it most hosted web servers, which don't support Java. Although only a possibility at this stage, gcj (the Gnu Java compiler) can generate native code out of Java. If this works I won't need to port my nmerge code to C++ and I can keep developing in Java with all the benefits of a higher level and more modern language. Gcj is a bit hard to use and always seems out of date, particularly the Swing GUI stuff, so I won't know if it really works until I try to compile the whole thing. But as there is no GUI associated with nmerge, it should work. I'll be trying to do this in the next few days, because I need it for the Booklogic demo, and I'll post another note when it works/doesn't work.&lt;/p&gt;
&lt;h3&gt;But does it work?&lt;/h3&gt;
&lt;p&gt;Actually, no. At least not gcj. First off, it's only 1.5 compatible. Secondly even that doesn't work. After careful testing I discovered that gcj can't read files properly, which is kind of fundamental. And I can't afford the Russian product that is supposed to work very well. So, it's back to rewriting nmerge in C++. Oh well.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-2284093790833599633?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/2284093790833599633/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=2284093790833599633' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2284093790833599633'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2284093790833599633'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/02/native-version-of-nmerge.html' title='A Native Version of nmerge'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-9049347916808475440</id><published>2010-02-07T13:31:00.001-08:00</published><updated>2010-02-07T14:15:04.833-08:00</updated><title type='text'>Masterclass Book Logic</title><content type='html'>&lt;p&gt;I thought I'd describe my preparations for the BOOK LOGIC Master Classes and Symposium
in Sydney, 19–20 March 2010. I took a risk by saying that I'd have something ready to demonstrate by then, but I find that deadlines have a way of inspiring one to get something done. Here are the views I want completed for Sydney. The current test version of this is on the Harpur site. (Only the list and import views are yet done, and only partly.)&lt;/p&gt;
&lt;h3&gt;List View&lt;/h3&gt;
&lt;p&gt;The list view, which just lets the user select or delete an MVD, and to create new empty ones. It also lets you categorise them by putting them into folders (and moving them from folder to folder). But you have to log in first to get that utility. This is mostly done, even though the new file button doesn't yet work probably because the external call to the nmerge tool doesn't yet work in PHP for some reason. Almost every digital text archive has need of such a facility, but usually all they have is a long list of HTML links that the user has to read through to find what they want. In the yet-to-be-done Find View the user will be able to locate a particular text by name, description or content and have it selected in this view.&lt;/p&gt;
&lt;h3&gt;Import View&lt;/h3&gt;
&lt;p&gt;The goal of Import view is to allow almost any kind of text to be loaded after being automatically cleaned up, in the vein of HTMLTidy. Ordinary TEI texts should be usable, as well as plain text. At the moment, though, for the Book Logic demo, I'm only going to implement the basic import facility of nmerge: that is, the texts have to be lightly marked up TEI already used for the current texts in the Harpur and Digital Variants archives. I guess I should publish that format online sometime soon.&lt;/p&gt;
&lt;h3&gt;Tree View&lt;/h3&gt;
&lt;p&gt;Tree View will I hope win over many converts to MVD, by showing them the potential benefits of a format in which all versions of a work are in one file. It just uses the data in an MVD to build a phylogenetic tree or stemma using bioinformatics software. It shows the relationship between the various versions of a work and will expose the options of the tree-generating program to the user, so you can configure the view to suit your own tastes or research interests. No one else has quite this facility online yet that I know of, although tree views of literary texts have of course been done many times before. Here's an example generated by Phylip from the 36 versions of the Sibylline Gospel. This can also be useful even for modern texts.&lt;/p&gt;
&lt;p align="center"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_GGwOcLYrsVk/S284mfWINWI/AAAAAAAAAKY/sAfdbDyUnvM/s1600-h/dendro.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 340px; height: 400px;" src="http://3.bp.blogspot.com/_GGwOcLYrsVk/S284mfWINWI/AAAAAAAAAKY/sAfdbDyUnvM/s400/dendro.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5435625509159974242" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;Single View&lt;/h3&gt;
&lt;p&gt;Single view just shows the current text of interest and lets the user read a version of choice, which he/she can change by selecting it from a dropdown list.&lt;/p&gt;
&lt;h3&gt;Compare View&lt;/h3&gt;
&lt;p&gt;Compare view shows two texts side by side and highlights the differences. By clicking on one bit of text on one side the other side scrolls down to bring them into sync. This already works on the Alpha web application, but it has been much appreciated and I thought I would include it in the demonstration in Sydney. Not all version comparison can be done this way - sometimes you need to compare more than two texts, but it's still an undeniably useful view.&lt;/p&gt;
&lt;p&gt;Well, that's the plan. Obviously there is still a lot to be done after that to make com_mvd a usable tool, but that's all I'm going to show on the day. And maybe I won't get it all done. Let's see. The workshop is only five weeks away!&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-9049347916808475440?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/9049347916808475440/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=9049347916808475440' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9049347916808475440'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9049347916808475440'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2010/02/current-state-of-commvd.html' title='Masterclass Book Logic'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_GGwOcLYrsVk/S284mfWINWI/AAAAAAAAAKY/sAfdbDyUnvM/s72-c/dendro.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-2803886089485224217</id><published>2009-12-10T11:01:00.001-08:00</published><updated>2009-12-10T14:45:29.491-08:00</updated><title type='text'>LLC paper accepted in record time</title><content type='html'>&lt;p&gt;My LLC paper, the 9,000 word one about the inadequacy of markup for cultural heritage texts, has been accepted, exactly 30 days after it was submitted. I was expecting a wait of two years or so; this approval makes it the fastest paper I have ever had accepted. Who says the humanities are sleepy? Obviously they thought this was important enough to approve straight away. And I don't think the reviewers were &lt;em&gt;bored.&lt;/em&gt; Unlike many papers in the field it doesn't concentrate on a small specialised area - e.g. the digitisation of one author's works, but on the digitisation of all of them. A pizza and beer tonight.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-2803886089485224217?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/2803886089485224217/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=2803886089485224217' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2803886089485224217'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2803886089485224217'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/12/llc-paper-accepted-in-record-time.html' title='LLC paper accepted in record time'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-5616106406635402364</id><published>2009-12-06T21:51:00.000-08:00</published><updated>2010-02-07T13:30:47.779-08:00</updated><title type='text'>Progress Table for MVD Joomla Components</title><content type='html'>&lt;p&gt;As I did for nmerge I have drawn up a table for the various components of the Joomla! solution. I'll update this as each component is completed. Red means not done, yellow means partly written, orange means completely written and working but not tested, green means tested. At the current rate of progress this application will take until June to be fully finished and tested, and perhaps even that is optimistic.&lt;/p&gt;
&lt;p&gt;The structure is that there will be one component: mvd, which will have a number of views, and one plugin:&lt;/p&gt;
&lt;table border="1"&gt;
&lt;tr&gt;&lt;td&gt;view mvd_list&lt;/td&gt;&lt;td&gt;View to display a list of available MVDs. Allow user to create new MVDs and delete old ones. Open MVDs for viewing in various ways.&lt;/td&gt;&lt;td width="20%" style="background-color: orange"&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;view editversions&lt;/td&gt;&lt;td&gt;View to allow user to edit version information for a given MVD.&lt;/td&gt;&lt;td width="20%" style="background-color: red"&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;view twin&lt;/td&gt;&lt;td&gt;View two versions of an MVD side by side.&lt;/td&gt;&lt;td width="20%" style="background-color: red"&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;view single&lt;/td&gt;&lt;td&gt;View a single version of an MVD.&lt;/td&gt;&lt;td width="20%" style="background-color: red"&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;view msedit&lt;/td&gt;&lt;td&gt;Edit the source text for a version next to the relevant facsimile page. (NEW)&lt;/td&gt;&lt;td width="20%" style="background-color: red"&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;view singledit&lt;/td&gt;&lt;td&gt;Edit the transcription without the accompanying facsimile.&lt;/td&gt;&lt;td width="20%" style="background-color: red"&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;view tree&lt;/td&gt;&lt;td&gt;View the genealogy of the versions of an MVD as a phylogenetic tree or stemma. (NEW)&lt;/td&gt;&lt;td width="20%" style="background-color: red"&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;plugin_search&lt;/td&gt;&lt;td&gt;Indexed search plugin for all pages that require it. (NEW)&lt;/td&gt;&lt;td width="20%" style="background-color: red"&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;search site&lt;/td&gt;&lt;td&gt;Advanced search for files, descriptions and contents. (NEW)&lt;/td&gt;&lt;td width="20%" style="background-color: red"&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;import&lt;/td&gt;&lt;td&gt;Various import options: plain text to MVD, TEI XML to MVD etc. (NEW)&lt;/td&gt;&lt;td width="20%" style="background-color: yellow"&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;export&lt;/td&gt;&lt;td&gt;Export an MVD to source format (e.g. XML) (NEW)&lt;/td&gt;&lt;td width="20%" style="background-color: red"&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-5616106406635402364?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/5616106406635402364/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=5616106406635402364' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5616106406635402364'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5616106406635402364'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/12/progress-table-for-mvd-joomla.html' title='Progress Table for MVD Joomla Components'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-9167635694659068446</id><published>2009-12-04T21:32:00.000-08:00</published><updated>2009-12-04T21:48:50.243-08:00</updated><title type='text'>Launch of Harpur Website</title><content type='html'>&lt;p&gt;Although there’s not much there yet I launched &lt;a href="http://www.digitalvariants.net/harpur/"&gt;the Harpur Archive test website&lt;/a&gt; last weekend. It is a Joomla installation and in it I intend to build all the technology from the Alpha wiki prototype, and a new version of nmerge. In short, I will try to build &lt;em&gt;reusable&lt;/em&gt; and easy to use Joomla components and experiment with them there. So if you are interested, watch this space.&lt;/p&gt;
&lt;h3&gt;Markup Inadequacy Paper&lt;/h3&gt;
&lt;p&gt;While I’m on the subject of news I submitted on the 13th of November a long paper (9,000 words) to Literary and Linguistic Computing entitled: ‘The Inadequacy of Embedded Markup for Cultural Heritage Texts’. It’s provocative, and it’s meant to be. I am basically calling the establishment’s bluff that they dare try to stop this. I think we’ve gone on quite long enough with an inadequate means of recording our historical texts in digital form. So this is my attempt to make it stop. Here’s the abstract:&lt;/p&gt;
&lt;blockquote&gt;Embedded generalized markup, as applied by digital humanists to the recording and studying of our textual cultural heritage, suffers from a number of serious technical drawbacks. As a result of its evolution from early printer control languages, generalized markup can only express a document’s ‘logical’ structure via a repertoire of permissible printed format structures. In addition to the well-researched overlap problem, the embedding of markup codes into texts that never had them when written leads to a number of further difficulties: the inclusion of potentially obsolescent technical and subjective information into texts that are supposed to be archivable for the long term, the manual encoding of information that could be better computed automatically, and the obscuring of the text by highly complex technical data. Many of these problems can be alleviated by asserting a separation between the versions of which many cultural heritage texts are composed, and their content. In this way the complex interconnections between versions can be handled automatically, leaving only simple markup for individual versions to be handled by the user.&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-9167635694659068446?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/9167635694659068446/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=9167635694659068446' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9167635694659068446'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9167635694659068446'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/12/launch-of-harpur-website.html' title='Launch of Harpur Website'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-4811370042194335434</id><published>2009-11-27T12:40:00.000-08:00</published><updated>2009-11-27T13:08:32.908-08:00</updated><title type='text'>Interedition Handout</title><content type='html'>&lt;p&gt;I've had some positive feedback from the recent meeting of the Interedition initiative in Brussels. One of my colleagues distributed a handout that was favourably received, and to which I have already received one offer of collaboration. Since it expresses the essence of MVD in a non-technical form and has a stunning graphic of the comparison of two versions of Charles Harpur's 1845 versus 1888 editions of the Creek of the Four Graves, which have only around 40% similarity, I thought I'd share it with you:&lt;/p&gt;
&lt;h3 align="center"&gt;Multi-Version Documents and the Harpur Archive&lt;/h3&gt;
&lt;p&gt;The Multi-Version Document or MVD system is designed to automate as far as possible the work of editing our textual cultural heritage. Existing markup-based approaches pose serious problems for the modern digital scholarly editor, including:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;Failure to adequately and accurately represent ordinary textual phenomena &lt;/li&gt;
&lt;li&gt;Obscuring the text and confusing the editor with excessive density of technical markup &lt;/li&gt;
&lt;li&gt;Requiring manual tasks that could be performed much better and automatically by computer &lt;/li&gt;
&lt;li&gt;Embedding subjective and potentially obsolescent technical information into texts that are supposed to be archived for the long term&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;These problems can mostly be overcome by separating the versions from their content. In this way editing a text becomes relatively simple, because all the complexities of versions (insertions, deletions, variants and transpositions) are handled automatically. Instead the editor works on a simplified text marked up only with the textual structure of each version.&lt;/p&gt;

&lt;p&gt;An MVD represents 'the work' as an interrelated set of versions that can be searched, compared, edited and archived as a single, compact digital entity. An MVD also has a zero footprint. You can always get out the texts in exactly the same form as you put them in.&lt;/p&gt;

&lt;h4&gt;What we have now:&lt;/h4&gt;
&lt;p&gt;The following tools are available for download from the &lt;a href="http://code.google.com/p/multiversiondocs/downloads/list"&gt;Googlecode site:&lt;/a&gt; &lt;/p&gt;
&lt;ol&gt;&lt;li&gt;The nmerge commandline tool. This can be used to create, edit and manipulate MVDs.&lt;/li&gt;
&lt;li&gt;The Alpha wiki prototype. This can be used to visualise and edit MVDs. For copyright reasons it only has one example text: all major versions of Act 1 Scene 1 from Shakespeare’s King Lear.&lt;/li&gt;&lt;/ol&gt; 

&lt;h4&gt;Future Developments&lt;/h4&gt;
&lt;p&gt;We are currently developing a plugin for Joomla! that will incorporate all the current technology, with further enhancements, to enable a humanities type web archive to be easily built and deployed on ordinary web hosts, requiring only a low level of technical expertise. This will be used as the basis of the new Digital Variants website and also the Harpur Text Archive. Progress reports will be posted on the MVD blog.&lt;/p&gt;

&lt;h4&gt;References&lt;/h4&gt;
&lt;p&gt;Schmidt, D. (2009a). Merging Multi-Version Texts: a Generic Solution to the Overlap Problem. In: Usdin, B.T. (ed) Proceedings of Balisage: The Markup Conference 2009. doi:10.4242/BalisageVol3.Schmidt01. &lt;/p&gt;
&lt;p&gt;Schmidt, D. and Colomb, R. (2009). A data structure for representing multi-version texts online. International Journal of Human-Computer Studies, 67.6: 497-514. &lt;/p&gt;
&lt;p&gt;Schmidt, D., Brocca, N. and Fiormonte, D. (2008). A Multi-Version Wiki. In: L.L. Opas-Hänninen, M. Jokelainen, I. Juuso, T. Seppänen (eds), Proceedings of Digital Humanities 2008, Oulu, Finland, June, 2008, pp. 187-188. &lt;/p&gt;
&lt;p&gt;Multi-Version Documents. http://multiversiondocs.blogspot.com. &lt;/p&gt;
&lt;p&gt;Merge and edit N versions in one document. http://code.google.com/p/multiversiondocs/.&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_GGwOcLYrsVk/SxA8VFFCDMI/AAAAAAAAAJg/CZBmY-qOQ8w/s1600/handout2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 309px; height: 400px;" src="http://1.bp.blogspot.com/_GGwOcLYrsVk/SxA8VFFCDMI/AAAAAAAAAJg/CZBmY-qOQ8w/s400/handout2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5408889485310168258" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-4811370042194335434?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/4811370042194335434/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=4811370042194335434' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/4811370042194335434'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/4811370042194335434'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/11/interedition-handout.html' title='Interedition Handout'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_GGwOcLYrsVk/SxA8VFFCDMI/AAAAAAAAAJg/CZBmY-qOQ8w/s72-c/handout2.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-9139438130921421808</id><published>2009-11-24T13:53:00.000-08:00</published><updated>2009-11-24T13:55:48.585-08:00</updated><title type='text'>Minor updates to nmerge, Alpha</title><content type='html'>&lt;p&gt;I have added a README to Alpha to help install it and get it working. It didn't have one, which was an oversight. Also I noticed that the nmerge installer didn't work properly. This is due to my inexperience with automake. In fact it installed correctly, it just complained about the java source code directory which wasn't listed in the makefile properly. I'll try to be more careful in future.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-9139438130921421808?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/9139438130921421808/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=9139438130921421808' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9139438130921421808'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9139438130921421808'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/11/minor-updates-to-nmerge-alpha.html' title='Minor updates to nmerge, Alpha'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-3342341601161739986</id><published>2009-11-01T13:51:00.000-08:00</published><updated>2009-11-01T14:13:02.461-08:00</updated><title type='text'>C++ Version of nmerge</title><content type='html'>&lt;p&gt;One problem with the current design of nmerge is that it is written in Java. The commandline tool is a thin C wrapper around that, and if you want to process larger files you can't pass in arguments to increase available memory. So it just fails to work on large files. Also if you want to run it on servers that don't have, or won't allow, Java (true of many commercial hosting sites) you're also out of luck. Since the Digital Variants people and probably a large number of humanities projects will have these problems also, I have decided to convert it into pure C++. This should be relatively easy, and the benefits are:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;memory usage will be limited only by what is available on the machine, not to that allocated to the Java Virtual machine (JVM)&lt;/li&gt;&lt;li&gt;nmerge-c++ will be callable from PHP or another scripting language without requiring installation of a JVM.&lt;/li&gt;&lt;li&gt;nmerge can optionally write to a database instead of directly to disk. This is usually the only way you can save changes on a commercial hosting site.&lt;/li&gt;&lt;li&gt;The C++ version will use far less memory than the Java version and should be a bit faster.&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;Overall, these changes will facilitate the building of a practical web application or plugin, which can be added to existing sites. Initially, my intention is to produce a Joomla! plugin that other people can use.&lt;/p&gt;
&lt;p&gt;Some changes that will be possible in this revision include:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;Grouped transpositions. By assessing individual transposition candidates as a group it will be possible to detect larger transpositions that contain small corrections.&lt;/li&gt;&lt;li&gt;Proper multi-tasking of the merging process in C++ will hopefully speed up the algorithm considerably.&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;That's the plan. I thought I'd let you know where I'm taking this, and it is to turn it into a generally usable tool.&lt;/p&gt;
&lt;p&gt;There is at least one drawback, of course. C++ is cumbersome to write code in, compared to the relative heaven of Java. It's like painting a room with a brush instead of a roller.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-3342341601161739986?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/3342341601161739986/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=3342341601161739986' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3342341601161739986'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3342341601161739986'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/11/c-version-of-nmerge.html' title='C++ Version of nmerge'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-4428730191377233423</id><published>2009-10-23T15:16:00.000-07:00</published><updated>2009-10-23T18:05:20.931-07:00</updated><title type='text'>Whoops!</title><content type='html'>&lt;p&gt;A favourite quotation of Edgar Dijkstra is that 'testing shows the presence, not the absence of bugs'. This is very true. In Australian homes too you can squash a few cockroaches and think you've got them all, but how do you know there isn't a whole colony hiding in the skirting boards? I'm guilty of putting in a '!' when I shouldn't have. My only explanation was that I was jetlagged in Montreal and fed up with preparing my presentation. For some reason I put in that 'not', which prevented nmerge from finding any left-side transpositions at all. All I can say is: 'Whoops!'&lt;/p&gt;
&lt;p&gt;I'll fix it in the next hour or so and upload the new version as 1.0.2, and update Alpha too. The transposition algorithm is not perfect - I never said so, if you read the Balisage paper, particularly at the end - but it is workable. One thing you should keep in mind is that this is a unique program in its field. Several people have written merging programs for humanistic texts, and a couple have even included transpositions (MEDITE, JNDiff). But only between &lt;em&gt;two&lt;/em&gt; texts at a time. I merge &lt;em&gt;N&lt;/em&gt; texts into one digital representation.&lt;/p&gt;
&lt;p&gt;One thing I'd like to do soon is make it find transpositions in groups (a flaw that Peter Robinson rightly pointed out). And it could be even faster, if I can work out how to parallelise the algorithm. That's why I 'built' this fancy i7 computer.&lt;/p&gt;
&lt;p&gt;The good thing about computing variants automatically rather than manually is that it is not final. Any improvements in the algorithm are immediately visible. Whereas making systematic changes to a manually coded set of texts with complex variants is &lt;em&gt;not&lt;/em&gt; trivial.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-4428730191377233423?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/4428730191377233423/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=4428730191377233423' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/4428730191377233423'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/4428730191377233423'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/10/whoops.html' title='Whoops!'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-7159088014283086467</id><published>2009-09-01T04:50:00.001-07:00</published><updated>2009-09-01T04:55:01.257-07:00</updated><title type='text'>New Versions of nmerge and Alpha posted</title><content type='html'>&lt;p&gt;The difference here is that nmerge now includes the full source code, released under the GPL v3, and also contains a single example text that I can give away under the same license. It is the first scene of Shakespeare's King Lear. I have tried to make it as true to the source texts as I can but it's a lot of work getting markup to look like a manuscript. I never realised before how much the tags interfere with that. It's very annoying. Anyway, let me know if there are any mistakes. Or any ideas on how Alpha can be improved. I'm sure there are lots.&lt;/p&gt;
&lt;p&gt;Of course it is full of markup hacks, mainly lines split over speeches, but I couldn't fix that without introducing another layer for each MS. I'd prefer using some other technology other than markup for the content but there isn't one yet. Oh well!&lt;/p&gt;
&lt;p&gt;&lt;a href="http://code.google.com/p/multiversiondocs/downloads/list"&gt;Here's the link.&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-7159088014283086467?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/7159088014283086467/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=7159088014283086467' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7159088014283086467'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7159088014283086467'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/09/new-versions-of-nmerge-and-alpha-posted.html' title='New Versions of nmerge and Alpha posted'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-4656383745883872749</id><published>2009-08-13T19:44:00.001-07:00</published><updated>2009-08-18T15:42:46.164-07:00</updated><title type='text'>Balisage Presentation 13 August 2009</title><content type='html'>&lt;p&gt;My talk at Balisage in Montreal went very much as planned. The slides took as long as I had paced them to last: 28 minutes. Then I did two software demonstrations, one of the nmerge commandline tool and another of the Alpha multi-version wiki. The former is more or less finished (though I keep tweaking it) and the Alpha wiki is about half done, but usable. There was time afterwards for a few questions. The best of these came from Fabio Vitali, who also works with Angelo Di Iorio on a diff calculating algorithm for edited XML texts. He convinced me after the talk that their method of computing diffs has some advantages over my simplistic greedy approach for XML texts. But my method I think is still a good fallback in the general case. I think the best thing is to try to incorporate the basic idea of their JNDiff algorithm, which is making the merging algorithm &lt;em&gt;optionally&lt;/em&gt; XML-aware, rather than try to use their code, which is not really open-source yet.&lt;/p&gt;
&lt;p&gt;I think the paper went down well because of the demos. No one else whose talk I saw presented any finished software. It was mostly work in progress - the usual conference fare. But reactions to it were not very critical. They had little to say I think because it was not about an application of XSLT or XQuery - their favourite tools. But the talk at least has exposed the MVD idea to a wider audience. No more excuses any more for &lt;em&gt;not&lt;/em&gt; mentioning it when discussing solutions to overlapping hierarchies.&lt;/p&gt;
&lt;p&gt;I received favourable comments from the upper reaches of the Balisage hierarchy which seemed genuine. And I am encouraged by that.&lt;/p&gt;
&lt;p&gt;I have updated nmerge with &lt;a href="http://code.google.com/p/multiversiondocs/downloads/list"&gt;the version I demonstrated at Montreal.&lt;/a&gt; Also there is a copy of the wiki in its current state, minus any MVDs. I can't use any of the usual examples because of copyright restrictions. So I'll have to create some of my own pretty soon.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-4656383745883872749?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/4656383745883872749/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=4656383745883872749' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/4656383745883872749'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/4656383745883872749'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/08/balisage-presentation-13-august-2009.html' title='Balisage Presentation 13 August 2009'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-516643747070155549</id><published>2009-08-12T17:40:00.000-07:00</published><updated>2009-08-12T18:02:26.338-07:00</updated><title type='text'>The Biggest Advantage of Using MVDs</title><content type='html'>&lt;p&gt;I suddenly realised now I am here in Montreal preparing to defend my ideas against 100 experts from around the world that I have failed to notice all this time the biggest advantage of MVDs. And it is this: The alternative to computing the interrelations between multi-version texts can only be encoding them &lt;em&gt;manually.&lt;/em&gt; In speaking of the supposed advantages of standard XML tools what is often forgotten is the enormous human cost of training people to use markup, and getting them to encode it and check it against the originals. I know from experience that this is very expensive. We literally spent thousands of man-hours encoding variants in Wittgenstein. If we could have had a tool for doing that automatically, much of that time and money would have been saved.&lt;/p&gt;
&lt;p&gt;Another advantage of computing interrelations automatically is that it is so easy to get back what you put in, unmolested. Hand-encoded XML hard-wires the interconnections between versions, and getting back the original text can be a hard problem if you decide later to change to another technology. With nmerge I just press the "archive" button and it is done.&lt;/p&gt;
&lt;p&gt;If computers are good for anything they are good for saving human effort.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-516643747070155549?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/516643747070155549/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=516643747070155549' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/516643747070155549'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/516643747070155549'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/08/biggest-advantage-of-using-mvds.html' title='The Biggest Advantage of Using MVDs'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-7845460932957899906</id><published>2009-07-26T17:52:00.000-07:00</published><updated>2009-07-28T03:16:15.857-07:00</updated><title type='text'>Alpha Prototype Ready</title><content type='html'>&lt;p&gt;I am renaming the multi-version wiki Alpha, simply because it's easier to say than Phaidros. It's a bit of a joke, really, because 'Alpha' was just the description of the product I developed for DH2008. It was the 'alpha' release of that.&lt;/p&gt;
&lt;p&gt;The old Alpha didn't do transpositions, and to remedy this deficiency I have been labouring hard for the past year. NMerge was revised to support transpositions, but I hadn't integrated it into the multi-version wiki. But when I finally saw the result of the new nmerge in the web browser, it was suddenly clear that there were still some bugs in the transposition algorithm. Finding out exactly &lt;em&gt;what&lt;/em&gt; was going wrong, though, took me about a week of solid debugging. But it is done now and I am finally satisfied. And now I have something to take to Montr&amp;eacute;al to show the audience. And I can say: 'Hey folks, you said this conference was all about theory, but here's something that &lt;em&gt;actually works.&lt;/em&gt;' I think that is a pretty good argument.&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_GGwOcLYrsVk/Sm4xmHk37FI/AAAAAAAAAIM/V2BxTmDGPG8/s1600-h/Screenshot.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 550px; height: 90px;" src="http://4.bp.blogspot.com/_GGwOcLYrsVk/Sm4xmHk37FI/AAAAAAAAAIM/V2BxTmDGPG8/s1600/Screenshot.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5363278737183337554" /&gt;&lt;/a&gt;
&lt;p&gt;In this screendump of part of the TwinView of Galiano's 'El mapa de las aguas' you can see the transposition of 'otras de un hachazo' from after 'de un bocado rabioso' (in version B, left) to before (in version C, right). To consistently detect cases like this manually would be near to impossible.&lt;/p&gt;
&lt;p&gt;Red text is deleted in the left-hand version with respect to the version on the right. Blue text is inserted, and transpositions are shown in grey. Black text is merged and, like transpositions, clicking on it aligns the text on each side. This use of these simple features of HTML results in a surprisingly effective UI.&lt;/p&gt;
&lt;h4&gt;Character-Level vs Word-Level Alignment&lt;/h4&gt;
&lt;p&gt;The use of character-level alignment by default is new to this version. For example, the expression 'el molino chico' became 'el molino' through the deletion of the character sequence 'o chic'. This goes to show that what humans would expect &amp;ndash; the deletion of '&amp;nbsp;chico' &amp;ndash; and what the computer detects, don't always correspond. I don't think that is a bad thing. The alternative would be to fail to see changes of spelling such as 'desaparecido' for 'desparecido' or the capitalisation of 'Ojos' for 'ojos'. A word-level granularity would puzzle the reader while he/she tried to work out the difference. It is clearer to see small changes like these highlighted, so I agree with the MEDITE people that character-level alignment is more powerful. After all, you can always reduce character-level granularity to word-level but if you only have word-level alignment you are stuck with it.&lt;/p&gt;
&lt;p&gt;'Collation' programs based on XML use word-level granularity because a finer resolution would make the markup impossibly complex (you'd have to mark up each letter separately). That doesn't have to be a restriction once we abandon the print-oriented concept of 'apparatus.' For the digital medium, at least, a new digital presentation of variation is needed. Let it evolve.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-7845460932957899906?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/7845460932957899906/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=7845460932957899906' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7845460932957899906'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7845460932957899906'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/07/alpha-prototype-ready.html' title='Alpha Prototype Ready'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_GGwOcLYrsVk/Sm4xmHk37FI/AAAAAAAAAIM/V2BxTmDGPG8/s72-c/Screenshot.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-1989968327116034439</id><published>2009-07-02T15:27:00.000-07:00</published><updated>2009-07-02T15:40:59.681-07:00</updated><title type='text'>Interface 09 and Multi-Version Wiki</title><content type='html'>&lt;p&gt;We will be presenting a poster at &lt;a href="http://www.interface09.org.uk"&gt;Interface09&lt;/a&gt; at the University of Southampton. There will also be a demo of the multi-version wiki, which I hope will be an iteration further on from that presented at Oulu for Digital Humanities 2008. The new multi-version wiki is simply the old wiki with the new nmerge library added, but that includes support for transpositions, which is kind of important. It is a Jetty 6 based web application that runs inside your browser, and allows you to view and edit MVDs in a variety of intuitive ways.&lt;/p&gt;
&lt;h4&gt;Digital Variants Portal&lt;/h4&gt;
&lt;p&gt;Eventually the wiki will be broken up and integrated into the Digital Variants Website I am building. In this form the wiki will be a series of portlets inside a portal. Each portlet conforms to JSR 286 and is implemented in &lt;a href="http://portals.apache.org/jetspeed-2/"&gt;Jetspeed 2.&lt;/a&gt; A portal allows the user to configure his or her own interface on the web using the portlet components. It also promotes reuse of the portlets by other parties. We are going for broke with this design: I for one don't believe that deficient or obsolescent technology has any place in designs for the future. If we can build it, we will.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-1989968327116034439?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/1989968327116034439/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=1989968327116034439' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/1989968327116034439'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/1989968327116034439'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/07/interface-09-and-multi-version-wiki.html' title='Interface 09 and Multi-Version Wiki'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-5025115656802956367</id><published>2009-06-05T01:57:00.000-07:00</published><updated>2009-06-05T13:44:37.285-07:00</updated><title type='text'>nmerge 1.0 posted</title><content type='html'>&lt;p&gt;OK, I've posted the first &lt;a href="http://code.google.com/p/multiversiondocs/downloads/list"&gt;BETA version of nmerge&lt;/a&gt; for UNIX/Linux/OSX only. I'll add a Windows installer as soon as I can get around to it. Of course I expect it to go wrong immediately, even though I have tested it thoroughly. But I can only really gather more information by trying it on other files. And it comes currently with &lt;em&gt;no&lt;/em&gt; example files. &lt;/p&gt;
&lt;p&gt;Some basic installation instructions for the non-GNU afficionados:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Download the nmerge-1.0.tar.gz file using the link above&lt;/li&gt;
&lt;li&gt;Open a terminal window, navigate to the download file and unpack it
   using &lt;code&gt;tar xzf dir.tar.gz&lt;/code&gt; or just double click on it if you have a Mac&lt;/li&gt;
&lt;li&gt;In the terminal window type &lt;code&gt;cd nmerge-1.0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;./configure&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;make&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sudo make install&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You should now have a command "&lt;code&gt;nmerge&lt;/code&gt;". If it complains about the Java make sure you have a valid JRE installed. It must be at least version 1.5.0 (1.4.2 is no good). To find out type &lt;code&gt;java -version&lt;/code&gt; in the terminal window. Download a more recent JRE from &lt;a href="http://java.sun.com/javase/downloads/index.jsp"&gt;Sun.&lt;/a&gt; (You only need the JRE not the JDK unless you also want to develop Java software). If it still doesn't work you have an issue that you should post on &lt;a href="http://code.google.com/p/multiversiondocs/issues/list"&gt;Google code&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The first update will contain the source code and documentation. I left it out because of my inexperience with GNU automake.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-5025115656802956367?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/5025115656802956367/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=5025115656802956367' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5025115656802956367'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5025115656802956367'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/06/nmerge-10-posted.html' title='nmerge 1.0 posted'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-5190496177537744994</id><published>2009-06-02T15:45:00.000-07:00</published><updated>2009-06-02T15:51:27.631-07:00</updated><title type='text'>Balisage Paper Accepted</title><content type='html'>&lt;p&gt;My Balisage paper about how to create and edit MVD files has been accepted. I have already bought the flight tickets and registered, so I will be going to Montreal on August 11-14. That's the other side of the world for me and I think I must be mad. But this is the only way to properly air the MVD concept and get some reactions from the people most likely to field valid objections. If they clear it, then I think that will vindicate it as far as it can be at this stage. The draft paper is &lt;a href="http://www.itee.uq.edu.au/~schmidt/_articles/balisagepaper.zip"&gt;here&lt;/a&gt;, although it is rather technical. I will post my simplified slide show when I have it.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-5190496177537744994?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/5190496177537744994/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=5190496177537744994' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5190496177537744994'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5190496177537744994'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/06/balisage-paper-accepted.html' title='Balisage Paper Accepted'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-1582700898342146904</id><published>2009-06-02T03:13:00.000-07:00</published><updated>2009-06-02T03:20:33.878-07:00</updated><title type='text'>Out of the Tunnel</title><content type='html'>&lt;p&gt;Well, it all works. Now I just have to build an installable package for it. To be honest I don't think many if any people will want to use nmerge. It's too user unfriendly because it has no real user interface. People want a GUI these days, and nmerge is designed to be the Swiss army knife for whatever GUI you might want to put on top of it. Nevertheless I will post it as soon as possible with a GNU type installer and maybe a Windows one if that is not too hard (perhaps using Nullsoft). The main point is that a milestone has been reached: the MVD file format is born. (Hooray!)&lt;/p&gt;
&lt;p&gt;After that it will be time to add my own GUI, which is just an updating of the Phaidros wiki which has lain untouched for nearly a year now. It is time to update it with some killer features: e.g. Tree View, which will show the genealogy of a set of versions via a graphical tree which you can configure and regenerate according to taste. I have some other ideas too which can be blended in gradually.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-1582700898342146904?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/1582700898342146904/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=1582700898342146904' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/1582700898342146904'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/1582700898342146904'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/06/out-of-tunnel.html' title='Out of the Tunnel'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-1365632653568191720</id><published>2009-05-28T04:50:00.000-07:00</published><updated>2009-06-02T03:16:57.740-07:00</updated><title type='text'>The Light at the End of the Tunnel</title><content type='html'>&lt;p&gt;Well I &lt;em&gt;finally&lt;/em&gt; got 'compare' to work properly. The delay was caused by having to redesign the 'chunking' mechanism that delivers the text back to the browser as a series of blocks with all the same characteristics. So all the deleted text can be made red, and the inserted blue and the merged black. And the user can click on the black text and be taken to the corresponding part of the compared text. Very important, but also very tricky to get absolutely right. And in this version I had to allow for transpositions, and they are even more complicated. But now at last it works. I will post the project on Google Code in the morning, because I am too tired now.&lt;/p&gt;
&lt;p&gt;&lt;table&gt;&lt;tr&gt;&lt;td bgcolor="green"&gt;usage&lt;/td&gt;&lt;td bgcolor="green"&gt;create&lt;/td&gt;&lt;td bgcolor="green"&gt;help&lt;/td&gt;&lt;td bgcolor="green"&gt;add&lt;/td&gt;&lt;td bgcolor="green"&gt;del&lt;/td&gt;&lt;td bgcolor="green"&gt;desc&lt;/td&gt;&lt;td bgcolor="green"&gt;arch&lt;/td&gt;&lt;td bgcolor="green"&gt;unarch&lt;/td&gt;&lt;td bgcolor="green"&gt;export&lt;/td&gt;&lt;td bgcolor="green"&gt;import&lt;/td&gt;&lt;td bgcolor="green"&gt;update&lt;/td&gt;&lt;td bgcolor="green"&gt;read&lt;/td&gt;&lt;td bgcolor="green"&gt;list&lt;/td&gt;&lt;td bgcolor="green"&gt;comp&lt;/td&gt;&lt;td bgcolor="green"&gt;find&lt;/td&gt;&lt;td bgcolor="green"&gt;vars&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/p&gt;
&lt;h4&gt;Several Days Later ...&lt;/h4&gt;
&lt;p&gt;Almost done testing the code. Just a few minor problems with find (again) and variants. The latter could be quite a useful feature in the GUI. For example, selecting a piece of text could conceivably show its variants dynamically in a sub-window at the bottom. I favour an in-line solution using popup text, but that will have to wait. This feature should demonstrate that we don't need to 'collate' separate physical versions any longer to get this information.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-1365632653568191720?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/1365632653568191720/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=1365632653568191720' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/1365632653568191720'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/1365632653568191720'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/05/light-at-end-of-tunnel.html' title='The Light at the End of the Tunnel'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-1228803817033945809</id><published>2009-05-14T16:22:00.000-07:00</published><updated>2009-06-02T15:03:55.529-07:00</updated><title type='text'>HyperNietzsche vs MVD</title><content type='html'>&lt;p&gt;I decided after all to make some general remarks about the recently proposed &lt;a href="http://wiki.tei-c.org/index.php/Genetic_Editions"&gt;'Encoding Model for Genetic Editions'&lt;/a&gt; being promoted by the HyperNietzsche people and the TEI. Since this is being put forward as a rival solution for a small subset of multi-version texts covered by my solution, I thought that readers of this blog might like to know the main reasons why I think that the MVD technology is much the better of the two.&lt;/p&gt;
&lt;h4&gt;One Work = One Text&lt;/h4&gt;
&lt;p&gt;Because it is difficult to record many versions in one file using markup, the proposal recommends a document-centric approach. In this method each physical document is encoded separately, even when they are just drafts of the one text. As a result there is a great deal of redundancy in their representation. They interconnect the variants between documents by means of links which are weighted with a probability, and they see in this their main advantage over MVD. But this is based purely on a misunderstanding of the MVD model. The weights can of course be encoded in the version information of the MVD as user-constructed paths. We can have an 80% probable version and a 20% probable version just as well as physical versions.&lt;/p&gt;
&lt;p&gt;Actually I think it is wrong to encode &lt;em&gt;one transcriber's opinion&lt;/em&gt; about the probability that a certain combination of variants is 'correct'. A transcription should just record the text and any interpretations should be kept separate. How else can it be shared? The display of alternative paths is a task for the software, mediated by the user's preferences.&lt;/p&gt;
&lt;p&gt;The main disadvantage in having multiple copies of the same text is that every subsequent operation on the text has to reestablish or maintain the connections between bits that are supposed to be the same. You thus have much more work to do than in an MVD. I believe that &lt;em&gt;text that is the same across versions should literally be the same text.&lt;/em&gt; This simplifies the whole approach to multi-version texts. I also don't believe that humanists want to maintain complex markup that essentially records interconnections between versions, when this same information can be recorded automatically as simple identity.&lt;/p&gt;
&lt;h4&gt;OHCO Thesis Redux&lt;/h4&gt;
&lt;p&gt;The section on 'grouping changes' implies that manuscript texts have a structure that can be broken down into a hierarchy of changes that can be conveniently grouped and nested arbitrarily. Similarly in section 4.1 a strict hierarchy is imposed consisting of document-&gt;writing surface-&gt;zone-&gt;line. Since Barnard's paper in 1988 where he pointed out the inherent failure of markup to adequately represent a simple case of nested speeches and lines in Shakespeare - sometimes a line was spread over two speeches - the problem of overlap has become the dominant issue in the digital encoding of historical texts. This representation, which seeks to reassert the OHCO thesis, which has been withdrawn by its own authors, will fail to adequately represent these genetic texts until it is recognised that they are fundamentally non-hierarchical. The last 20 years of research cannot simply be ignored. It is no longer possible to propose something for the future that does not address the overlap problem. And MVD neatly disposes of that.&lt;/p&gt;
&lt;h4&gt;Collation of XML Texts&lt;/h4&gt;
&lt;p&gt;I am also curious as to how they propose to 'collate' XML documents arranged in this structure, especially when the variants are distributed via two mechanisms: as markup in individual files and also as links between documentary versions. Collation programs work by comparing basically plain text files, containing only light markup for references in COCOA or empty XML elements (as in the case of Juxta). The virtual absence of collation programs able to process arbitrary XML renders this proposal at least very difficult to achieve. It would be better if a purely digital representation of the text were the objective, since in this case, an apparatus would not be needed.&lt;/p&gt;
&lt;h4&gt;Transpositions&lt;/h4&gt;
&lt;p&gt;The mechanism for transposition as described also sounds infeasible. It is unclear what is meant by the proposed standoff mechanism. However, if this allows chunks of transposed text to be moved around this will fail if the chunks contain non-well-formed markup or if the destination location does not permit that markup in the schema at that point. Also if transpositions between physical versions are allowed - and this actually comprises the majority of cases - how is such a mechanism to work, especially when transposed chunks may well overlap? &lt;/p&gt;
&lt;h4&gt;Simplicity = Limited Scope&lt;/h4&gt;
&lt;p&gt;Much is made in the supporting documentation of the HyperNietzsche Markup Language (HNML) and 'GML' (Genetic Markup Language)  of the greater simplicity of the proposed encoding schemes. Clearly, the more general an encoding scheme the less succinct it is going to be. Since the proposal is to encorporate the encoding model for genetic editions into TEI then this advantage will surely be lost. In any case there seems very little in the proposal that cannot already be encoded as well (or as poorly, depending on your point of view) in the TEI Guidelines as they now stand.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-1228803817033945809?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/1228803817033945809/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=1228803817033945809' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/1228803817033945809'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/1228803817033945809'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/05/genetic-editions.html' title='HyperNietzsche vs MVD'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-6864692459326419027</id><published>2009-05-08T03:48:00.000-07:00</published><updated>2009-05-28T04:56:13.182-07:00</updated><title type='text'>A Slight Delay in a Good Cause</title><content type='html'>&lt;p&gt;OK, I'm not finished yet, when I said I would, but software is like that. Sorry. I decided that in order to really test the program properly I should have a complete test suite that I can run after making any changes to make sure that everything in the release is OK. Well when I say 'make sure' a test can only tell you if a bug is present, not tell you that there are none. But that's a lot better than letting the user find them. If I release something that is incomplete or not fully tested then I know the sceptics will attack the flaws. They will say 'See, it doesn't work, I told you so!' I can't afford that, so I have to be careful. So far I have tests for fourteen out of 16 commands.&lt;/p&gt;
&lt;p&gt;I also added an unarchive command to go with the archive command. With 'archive' users can save an MVD as a set of versions in a folder, &lt;em&gt;plus&lt;/em&gt; a small XML file instructing nmerge how to reassemble them into an MVD. This contains all the version and group information etc. So if you don't believe the MVD format will last, &lt;em&gt;it doesn't matter.&lt;/em&gt; You always have the archive and that is in whatever format the original files were in. A user could even construct such an archive manually. The 'unarchive' command takes this archive and builds an MVD from it in one step. &lt;/p&gt;
&lt;p&gt;Here's a progress bar for the tests. Green means there is a test routine and it passes. Yellow means there is a test routine but it doesn't pass yet. Red means there is no test routine and I don't know for sure if it works, but it might. There was an intermittent problem with update, but this is now fixed.&lt;/p&gt;
&lt;table&gt;&lt;tr&gt;&lt;td bgcolor="green"&gt;usage&lt;/td&gt;&lt;td bgcolor="green"&gt;create&lt;/td&gt;&lt;td bgcolor="green"&gt;help&lt;/td&gt;&lt;td bgcolor="green"&gt;add&lt;/td&gt;&lt;td bgcolor="green"&gt;del&lt;/td&gt;&lt;td bgcolor="green"&gt;desc&lt;/td&gt;&lt;td bgcolor="green"&gt;arch&lt;/td&gt;&lt;td bgcolor="green"&gt;unarch&lt;/td&gt;&lt;td bgcolor="green"&gt;export&lt;/td&gt;&lt;td bgcolor="green"&gt;import&lt;/td&gt;&lt;td bgcolor="green"&gt;update&lt;/td&gt;&lt;td bgcolor="green"&gt;read&lt;/td&gt;&lt;td bgcolor="green"&gt;list&lt;/td&gt;&lt;td bgcolor="green"&gt;comp&lt;/td&gt;&lt;td bgcolor="red"&gt;find&lt;/td&gt;&lt;td bgcolor="red"&gt;vars&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;p&gt;I'm going for a beta version with this release. I think it's good enough.&lt;/p&gt;
&lt;p&gt;OK now there's a project on &lt;a href="http://code.google.com/p/multiversiondocs/"&gt;Google code&lt;/a&gt;. I must say it was much easier than creating a Sourceforge project. They wanted me to write an epic about it and even then I had to wait 1-3 days for their royal approval. On Google code it was instant. Cool.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-6864692459326419027?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/6864692459326419027/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=6864692459326419027' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6864692459326419027'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6864692459326419027'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/05/slight-delay.html' title='A Slight Delay in a Good Cause'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-2817768114281872489</id><published>2009-04-30T16:32:00.001-07:00</published><updated>2009-05-20T16:49:31.627-07:00</updated><title type='text'>Nmerge tool code-complete</title><content type='html'>&lt;p&gt;The nmerge commandline tool is now code-complete. I guess it's a 'pre-alpha' version. Since this is a revision of a previous working version, though, testing should not take too long. I would estimate that, after the Labor day weekend (Monday 4th May) I should have an alpha-version. But with software you never know. This version supports the new merging algorithm from the submitted &lt;a href="http://www.itee.uq.edu.au/~schmidt/_articles/balisagepaper.zip"&gt;Balisage 2009 paper,&lt;/a&gt; which works pretty well.&lt;/p&gt;
&lt;p&gt;Nmerge is also a JAVA library that can be used from within a JAVA application, like the Phaidros wiki, to provide support for Multi-Version-Documents. Once it has stabilised I will rewrite it as a C++ commandline tool. But for now we have to put up with a slightly more cumbersome syntax. Here is the "usage" statement produced by the program so you can get some idea of what it does. Once it is reasonably well tested I will put the source code on SourceForge under the GPL v3.&lt;/p&gt;
&lt;p&gt;The command syntax is a bit complicated, but so is what it is trying to do. I envisage that this tool could be used in a shell or commandline script to automate, say, the construction of an MVD from a set of files. At least that's what &lt;em&gt;I&lt;/em&gt; use it for. In any case the -h option prints out an example or two of how to use each command. The -c option specifies the command you want to perform on the MVD, and the other arguments are the parameters that the command uses, provided they make sense. If they don't you'll get an error message.&lt;/p&gt;
&lt;p&gt;With the nmerge tool MVD becomes a real format. There's no GUI user interface because if I added one, you couldn't take it away and put in your own. If you need one, wait for Phaidros.&lt;/p&gt;
&lt;pre&gt;
usage: java -jar nmerge.jar [-c command] [-a archive] [-b backup] 
     [-d description] [-e encoding] [-f string] [-g group] [-h command] 
     [-k length] [-l longname] [-m MVD] [-n mask] [-o offset] [-p]
     [-s shortname] [-t textfile] [-v version] [-w with] [-x XMLfile]
     [-?] 

-a archive - folder to use with archive and unarchive commands
-b backup - the version number of a backup (for partial versions)
-c command - operation to perform. One of:
     add - add the specified version to the MVD
     archive - save MVD in a folder as a set of separate versions
     compare - compare specified version 'with' another version
     create - create a new empty MVD
     description - print or change the MVD's description string
     delete - delete specified version from the MVD
     export - export the MVD as XML
     find - find specified text in all versions or in specified version
     import - convert XML file to MVD
     list - list versions and groups
     read - print specified version to standard out
     update - replace specified version with contents of textfile
     unarchive - convert an MVD archive into an MVD
     variants - find variants of specified version, offset and length
-d description - specified when setting/changing the MVD description
-e encoding - the encoding of the version's text e.g. UTF-8
-f string - to be found (used with command find)
-g group - name of group for new version
-h command - print example for command
-k length - find variants of this length in the base version's text
-l longname - the long name/description of the new version (quoted)
-m MVD - the MVD file to create/update
-n mask - mask out which kind of data in new mvd: none, xml or text
-o offset - in given version to look for variants
-p - specified version is partial
-s shortname - short name or siglum of specified version
-t textfile - the text file to add to/update in the MVD
-v version - number of version for command (starting from 1)
-w with - another version to compare with version
-x XML - the XML file to export or import
-? - print this message
&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-2817768114281872489?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/2817768114281872489/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=2817768114281872489' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2817768114281872489'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2817768114281872489'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/04/nmerge-tool-code-complete.html' title='Nmerge tool code-complete'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-4405738502512958482</id><published>2009-04-23T15:13:00.000-07:00</published><updated>2009-04-24T12:13:45.985-07:00</updated><title type='text'>MVDs in binary or XML?</title><content type='html'>&lt;p&gt;A pattern is emerging in the effect that the MVD concept is having on people. They take on board its power at representing variation but they don't like the idea of representing the data in binary form. Instead they think it is possible to represent variation in some form of XML. So far I've heard proposals to use TEI-XML, RDF or GraphML. It's tempting, of course, to carry on using XML when this is the tool we are all most familiar with. However, my point of developing the MVD format was precisely to get around the limitations of all forms of markup. You can't represent a variant graph in XML satisfactorily if the text you are recording the variation of is itself XML &amp;ndash; and it usually is. The reason is that you can't represent cases where the markup itself varies: for example the deletion of a paragraph break:&lt;/p&gt;
&lt;pre&gt;
&amp;lt;del&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;/del&amp;gt;???
&lt;/pre&gt;
&lt;p&gt;Of course there are hacks to get around this particular case but they have negative consequences. What you end up doing is &lt;em&gt;modifying the markup to accommodate weaknesses in the representational power of markup itself.&lt;/em&gt; I think that is a fundamentally flawed strategy. It is just another form of putting presentational information into markup that is supposed to be generic. If you try to represent variation in a set of texts or in one text using markup you very quickly run up against the problem of overlap. And markup is very poor at representing that as we all know. The only way to completely get around the overlap problem is to represent variation using a non-markup based technology. That's the whole point of MVDs that doesn't seem to have  been widely acknowledged yet.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-4405738502512958482?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/4405738502512958482/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=4405738502512958482' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/4405738502512958482'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/4405738502512958482'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/04/common-misunderstanding-of-mvd.html' title='MVDs in binary or XML?'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-3176054187289386672</id><published>2009-04-05T00:32:00.000-07:00</published><updated>2009-04-05T14:24:10.770-07:00</updated><title type='text'>MergeTester released</title><content type='html'>&lt;p&gt;For the thesis I wrote &lt;a href="http://www.itee.uq.edu.au/~schmidt/downloads.html"&gt;MergeTester&lt;/a&gt;, a simple utility that implements the merging algorithm from chapter 5. Although not a practical program, it does demonstrate how the program works and allows the user to test it on folders of versions in any format. It builds up a variant graph of the versions and prints them out one arc at a time. From the printout the user could manually reconstruct the graph or part of it.&lt;/p&gt;
&lt;p&gt;The advantage of the program lies in the fact that the way it works is not obscured by any other code and it does not depend on 3rd party libraries. Any comments and reports of bugs found will be gratefully received!&lt;p&gt;
&lt;p&gt;At the moment I am incorporating it into nmerge, which will also be released shortly. Nmerge can convert a variant graph into an MVD, so the merging algorithm will then become practical.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-3176054187289386672?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/3176054187289386672/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=3176054187289386672' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3176054187289386672'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3176054187289386672'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/04/mergetester-released.html' title='MergeTester released'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-843746255674046858</id><published>2009-03-18T19:16:00.000-07:00</published><updated>2009-04-04T20:03:56.175-07:00</updated><title type='text'>Final Version of Multi-Version Documents Paper Published by Elsevier</title><content type='html'>&lt;p&gt;&lt;a href="http://dx.doi.org/10.1016/j.ijhcs.2009.02.001"&gt;The final version of my MVD paper&lt;/a&gt; has now appeared online. This hyperlink is permanent and can be used in citations. The paper reference is Schmidt, D. and Colomb, R, 2009. A data structure for representing multi-version texts online, &lt;i&gt;International Journal of Human-Computer Studies,&lt;/i&gt; 67.6, 497-514.&lt;/p&gt;
&lt;h3&gt;Thesis Submission&lt;/h3&gt;
&lt;p&gt;Also I have now submitted my thesis. The final title was 'Multiple Versions and Overlap in Digital Text'. Here's the abstract:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;This thesis is unusual in that it tries to solve a problem that exists between two widely separated disciplines: the humanities (and to some extent also linguistics) on the one hand and information science on the other.  
&lt;/p&gt;&lt;p&gt;
Chapter 1 explains why it is essential to strike a balance between study of the solution and problem domains.
&lt;/p&gt;&lt;p&gt;
Chapter 2 surveys the various models of cultural heritage text, starting in the remote past, through the coming of the digital era to the present. It establishes why current models are outdated and need to be revised, and also what significance such a revision would have.
&lt;/p&gt;&lt;p&gt;
Chapter 3 examines the history of markup in an attempt to trace how inadequacies of representation arose. It then examines two major problems in cultural heritage and linguistics digital texts: overlapping hierarchies and textual variation. It assesses previously proposed solutions to both problems and explains why they are all inadequate. It argues that overlapping hierarchies is a subset of the textual variation problem, and also why markup cannot be the solution to either problem.
&lt;/p&gt;&lt;p&gt;
Chapter 4 develops a new data model for representing cultural heritage and linguistics texts, called a 'variant graph', which separates the natural overlapping structures from the content. It develops a simplified list-form of the graph that scales well as the number of versions increases. It  also describes the main operations that need to be performed on the graph and explores their algorithmic complexities.
&lt;/p&gt;&lt;p&gt;
Chapter 5 draws on research in bioinformatics and text processing to develop a greedy algorithm that aligns &lt;i&gt;n&lt;/i&gt; versions with non-overlapping block transpositions in &lt;i&gt;O(MN)&lt;/i&gt; time in the worst case, where &lt;i&gt;M&lt;/i&gt; is the size of the graph and &lt;i&gt;N&lt;/i&gt; is the length of the new version being added or updated. It shows how this algorithm can be applied to texts in corpus linguistics and the humanities, and tests an implementation of the algorithm on a variety of real-world texts.&lt;/p&gt;&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-843746255674046858?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/843746255674046858/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=843746255674046858' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/843746255674046858'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/843746255674046858'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/03/final-version-of-multi-version.html' title='Final Version of Multi-Version Documents Paper Published by Elsevier'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-7791140286533517335</id><published>2009-03-10T13:47:00.000-07:00</published><updated>2009-03-11T00:29:41.542-07:00</updated><title type='text'>MVD is Not a Replacement for Markup</title><content type='html'>&lt;p&gt;Some people still think of MVD as a &lt;em&gt;replacement&lt;/em&gt; for markup. It isn't. It &lt;em&gt;complements&lt;/em&gt; markup systems or any technology that can represent content. As I said in the main page &lt;a href="http://multiversiondocs.blogspot.com/2008/03/whats-multi-version-document.html"&gt;What's a Multi-Version Document?&lt;/a&gt; an MVD represents the overlapping structure of a set of versions or markup perspectives. It doesn't need to represent any of the detail of the content, which is the responsibility of the markup.&lt;/p&gt;
&lt;p&gt;I realise that it's easy, and natural, to seek to dismiss radical ideas simply because they are radical. The difference in this case is that MVD is a technology that definitely works. It's not all that radical anyway. Consider the direction in which multiple-sequence alignment is going in biology. They have also realised that the best way to represent multi-version genomes or protein sequences is via a directed graph (e.g. Raphael et al., 2004. A novel method for multiple alignment of sequences with repeated and shuffled elements, Genome Research, 14, 2336-2346). I prefer to think of that idea as parallel to mine, and his 'A-Bruijn' graph is rather different from my MVD, &lt;em&gt;but it represents the same kind of data in much the same way&lt;/em&gt;. Acceptance that this basic idea can also be applied to texts in humanities and linguistics is just a matter of time.&lt;/p&gt;
&lt;h3&gt;The Inadequacy of Markup&lt;/h3&gt;
&lt;p&gt;If markup is adequate for linguistics texts, why is it that every year someone thinks up a new way to manipulate markup systems to try to represent overlap? If it were adequate there would be no need for new systems, but we continue to see 1-3 new papers on the subject every year. It's seen as a game. Look at the &lt;a href="http://www.balisage.net/"&gt;Balisage website&lt;/a&gt;: 'There's nothing so practical as a good theory'. Perceived as an unsolvable problem, overlap is the perfect topic for a paper or a thesis.&lt;/p&gt;
&lt;p&gt;In the humanities, overlap in markup systems is more than an annoyance; it wrecks the whole process of digitisation. In simple texts you can just about get by, but it's a question of degree. Try to use markup to record the following structures:
&lt;ol&gt;
&lt;li&gt;Deletion of a paragraph break&lt;/li&gt;
&lt;li&gt;Deletion of underlining&lt;/li&gt;
&lt;li&gt;Changes to document &lt;em&gt;structure&lt;/em&gt;
&lt;li&gt;Transposition&lt;/li&gt;
&lt;li&gt;Overlapping variants&lt;/li&gt;
&lt;/ol&gt;
These can all be done somehow in markup, I admit, but very poorly. And they are features that occur all the time in original texts. The fundamental problem is that you can't adequately fit a non-hierarchical structure into a hierarchical template. To choose markup alone as a medium to preserve our textual cultural heritage is to resign yourself to &lt;em&gt;mangling&lt;/em&gt; that information.&lt;/p&gt;
&lt;p&gt;Why do we have to use markup to record complex structures it was never designed to represent? Hand that complexity over to the computer and let &lt;em&gt;it&lt;/em&gt; work it out. That's what MVD lets you do. If you are getting a headache shuffling around angle brackets and xml:ids, then think again. Is this any proper way for humans of the 21st century to interact with the texts of their forebears?&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-7791140286533517335?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/7791140286533517335/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=7791140286533517335' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7791140286533517335'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7791140286533517335'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/03/mvd-is-not-replacement-for-markup.html' title='MVD is Not a Replacement for Markup'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-7633838304486093510</id><published>2009-02-18T13:59:00.001-08:00</published><updated>2009-02-18T15:08:38.728-08:00</updated><title type='text'>MVD Paper available online</title><content type='html'>&lt;p&gt;Elsevier have published &lt;a href="http://dx.doi.org/10.1016/j.ijhcs.2009.02.001"&gt;the paper I wrote with Bob Colomb about Multi-Version Documents online&lt;/a&gt;. The Greek text has dropped out of Figure 16, but the rest is good. I hope this has an impact, and it is certainly something I will be referring to in future. It represents everything I knew about the MVD idea and its implications as of December 2008.&lt;/p&gt;
&lt;h3&gt;Thesis Complete&lt;/h3&gt;
&lt;p&gt;This morning I submitted a near-final draft of my thesis 'Multiple Versions and Overlap in Digital Text' to my two supervisors. The last chapter describes some new work on aligning multi-version texts automatically. Here's a table taken from the thesis which summarises its performance on a variety of multi-version texts.&lt;/p&gt;

&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_GGwOcLYrsVk/SZyIJypD_cI/AAAAAAAAAG8/FG3WzM-sP9o/s1600-h/table.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 281px;" src="http://1.bp.blogspot.com/_GGwOcLYrsVk/SZyIJypD_cI/AAAAAAAAAG8/FG3WzM-sP9o/s400/table.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5304264162929802690" /&gt;&lt;/a&gt;

&lt;p&gt;The SZ column is the average version size in kilobytes, NV is the number of versions, TT is the total time taken to merge all versions, AT is the average time to merge one version after the first, both in seconds. The test machine had a 1.66GHz Core Duo processor, using one core. The Romulo doesn't merge properly at the moment because there is almost nothing in common between the versions, so the merge times don't mean much in this case.&lt;/p&gt;
&lt;p&gt;The key is the AT column, which is how long it takes to 'save' an edited version back into the document. As you can see, it's pretty fast, considering that this is a hard problem. As far as quality goes, I can't see any bad alignments or false transpositions, except in the Malvezzi case. Once I can coerce the input into a sensible format this should also work.&lt;/p&gt;
&lt;h3&gt;Balisage&lt;/h3&gt;
&lt;p&gt;It looks as if I will be going to Balisage this year. I will be presenting a boiled down version of Chapter 5 of the thesis, which is all new work. I'll be very interested to hear their reactions, especially as I can now demonstrate the theory. (Their motto is 'There is nothing so practical as a good theory').&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-7633838304486093510?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/7633838304486093510/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=7633838304486093510' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7633838304486093510'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7633838304486093510'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2009/02/mvd-paper-available-online.html' title='MVD Paper available online'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_GGwOcLYrsVk/SZyIJypD_cI/AAAAAAAAAG8/FG3WzM-sP9o/s72-c/table.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-6618258187052007115</id><published>2008-12-04T13:19:00.000-08:00</published><updated>2008-12-04T13:31:08.293-08:00</updated><title type='text'>The MVD File Format</title><content type='html'>&lt;p&gt;Several people have asked me what is inside an MVD, so I thought I would put it on the record.&lt;/p&gt;
&lt;p&gt;The idea behind the Multi-Version Document or MVD format is to use the list form of the variant graph as the basis for an encoding of a single work, in all its versions or markup perspectives, as a single digital entity. The advantages of this form of digital document should be obvious. It enables a work to be viewed and searched, its versions compared and edited as one file. For example, all versions of Homer's Iliad or the seven markup perspectives of the American National Corpus (Ide, 2006) could be encapsulated in a single compact and editable representation. Also, the relationships between various parts of each version, the what-is-a-variant-of-what information, is also recorded. Storing a multi-version work as a set of separate files has the great disadvantage of requiring this kind of data to be recalculated each time it is needed. In an MVD this has already been calculated once and is thus built-in.&lt;/p&gt;

&lt;p&gt;If the content of each version is itself XML, then XML is a poor format for an MVD. An MVD may, however, be written in binary or XML format. In the latter case, the XML content of each version, &lt;em&gt;inside&lt;/em&gt; the XML encoding of the MVD structure, is escaped. That is, all instances of '&amp;lt;', '&amp;gt;' and '&amp;amp;' have to be replaced by their equivalent entities '&amp;amp;lt;', '&amp;amp;gt;' and '&amp;amp;amp;'. The purpose of the XML form of an MVD is merely to allow the researcher to look inside it to see what is there. Editing it by hand is virtually impossible, because the delicate list format produced by Algorithm 1 can so easily be broken.&lt;/p&gt;

&lt;p&gt;A tinker-proof binary format is therefore preferred. If desired for archival purposes, an MVD can be written out as a set of separate XML files, but the format uses open-source software to encode its content, so it is also archivable. The structure of an MVD is shown below:&lt;/p&gt;

&lt;a href="http://2.bp.blogspot.com/_GGwOcLYrsVk/SThKAlC9Y2I/AAAAAAAAAGg/ScqBj-WDdqo/s1600-h/mvd.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: undefinedpx; height: undefinedpx;" src="http://2.bp.blogspot.com/_GGwOcLYrsVk/SThKAlC9Y2I/AAAAAAAAAGg/ScqBj-WDdqo/s400/mvd.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5276048337269515106" /&gt;&lt;/a&gt;

&lt;p&gt;The outer wrapper is a Base64 encoding, expressing binary data as plain text. &lt;/p&gt;

&lt;p&gt;The inner wrapper is the ZIP encoding performed by the open source Zlib library (Gaily and Adler, 1995). This serves the double purpose of scrambling the data to deter tinkering, and compressing it so that one MVD typically occupies little more space than a single original version. Even the alteration of a single byte of the outer Base64 wrapper will very likely break the inner ZIP encoding and the document will fail to load, as it should. Inside the ZIP container are the four parts that comprise the real content:&lt;/p&gt;

&lt;table&gt;&lt;tr&gt;&lt;td valign="top"&gt;&lt;em&gt;Magic&lt;/em&gt;&lt;/td&gt;&lt;td&gt;the presence of this hexadecimal string guarantees that this is an MVD&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td valign="top"&gt;&lt;em&gt;Groups&lt;/em&gt;&lt;/td&gt;&lt;td&gt;these are labels for a hierarchy of arbitrary depth used to group versions or other groups&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td valign="top"&gt;&lt;em&gt;Versions&lt;/em&gt;&lt;/td&gt;&lt;td&gt;these provide a simple description sufficient to identify the ID, short name and long name of each version, whether or not it is a partial version, and its group&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td valign="top"&gt;&lt;em&gt;Pairs&lt;/em&gt;&lt;/td&gt;&lt;td&gt;the pairs list that defines the variant graph itself&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;

&lt;p&gt;No further detail is needed, and would in fact damage the general applicability of the format. Groups can be used to express any desired classification system for versions. The short name of a version would typically be a siglum or other short name for convenient reference, and the longer name would typically be a full version name. All other details of a version's text are the responsability of the content format.&lt;/p&gt;

&lt;h4&gt;References&lt;/h4&gt;
&lt;p&gt;J.-L. Gailly  and M. Adler (1995) &lt;a href="http://www.zlib.net/"&gt;Zlib&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;N. Ide and K. Suderman (2006) &lt;a href="http://www.cs.vassar.edu/~ide/papers/ANC-LREC06.pdf"&gt;Integrating Linguistic Resources: The American National Corpus Model.&lt;/a&gt;
In &lt;em&gt;Proceedings of the Fifth Language Resources and Evaluation Conference.&lt;/em&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-6618258187052007115?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/6618258187052007115/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=6618258187052007115' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6618258187052007115'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6618258187052007115'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/12/mvd-file-format.html' title='The MVD File Format'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_GGwOcLYrsVk/SThKAlC9Y2I/AAAAAAAAAGg/ScqBj-WDdqo/s72-c/mvd.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-8209189649917661893</id><published>2008-11-30T12:49:00.000-08:00</published><updated>2008-11-30T12:54:29.945-08:00</updated><title type='text'>From Toy Time to Big Time</title><content type='html'>&lt;p&gt;I don't know if this warrants another entry but the test program is now robust enough to handle large real world files in XML. I tried it on three 16K texts and it took 13.5 seconds overall to merge them with hundreds of transpositions. That is probably too many, but it does break up longer transpositions if it finds an alignment or insertion/deletion in the middle. The next step is to incorporate the test program into the NMerge library and thus allow the results to be displayed in the multi-version wiki.&lt;/p&gt;
&lt;p&gt;The transposition program works in the real world and it is fast.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-8209189649917661893?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/8209189649917661893/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=8209189649917661893' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/8209189649917661893'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/8209189649917661893'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/11/from-toy-time-to-big-time.html' title='From Toy Time to Big Time'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-6915047723380378986</id><published>2008-11-16T14:23:00.000-08:00</published><updated>2008-11-18T17:45:30.955-08:00</updated><title type='text'>Transpositions Conquered</title><content type='html'>&lt;p&gt;Today the test program correctly merged three versions of a single sentence of the Sibylline Gospel, detecting four transpositions and encoding them correctly. The sentences were:&lt;/p&gt;
&lt;p&gt;A: Et sumpno suscepto tribus diebus morte morietur et deinde ab inferis regressus ad lucem veniet.&lt;/p&gt;
&lt;p&gt;B: Et mortem sortis finiet post tridui somnum et morte morietur tribus diebus somno suscepto et tunc ab inferis regressus ad lucem veniet.&lt;/p&gt;
&lt;p&gt;C: Et sortem mortis tribus diebus sompno suscepto et tunc ab inferis regressus ad lucem veniet.&lt;/p&gt;
&lt;p&gt;I must thank Nicoletta for supplying this splendid example, which in a small space contains so many transpositions. Here is the variant graph built &lt;em&gt;automatically&lt;/em&gt; from the three versions. When I say 'automatically' what I mean is that I drew the graph manually from the program's textual output. The program was set to make no variants of less than five characters, although it does split arcs down to a single character. There are two transpositions, each present twice. I have indicated these by drawing the transposed forms in grey. The parent arcs are in black and the two are connected by dotted lines. The triple repetition of 'Et' at the start of the graph could be removed by reducing the minimal variant size. At the moment I am happy to see such high quality output without resorting to fine tuning.&lt;/p&gt;&lt;p&gt;The best thing about the program is the degree to which repetitions between versions have been systematically removed. This is the whole objective of the variant graph model.&lt;/p&gt;
&lt;a href="http://3.bp.blogspot.com/_GGwOcLYrsVk/SSKTmIxCZ0I/AAAAAAAAAGY/e_7wiH6u_0c/s1600-h/transpose.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: undefinedpx; height:undefinedpx" src="http://3.bp.blogspot.com/_GGwOcLYrsVk/SSKTmIxCZ0I/AAAAAAAAAGY/e_7wiH6u_0c/s400/transpose.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5269936797374375746" /&gt;&lt;/a&gt;
&lt;p&gt;This is, of course, only a test program. The algorithm will eventually be added to NMerge and all this will happen behind the scenes in the multi-version wiki whenever you save.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-6915047723380378986?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/6915047723380378986/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=6915047723380378986' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6915047723380378986'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6915047723380378986'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/11/mvd-overview.html' title='Transpositions Conquered'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_GGwOcLYrsVk/SSKTmIxCZ0I/AAAAAAAAAGY/e_7wiH6u_0c/s72-c/transpose.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-5220808325213373473</id><published>2008-10-26T17:39:00.000-07:00</published><updated>2008-11-10T18:03:13.178-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Align'/><title type='text'>Transpositions</title><content type='html'>&lt;p&gt;Transpositions are undeniably part of real world texts, and must be included in any practical solution to the technical problems of how to represent overlapping structures in digital texts. The MVD or variant graph model described on this website includes transpositions, but until now there has been no way to calculate them automatically.&lt;/p&gt;
&lt;p&gt;Transpositions can't be supported in any markup scheme. In principal, any section of the document can be transposed, not only small bits of text. This means that a feature like the &amp;lsquo;copyof&amp;rsquo; attribute (Smith, 1999), if used to record transpositions, would have to allow &lt;em&gt;any&lt;/em&gt; element to be contained in any other element &amp;ndash; thus destroying the whole idea of a document schema or DTD. Also, the transposed sections might not even contain well-formed markup, i.e. they might contain unmatched start or end-tags. So this approach doesn&amp;rsquo;t work.&lt;/p&gt;
&lt;p&gt;The method I have been using until now is that suggested by Lopresti and Tomkins (1997): that one should do all the alignments, variants, deletions and insertions &lt;em&gt;first&lt;/em&gt;, and then consider pairs of insertions/deletions as candidates for transposition &lt;em&gt;afterwards&lt;/em&gt;. The problem is that, while this works well for two versions, it leads to problems caused by &amp;lsquo;contamination&amp;rsquo; between multiple versions. You end up with the same word being recorded as a variant of itself in another version. What is left over after the completion of alignment is a lot of noise that does not form a suitable basis for the calculation of transpositions.&lt;/p&gt;

&lt;h4&gt;The French Connection&lt;/h4&gt;
&lt;p&gt;Alternatively, the approach adopted by Bourdaillet (2007) in his thesis is the exact opposite: do the transpositions (and alignments) &lt;em&gt;first,&lt;/em&gt; then what is left over can be considered as candidate variants, insertions and deletions. His method is, however, still tied to just two versions, but there are some useful ideas that can be applied to the case of aligning N versions too.&lt;/p&gt;
&lt;p&gt;One reason why I suspect he avoided trying to merge N versions into one document was that he didn't have a data structure to record it, but also because, until now, this has been considered to be too hard a problem. However, I believe that it is solvable, by combining his general approach with the MVD document structure. I will try to describe how it will work, by illustrating a simple example step by step.&lt;/p&gt;
&lt;h4&gt;The Case for Automatic Detection of Transpositions&lt;/h4&gt;
&lt;p&gt;You might think that automatically detecting transpositions would be a bad idea. If the author transposed some text in a holograph, this should be clear from the manuscript and can simply be encoded as such by the editor. But what about the case where there are several versions of the same work &amp;ndash; perhaps redrafts of a single text or independent versions created by copying? Spotting such transpositions visually is very hard work. In these cases calculating transpositions is a good idea, and saves the editor a lot of trouble. Even in the holograph case, calculating transpositions can still be useful. So long as the computer gets it right we don't have to encode it manually (which saves work); only when it gets it wrong do we have to do anything.&lt;/p&gt;
&lt;h4&gt;Outline of the Proposed Method&lt;/h4&gt;
&lt;p&gt;Imagine you have already aligned N versions into an MVD. Now you want to add the N+1th version to the structure. (This is the standard inductive formulation: if we can prove that it works in this case, then it will always work.) The Longest-Common-Substring or LCS is the longest section of text shared by the MVD and the new version where successive characters are all the same. The basic algorithm uses this property to merge the entire graph. In essence the algorithm is simply:
&lt;ol&gt;&lt;li&gt;Merge the variant graph and the new version where the LCS occurs.&lt;/li&gt;
&lt;li&gt;Call the algorithm recursively on the two unaligned sections before and after the LCS.&lt;/li&gt;&lt;/ol&gt;&lt;/p&gt;
&lt;h4&gt;The Challenge&lt;/h4&gt;
&lt;p&gt;As an example consider the three versions:&lt;/p&gt;
&lt;pre&gt;
1. The quick brown fox jumps over the lazy dog.
2. The quick white rabbit jumps over the lazy dog.
3. The quick brown ferret leaps over the lazy dog.
&lt;/pre&gt;
Imagine we have already built an MVD out of this. Now we want to merge it with:
&lt;pre&gt;
4. The white quick rabbit jumps over the dog.
&lt;/pre&gt;
&lt;p&gt;There is a small transposition here: version 4 has &amp;lsquo;white quick&amp;rsquo; instead of &amp;lsquo;quick white&amp;rsquo; in version 2. Let&amp;rsquo;s see if the algorithm can detect it.&lt;/p&gt;
&lt;h4&gt;Following the Algorithm as it Works&lt;/h4&gt;
&lt;p&gt;We can add the fourth version to the graph very easily, by creating one big arc with the text of the version and attaching it to the start and end of the graph, like this:&lt;/p&gt;
&lt;a href="http://3.bp.blogspot.com/_GGwOcLYrsVk/SRS2zAkkenI/AAAAAAAAAEo/Zm0rtXWt-CY/s1600-h/trans1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: undefinedpx; height: undefinedpx;" src="http://3.bp.blogspot.com/_GGwOcLYrsVk/SRS2zAkkenI/AAAAAAAAAEo/Zm0rtXWt-CY/s400/trans1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5266034851745921650" /&gt;&lt;/a&gt;
&lt;p&gt;This is already a valid variant graph, but it is full of redundancy. Every word of the new version can also be found in the &amp;lsquo;old&amp;rsquo; graph. Apart from wasting storage, this redundancy fails to inform us of the relationship between the various parts of the new version and the rest of the document. The algorithm will correct this problem by gradually removing all of the copies.&lt;/p&gt;
&lt;p&gt;The &amp;lsquo;longest-common-substring&amp;rsquo; (or LCS) between the first three versions and the 4th one is &amp;lsquo;rabbit jumps over the&amp;rsquo; from version 2, even though bits of that string are shared by other versions. What we do is align version 4 with the LCS, leaving two bits at either end non-aligned. The two bits are &amp;lsquo;The white quick&amp;rsquo; and &amp;lsquo;dog.&amp;rsquo;:&lt;/p&gt;
&lt;a href="http://3.bp.blogspot.com/_GGwOcLYrsVk/SRS4EmdSB6I/AAAAAAAAAE4/I-7nZ7viegY/s1600-h/trans2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: undefinedpx; height: undefinedpx;" src="http://3.bp.blogspot.com/_GGwOcLYrsVk/SRS4EmdSB6I/AAAAAAAAAE4/I-7nZ7viegY/s400/trans2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5266036253485303714" /&gt;&lt;/a&gt;
&lt;p&gt;Note that &amp;lsquo;rabbit jumps over the&amp;rsquo; has now acquired version D. We then call the same routine (this is called recursion) on these two bits left over, but align them with the corresponding parts of the MVD that precede &lt;em&gt;OR&lt;/em&gt; follow the LCS in version 2.&lt;/p&gt;
&lt;a href="http://3.bp.blogspot.com/_GGwOcLYrsVk/SRS9fqJGF_I/AAAAAAAAAFA/PQfNgHI1AJQ/s1600-h/trans3.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: undefinedpx; height: undefinedpx;" src="http://3.bp.blogspot.com/_GGwOcLYrsVk/SRS9fqJGF_I/AAAAAAAAAFA/PQfNgHI1AJQ/s400/trans3.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5266042215888984050" /&gt;&lt;/a&gt;
&lt;p&gt;We now have two subgraphs containing arcs that definitely precede or follow the LCS. All the other arcs that are effectively parallel to the LCS are left in place but are not considered further. Now we have to try to align Arc 1 &amp;lsquo;The white quick&amp;rsquo; with Graph 1 &lt;em&gt;and&lt;/em&gt; Graph 2, and likewise align Arc 2 &amp;lsquo;dog.&amp;rsquo; with graphs 1 &amp;amp; 2. Because we now have more than one subgraph we will have to consider transpositions. However, when we compare the LCS between Arc 2 and Graph 1 (nothing) with that calculated between Arc 2 and Graph 2 (&amp;lsquo;dog.&amp;rsquo;) it is clear that no transpositions are possible. Instead, the best alignment is the direct one of &amp;lsquo;dog.&amp;rsquo; between versions ABC and D:&lt;/p&gt;
&lt;a href="http://4.bp.blogspot.com/_GGwOcLYrsVk/SRTB2K5tHsI/AAAAAAAAAFQ/8tSPQDwG0VU/s1600-h/trans4.jpg"&gt;&lt;img width="150" style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: undefinedpx; height: undefinedpx;" src="http://4.bp.blogspot.com/_GGwOcLYrsVk/SRTB2K5tHsI/AAAAAAAAAFQ/8tSPQDwG0VU/s400/trans4.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5266047000686436034" /&gt;&lt;/a&gt;
&lt;p&gt;We have no more D-text to align here so the process stops. The empty D-arc becomes a deletion in version D. On the other hand, there is still work to do on the left, in Graph 1. Here comparison between Graph 1 and Arc 1 suggests either &amp;lsquo;white&amp;rsquo; or &amp;lsquo;quick&amp;rsquo; as the LCS. We will choose &amp;lsquo;quick&amp;rsquo; because it is more central:&lt;/p&gt;
&lt;a href="http://3.bp.blogspot.com/_GGwOcLYrsVk/SRTJRQA_ScI/AAAAAAAAAFY/4eB-vEqBIrM/s1600-h/trans5.jpg"&gt;&lt;img width="250" style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: undefinedpx; height: undefinedpx;" src="http://3.bp.blogspot.com/_GGwOcLYrsVk/SRTJRQA_ScI/AAAAAAAAAFY/4eB-vEqBIrM/s400/trans5.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5266055162497026498" /&gt;&lt;/a&gt;
&lt;p&gt;This is still not quite right. The instance of &amp;lsquo;white&amp;rsquo; on the left is clearly a transposition of the &amp;lsquo;white&amp;rsquo; on the right. Again the merging of the LCS leads to two arcs and two graphs:&lt;/p&gt;
&lt;a href="http://2.bp.blogspot.com/_GGwOcLYrsVk/SRTMYVtOXQI/AAAAAAAAAFo/oLnH0Z9Y1eI/s1600-h/trans6.jpg"&gt;&lt;img width="200" style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: undefinedpx; height: undefinedpx;" src="http://2.bp.blogspot.com/_GGwOcLYrsVk/SRTMYVtOXQI/AAAAAAAAAFo/oLnH0Z9Y1eI/s400/trans6.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5266058582818708738" /&gt;&lt;/a&gt;
&lt;p&gt;Calculation of the LCS between Arc 1 and Graph 2 yields &amp;lsquo;white&amp;rsquo;, shown here in bold. So we merge the LCS into the graph, except that this is a transposition, and so we must leave the text where it is and point to the target of the transposition. This leaves only one copy of &amp;lsquo;white&amp;rsquo; in the graph, and another copy, shown in grey, that points to it.&lt;/p&gt;
&lt;a href="http://2.bp.blogspot.com/_GGwOcLYrsVk/SRTOzZLatsI/AAAAAAAAAFw/bDs7h0ArtBE/s1600-h/trans7.jpg"&gt;&lt;img width="250" style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: undefinedpx; height: undefinedpx;" src="http://2.bp.blogspot.com/_GGwOcLYrsVk/SRTOzZLatsI/AAAAAAAAAFw/bDs7h0ArtBE/s400/trans7.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5266061246630377154" /&gt;&lt;/a&gt;
&lt;p&gt;We are still not finished. &amp;lsquo;The&amp;rsquo; appears twice. So we need one further LCS calculation between the &amp;lsquo;The&amp;rsquo; D-arc and the &amp;lsquo;The&amp;rsquo; ABC-arc:&lt;/p&gt;
&lt;a href="http://2.bp.blogspot.com/_GGwOcLYrsVk/SRUBSR1lqoI/AAAAAAAAAF4/67k5q-aMD4g/s1600-h/trans8.jpg"&gt;&lt;img width="150" style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: undefinedpx; height: undefinedpx;" src="http://2.bp.blogspot.com/_GGwOcLYrsVk/SRUBSR1lqoI/AAAAAAAAAF4/67k5q-aMD4g/s400/trans8.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5266116752817105538" /&gt;&lt;/a&gt;
&lt;p&gt;Now the two &amp;lsquo;The&amp;rsquo;s are merged, all that remains is to introduce an empty ABC-arc to indicate that &amp;lsquo;white&amp;rsquo; only appears in that position in version D.&lt;/p&gt;
&lt;h4&gt;Taking Stock&lt;/h4&gt;
&lt;p&gt;We have been recursing into smaller and smaller portions of the graph. That does not mean that these portions or subgraphs are in any way detached from the rest of the graph. The other parts were simply omitted for clarity. Overall the graph now looks like this:&lt;/p&gt;
&lt;a href="http://1.bp.blogspot.com/_GGwOcLYrsVk/SRUKXlnfU2I/AAAAAAAAAGI/mnBcPknuHgo/s1600-h/trans9.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: undefinedpx; height: undefinedpx;" src="http://1.bp.blogspot.com/_GGwOcLYrsVk/SRUKXlnfU2I/AAAAAAAAAGI/mnBcPknuHgo/s400/trans9.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5266126739630674786" /&gt;&lt;/a&gt;
&lt;p&gt;The text of version D has been fully assimilated. It has been aligned with ALL the versions of the graph, not just with the one that was most similar to the version we were trying to add. (This is what biologists do in their &amp;lsquo;progressive&amp;rsquo; alignment technique, and they don't even attempt transpositions). The result is a much better alignment with a lot less redundancy. To add further versions to the MVD we simply repeat the above steps, with the new variant graph as our starting point.&lt;/p&gt;
&lt;h4&gt;Time Complexity&lt;/h4&gt;
&lt;p&gt;I believe this routine may eventually be O(N log N), that is as fast as the famous &amp;lsquo;quicksort&amp;rsquo; routine of Hoare from 1961, which it resembles. At the moment it is somewhat slower because my current calculation of the LCS takes O(N&lt;sup&gt;2&lt;/sup&gt;) time in the worst case. The LCS between two &lt;em&gt;strings&lt;/em&gt; can be calculated in O(N) time according to Gusfield. But that requires the construction of two suffix trees using Ukkonen's 1995 algorithm. I have implemented that for the text of the new version but I can't generate a suffix tree for the variant graph because it is too difficult, and may not be possible in O(N) time. To calculate the LCS I just traverse the graph, looking for runs of matching characters in the new version which has been converted into a suffix tree using Ukkonen's algorithm. Overall I think this is O(N&lt;sup&gt;2&lt;/sup&gt;). However, even as it now stands, the algorithm is still very fast because expected performance is usually much better than that.&lt;/p&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;D. Lopresti and A. Tomkins, &amp;lsquo;Block edit models for approximate string matching&amp;rsquo;, &lt;i&gt;Theoretical Computer Science&lt;/i&gt; 1997, 181, 159&amp;ndash;179.&lt;br/&gt;
J. Bourdaillet, &lt;i&gt;Alignment textuel monolingue avec recherche de d&amp;eacute;placements: algorithmique pour la critique g&amp;eacute;n&amp;eacute;tique&lt;/i&gt; PhD Thesis, Universit&amp;eacute; Paris 6 Pierre et Marie Curie, 2007.&lt;br/&gt;
C. Hoare, &amp;lsquo;Partition: Algorithm 63&amp;rsquo;, &amp;lsquo;Quicksort: Algorithm 64,&amp;rsquo; &lt;i&gt;Communications of the ACM&lt;/i&gt; 4(7), 321&amp;ndash;322, 1961&lt;br/&gt;
D. Smith, &amp;lsquo;Textual Variation and Version Control in the TEI&amp;rsquo; &lt;i&gt;Computers and the Humanities&lt;/i&gt;, 33.1, 1999, 103&amp;ndash;112.&lt;br/&gt;
E. Ukkonen, &amp;lsquo;On-line Construction of Suffix Trees&amp;rsquo; &lt;i&gt;Algorithmica&lt;/i&gt; 14 (1995), 249--260.&lt;br/&gt;
D. Gusfield &lt;i&gt;Algorithms on Strings, Trees and Sequences&lt;/i&gt;, Cambridge: Cambridge University Press, 1997.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-5220808325213373473?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/5220808325213373473/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=5220808325213373473' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5220808325213373473'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5220808325213373473'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/10/transpositions.html' title='Transpositions'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_GGwOcLYrsVk/SRS2zAkkenI/AAAAAAAAAEo/Zm0rtXWt-CY/s72-c/trans1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-6216410517893927606</id><published>2008-09-11T00:23:00.001-07:00</published><updated>2008-09-11T17:17:10.960-07:00</updated><title type='text'>MVD paper accepted</title><content type='html'>In case anyone was wondering if my theories have been independently verified, the International Journal of Human-Computer Studies has just accepted my paper that explains the core idea behind the MVD technology. This is a respected technical journal and I spared no detail in the paper, which is 16 pages long. There is a link at the bottom of the &lt;a href="http://multiversiondocs.blogspot.com/2008/03/whats-multi-version-document.html"&gt;"What is an MVD?"&lt;/a&gt; page to the PDF I submitted.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-6216410517893927606?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/6216410517893927606/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=6216410517893927606' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6216410517893927606'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6216410517893927606'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/09/mvd-paper-accepted.html' title='MVD paper accepted'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-7250671412279799697</id><published>2008-07-29T14:27:00.000-07:00</published><updated>2008-08-06T04:50:08.792-07:00</updated><title type='text'>Improvements to Alignment</title><content type='html'>&lt;p&gt;In preparation to publicly releasing the NMerge code I have been revising the alignment algorithm, as there were significant weaknesses in the naïve approach I had previously adopted. In cases like the Sibylline Gospel, there is a great deal of variation in a small space, and simply requiring the user to choose a base text for alignment doesn&amp;rsquo;t work very well. There seem to be a few reasons for this:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;The user doesn't actually know to which existing version a new version should be aligned. It should be the task of the software to find this out.&lt;/li&gt;
&lt;li&gt;There can't be a distinction, as I originally supposed, between updating an existing version and adding a new one. The newly revised version might resemble another version more than the one it came from. For example, we might change &amp;lsquo;the quick brown dog&amp;rsquo; &lt;em&gt;back to&lt;/em&gt; &amp;lsquo;the quick brown fox&amp;rsquo;. In the case of manuscript traditions like the Sibylline Gospel this happens all the time, because the versions aren&amp;rsquo;t a succession of edits, but a set of parallel alternatives. To avoid this problem I now automatically calculate the most similar version already in the MVD and then align with that.&lt;/li&gt;
&lt;li&gt;When you add a new arc to the graph you need to look for identical &lt;em&gt;paths&lt;/em&gt; not merely identical &lt;em&gt;arcs&lt;/em&gt; that are already in that position. Now when the program finds an existing path with the same text, spanning the same two end-points, the new arc can be discarded and its version simply added to the path instead:
&lt;a href="http://4.bp.blogspot.com/_GGwOcLYrsVk/SJmP3KT853I/AAAAAAAAADc/-6XU_S1Atn8/s1600-h/optimise.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_GGwOcLYrsVk/SJmP3KT853I/AAAAAAAAADc/-6XU_S1Atn8/s400/optimise.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5231370619991156594" /&gt;&lt;/a&gt;
In this simplified real-world example, the new D-version was aligned with version C, overall its most similar version. However,  in this location the D-variant &amp;lsquo;milia hominum&amp;rsquo; already exists in version A. When the program tried to add the D-Arc it realised that there was already an A-path with the same text and instead merely added the D-version to that path.&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;This is only the first stage of a series of improvements. Still to come: &lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Calling the alignment algorithm recursively to handle contamination, i.e. aligning to more than one base text. After aligning to the most similar version, any significant portions of the new version that didn't align can be realigned to the most similar version &lt;em&gt;between the two endpoints&lt;/em&gt; of the unaligned section.&lt;/li&gt;
&lt;li&gt;Calculating what is a variant of what, one use of which might be to generate a kind of &lt;em&gt;apparatus criticus&lt;/em&gt; &lt;/li&gt;
&lt;li&gt;Calculating transpositions. These can be done after the other alignments are complete: any leftover insertion/deletion pairs meeting certain criteria can be tested for equality, and the transposition carried out.&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;
&lt;h3&gt;XML Awareness&lt;/h3&gt;
&lt;p&gt;While NMerge remains ignorant of XML, as it should, the public interface class now breaks up &amp;lsquo;words&amp;rsquo; based on angle-brackets as well as white space. In addition I am contemplating moving this code and the class that XML-izes the differences between versions into a separate package. The latter functionality is needed by the wiki since a difference might occur in the middle of a tag, and the marker for this needs to be moved to the start of the next piece of real text, so it can be displayed.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-7250671412279799697?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/7250671412279799697/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=7250671412279799697' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7250671412279799697'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/7250671412279799697'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/07/improvements-to-alignment.html' title='Improvements to Alignment'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_GGwOcLYrsVk/SJmP3KT853I/AAAAAAAAADc/-6XU_S1Atn8/s72-c/optimise.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-704379530132135021</id><published>2008-06-25T13:50:00.000-07:00</published><updated>2008-06-25T14:35:56.020-07:00</updated><title type='text'>Extreme Makeover</title><content type='html'>&lt;p&gt;The look and feel of the entire application can be changed by simply altering the CSS and XSL stylesheets for each page. I have therefore executed an extreme makeover for the demonstration in Oulu. It now conforms to what users expect of a simple, clear interface, even though the functionality is exactly the same as before. This is more than just window-dressing, however. It is important, for example, that the search box positions itself automatically at the bottom of the window however large it is, also that the width of Twin View does not exceed that of a normal laptop screen size (1024x768). And fixed column widths are now the norm in a world in which wide screens are commonplace. This was never an issue when people had screens of 800x600 pixels in size. Then you wanted your webpage to stretch the full width, so that resizing of the window caused the text to reflow. As typographers will tell you, however, text is most easily read in narrow columns. I'm unsure what width to use but I chose sizes that seem to be widely used: 660 for Single View and 480 for each of the columns in Twin View. It looks particularly good on OSX, nearly as good in Firefox on the PC and not very good at all in Internet Explorer. In fact getting it to look nearly the same in IE and Firefox is quite hard.&lt;/p&gt;
&lt;p&gt;I didn't get as much finished for the conference as I hoped, but it is still quite a lot. I would estimate that the application is about 70-80% finished, and that it could certainly be completed by the end of the year.&lt;/p&gt;
&lt;a href="http://4.bp.blogspot.com/_GGwOcLYrsVk/SGK6Lm8dD6I/AAAAAAAAADA/Bl8BhkhEnJQ/s1600-h/makeover.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_GGwOcLYrsVk/SGK6Lm8dD6I/AAAAAAAAADA/Bl8BhkhEnJQ/s400/makeover.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5215936027043893154" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-704379530132135021?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/704379530132135021/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=704379530132135021' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/704379530132135021'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/704379530132135021'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/06/extreme-makeover.html' title='Extreme Makeover'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_GGwOcLYrsVk/SGK6Lm8dD6I/AAAAAAAAADA/Bl8BhkhEnJQ/s72-c/makeover.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-1890465180712600064</id><published>2008-06-16T03:18:00.000-07:00</published><updated>2008-06-16T04:35:34.060-07:00</updated><title type='text'>Lost in Hyperspace</title><content type='html'>&lt;p&gt;The drawback with TwinView is that, in spite of its power to display the &lt;em&gt;differences&lt;/em&gt; between two versions, it becomes so hard to see what is the &lt;em&gt;same&lt;/em&gt;. After a few insertions and deletions on each side the two columns quickly get out of alignment. When the user scrolls down to study a bit of text how does he or she find the corresponding text in the other version? Without synchronisation you are forced to scroll up and down manually to try to match up the texts when the computer should be really be doing all that for you in the blink of an eye.&lt;/p&gt;

&lt;p&gt;The possibilities of providing visual cues in a web browser to indicate alignment are quite limited. There is no way, for example, to connect text blocks using lines that cross from one window to another, as in &lt;a href="http://www.xanadu.com.au/ted/TN/PARALUNE/paraviz.html"&gt;Nelson's Parallel Textface&lt;/a&gt;. And it is very difficult or impossible (at least &lt;em&gt;I&lt;/em&gt; can't see how) to synchronise the scrolling of two &lt;code&gt;&amp;lt;div&amp;gt;s&lt;/code&gt; in parallel. On the other hand, maybe something simpler might actually be more effective. The technique I eventually settled on uses &lt;code&gt;&amp;lt;span&amp;gt;&lt;/code&gt; and the &lt;code&gt;onclick&lt;/code&gt; attribute. When the user clicks on a bit of black text common to both sides it causes the corresponding text in the other frame to scroll down to precisely the same position as the clicked-on text, and then highlights it. Using &lt;code&gt;&amp;lt;span&amp;gt;&lt;/code&gt; instead of hyperlinks has the advantage that the text is still selectable &amp;ndash; very handy for choosing and copying text to search for in other versions. It is also intuitive: a na&amp;iuml;ve user will eventually click on the text and discover the alignment facility. (Text copyright Vicenzo Cerami)&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_GGwOcLYrsVk/SFZBgWFydVI/AAAAAAAAACo/cqyr4Vwbh9s/s1600-h/synchronise1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_GGwOcLYrsVk/SFZBgWFydVI/AAAAAAAAACo/cqyr4Vwbh9s/s400/synchronise1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5212425642669405522" /&gt;&lt;/a&gt;
&lt;p&gt;&lt;i&gt;Before the click on the left &amp;ndash; hopelessly out of alignment&lt;/i&gt;&lt;/p&gt;
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_GGwOcLYrsVk/SFZBg-Q5JdI/AAAAAAAAACw/NbeteI3j3I4/s1600-h/synchronise2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_GGwOcLYrsVk/SFZBg-Q5JdI/AAAAAAAAACw/NbeteI3j3I4/s400/synchronise2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5212425653453399506" /&gt;&lt;/a&gt;
&lt;p&gt;&lt;i&gt;... and after &amp;ndash; beautifully aligned and everything clear.&lt;/i&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-1890465180712600064?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/1890465180712600064/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=1890465180712600064' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/1890465180712600064'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/1890465180712600064'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/06/lost-in-hyperspace.html' title='Lost in Hyperspace'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_GGwOcLYrsVk/SFZBgWFydVI/AAAAAAAAACo/cqyr4Vwbh9s/s72-c/synchronise1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-2662656109692465933</id><published>2008-06-08T16:19:00.000-07:00</published><updated>2008-06-16T04:27:45.461-07:00</updated><title type='text'>Search</title><content type='html'>&lt;p&gt;One of the major advantages of the MVD format is the ability to search all versions simultaneously. The search implementation in the underlying platform (as in MVDDeskViewer) has been around for some time, but making it available in the wiki, where the search results have to be sent to the browser as HTML, and the text highlighted and merged with other highlighted text, was not easy. However, it all seems to work fine now.&lt;/p&gt;
&lt;p&gt;The user can search only the currently selected version (that's the "&amp;gt;" button in the image below) or all versions (the "&amp;gt;&amp;gt;" button). When searching all versions the display loads new versions as needed; it also updates the other window to reflect the different insertions or deletions with respect to the new version. You may search in either the left-hand or right-hand window in TwinView. Repeated clicks on a search button find successive matches starting from the current version; it also wraps around at the end, eventually taking you back to your original version.&lt;/p&gt;
&lt;h3&gt;Bug Fixes&lt;/h3&gt;
&lt;p&gt;This iteration also removes some outstanding bugs which appear only on Windows in Internet Explorer particularly. In EditVersions previously if you chose to edit, add or delete a version or group in a Windows browser it usually failed. Thankfully, it doesn't do so any more. I have tested it on Firefox and IE6 on Windows XP, and also on Safari and Firefox on Mac OSX.&lt;/p&gt;
&lt;h3&gt;Wish-list&lt;/h3&gt;
&lt;p&gt;There remain, however, some deficiencies:
&lt;ul&gt;&lt;li&gt;Because each set of search results are sent to the browser, the &lt;em&gt;entire&lt;/em&gt; text of each version, or pair of versions, has to be sent each time, making the response seem a bit slow, when in fact the search itself is really fast. I need to use AJAX here to only send the differences and update the HTML. However, this is not a real problem at the moment because the client and server are on the same machine.&lt;/li&gt;
&lt;li&gt;Because I am developing on a very wide screen I tend to forget how little screen space there is on the average laptop. The buttons on the left of the display are a serious problem in TwinView, and will have to be moved above the text-columns as a matter of some urgency.&lt;/li&gt;
&lt;li&gt;Also urgent is providing some means of synchronisation between columns in TwinView. I tried but failed in this iteration to make it align the text on one side whenever you click on the text on the other. Getting this to work cross-browser in JavaScript, however, has so far eluded me.&lt;/li&gt;&lt;/ul&gt;
Here is another screen dump showing how it displays search hits using a background colour and &amp;lt;span&amp;gt; elements in HTML. This is a bit of kludge, but using real selections, as you do with a mouse, doesn't work on all browsers. (Text &amp;copy; N. Brocca 2008)
&lt;a href="http://4.bp.blogspot.com/_GGwOcLYrsVk/SExuneloGDI/AAAAAAAAACg/HoX-vllcqbw/s1600-h/searchall.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_GGwOcLYrsVk/SExuneloGDI/AAAAAAAAACg/HoX-vllcqbw/s400/searchall.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5209660493465655346" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-2662656109692465933?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/2662656109692465933/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=2662656109692465933' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2662656109692465933'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/2662656109692465933'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/06/search.html' title='Search'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_GGwOcLYrsVk/SExuneloGDI/AAAAAAAAACg/HoX-vllcqbw/s72-c/searchall.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-9218502948173872847</id><published>2008-05-19T04:44:00.000-07:00</published><updated>2008-05-19T14:00:54.002-07:00</updated><title type='text'>Editable Versions and Groups</title><content type='html'>&lt;p&gt;At last I can edit groups and versions, delete them, move them between groups or rename them. The Version Edit screen looks a bit crowded (see below) but I reduced the clutter somewhat by using links instead of buttons. Some people think this is poor style. It is far worse to have buttons disguised as links to achieve a similar effect. It is true that the wiki won't function without JavaScript, but it is simply impossible to build this level of functionality without it.&lt;/p&gt;
&lt;a href="http://2.bp.blogspot.com/_GGwOcLYrsVk/SDFqzWy9_2I/AAAAAAAAACQ/RcwVPeQoIW8/s1600-h/editversions.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://2.bp.blogspot.com/_GGwOcLYrsVk/SDFqzWy9_2I/AAAAAAAAACQ/RcwVPeQoIW8/s400/editversions.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5202056475114864482" /&gt;&lt;/a&gt;
&lt;p&gt;I have also added a Revert button, which restores the text or the versions to the state they were in after the last click on the Save button. The "Edit..." links take you to separate pages which allow you to edit the characteristics of the group or version:&lt;/p&gt;
&lt;a href="http://4.bp.blogspot.com/_GGwOcLYrsVk/SDFwP2y9_3I/AAAAAAAAACY/sFDxWIVXKZ4/s1600-h/editversion.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_GGwOcLYrsVk/SDFwP2y9_3I/AAAAAAAAACY/sFDxWIVXKZ4/s400/editversion.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5202062462299275122" /&gt;&lt;/a&gt;
&lt;p&gt;What remains now are a few important features that I hope to set right before we go to Oulu:
&lt;ol&gt;&lt;li&gt;Find &amp;ndash; I need to reimplement the same search facility we had in the desktop version of the tool. It should be easy because the highlighting of search hits is similar to the highlighting of differences between two versions, which we already have. No other application can search multiple versions &lt;em&gt;simultaneously and in linear time&lt;/em&gt; &amp;ndash; this is one of the great strengths of the MVD format.&lt;/li&gt;
&lt;li&gt;Beautify the buttons &amp;ndash; these could be reduced to standard icons, and arranged over the top of each text column, instead of at the side. Rather than ugly strings the labels could vanish into 'tool tip text' that only appears when you mouse over them.&lt;/li&gt;
&lt;li&gt;Allow the user to author an MVD from scratch. This is meant to be the centrepiece of the demo at Oulo. It is fairly easy, but I will need to automatically markup plain text using TEI encoding and also check the syntax before committing it. That may have to be done in JavaScript again, otherwise we won't be able to alert the user to syntax errors.&lt;/li&gt;
&lt;li&gt;Image view and edit image view &amp;ndash; this is really cosmetic but could be quite impressive. The only way I can think of getting the correct image to display in HTML as you scroll down is to use hyperlinks in the right hand window that call JavaScript to set the image of the left hand side. At the moment I don't know how to achieve the same effect in edit view.&lt;/li&gt;
&lt;li&gt;Synchronise the left and right hand text in Twin View. This can also be done by hyperlinks, as it was in the multilingual example metioned earlier. What it means is that the text on the right corresponds to the text immediately on the left. The user should never have to scroll and read to align the texts manually. This is really important.&lt;/li&gt;
&lt;li&gt;Transpositions &amp;ndash; I think these can be done as an optimising step after alignment has been completed. Then any significant pairs of deletions/insertions can be tested to see if they qualify as transpositions. I think if we recalculate them after every update there won't be any problems as to which bits are transposed and which bits are insertions/deletions etc. It all boils down to the status of individual fragments: a bit of text is either inserted, deleted, transposed, a match or a variant. When those characteristics change we start a new fragment.&lt;/li&gt;
&lt;li&gt;Edit Twin View &amp;ndash; I want to use AJAX to automatically update the left hand HTML version of the XML the user is editing on the right.&lt;/li&gt;&lt;/ol&gt;&lt;/p&gt;
&lt;p&gt;That's quite a lot of work before I can call Phaidros version 1 complete, but it shouldn't take more than a couple of months at most. Then we will have a completely new tool to play with. It will be exciting to see where we can take multi-version text then.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-9218502948173872847?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/9218502948173872847/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=9218502948173872847' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9218502948173872847'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9218502948173872847'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/05/fully-editable-versions-groups.html' title='Editable Versions and Groups'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_GGwOcLYrsVk/SDFqzWy9_2I/AAAAAAAAACQ/RcwVPeQoIW8/s72-c/editversions.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-9045137417121658856</id><published>2008-05-04T23:49:00.001-07:00</published><updated>2008-05-05T01:20:33.015-07:00</updated><title type='text'>Twin View</title><content type='html'>&lt;p&gt;The idea of viewing two versions of a document side by side and comparing them goes back at least as far as &lt;a href="http://www.xanadu.com.au/ted/TN/PARALUNE/paraviz.html"&gt;Nelson's 'Parallel Textface'&lt;/a&gt; (1971), in which equivalent text fragments, perhaps transposed, were connected by lines. Programmers are also used to comparing versions in this way, but it is not clear to me how the user is supposed to make sense of the interconnecting lines when the unit of comparison is a &lt;em&gt;word&lt;/em&gt; rather than a &lt;em&gt;line.&lt;/em&gt; Highlighting differences on its own is clearly insufficient, but if combined with synchronisation of the two windows &amp;ndash; keeping the text on the left in line with the text on the right &amp;ndash; this should be enough to prevent the user losing his or her way. In any case interconnecting lines are impossible in HTML windows, and one of the goals of Phaidros was to display multi-version texts in an ordinary web browser.&lt;/p&gt;
&lt;a href="http://3.bp.blogspot.com/_GGwOcLYrsVk/SB63hwZXn3I/AAAAAAAAACI/hwQ7DkThVQg/s1600-h/compare.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://3.bp.blogspot.com/_GGwOcLYrsVk/SB63hwZXn3I/AAAAAAAAACI/hwQ7DkThVQg/s400/compare.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5196792810587004786" /&gt;&lt;/a&gt;
&lt;p&gt;&lt;em&gt;Comparing two versions of the Sibylline Gospel (&amp;copy; N. Brocca, 2008)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;With this goal in view I have now added Twin View to the wiki, which highlights the differences between two texts. In the Sibylline Gospel example there are 36 versions. The total number of pairs that could be compared is thus (36*35)/2 = 630. Some of the rival programs, such as &lt;a href="www.patacriticism.org/juxta"&gt;Juxta&lt;/a&gt; and &lt;a href="http://www-poleia.lip6.fr/~ganascia/Medite_Project"&gt;Medite&lt;/a&gt; compare texts in real time, cacheing the results in case the same pairs of texts are compared again. But this doesn't work so well when there are more than a handful of versions. An MVD file has the advantage that any combination of versions can be displayed in equal time and the comparison results don't need to be cached. So even if there are 5,600 versions, as in the case of the Greek New Testament, you will always get a quick response.&lt;/p&gt;
&lt;p&gt;The version of each window can be changed by choosing the desired version from its popup menu. Text on the left that does not appear in the version on the right is highlighted red, and extra text on the right, not found in the left, is highlighted blue. As yet, transpositions are not detected &amp;ndash; they will be shown as insertion/deletion pairs instead. Also, synchronisation has not yet been added, but as in the &lt;a href="http://www.itee.uq.edu.au/~schmidt/multilingual/"&gt;multilingual example&lt;/a&gt;, referenced at the start of this blog, synchronisation in HTML is possible by using hyperlinks to invisibly connect the left hand text with the text on the right. This should be added soon in a later update to the program.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-9045137417121658856?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/9045137417121658856/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=9045137417121658856' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9045137417121658856'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/9045137417121658856'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/05/adding-twin-view.html' title='Twin View'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_GGwOcLYrsVk/SB63hwZXn3I/AAAAAAAAACI/hwQ7DkThVQg/s72-c/compare.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-6256018815797248149</id><published>2008-04-23T20:06:00.001-07:00</published><updated>2008-07-13T01:45:38.481-07:00</updated><title type='text'>Viewing Version Definitions</title><content type='html'>&lt;p&gt;At last some improvements: you can now view the set of versions in an MVD. These can be grouped hierarchically to any depth, and each step in the hierarchy can be named:&lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_GGwOcLYrsVk/SA_5lwZXnzI/AAAAAAAAABo/pb7HBHms8FA/s1600-h/sibyllinegospel.bmp"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_GGwOcLYrsVk/SA_5lwZXnzI/AAAAAAAAABo/pb7HBHms8FA/s400/sibyllinegospel.bmp" border="0" alt=""id="BLOGGER_PHOTO_ID_5192643322423254834" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;The Version List&lt;/em&gt;&lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_GGwOcLYrsVk/SA_9egZXn0I/AAAAAAAAABw/5KgQq_eUoOY/s1600-h/recensions.bmp"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://3.bp.blogspot.com/_GGwOcLYrsVk/SA_9egZXn0I/AAAAAAAAABw/5KgQq_eUoOY/s400/recensions.bmp" border="0" alt=""id="BLOGGER_PHOTO_ID_5192647595915714370" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Viewing one version of the Sibylline Gospel&lt;/em&gt;&lt;/p&gt;&lt;p&gt;The text illustrated here is the moderately complex Sibylline Gospel &amp;ndash; an apocalyptic prediction about what will happen at the end of the world. Nicoletta has recorded 36 versions, some of which are shown in the expandable list above. Getting this text right necessitated the fixing of a number of bugs in the NMerge package, but all seems to be well now. To build the document I used the MvdTool, which is a commandline version of NMerge. It does many of the same things as the wiki, but without the user-friendly interface.&lt;/p&gt;&lt;p&gt;Next step is to add twin-view, which will allow the user to instantly see highlighted differences between any two versions in an MVD. The edit version of this view will probably display a HTML rendering on the left and an editable window on the right, updated every few seconds, or at the press of an 'update' button.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-6256018815797248149?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/6256018815797248149/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=6256018815797248149' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6256018815797248149'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6256018815797248149'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/04/next-iteration.html' title='Viewing Version Definitions'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_GGwOcLYrsVk/SA_5lwZXnzI/AAAAAAAAABo/pb7HBHms8FA/s72-c/sibyllinegospel.bmp' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-5807625366537813401</id><published>2008-04-03T18:39:00.000-07:00</published><updated>2008-07-13T01:48:23.901-07:00</updated><title type='text'>Proof of Concept</title><content type='html'>&lt;p&gt;Yesterday I completed the Proof of Concept stage. I now have a very simple web application that allows you to choose an MVD from a list, then to view, edit and &amp;ndash; most importantly &amp;ndash; &lt;em&gt;save&lt;/em&gt; individual versions. No searching, comparing or creation of new versions is yet possible. It doesn't check your XML, so if you make a mistake it falls over. Also, it is as ugly as sin. But I don't care. What matters is that it works. The underlying Nmerge platform is pretty solid and has a lot of functionality I haven't yet exposed in the user interface. This should come fairly quickly now. For the moment here are a couple of screen shots, firstly to show how ugly it is, secondly to show what it can do so far:&lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_GGwOcLYrsVk/R_WLm2GRCpI/AAAAAAAAABg/6BPxE1rdwvU/s1600-h/phaidros1.bmp"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_GGwOcLYrsVk/R_WLm2GRCpI/AAAAAAAAABg/6BPxE1rdwvU/s400/phaidros1.bmp" border="0" alt=""id="BLOGGER_PHOTO_ID_5185204045460081298" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;The MVD file list&lt;/em&gt;&lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_GGwOcLYrsVk/R_WKqGGRCoI/AAAAAAAAABY/xhL5uxz0FsY/s1600-h/phaidros2.bmp"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_GGwOcLYrsVk/R_WKqGGRCoI/AAAAAAAAABY/xhL5uxz0FsY/s400/phaidros2.bmp" border="0" alt=""id="BLOGGER_PHOTO_ID_5185203001783028354" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Viewing one version of an MVD (transformed into HTML)&lt;/em&gt;&lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_GGwOcLYrsVk/R_WJ5mGRCnI/AAAAAAAAABQ/TmpJydNhgQc/s1600-h/phaidros3.bmp"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://3.bp.blogspot.com/_GGwOcLYrsVk/R_WJ5mGRCnI/AAAAAAAAABQ/TmpJydNhgQc/s400/phaidros3.bmp" border="0" alt=""id="BLOGGER_PHOTO_ID_5185202168559372914" /&gt;&lt;/a&gt;&lt;p&gt;&lt;em&gt;Editing one version of an MVD&lt;/em&gt;&lt;/p&gt;&lt;p&gt;Unfortunately, I can't publish the software on here because of copyright restrictions on the example texts. Send me an email if you are interested and if you are part of the research group I can probably send you a copy for testing.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-5807625366537813401?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/5807625366537813401/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=5807625366537813401' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5807625366537813401'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/5807625366537813401'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/04/proof-of-concept.html' title='Proof of Concept'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_GGwOcLYrsVk/R_WLm2GRCpI/AAAAAAAAABg/6BPxE1rdwvU/s72-c/phaidros1.bmp' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-3546891957177017395</id><published>2008-03-16T04:13:00.000-07:00</published><updated>2008-07-13T01:16:03.008-07:00</updated><title type='text'>Merging N versions into an MVD</title><content type='html'>&lt;p&gt;Tonight I finally managed to automatically merge 3 versions of a short story in Spanish into a valid MVD, after literally several years of trying. Previously I had to create MVDs manually for all my demos. Now I have an automatic tool for doing it accurately and fairly quickly. The steps are quite simple, and will be the same steps you would use in the multi-version wiki to build up a multi-version document.&lt;ol&gt;&lt;li&gt;Starting with an empty document containing NO text of version 1, update it with the full text of version 1. This produces an MVD with only one version.&lt;/li&gt;&lt;li&gt;Now update the MVD with the text of the second version. The program does three things:&lt;/li&gt;&lt;ol type="a"&gt;&lt;li&gt;It calculates the differences between version 1 and version 2. These identify parts of version 2 that are the same, or are inserted, deleted, or alternatives to the corresponding parts of version 1. The differences are normally aligned by word or, optionally, by character.&lt;/li&gt; &lt;li&gt;It 'stitches in' the differences using version 1 as a base text to produce a correct MVD of two versions:&lt;a href="http://2.bp.blogspot.com/_GGwOcLYrsVk/R92NMHKuAUI/AAAAAAAAAAw/ax16uwgqhoY/s1600-h/fig19.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" width="240" src="http://2.bp.blogspot.com/_GGwOcLYrsVk/R92NMHKuAUI/AAAAAAAAAAw/ax16uwgqhoY/s400/fig19.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5178450385767891266" /&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;It optimises the MVD so that any two arcs going from and to the same pair of nodes and containing the same text (in different versions of course) get merged into one arc. This happens all the time if, for example, you have a text containing the letter 'A' and then change it to 'B'. Then, you change it back to 'A' again. This produces three separate arcs for the three versions when you actually want two arcs, or two pairs, one for versions '1,3' containing the text 'A' and another for version '2' containing text 'B'. &lt;/li&gt;&lt;/ol&gt;&lt;li&gt;Finally, update the MVD using version 2 as the base of version 3. If you have more versions, then you just repeat this step as often as needed. You would normally choose a different base text each time, preferably one quite similar to the new one you are committing.&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;The result is an optimal alignment between the new version and the base version, rather than for all possible pairs of versions. For genetic texts at least, a writer conceptually changes an existing base version each time he/she makes an alteration to the text. Insertions, deletions and variants happen in pairwise fashion, not globally. Admittedly this is not true of texts that have evolved over time in many different, physically separate versions, as in the case of the complex manuscript tradition of an ancient text, but I still wonder how useful an alignment optimised over N versions would really be even in that case. Traditionally, at least, variants have always been aligned to a particular base text, whether a lost, shared original, or the editor's version or a copy text etc. The same thing happens in bioinformatics: a multiple sequence alignment program figures out which pairs of sequences are most similar, and then aligns them pairwise. All I am doing is taking away the automatic selection of a base version and replacing it with the user's choice.&lt;/p&gt;&lt;p&gt;The same process as the one described above works for &lt;em&gt;updates&lt;/em&gt; of an existing version as well as for &lt;em&gt;adding&lt;/em&gt; a new version. If you only change a few words, as you would typically during an editing session, the response time will typically be only a few milliseconds. In my test, adding &lt;em&gt;all of&lt;/em&gt; version 2 to version 1 took only 0.7 of a second. Adding version 3 to version 2, however, took over 12 seconds. It all depends on the number of differences.&lt;/p&gt;&lt;h4&gt;Unresolved issues&lt;/h4&gt;&lt;p&gt;I haven't yet worked out how to include transpositions in this process, although it ought to be possible. The MVD format supports them fully, but we obviously can't calculate them or it would take forever. I think the user should specify them because they are always visible in the original texts, rather than let a machine 'calculate' it somehow, probably badly.&lt;/p&gt;&lt;p&gt;There is also the question of 'transclusion', to misuse Nelson's term. What &lt;em&gt;I&lt;/em&gt; mean by transclusion is altering the text of one version and having that change also applied to any other versions that share the same text. This ought to be optional, but it would make the wiki really useful. Imagine an aircraft manual made up of a set of systems shared across different aircraft. Updating one system ought to be propagated automatically to all the other manuals. Again, this is possible, but I couldn't fully work out the details in this version of the multi-version document platform, which I call Nmerge.&lt;/p&gt;&lt;h4&gt;Structure of the multi-version Wiki&lt;/h4&gt;&lt;p&gt;Here's a drawing of the overall structure of the wiki. The wiki module itself is called Phaidros, after the dialogue in Plato where &lt;a href="http://www.units.muohio.edu/technologyandhumanities/plato.htm"&gt;Socrates criticises the medium of writing&lt;/a&gt;:&lt;/p&gt;&lt;a href="http://1.bp.blogspot.com/_GGwOcLYrsVk/R918r3KuARI/AAAAAAAAAAY/ocDMD3KRqzI/s1600-h/fig1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" width="200" src="http://1.bp.blogspot.com/_GGwOcLYrsVk/R918r3KuARI/AAAAAAAAAAY/ocDMD3KRqzI/s400/fig1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5178432239531065618" /&gt;&lt;/a&gt;&lt;p&gt;The next stage is to build the Phaidros web application. I have an old version that just listed one version of an MVD. I need to add at least the facility of editing the source in XML and committing the changes back to the MVD to turn it into a wiki. Then we will have proof of concept, but the hard work is now done.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-3546891957177017395?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/3546891957177017395/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=3546891957177017395' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3546891957177017395'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/3546891957177017395'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/03/merging-n-versions-into-mvd.html' title='Merging N versions into an MVD'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_GGwOcLYrsVk/R92NMHKuAUI/AAAAAAAAAAw/ax16uwgqhoY/s72-c/fig19.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-8893306726040284633</id><published>2008-03-05T00:56:00.000-08:00</published><updated>2011-02-18T14:59:04.422-08:00</updated><title type='text'>What's a Multi-Version Document?</title><content type='html'>&lt;h4&gt;What's it for?&lt;/h4&gt;&lt;p&gt;A multi-version document is for recording texts that exist in multiple versions. This musn't be confused with recording multiple drafts of a single text you might be working on. Instead, a multi-version document is for recording the non-linear structure of a text, like the Greek New Testament, which exists in thousands of versions, or equally the text of a modern literary or philosophical work, which might have been published in several versions or may exist only in the form of manuscripts heavily edited by their author. Another major use is in recording multiple marked-up versions of digital documents, for example, texts in linguistics that express multiple perspectives of the same text, or which contain overlapping markup. &lt;/p&gt;&lt;h4&gt;Why we need them&lt;/h4&gt;&lt;p&gt;Multi-version documents are the ideal form for recording our textual cultural heritage in an increasingly digital age. Year on year we are reading fewer paper books. As people play more games, watch more DVDs, browse more information on the Internet, our rich cultural heritage comes under threat. Ultimately, those written texts that give our language depth and history, and our culture an identity, the collection of works written on physical media that go back thousands of years, will have to be transferred to the digital medium if they are to survive. The problem is that existing forms of digital text can't accurately record these documents. Subtle structural differences between the two media mean that important information will be lost, or it will simply prove too difficult to transfer old knowledge into the new form. If we fail to develop new means for representing our textual cultural heritage now, we may soon lose it forever.&lt;/p&gt;&lt;h4&gt;How does it work?&lt;/h4&gt;&lt;p&gt;A multi-version document represents a text as a set of merged versions in a single digital entity, which can be efficiently edited, and its versions listed, compared and searched. Versions can overlap freely, and this overcomes the limitation of markup languages, which are based on the formal generative grammars invented by linguists in the 1950s, which are the basis of all markup systems today. A multi-version document is represented as a list of text fragments t&lt;sub&gt;i&lt;/sub&gt;, each of which is assigned a set of versions v&lt;sub&gt;i&lt;/sub&gt;:&lt;/p&gt;&lt;pre&gt;{v&lt;sub&gt;1&lt;/sub&gt;, t&lt;sub&gt;1&lt;/sub&gt;}, {v&lt;sub&gt;2&lt;/sub&gt;, t&lt;sub&gt;2&lt;/sub&gt;}, ... {v&lt;sub&gt;n&lt;/sub&gt;, t&lt;sub&gt;n&lt;/sub&gt;}&lt;/pre&gt;&lt;p&gt;This extremely simple form is all you need to record texts that contain thousands of versions. It is a form of digital text that trades complexity for mere size, and size is something modern computers are very good at handling. The structure of the document is &lt;em&gt;implied&lt;/em&gt; by the intersection of the versions of each fragment. For example, to read a single version all you need to do is read through the list, picking out the fragments that belong to it. Other common operations, such as comparing two texts to find the differences, or printing the variants of a particular range within a given version are just as easy.&lt;/p&gt;&lt;a href="http://4.bp.blogspot.com/_GGwOcLYrsVk/R91-BnKuATI/AAAAAAAAAAo/sQ877xERWHk/s1600-h/fig16.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_GGwOcLYrsVk/R91-BnKuATI/AAAAAAAAAAo/sQ877xERWHk/s400/fig16.jpg" border="0" width="300" alt=""id="BLOGGER_PHOTO_ID_5178433712704848178" /&gt;&lt;/a&gt;&lt;h4&gt;The mathematical basis for this model&lt;/h4&gt;&lt;p&gt;This form is equivalent to a 'graph', a set of intermingling paths that start at one point, branch, rejoin and split again, until they all join back together at the end. It can be proven, and has been, in a paper which has recently been accepted by the International Journal of Human-Computer Studies, that these two forms of multi-version text, namely:&lt;ol type="a"&gt;&lt;li&gt;the intuitive graph representation, and &lt;/li&gt;&lt;li&gt;the list of pairs described above&lt;/li&gt;&lt;/ol&gt;&lt;/p&gt;&lt;p&gt;are equivalent, that is, we can transform one into the other and back again with no loss of information. This kind of solid mathematical basis is in contrast to previous attempts at representing versions and overlapping structures in digital text, all of which were based on markup, which can only efficiently represent hierarchical structures. As many humanists and linguists have discovered, natural texts in their disciplines are much more frequently overlapping in structure than they are hierarchical.&lt;/p&gt;&lt;a href="http://3.bp.blogspot.com/_GGwOcLYrsVk/R919oXKuASI/AAAAAAAAAAg/HzTZMuEgVro/s1600-h/fig5.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://3.bp.blogspot.com/_GGwOcLYrsVk/R919oXKuASI/AAAAAAAAAAg/HzTZMuEgVro/s400/fig5.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5178433278913151266" /&gt;&lt;/a&gt;&lt;h4&gt;The Variant Graph is not a Replacement for Markup: it Complements it&lt;/h4&gt;&lt;p&gt;A Multi-Version Document cleanly separates content from variation. The content of a document is expressed by the textual fragments in the list, or by the textual labels to the arcs in the graph. The structure of the document, its variation, on the other hand, is expressed by the order of the pairs and by their sets of versions.&lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_GGwOcLYrsVk/SBqORwZXn2I/AAAAAAAAACA/VaNxeqpQk6o/s1600-h/varcont2.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://3.bp.blogspot.com/_GGwOcLYrsVk/SBqORwZXn2I/AAAAAAAAACA/VaNxeqpQk6o/s400/varcont2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5195621555825516386" /&gt;&lt;/a&gt;&lt;p&gt;This means that any technology can be used to represent the content, even ordinary markup. Since all the overlapping structures have been removed and placed in the Variant Graph structure the markup can be simple enough to handle in a wiki. Of course, we are not tied to markup. If in future markup becomes obsolete we can still use Multi-Version Documents to record the content using some other technology, in binary form for example.&lt;/p&gt;&lt;p&gt;A Multi-Version Document doesn't, and need not, represent any of the complexities of text relating to content. It just represents versions and its complexity ends right there.&lt;/p&gt;&lt;h4&gt;What we have so far&lt;/h4&gt;&lt;p&gt;The first publication of the idea of 'network text' (submitted 2004) was in:&lt;/p&gt;&lt;p&gt;Schmidt, D. 2006 'A Graphical Editor for Manuscripts' &lt;i&gt;Literary and Linguistic Computing&lt;/i&gt; 21: 341-351.&lt;/p&gt;&lt;p&gt;The first publication of the variant graph idea was in:&lt;/p&gt;&lt;p&gt;Schmidt, D., and Wyeld, T., 2005. 'A novel user interface for online literary documents.' &lt;i&gt;ACM International Conference Proceeding Series&lt;/i&gt; 122,  1--4.&lt;/p&gt;&lt;p&gt;Subsequent conference papers appeared in:&lt;/p&gt;&lt;p&gt;Schmidt, D., and Fiormonte, D., 2006. 'A Fresh Computational Approach to Textual Variation', in: &lt;i&gt;The First International Conference of the Alliance of Digital Humanities Organisations (ADHO) 5-9 July Paris-Sorbonne, Conference  Abstracts,&lt;/i&gt; 193--196.&lt;/p&gt;&lt;p&gt;Schmidt, D. and Fiormonte, D. 2007. &lt;a name="AIIA" href="http://www.itee.uq.edu.au/~schmidt/_articles/chworkshop.pdf"&gt;'Documenti Multiversione: una soluzione per gli artefatti testuali del patrimonio culturale / Multi-Version Documents: a Digitisation Solution for Textual Cultural Heritage Artefacts'&lt;/a&gt;. In Bordoni, L. (ed.) &lt;i&gt;Proceedings of the AI*IA Workshop for Cultural Heritage. 10th Congress of Italian Association for Artificial Intelligence, Università  di Roma Tor Vergata, Villa Mondragone, 10 settembre 2007,&lt;/i&gt; 9-16.&lt;/p&gt;&lt;p&gt;This was subsequently accepted for &lt;i&gt;Intelligenza Artificiale&lt;/i&gt; (see below)&lt;/p&gt;&lt;p&gt;Schmidt, D., Brocca, N., Fiormonte, D. &lt;a name="DH2008" href="http://www.itee.uq.edu.au/~schmidt/_articles/dh2008b.pdf"&gt;'A Multi-Version Wiki'&lt;/a&gt;, Proceedings of Digital Humanities 2008, Oulu, Finland, June 2008&lt;/p&gt;&lt;p&gt;Schmidt, D. and Colomb, R., 2009. &lt;a name="IJHCS" href="http://dx.doi.org/10.1016/j.ijhcs.2009.02.001"&gt;'A Data Structure for Representing Multi-version Texts Online'&lt;/a&gt;, International Journal of Human Computer Studies 67.6, pp. 497-514.&lt;/p&gt;Schmidt, D., 2009. Merging Multi-Version Texts: a General Solution to the Overlap Problem, in &lt;a href="http://www.balisage.net/Proceedings/vol3/html/Schmidt01/BalisageVol3-Schmidt01.html"&gt;The Markup Conference 2009 Proceedings&lt;/a&gt;, Montreal, August.&lt;/p&gt;
&lt;p&gt;Schmidt, D., 2010.  &lt;a href="http://llc.oxfordjournals.org/cgi/content/full/fqq007?ijkey=ilzrEgphmlEtphb&amp;keytype=ref"&gt;The Inadequacy of Embedded Markup for Cultural Heritage Texts.&lt;/a&gt; Literary and Linguistic Computing, 25.3, 337-356.&lt;/p&gt;
&lt;p&gt;Schmidt, D., Fiormonte, D., 2010. Documenti multiversione: una soluzione per gli artefatti testuali del patrimonio culturale/Multi-version documents: a digitsation solution for textual cultural heritage artefacts. Intelligenza artificiale, IV.1 (Dec) 56-61.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-8893306726040284633?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/8893306726040284633/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=8893306726040284633' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/8893306726040284633'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/8893306726040284633'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/03/whats-multi-version-document.html' title='What&apos;s a Multi-Version Document?'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_GGwOcLYrsVk/R91-BnKuATI/AAAAAAAAAAo/sQ877xERWHk/s72-c/fig16.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4555078640999654611.post-6370856164533101965</id><published>2008-02-27T22:14:00.000-08:00</published><updated>2008-02-27T22:15:15.952-08:00</updated><title type='text'>Status of Multi-Version Wiki project</title><content type='html'>I created this blog to record progress on the multi-version wiki I am developing so that other people who are also currently interested in the project can figure out where it is at.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4555078640999654611-6370856164533101965?l=multiversiondocs.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://multiversiondocs.blogspot.com/feeds/6370856164533101965/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4555078640999654611&amp;postID=6370856164533101965' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6370856164533101965'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4555078640999654611/posts/default/6370856164533101965'/><link rel='alternate' type='text/html' href='http://multiversiondocs.blogspot.com/2008/02/status-of-multi-version-wiki-project.html' title='Status of Multi-Version Wiki project'/><author><name>desmond</name><uri>http://www.blogger.com/profile/01722159590093138289</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
