Friday, December 4, 2009

Launch of Harpur Website

Although there’s not much there yet I launched the Harpur Archive test website last weekend. It is a Joomla installation and in it I intend to build all the technology from the Alpha wiki prototype, and a new version of nmerge. In short, I will try to build reusable and easy to use Joomla components and experiment with them there. So if you are interested, watch this space.

Markup Inadequacy Paper

While I’m on the subject of news I submitted on the 13th of November a long paper (9,000 words) to Literary and Linguistic Computing entitled: ‘The Inadequacy of Embedded Markup for Cultural Heritage Texts’. It’s provocative, and it’s meant to be. I am basically calling the establishment’s bluff that they dare try to stop this. I think we’ve gone on quite long enough with an inadequate means of recording our historical texts in digital form. So this is my attempt to make it stop. Here’s the abstract:

Embedded generalized markup, as applied by digital humanists to the recording and studying of our textual cultural heritage, suffers from a number of serious technical drawbacks. As a result of its evolution from early printer control languages, generalized markup can only express a document’s ‘logical’ structure via a repertoire of permissible printed format structures. In addition to the well-researched overlap problem, the embedding of markup codes into texts that never had them when written leads to a number of further difficulties: the inclusion of potentially obsolescent technical and subjective information into texts that are supposed to be archivable for the long term, the manual encoding of information that could be better computed automatically, and the obscuring of the text by highly complex technical data. Many of these problems can be alleviated by asserting a separation between the versions of which many cultural heritage texts are composed, and their content. In this way the complex interconnections between versions can be handled automatically, leaving only simple markup for individual versions to be handled by the user.

Friday, November 27, 2009

Interedition Handout

I've had some positive feedback from the recent meeting of the Interedition initiative in Brussels. One of my colleagues distributed a handout that was favourably received, and to which I have already received one offer of collaboration. Since it expresses the essence of MVD in a non-technical form and has a stunning graphic of the comparison of two versions of Charles Harpur's 1845 versus 1888 editions of the Creek of the Four Graves, which have only around 40% similarity, I thought I'd share it with you:

Multi-Version Documents and the Harpur Archive

The Multi-Version Document or MVD system is designed to automate as far as possible the work of editing our textual cultural heritage. Existing markup-based approaches pose serious problems for the modern digital scholarly editor, including:

  1. Failure to adequately and accurately represent ordinary textual phenomena
  2. Obscuring the text and confusing the editor with excessive density of technical markup
  3. Requiring manual tasks that could be performed much better and automatically by computer
  4. Embedding subjective and potentially obsolescent technical information into texts that are supposed to be archived for the long term

These problems can mostly be overcome by separating the versions from their content. In this way editing a text becomes relatively simple, because all the complexities of versions (insertions, deletions, variants and transpositions) are handled automatically. Instead the editor works on a simplified text marked up only with the textual structure of each version.

An MVD represents 'the work' as an interrelated set of versions that can be searched, compared, edited and archived as a single, compact digital entity. An MVD also has a zero footprint. You can always get out the texts in exactly the same form as you put them in.

What we have now:

The following tools are available for download from the Googlecode site:

  1. The nmerge commandline tool. This can be used to create, edit and manipulate MVDs.
  2. The Alpha wiki prototype. This can be used to visualise and edit MVDs. For copyright reasons it only has one example text: all major versions of Act 1 Scene 1 from Shakespeare’s King Lear.

Future Developments

We are currently developing a plugin for Joomla! that will incorporate all the current technology, with further enhancements, to enable a humanities type web archive to be easily built and deployed on ordinary web hosts, requiring only a low level of technical expertise. This will be used as the basis of the new Digital Variants website and also the Harpur Text Archive. Progress reports will be posted on the MVD blog.

References

Schmidt, D. (2009a). Merging Multi-Version Texts: a Generic Solution to the Overlap Problem. In: Usdin, B.T. (ed) Proceedings of Balisage: The Markup Conference 2009. doi:10.4242/BalisageVol3.Schmidt01.

Schmidt, D. and Colomb, R. (2009). A data structure for representing multi-version texts online. International Journal of Human-Computer Studies, 67.6: 497-514.

Schmidt, D., Brocca, N. and Fiormonte, D. (2008). A Multi-Version Wiki. In: L.L. Opas-Hänninen, M. Jokelainen, I. Juuso, T. Seppänen (eds), Proceedings of Digital Humanities 2008, Oulu, Finland, June, 2008, pp. 187-188.

Multi-Version Documents. http://multiversiondocs.blogspot.com.

Merge and edit N versions in one document. http://code.google.com/p/multiversiondocs/.

Tuesday, November 24, 2009

Minor updates to nmerge, Alpha

I have added a README to Alpha to help install it and get it working. It didn't have one, which was an oversight. Also I noticed that the nmerge installer didn't work properly. This is due to my inexperience with automake. In fact it installed correctly, it just complained about the java source code directory which wasn't listed in the makefile properly. I'll try to be more careful in future.