Friday, December 23, 2011

HritServer

My colleagues at Loyola have inspired me with their questions to turn nmerge into a service. The idea of HritServer (Humanities Resources, Infrastructure and Tools) is to build a self-contained application that runs as a service from the commandline, and provides the infrastructure to build any digital humanities website. It is a collection of the tools written over the past few years, including nmerge, the XML import/export tools, formatter and the GUIs I developed for Digital Variants. The structure will be a little like Tomcat:

  1. a back-end administrative interface allowing the admin user to add or import new texts and edit existing ones
  2. Example GUIs in Java and PHP that exercise each facility provided by HritServer: compare, view variants, indexed search, tree-view.

The service type will be strictly RESTful; everything will be done via HTTP. The database at the back end will be a modern key-value store rather than an old-fashioned relational design. Each resource will be accessible via a simple URL, with no complex access needed. For example, to get the formatted HTML of act1, scene 1, first folio of Shakespeare's King Lear, one would only need to fetch the URL:

http://dhtestbed.ctsdh.luc.edu/html/english/shakespeare/kinglear/act1/scene1/F1/

Anything can be stored at a similar URL in a simple hierarchical structure. The HTML is generated on the server from plain text and overlapping markup sets, and never stored. By passing parameters to the same URL, different formatted versions of the same text can be achieved. Different encodings of the same text can likewise be realised by specifying a different collection of markup sets. The idea is to take the complexity out of building such websites, and to maximise automation by providing a powerful base infrastructure that will work for any set of texts. It should be achievable within a reasonable time, because almost everything already exists (although some tools are still incomplete). All I have to do is stitch it together and test it.

I'll have to write a formal software specification but I've already made a good start on coding it.

Thursday, November 10, 2011

XML-Free Digital Editions

Playing around with Apache's CouchDB today I realised that it uses JSON, not XML to handle exchanges between client and server. This opens the intriguing possibility of making XML-free digital editions. If the standoff properties for digital texts were also stored using JSON or YAML rather than XML - a simple enough change - then the entire edition could be XML-free. The only thing that comes close is the final conversion into HTML for the browser. But this can be in good old HTML (an SGML dialect) rather than XHTML, an XML dialect. I doubt that anyone has achieved that before, depending on how you define a 'Digital Edition'. By that term I mean an online digital archive of marked up texts accessible over the web. I think this is rather a liberating idea that is actually inevitable. If we believe the XML afficionados disaster will ensue as soon as we abandon 'standards'. Actually what will happen is that digital editions will blossom with the possibilities offered by the new form of digital text. It's time to show people what is possible without XML.

Monday, September 19, 2011

Formatter tool with full overlap

More progress, I'm afraid. I've incorporated the test program announced in the last post into the formatter tool. This is intended as a practical replacement for XSLT. So now I can convert real texts plus overlapping standoff properties into valid HTML. If the properties are derived from XML documents there won't be any overlap initially. What formatter does is loosen up that particular restriction. So in the GUI it will be possible to change properties or add new ones that overlap. And it will still format correctly. I'll be putting some test cases onto the testbed at Loyola soon.