Friday, February 11, 2011

The death of the angle-bracket

I was pleasantly surprised to learn that the eComma project uses overlapping properties rather than embedded markup to encode humanities texts. This emboldens me to take a similar approach with my rewrite of the MVD-GUI. For a relatively small effort I can transform ugly bits of XML such as <hi rend="italic">word</hi> into the standoff property called italics that applies to a specific range in the text. So to kill off angle brackets for good all I have to do is the following:

  1. Take a TEI text and use my splitter program to split the markup from the text for all the versions of a work. This yields as many versions of plain text as versions of markup.
  2. Simplify the markup: remove attributes by merging them with element names and swapping them for something shorter. And we can have multi-lingual property-names – no need to always use English.
  3. Merge the text of all the versions into a CorTex (MVD) and all the markup into a CorCode. The CorCode is just a list of properties and their ranges in the text, one for each version.
  4. Design 3 Joomla components:
    1. A formatted view of any chosen version, with expanding/collapsing apparatus.
    2. Edit the CorCode. A formatted view of the CorTex+CorCode for the currently chosen version: oft-used markup tags on the right as buttons, the rest as a dropdown list. Either just pressing a button or selecting an item from the dropdown and pressing 'apply' would apply that format to the current selection.
    3. Edit the CorTex. This view is just a text editing box, with possibly an expanding/collapsing apparatus.

That's not too much work, and when it is done users won't have to struggle with complex syntax ever again. In its place a set of simple overlapping properties that automatically format themselves into HTML in the browser. And all steps will be reversible: so we can go back to the XML representation at any stage, with no loss of information (hopefully).

Here are some mock-ups of how the user interface would look:

The Combined view

This is partly implemented in the new version, (all browsers) and more fully implemented in the old version (markup still embedded, Firefox only).

The CorCode view

We only have to show the properties present in this text. Note the language dropdown menu – this will translate the property names into whatever we provided in the property list. 'clear all' clears all properties from the current selection.

The CorTex view

This is just a plain edit text box, although I have enhanced it with a collapsable apparatus showing textual (not formatting) variants. The user simply edits then clicks 'save'. Carriage returns are not passed on to the display so they can be added as desired to lay out the text so it is more readable.

No comments: