Thursday, April 30, 2009

Nmerge tool code-complete

The nmerge commandline tool is now code-complete. I guess it's a 'pre-alpha' version. Since this is a revision of a previous working version, though, testing should not take too long. I would estimate that, after the Labor day weekend (Monday 4th May) I should have an alpha-version. But with software you never know. This version supports the new merging algorithm from the submitted Balisage 2009 paper, which works pretty well.

Nmerge is also a JAVA library that can be used from within a JAVA application, like the Phaidros wiki, to provide support for Multi-Version-Documents. Once it has stabilised I will rewrite it as a C++ commandline tool. But for now we have to put up with a slightly more cumbersome syntax. Here is the "usage" statement produced by the program so you can get some idea of what it does. Once it is reasonably well tested I will put the source code on SourceForge under the GPL v3.

The command syntax is a bit complicated, but so is what it is trying to do. I envisage that this tool could be used in a shell or commandline script to automate, say, the construction of an MVD from a set of files. At least that's what I use it for. In any case the -h option prints out an example or two of how to use each command. The -c option specifies the command you want to perform on the MVD, and the other arguments are the parameters that the command uses, provided they make sense. If they don't you'll get an error message.

With the nmerge tool MVD becomes a real format. There's no GUI user interface because if I added one, you couldn't take it away and put in your own. If you need one, wait for Phaidros.

usage: java -jar nmerge.jar [-c command] [-a archive] [-b backup] 
     [-d description] [-e encoding] [-f string] [-g group] [-h command] 
     [-k length] [-l longname] [-m MVD] [-n mask] [-o offset] [-p]
     [-s shortname] [-t textfile] [-v version] [-w with] [-x XMLfile]
     [-?] 

-a archive - folder to use with archive and unarchive commands
-b backup - the version number of a backup (for partial versions)
-c command - operation to perform. One of:
     add - add the specified version to the MVD
     archive - save MVD in a folder as a set of separate versions
     compare - compare specified version 'with' another version
     create - create a new empty MVD
     description - print or change the MVD's description string
     delete - delete specified version from the MVD
     export - export the MVD as XML
     find - find specified text in all versions or in specified version
     import - convert XML file to MVD
     list - list versions and groups
     read - print specified version to standard out
     update - replace specified version with contents of textfile
     unarchive - convert an MVD archive into an MVD
     variants - find variants of specified version, offset and length
-d description - specified when setting/changing the MVD description
-e encoding - the encoding of the version's text e.g. UTF-8
-f string - to be found (used with command find)
-g group - name of group for new version
-h command - print example for command
-k length - find variants of this length in the base version's text
-l longname - the long name/description of the new version (quoted)
-m MVD - the MVD file to create/update
-n mask - mask out which kind of data in new mvd: none, xml or text
-o offset - in given version to look for variants
-p - specified version is partial
-s shortname - short name or siglum of specified version
-t textfile - the text file to add to/update in the MVD
-v version - number of version for command (starting from 1)
-w with - another version to compare with version
-x XML - the XML file to export or import
-? - print this message

2 comments:

desmond said...

I guess I need to add an "unarchive" command. Then a separate utility could be written to generate "archive" format from ordinary marked up texts in say TEI encoding, for importation into MVD.

desmond said...

Yes, I know, talking to myself again. The thing is, I know quite a few people read this blog but they don't post comments. It has a something of a curiosity value.