Tuesday, May 15, 2012

Anthologies

One of the curious problems we face is anthologies, which represent revisions and collections of works whose order is often rearranged.

For example, Charles Harpur produced several collections of poetry, and often the same works appear in multiple anthologies in altered form, with further corrections. The complexity of this natural arrangement of data is not appreciated or even recognised by most technicians who have undertaken to put such material online. Usually they just treat each anthology as a separate document, transcribe only the final version, catalogue it with metadata, and leave the user to find his or her way around the vast collection. The key problem of interrelating such information is left as a conundrum to be solved manually by the user.

The technique I used to overcome this is to split up each anthology into separate poems and then write out an anthology file that contains links to the individual works. In this way merging differences between works is easy, and the user can still get a feel for how each anthology looked. A case in point is the poem 'Eden Lost', which appears in MS A88 and also MS C376. Apart from inevitable differences in wording and punctuation there is also an extra stanza in the C376 version. The reader needs to know what those similarities and differences are, not by clumsily comparing anthology with anthology on screen, that is, separate document with separate document, but by visualising differences between versions of the same poetic work interactively. Since I am receiving transcriptions of entire anthologies I have written a splitting program that stores them in such a way that subsequent anthologies will place versions of the same poem into the same folder. Although this is an edition-specific program, it is worth considering as a general approach for handling any collection of anthologies or similar material:

The 'versions' file stores all successive versions, that is, the names and descriptions of each anthology. The Poem folders, which are very numerous, each contain the sources of the poem's versions, to facilitate importing. The 'anthologies' folder contains files with links to the poems. These are documents which have the same status as the poems themselves, and also appear in the catalog, although they only have one version. The next step is to write a script that uses the import facility developed in the previous-but-one post to automate importation. This speeds up the process of building a website of archival material considerably.

In fact, it is beginning to look as if I will need to create a separate program on the desktop to manage online archives: to backup, upload, import, export and test if they are still up. That's not such a trivial task, and is only clumsily served by my rapidly growing collection of scripts.

No comments: