Sunday, May 6, 2012

Useless elements in TEI

Writing a program to separate out versions in TEI (text encoding initiative) encoded documents reveals some surprising facts about the latest tags added to that now vast scheme. Real world texts such as manuscripts written by their authors (holographs) contain lots of corrections and may exist in the form of separate physical drafts. In recording variation in such texts what functions do <choice>, <app>, <subst> and <mod> actually perform? By design they are supposed to group various kinds of alternatives but functionally speaking they serve no purpose. You could leave them out and the encoded text would record exactly the same information.

Admittedly there is a human factor here. Humans want things spelled out clearly and tags like <subst> make it clearer, or do they? Since <choice>, <subst> and <mod> are new tags in version 5 not present in version 4 one wonders how confused people were back in the good old days. Now perhaps they are confused even more by the addition of extra tags that obscure the text and serve no functional purpose whatsoever. You might think that <app> (apparatus entry) groups together a set of readings in parallel, but since each successive <rdg> (reading) contains a "wit" attribute that spells out which versions it contains, <app> is left with nothing to do.

One might argue that these tags are comments on variation, and their information should be preserved somehow. On the other hand <app> is clearly end-result related – it refers to the creation of a printed apparatus. Even <mod>, which might be used to record "open" variants, where the first version is not cancelled, only makes explicit what should already be implied by the contents. The fact that there is no element to describe an uncancelled first variant (<undel>?) is a problem with TEI not a justification for <mod>.

No comments: