ODD chaining for Beginners Lou Burnard

As published on lb42.github.io

authored from scratch

Minor changes for publication on lb42.github.io Expanded and announced on TEI-L Uploaded for Council review Drafted first part on train from Paris to La Souterraine; then lost half of it by shutting lid in a hurry without saving first: doh.
What is this for?

This little guide is intended to explain the mechanism of ODD chaining. An ODD file specifies a particular view of the TEI, by selecting particular elements, attributes, etc. from the whole of the TEI. But you can also refine such a specification further, making your ODD derive from another one. In principle you can chain together ODDs in this way as much as you like. You can use this feature in several different ways: you can add additional restrictions to an existing ODD, for example by changing the value list of an attribute you can further reduce the subset of elements provided by an existing ODD you can add new elements or modules to an existing ODD

How does it work?

An ODD can of course contain nothing but free standing declarations, using elements such as elementSpec, classSpec alone. But most TEI ODDs are made by reference to the huge existing collection of such declarations provided by the TEI Guidelines. An ODD such as TEI Lite or TEI Bare is composed of references to the objects it uses, expressed by means of elements such as moduleRef, elementRef, or classRef. These references (and also any additional free standing declarations) are collected together within a schemaSpec element which specifies the schema the ODD is intended to generate. This element has a useful but little known attribute source the purpose of which is to state where exactly the objects referenced by the schema specification (the free standing declarations) are to be found. By default, when an ODD specifies no source, it is assumed that they are to be collected from the most recent release of the TEI Guidelines. You can modify this behaviour by supplying a different URI. For example, a schemaSpec with its attribute source set to tei:2.4.0 would search for declarations in release 2.4.0 of the Guidelines. One with the value mySuperODD.subset.xml will go looking for declarations in a file of that name in the current source tree. And one with the value http://example.com/superODDs/anotherSubset.xml will go looking for it at the URL indicated.

It's important to understand that the resource indicated by the source attribute must contain complete and explicit specification elements: elementSpec rather than elementRef, classSpec rather than classRef and so on. It may of course contain other TEI elements, but these will be ignored entirely in the construction of a schema. A file called p5subset.xml, provided as part of every TEI release, is an example of such a resource: it contains specifications for every single TEI element, class, macro, and datatype, but nothing else much. If a value for the source attribute is not specified, the most recently available version of this file is what will be used during the processing of an ODD.

Processing an ODD

Let's look more closely at the way the TEI defines a very light weight schema called TEI Bare. Its schema specification element begins like this:

No source is specified, so declarations for the elements requested here will be taken from the current p5subset.xml.

Note that this ODD contains both references and specifications: there are references to modules (which may be thought of as short hand for references to elements or classes, since a module is simply a collection of element and class specifications) and specifications for two classes (classSpec), rather than references (classRef). The reference to the module tei brings with it specifications for most TEI classes, including these two. An ODD processor will therefore have to deal with duplicate class specifications for the classes att.global and att.fragmentable. The resolution method required is indicated by the value of the mode attribute: if this is delete then both declarations are to be ignored, and the class is therefore suppressed; if it is change then the two declarations are to be merged, with any part of it present in the second specification over-riding that in the first. In this case, the effect will be to suppress the three attributes mentioned.

If you'd like to check that this ODD does what you expect, and you have oXygen installed with a recent version of the TEI Frameworks, just download the file tei_bare.odd (you can get it from the TEI github repo), and tell oXygen to apply the predefined transformation TEI ODD to HTML to it. This will produce a mini-manual for the TEI Bare customization in HTML format, near the beginning of which you should see a list of the elements the schema contains.

You may like to check that the modifications to the attribute class att.global indicated above have indeed been performed, by looking at its documentation in this mini-manual.

Rolling your own subset

In the preceding step, we processed the ODD with reference to the default p5subset, i.e with respect to the whole of the TEI. Suppose, however, that we would like to use TEI Bare as the starting point for another customization. We could simply edit the source of TEI Bare, and add our further modifications there, but that would soon become unmanageable if we were dealing with a larger customization as starting point. Instead, we will use TEI Bare itself as our source. To do this, as noted above, we need to generate a compiled version of TEI Bare containing only specification elements in which all the references have been resolved. This is easily done using the stylesheet odd2odd.xsl which is supplied as part of the TEI Stylesheet package, but is not currently included in the TEI oXygen framework. There is however a command line utility teitoodd which does the job, and it is also easy to set up your own oXygen transformation to do it. We leave this as an exercise for the reader.

Chaining: subsetting

Suppose we now have a compiled version of TEI_bare in the file TEI_bare.compiled.xml. Processing the following schema specification should produce exactly the same results as we received from the uncompiled version.

This works because each of the moduleRef elements here refers to the module (i.e. set of elements etc.) available in the compiled ODD rather than to the module as defined in the whole TEI. Note also that simply supplying the compiled ODD as source for a schema is not enough: we must also specify which of the declarations it contains we want to use: nihil ex nihilo fit...!

However, the reason we started down this path was not to find yet another way of doing the same thing. Let's now make a reduced version of TEI Bare in which the head element is missing.

And, just for completeness, here is another way of achieving the same effect:

Note that we cannot suppress or modify anything which is not already present in the compiled ODD specified by the source attribute. This approach to ODD chaining works best in a situation where we first define an ODD which combines all the elements (etc.) that we ever plan to use, and then derive individual subset schemas from them, for example, one for manuscripts, one for early modern print, one for modern print etc. (This approach has been adopted by, for example, the Deutsches Text Archive.)

Chaining: supersetting

But ODD chaining is not restricted to subsetting. Suppose we want to take the pre-existing TEI Bare schema and add declarations from some other module. We could of course laboriously copy all the declarations we want into our schemaSpec, but it would be much nicer not to have to do that. Suppose for example that we want to add everything provided by the gaiji module. That module was not included when we defined our compiled version of TEI Bare, though it is of course available in the full TEI. Here's one way of doing it:

The moduleRef which pulls in the gaiji module uses its own source attribute to specify where to find the declarations for that module. No sense looking for them in tei_bare.compiled.odd: they're not there. Instead, we will collect them from the online copy of the compiled ODD which provides the whole TEI Guidelines, as noted above. Of course we can also do the usual kind of subsetting: for example

This ODD will give us everything in tei_bare along with just g, char, and glyph from the default gaiji module. We could achieve the same effect by explicitly naming the elements we want, and again specifying where they are to be found:

We can use this technique to put back an element which was deleted from the compiled schema. For example, the q element is not available in TEI Bare, but we can easily get it back:

A footnote on versioning

We mentioned above that, unless otherwise stated, the default source for TEI definitions to be included in an ODD is the current release of the TEI Guidelines. If you don't supply a source attribute at all, you are requesting the definitions provided by the file p5subset.xml forming part of the current release. As we've seen, you can vary this by specifying your own source library as the value of the source attribute. But how do you go about requesting a different version of the TEI Guidelines as source ?

(See How to Update your ODD for some discussion of circumstances in which you might need to make your ODD use something other than the current version of the TEI Guidelines, if only temporarily.)

Previous versions of all parts of the P5 Guidelines are all kept on the TEI web site in a folder with the address http://www.tei-c.org/Vault/P5/x.y.z, where x.y.z identifies the version concerned. So, if for example we wanted to process our ODD not against the current p5subset.xml but against the version of that file released as part of 3.0.0, we would supply http://www.tei-c.org/Vault/P5/3.0.0/xml/tei/odd/p5subset.xml as the value for the source attribute on the schemaSpec defining our ODD. We could do the same thing (though I don't recommend it) even at the level of individual elements, by specifying a different version as source for an elementSpec.

And just to make life a little simpler, there is an officially recognized short cut built into the current ODD processing stylesheets: instead of the lengthy URL above, we could simply say tei:3.0.0. For example, supposing that for some strange reason we don't want to add the current definition for q in the preceding example, but instead add the version defined in release 3.0.0, we might use the following ODD:

There is a useful table listing the dates and version numbers of all TEI P5 releases at .

A use case

Let's suppose you're setting up a crowd-sourcing application for the transcription of archival documents of some kind. Once the documents have been captured and minimally tagged, you plan to enrich the archive with detailed metadata describing the people and places referred to in them. So you anticipate needing two schemas: one (very simple and constrained) to validate the transcription files, and another (also very constrained, but differently) to validate the metadata. But of course you will also need to validate completed documents, combining both kinds of data. And there are some features (paragraphs, titles, etc.) common to both, which suggests you will also need a third schema... ODD chaining is the answer!

(Before reading further, I suggest you download our little set of example files, and fire up your favourite XML processor. Please note that these example files are just meant to demonstrate the effect of chaining: in a real life application, we would of course customize our schemas much more precisely, for example by removing un-needed attributes, simplifying content models, adding different examples etc.)

We will need to define the third schema, which contains everything likely to be useful to either of the other two, plus everything common to both. Let's call this one our motherODD. Open the file motherODD.xml and you will see a typical ODD, with root element TEI, defined with reference to the full TEI Guidelines. In addition to the infrastructural module tei, it contains such elements as pb, p, hi div and name from the core module, plus the metadata set we plan to use for each valid TEI document, which takes components from modules header, namesdates and corpus.

We will be defining our two more specialised schemas with reference to this one. We therefore need to compile the motherODD, effectively transforming it to a collection or library of complete TEI specifications. We do this by running the odd2odd transformation referred to above: the results for our example file are in the file motherODD.compiled.

Now take a look at the two different subset ODDs in our example: one (justTranscripts.xml) for the transcriptions, and one (justMetadata.xml) for the metadata. Note that each of these ODD files references motherODD.compiled by means of its source attribute. Note also that each of them specifies a different root element: this is so that we can use the resulting schemas to validate a transcription without a header, or a header without a transcription.

Try processing each of these ODD files to generate some documentation and a schema, in the usual way. Then compare the outputs. We've included a couple of example data files for you to check that the validation works in the way it should: the file transcript.xml should be valid against the schema generated from the justTranscripts ODD and the file metadata.xml should be valid against the schema generated from the justTranscripts ODD. Our example assumes a particular workflow in which, for example, the ref attribute is used to link name elements with a person or place element in the header; your mileage may of course vary.

Finally, take a look at the file driver.tei: this uses xInclude to combine the two sample data files into a complete TEI document, which should be valid against a schema generated from the motherODD. Again, feel free to modify as necessary to suit your own working practices!