What is this for?
This little guide is intended to explain the mechanism of ODD chaining. An
ODD file specifies a particular view of the TEI, by selecting particular elements,
attributes, etc. from the whole of the TEI. But you can also refine such a specification
further, making your ODD derive from another one. In principle you can chain together ODDs
in this way as much as you like. You can use this feature in several different ways:
- you can add additional restrictions to an existing ODD, for example by changing
the value list of an attribute
- you can further reduce the subset of elements provided by an existing ODD
- you can add new elements or modules to an existing ODD
How does it work?
An ODD can of course contain nothing but free standing declarations, using elements such
as elementSpec, classSpec alone. But most TEI ODDs are made by reference
to the huge existing collection of such declarations provided by the TEI Guidelines. An
ODD such as TEI Lite or TEI Bare is composed of
references to the objects it uses, expressed by means of elements such as
moduleRef, elementRef, or classRef. These references (and
also any additional free standing declarations) are collected together within a
schemaSpec element which specifies the schema the ODD is intended to generate.
This element has a useful but little known attribute source the purpose of
which is to state where exactly the objects referenced by the schema specification (the
free standing declarations) are to be found. By default, when an ODD specifies no source,
it is assumed that they are to be collected from the most recent release of the TEI
Guidelines. You can modify this behaviour by supplying a different URI. For example, a
schemaSpec with its attribute source set to tei:2.4.0
would search for declarations in release 2.4.0 of the Guidelines. One with the value
mySuperODD.subset.xml will go looking for declarations in a file of that name
in the current source tree. And one with the value
http://example.com/superODDs/anotherSubset.xml will go looking for it at the
URL indicated.
It's important to understand that the resource indicated by the source
attribute must contain complete and explicit specification elements: elementSpec
rather than elementRef, classSpec rather than classRef and so
on. It may of course contain other TEI elements, but these will be ignored entirely in the
construction of a schema. A file called p5subset.xml, provided as part of
every TEI release, is an example of such a resource: it contains specifications for every
single TEI element, class, macro, and datatype, but nothing else much. If a value for the
source attribute is not specified, the most recently available version of this
file is what will be used during the processing of an ODD.
Processing an ODD
Let's look more closely at the way the TEI defines a very light weight schema called
TEI Bare. Its schema specification element begins like this:
No source is specified, so declarations for the elements requested here will
be taken from the current p5subset.xml.
Note that this ODD contains both references and specifications: there are references to
modules (which may be thought of as short hand for references to elements or classes,
since a module is simply a collection of element and class specifications) and
specifications for two classes (classSpec), rather than references
(classRef). The reference to the module tei brings with it
specifications for most TEI classes, including these two. An ODD processor will therefore
have to deal with duplicate class specifications for the classes att.global
and att.fragmentable. The resolution method required is indicated by the value of
the mode attribute: if this is delete then both declarations are to
be ignored, and the class is therefore suppressed; if it is change then the two
declarations are to be merged, with any part of it present in the second specification
over-riding that in the first. In this case, the effect will be to suppress the three
attributes mentioned.
If you'd like to check that this ODD does what you expect, and you have oXygen installed
with a recent version of the TEI Frameworks, just download the file
tei_bare.odd (you can get it from [the TEI github repo](https://github.com/TEIC/TEI/blob/dev/P5/Exemplars/tei_bare.odd)), and tell oXygen to apply the predefined transformation
TEI ODD to HTML to it. This will produce a mini-manual for the TEI Bare
customization in HTML format, near the beginning of which you should see a list of the
elements the schema contains.
You may like to check that the modifications to the attribute class
att.global indicated above have indeed been performed, by looking at its
documentation in this mini-manual.
Rolling your own subset
In the preceding step, we processed the ODD with reference to the default p5subset, i.e
with respect to the whole of the TEI. Suppose, however, that we would like to use TEI Bare
as the starting point for another customization. We could simply edit the source of TEI
Bare, and add our further modifications there, but that would soon become unmanageable if
we were dealing with a larger customization as starting point. Instead, we will use TEI
Bare itself as our source. To do this, as noted above, we need to generate a
compiled version of TEI Bare containing only specification elements
in which all the references have been resolved. This is easily done using the stylesheet
odd2odd.xsl which is supplied as part of the TEI Stylesheet package, but
is not currently included in the TEI oXygen framework. There is however a command line
utility teitoodd which does the job, and it is also easy to set up your own
oXygen transformation to do it. We leave this as an exercise for the reader.
Chaining: subsetting
Suppose we now have a compiled version of TEI_bare in the file
TEI_bare.compiled.xml. Processing the following schema specification
should produce exactly the same results as we received from the uncompiled version.
This works because each of the moduleRef elements here refers to the module
(i.e. set of elements etc.) available in the compiled ODD rather than to the
module as defined in the whole TEI. Note also that simply supplying the compiled ODD as
source for a schema is not enough: we must also specify which of the declarations it
contains we want to use: nihil ex nihilo fit...!
However, the reason we started down this path was not to find yet another way of doing
the same thing. Let's now make a reduced version of TEI Bare in which the head
element is missing.
And, just for completeness, here is another way of achieving the same effect:
Note that we cannot suppress or modify anything which is not already present in the
compiled ODD specified by the source attribute. This approach to ODD chaining
works best in a situation where we first define an ODD which combines all the elements
(etc.) that we ever plan to use, and then derive individual subset schemas from them, for
example, one for manuscripts, one for early modern print, one for modern print etc. (This
approach has been adopted by, for example, the Deutsches Text Archive.)
Chaining: supersetting
But ODD chaining is not restricted to subsetting. Suppose we want to take the
pre-existing TEI Bare schema and add declarations from some other module. We
could of course laboriously copy all the declarations we want into our
schemaSpec, but it would be much nicer not to have to do that. Suppose for
example that we want to add everything provided by the gaiji module. That
module was not included when we defined our compiled version of TEI Bare, though it is of
course available in the full TEI. Here's one way of doing it:
The moduleRef which pulls in the gaiji module uses its own source
attribute to specify where to find the declarations for that module. No sense looking for
them in tei_bare.compiled.odd: they're not there. Instead, we will collect
them from the online copy of the compiled ODD which provides the whole TEI Guidelines, as
noted above. Of course we can also do the usual kind of subsetting: for example
This ODD will give us everything in tei_bare along with just g, char, and glyph from the
default gaiji module. We could achieve the same effect by explicitly naming the elements
we want, and again specifying where they are to be found:
We can use this technique to put back an element which was deleted from the compiled
schema. For example, the q element is not available in TEI Bare, but we can
easily get it back:
A footnote on versioning
We mentioned above that, unless otherwise stated, the default source for TEI
definitions to be included in an ODD is the current release of the TEI Guidelines. If
you don't supply a source attribute at all, you are requesting the
definitions provided by the file p5subset.xml forming part of the current
release. As we've seen, you can vary this by specifying your own source library as the
value of the source attribute. But how do you go about requesting a different
version of the TEI Guidelines as source ?
(See [How to Update your ODD](https://teic.github.io/TCW/purifyDoc.html) for some discussion of
circumstances in which you might need to make your ODD
use something other than the current version of the TEI
Guidelines, if only temporarily.)
Previous versions of all parts of the P5 Guidelines are all kept on the TEI web site in a
folder with the address
http://www.tei-c.org/Vault/P5/x.y.z
, where x.y.z identifies the version
concerned. So, if for example we wanted to process our ODD not against the current
p5subset.xml but against the version of that file released as part of 3.0.0, we would
supply http://www.tei-c.org/Vault/P5/3.0.0/xml/tei/odd/p5subset.xml
as
the value for the source attribute on the schemaSpec defining our
ODD. We could do the same thing (though I don't recommend it) even at the level of
individual elements, by specifying a different version as source for an
elementSpec.
And just to make life a little simpler, there is an
officially recognized short cut built into the current ODD processing stylesheets:
instead of the lengthy URL above, we could simply say tei:3.0.0
. For
example, supposing that for some strange reason we don't want to add the current
definition for q in the preceding example, but instead add the version
defined in release 3.0.0, we might use the following ODD:
There is a useful table listing the dates and version numbers of all TEI P5 releases at
.
A use case
Let's suppose you're setting up a crowd-sourcing application for the transcription of
archival documents of some kind. Once the documents have been captured and minimally
tagged, you plan to enrich the archive with detailed metadata describing the people and
places referred to in them. So you anticipate needing two schemas: one (very simple and
constrained) to validate the transcription files, and another (also very constrained, but
differently) to validate the metadata. But of course you will also need to validate
completed documents, combining both kinds of data. And there are some features
(paragraphs, titles, etc.) common to both, which suggests you will also need a third
schema... ODD chaining is the answer!
(Before reading further, I suggest you download our [little
set of example files](chainingTuto.zip), and fire up your favourite XML processor. Please note that
these example files are just meant to demonstrate the effect of chaining: in a real
life application, we would of course customize our schemas much more precisely, for
example by removing un-needed attributes, simplifying content models, adding different
examples etc.)
We will need to define the third schema, which contains everything likely to be useful to
either of the other two, plus everything common to both. Let's call this one our
motherODD. Open the file motherODD.xml and you will see a
typical ODD, with root element TEI, defined with reference to the full TEI Guidelines. In
addition to the infrastructural module tei, it contains such elements as
pb, p, hi
div and name from the core module, plus the metadata set
we plan to use for each valid TEI document, which takes components from modules
header, namesdates and corpus.
We will be defining our two more specialised schemas with reference to this one. We
therefore need to compile the motherODD, effectively transforming it
to a collection or library of complete TEI specifications. We do this by running the
odd2odd transformation referred to above: the results for our example
file are in the file motherODD.compiled.
Now take a look at the two different subset ODDs in our example: one
(justTranscripts.xml) for the transcriptions, and one
(justMetadata.xml) for the metadata. Note that each of these ODD files
references motherODD.compiled by means of its source
attribute. Note also that each of them specifies a different root
element: this is so that we can use the resulting schemas to validate a transcription
without a header, or a header without a transcription.
Try processing each of these ODD files to generate some documentation and a schema, in
the usual way. Then compare the outputs. We've included a couple of example data files for
you to check that the validation works in the way it should: the file
transcript.xml should be valid against the schema generated from the
justTranscripts ODD and the file metadata.xml should be
valid against the schema generated from the justTranscripts ODD. Our
example assumes a particular workflow in which, for example, the ref attribute
is used to link name elements with a person or place element in
the header; your mileage may of course vary.
Finally, take a look at the file driver.tei: this uses xInclude to
combine the two sample data files into a complete TEI document, which should be valid
against a schema generated from the motherODD. Again, feel free to modify as necessary to
suit your own working practices!