What is this for?
This little guide is intended to simplify and clarify the task of maintaining or updating an
existing TEI customization (an ODD), in particular an ODD which was written before release 3.0
of TEI P5. That release introduced several new features intended to remove the system's
reliance on other schema languages. These changes are colloquially known as
purification because their motivation was to ensure that all aspects of
a TEI specification should be encoded using TEI and nothing else (see further
[Resolving the Durand
Conundrum](https://jtei.revues.org/842)); they affect only the way that the content model of an
element or the datatype of an attribute are defined, but for completeness we have included
reminders about some other customization features that you may wish to review.
It ain't broke: why fix it?
Since release 2.0 it has been possible to tie any ODD file to a particular release of the
Guidelines. If you know approximately the date of the release of TEI P5 against which your ODD
was first compiled, you can use the source attribute on the schemaSpec
element it contains to ensure that the schema generated from it continues to use the TEI
definitions contained in that particular source. This has the obvious advantage that you don't
have to do any work to maintain it, but some equally obvious disadvantages:
- You can't use any of the new features, corrections, or other improvements introduced
in TEI P5 after the date your ODD was created;
- Your TEI usage risks becoming increasingly divergent from the mainstream, so that
interchange with others will become increasingly difficult.
Nothing in digital form is ever really finished. It's more than likely that the requirements of
your project will have changed a little since your ODD was first designed. It's almost
inevitable that as your project evolves, you'll have come across things you'd do differently if
you could start all over again. Reviewing your ODD is a very good way of thinking about doing
that.
Good things to review
Here's a little checklist of things you might want to review in the light of experience:
- Which elements are you actually using? Beginners often think that it's better to allow
almost any kind of content in their schema: an extreme case of this misapprehension leads
people to use TEI All for everything. It may well be that your project started out a bit
uncertain about the kind of data it would have to be able to handle. But now you've
encoded umpteen gazillion pages of Whatever, you surely know a bit better the kind of
things that crop up. And every element you allow for in your schema is another element
you need to explain to your encoders, document, find examples for, and check that it is
being used consistently (if it is used at all). It's also another element that the poor
benighted software developer you finally got funding for has to be prepared to handle in
that swish new interface you've been promising yourself forever.
- Reducing the number of elements permitted by your schema makes it easier for you to
concentrate on the quality of your documentation, for example by introducing examples
more appropriate to your project than those provided by vanilla TEI (which somehow manage
to be both general and very specialised).
- The same considerations apply to attributes, and in particular their range of values.
At the outset you may not have been sure what values to permit for the foo
attribute on your bar elements, so you allowed anything. Now you have discovered
that some of your encoders gave this attribute the value centre, others used
center, and yet others used middle, all meaning (probably) the
same thing. Now that you know what values you want, add a valList to your ODD to
insist on them, and do some data cleanup.
- Has the circus moved on? You may have spent a lot of time and effort defining new
elements or attributes not available in the TEI at the time, only to find that the TEI
(independently or as a result of your good work) has subsequently decided to implement
exactly what you did, or something like it. Maybe you can remove your old modifications
and be pure TEI again. Even if the TEI decided to do things differently from you, it
might be worth looking at the TEI version of your smart idea to see if it can be adapted
to your needs!
And yes, it's quite likely that as a result of this review, you may well need to change all
your data as well as your schema. But that's a simple matter of XSLT programming ...
Content Model Revision
In days of yore, (i.e. releases P1 to P3 inclusive) TEI Content Models were expressed in the
SGML language. The first XML release TEI P4, and all releases of P5 prior to 3.0 used the ISO
standard RELAX NG language for this purpose. But from the start, a design goal of the TEI was
to define an encoding scheme independent of any implementation metalanguage. The TEI has always
used its own ODD language to define components which are then processed to
produce documentation in a variety of formats (HTML, PDF, ePub etc.) and schemas in a number of
different languages (RELAX NG, W3C Schema, and XML DTD). At release 3.0 this principle was
extended to the definition of content models, which previously had been expressed using the
RELAX NG language, but which are now expressed using some special purpose TEI elements (The authoritative source of information on these is of course chapter 22 of the
Guidelines [Documentation Elements](http://www.tei-c.org/release/doc/tei-p5-doc/en/html/TD.html). In any disagreement between what is
stated there and what is suggested here, the former is correct.)
If your customization changed the content model of any element, it will have done so using
expressions in RELAX NG, typically using a rng:ref element to provide the name of a
pattern. For example, suppose your customization defined a new element with a content model of
macro.phraseSeq
. In the elementSpec defining your new element, there
will be a content element containing something like <rng:ref
name="macro.phraseSeq"/>
. To keep in step with the current TEI P5 release, you need
to change this RELAX NG content to its equivalent in the ODD language which is
<macroRef key="macro.phraseSeq"/>
.
In the ODD language, references to different kinds of object (elements, classes, macros, etc.)
in a content model are represented by different element types, specifically
elementRef, classRef, macroRef. Hence, a RELAX NG content model in
the form <rng:ref name="x">
will become <elementRef key="x"/>
if x names an element, <classRef key="x"/>
if it names a class,
and <macroRef key="x"/>
if it names a macro.
Content models can of course be much more complicated. If your RELAX NG model uses an
alternation or a sequence of components, you will need to use one of the ODD elements
alternate and sequence to wrap them rather than the RELAX NG
(semi)equivalents rng:choice and rng:group respectively. For example, this
RELAX NG model says that a foo can contain either a bar or a baz:
Its equivalent in pure ODD would be
If some components of your content model are optional or repeatable, you will have used the
RELAX NG elements rng:oneOrMore or rng:zeroOrMore. In ODD optionality and
repeatability are expressed using attributes minOccurs and maxOccurs,
which can be supplied for any of the elements discussed so far. For example, a RELAX NG content
model such as
has the following equivalent in pure ODD
An empty element is indicated in the RELAX NG language by a special pattern called
rng:empty. In the ODD language, however, we indicate that an element has no content
by supplying an empty content element in its specification.
The ODD language also provides a few more special-purpose component elements for content
models: textNode, valList, and anyElement.
The textNode element is provided as a replacement for the built-in RELAX NG pattern
rng:text. There are no restrictions on where it may be placed in an ODD content
model, although existing schema languages mostly permit it to appear only in mixed content
models like the following:
The RELAX NG language allows a number of other components within a content model, some of which
are difficult to convert, but few of which are likely to appear in a pre-existing TEI ODD. If
your ODD used the RELAX NG element rng:value to specify content explicitly, this must
be expressed in an ODD as a valItem within a valList. If your ODD used the
RELAX NG element rng:element to specify that any element is permitted at this point in
a content model, you can do something similar with the ODD anyElement element. In
general, however, the presence of any of the following in your old content model will require
manual intervention: rng:anyName
rng:attribute
rng:data
rng:element
rng:except
rng:name
rng:nsName
rng:param
rng:value
Here are some example content elements taken from the content models of existing TEI
elements:
RELAX NG Specification |
ODD equivalent |
|
|
|
|
|
|
Attribute values and datatypes
As noted above, you may wish to change your existing ODD to be more precise in the way that
attribute values are specified. If you decide to introduce a valList (semi-closed or
closed) to constrain the possible values of an attribute you will also need to change its
datatype to reference teidata.enumerated.
If the datatype element appears in your ODD already, you will need to change it to use
a dataRef element pointing to one of the predefined teidata datatype
specifications. A set of predefined data specifications using RELAX NG has been retained for
compatibility reasons, but these will probably be withdrawn by the end of 2017.
Proceed as follows:
- if your existing datatype contains something like
<rng:ref
name="data.xxxx"/>
, change it to <dataRef
key="teidata.xxxx"/>
- if your existing datatype contains something like
<rng:data
type="xxxx"/>
, change it to <dataRef name="xxxx"/>
- if a component of your existing datatype contains something like
<rng:param name='pattern'/>
you will need to either
- add an attribute restriction to the datatype, giving as
value the content of the rng:param element
- or (if the parameter cannot be expressed as a regular expression, and if the
datatype uses the name attribute) re-express the parameter as a TEI
dataFacet
- If the datatype being defined permits multiple values, alternation, or anything else
other than a single value, you must define it in a dataSpec element, using the
same components as an element content model (alternate, maxOccurs,
etc.) Note that although the datatype element permits only a single
dataRef child, it is itself repeatable, so that you can say (for example)
this attribute takes at least three pointer values
.
Here are some example content elements taken from the data specifications used by some
existing TEI datatypes:
RELAX NG Specification |
ODD equivalent |
|
|
100
|
|
0.0
1.0
regular
irregular
unknown
|
|
high
medium
low
unknown
|
|
My head hurts: where's the script?
Yes, there is a script or two you can experiment with if you find this all too daunting. They
were used to transform the whole of the pre-3.0 TEI Guideline specifications
semi-automatically. They deal with 99% of the situations you are likely to encounter, but
they're not guaranteed! You can find them on Github at : use them in good health, but at
your own risk.
Don't hesitate to share your experience, good or bad, on the TEI discussion list, by raising a
ticket on Github, or by contributing to the TEI wiki. The TEI is a community effort!