Unpublished draft tutorial
Born digital
We keep telling you to make an ODD for your project. Surely there must be a way of kickstarting the process automatically? Of course there is. This Guide shows you how to generate an ODD automatically from a corpus of existing TEI P5 documents, and makes some suggestions about how you might want to improve it.
The TEI provides a utility called
It is a little different from the typical XSLT stylesheet
in that it is designed to process a large number of separate documents rather than a
single one. You control its input and output by means of parameters which must be
passed to the XSLT processor in slightly differing ways depending on the processing
environment. For example, you could run the stylesheet at the command line, using a
command such as
If you are used to working at the command line, this may be the quickest and simplest
option. To process all the TEI files in a directory called
saxon -it:main -o:myGenerated.odd /usr/share/xml/tei/stylesheet/tools/oddbyexample.xsl
corpus=/home/me/myCorpus
(In case you are wondering, the -it option tells saxon which template in
the stylesheet should be processed first.)
To define an appropriate oXygen transformation scenario for the same files, you would
proceed as follows:
${frameworks}/tei/xml/tei/stylesheet/tools/oddbyexample.xsl to find
it (yes, there teis in that path)main
corpus parameter to contain the full name of the folder which
you want to analyse. Assuming you opened one of its files in the first step
above, just set the parameter to ${cfd} and click OK
Once you have defined this transformation scenario, you can use it as often as you
like with any collection of files. You don't need to go through the whole of the
above rigmarole every time! Next time round, proceed as follows:
Once this association has been made, every time you open that TEI XML file in oXygen, you can rerun the transformation, just by clicking the big red triangle on the tool bar (or typing CTRL-SHIFT-T, or selecting Document - Transformation - Apply Transformation Scenario)
You can also edit the scenario, for example by changing the value of the parameters passed on to the stylesheet, or by changing the output options. See below for a list of the parameters you can modify.
The ODD generated by
However, my ODD also contains many lines which are less useful. For example, if one of
the global attributes (such as @corresp) has been used on just one of the elements in
your corpus, the generated ODD has to delete it explicitly from every element on
which it is
More usefully, the generated ODD can tell you about the values which are actually
used for attributes (such as
Here's an example of the sort of thing I mean:
What can we do with this ODD file, other than study it as a witness to the follies of
our encoders? We can of course process it to generate a schema and a mini-manual
documenting our practice, just like any other ODD. Here's a reminder of how to set up
an oXygen transformation scenario to do just that:
You should of course check that the schema generated from this ODD does in fact validate all your corpus files correctly though it would be somewhat alarming if it did not. When you have some new files to add to your corpus, however, this process becomes very useful, whether you decide to maintain the encoding practices already established, or to expand them to cater for new usages in your new files. Maybe you'll need to add new values to the permitted range for one of your attributes? Maybe an attribute or element you thought you would never use needs to be restored to the ODD?
Suppose for example that although your ODD says that the values of If
there is a discrepancy between the text and the schema, in our world we trust the
text
) Let's say you adopt the latter course. You need to locate the
While you're there, you might like to document what you mean by each of these values,
for the benefit of non-English speakers, or yourself when you have forgotten whether
Once you've made these changes, you might like to regenerate a schema from your ODD,
and then check that oXygen will use the new features you added. When adding a
Now take a look at the HTML
By default, the examples, descriptions, and cross references supplied for each element will be just the same as those in the Guidelines. The content model and list of attributes however should reflect any changes proposed by your ODD.
You might like to test this by replacing one or more of the usage examples provided for each element with examples taken from your own corpus.
Open your generated ODD in oXygen and locate the my example </egXML>
Before doing so, however, we suggest you also add some discussion explaining what
your ODD is for, documenting its components and intended usage. This informal
documentation can be as simple or complex as you want, and you can (of course) use
all the range of elements provided by the TEI to express it. Add at least a
For example, you might expand your ODD to look like this:
This schema proposes a minimal subset of TEI elements, adequate to basic
transcription of archival sources. Each document is transcribed as a separate The following TEI elements are used to represent these components:
When an ODD like this is processed, the HTML (or other document formats) generated
from it by an ODD processor will contain all the text you see here. The element
There are many other things you might do once you start editing your ODD. This tutorial just suggests a few that are likely to be particularly useful in the process of defining a useful schema for an existing set of TEI documents.
Not all of them work... see