Introducing Pure ODD Lou Burnard

Unpublished draft

No other source: this is a born digital document

Add anyElement Typos etc from Martina Finalising discussion of dataspec Finalising discussion of text/interleave

This document is a tutorial guide to some new features of the ODD language introduced at release 3.0 of TEI P5. It assumes the reader already knows something about how ODD is designed and used, and presents only those aspects which have changed with the introduction of Pure ODD. For discussion and background information about the motivation for these changes see Resolving the Durand Conundrum, published in the TEI Journal (issue 6, 2013).

Two major changes are described. Firstly, the content model of an element specification, the content of the content element in an elementSpec, is now expressed using some new TEI elements, rather than (as previously) expressions in the RELAXNG syntax. Secondly, TEI-defined data specifications are now expressed using a dataSpec element rather than by a macro specification (macroSpec of type="dt") and a new dataRef element is used to select one as content for the datatype element which defines the datatype of an attribute

Defining a content model

In Pure ODD, we use the content element to describe the intended content of an element.

If the element concerned is empty, that is, it has no content at all, but only attributes, then the content element itself is empty: an element which may not contain anything

If the element concerned contains text, we use the special element textNode an element which may contain only textual data A text node may be of any length, including zero.

More usually, an element has what is known as element content. In this case, the content element will contain references to one or more other elements, each represented by an elementRef element. If there is only one such child element it can be given directly: an element which may contain only a one element

The attributes minOccurs and maxOccurs are used to indicate repetition. In the following example, we define an element which may contain any number of occurrences of the one element greater than two: an element which may contain two or more one elements

In some unusual circumstances (for example, defining the content of an element such as the TEI's egXML element) it may be necessary to say that a content model should permit any element at all, or any element from one or more specific namespaces. The anyElement element is provided for this purpose: an element which may contain one or more elements not taken from the TEI namespace

Grouping elements

An element may contain references to more than one different element. These elements may be grouped in one of three ways: as a sequence, as an alternation, or interleaved.

In a sequence, all of the child elements must follow each other in the same order as they appear within the content element: an element which may contain a one element followed by a two element

In an alternation, any of the child elements may appear: an element which contains either a one element or a two element

In an interleaved model, the child elements may appear in any order: an element which contains either a one element followed by a two element, or a two element followed by a one element. Not all target schema languages support the concept of interleaving. An ODD processor may map specifications using the interleave element to a less precise construct in the target language, or to a combination of constructs in different constraint languages.

The attributes minOccurs and maxOccurs are also used on these grouping elements to indicate repetition of the group. For example: an element which contains up to three repetitions of pairs of elements, each containing a one followed by a two

When alternations are repeated, any one of the child elements may appear any number of times: an element which contains one or more one or two elements

Occurrence indicators may be given at both levels. For example: an element which contains up to three repetitions of two or moreone elements followed by a two element

Sequences, alternations, and interleaved sequences may all be nested and combined as necessary, permitting quite complex structures to be expressed.

Mixed content

An element may have what is known as mixed content, meaning that it may contain a mixture of text fragments and some specified elements. This can be represented in Pure ODD using the alternate element: an element which contains any combination of text nodes, one elements, and two elements

It may also, more economically, be represented using interleave: an element which contains any combination of text nodes, one elements, and two elements

A text node may appear anywhere within an alternation or a sequence. For example: an element which may contain either two or moreone elements, or a text node However, not all current schema languages support such content models.

Class references

A classRef element can be used within a content model in the same way as an elementRef. It is a shorthand way of saying that any member, or all members, of a named model class of elements is permitted. For example: an element which contains a sequence of up to three elements which are members of the model.digital class

The classRef element here is understood to mean any one member of the class. Its attribute expand can be used to specify other meanings. For example, supposing that the model.digital has members dig1 dig2 and dig3, a reference to the class can have the following meanings: Value of expand Expansion of classRef alternate [default]sequencesequenceOptionalsequenceOptionalRepeatablesequenceRepeatable

A model class contains a predefined set of element types which we wish to manipulate or reference together, usually because they can all appear in the same context, or share other properties. Pure ODD allows us to define such classes by means of a classSpec element. Once defined, members of that class can be referenced by means of a classRef element, as we have just seen. An element specification (not discussed here) includes a specification of the classes of which it is a member in its classes element.

Datatypes

A further part of the specification for an element is the list of attributes it may bear, provided by an attList element. The specification for an attribute (provided by an attDef element) also includes information about the kinds of value it is permitted to take, for example, whether it is a date or an integer. We call this its datatype.

In Pure ODD, a datatype is specified using the dataRef element. In the following example we define an attribute called count on the element one, and specify that its value must be a positive integer: an empty element with an attribute indicates how often the one element is used

The name nonNegativeInteger identifies one of the datatypes defined by the W3C as part of its schema language, and is not further defined by our ODD.

It is however possible to provide a more detailed specification for a datatype using the dataSpec element to document its values, intended uses, etc. The TEI defines many such datatypes in the Guidelines, and these can also be re-used directly within your ODD. The attribute key is used (rather than name) to indicate that a TEI-defined or locally-defined datatype is intended, as in the following example: indicates how often the one element is used

In this example, the name teidata.count identifies a TEI datatype specification. That specification is provided by a dataSpec element with the identifier teidata.count, which is provided as part of the TEI system. A TEI dataSpec is similar to an elementSpec. Its content element may contain a dataRef element or a valList, or a number of such elements combined using the elements alternate or sequence in the same way as elementRefs are combined.

For example, if we wish to say that the value of the attribute count can be either a non-negative integer or the string unknown we would first define a dataSpec element with appropriate content: permits non-negative integers or the string unknown or equivalent As noted above, the key attribute on dataRef can then be used to refer to this data specification: may indicate how often the one element is used if we know

A dataRef element can also be used to define the content of an element. For example, there is a TEI data specification called teidata.xpath which can be used to indicate that the value of an attribute must be a conformant XPath specification. indicates an XPath to the nodes required

The same dataRef might also be used to indicate that the content of an element must be a conformant XPath specification: indicates an XPath to the nodes required

Note however that not all schema languages support the ability to constrain element content in this way.

Macro specifications

A small number of comparatively complex content models are frequently used by other TEI specifications. Rather than define them afresh each time, it is convenient to reference a macro, the value of which contains their definition. For example, the following code defines a macro called macro.xText a content model alternating any gaiji-like element and plain text

To use this declaration, a content model can simply reference it by means of a macroRef: an element containing any gaiji-like element and plain text