MDB2 XML Schema documentation

Manuel Lemos (mlemos@acm.org)

Igor Feghali (ifeghali@php.net)

Version control: @(#) $Id$

Contents

Introduction to XML

The schema description format is based on XML (eXtensible Markup Language). For those that are not familiar with it, XML is standard that specifies rules to define abstract data formats based on tagged text.

Like HTML (HyperText Markup Language), XML is also based on SGML (Standard Generalized Markup Language). SGML defines rules to structure data using special text tags.

SGML tags may be used to delimit data sections. Section begin tags are of the form <name attributes... > and end tags are of the form </name>. name is the name of the tag and attributes is a set of zero or more pairs of attribute names and the values associated with the respective tag.

XML is a little stricter in the way tags may be used. While with many SGML formats some end tags are optional, in XML end tags are always required. Also, when tag attributes are used, attribute values must be specified always between quotes. These XML requirements are usually known as well-formedness.

Another important detail about XML strictness is that tag names and attributes are case sensitive. This means that tags in upper case are distinct from tags in lower case.

Schema description format and XML

Unlike a common (but mistaken) belief, XML is not meant just for describing data in documents that are meant to be displayed or printed. XML is a standard that defines rules for describing abstract data may be used to for any purpose.

Even though it may be used to schemas that may be displayed or printed, Metabase schema description format is meant to provide a way for developers to design their database schemas using a DBMS independent file format. Using this format, developers may describe relations and properties of tables, field, indexes, sequences, etc..

This format uses just a subset of the XML known as SML (Simplified Markup Language). SML formats complies with the same rules as any XML format but it does not use all its possibilities to make it simpler for users to write and understand the data. For instance, files written on SML do not use any tag attributes.

Metabase schema description format is also simpler to edit by hand because tags and constant data values should always be in lower case to save the user from holding frequently the keyboard shift key.

Schema description data structure

The Metabase schema description format lets the developers describe a set of database schema objects following the database objects hierarchy. This means that a database may contain tables and sequence objects, tables may contain fields and index objects and all these objects have their own attributes.

The definition of each database schema object contained within the begin and end tags of the respective container object. Therefore, the definition of each table and sequence has to be specified between the main database begin and end tags. Likewise, the definition of fields and indexes has to be specified between the respective table begin and end tags.

The properties of each schema object are also defined between the respective begin and end tags. The values of each property are also defined between the respective property being and end tags.

The values of the properties are subject of validation according to the type of each property and the context within which they are being defined.

Some properties define names of database schema objects. There are names that are accepted as valid for some DBMS that are not accepted by other DBMS. Metabase schema parser may optionally fail if such names are used to reduce the potential problems when using the same Metabase based application with different DBMS.

The schema object description tags are defined as follows:

database

The database tag should be always at the top of the schema object hierarchy. Currently it may contain the definition of two types of objects: table and sequence.

The database schema object may have the following properties:

table

The table is one of the main database schema objects. It may be used in a schema description multiple times, once per each table that is contained the database being described.

The table object definition may contain the declaration and initialization sections besides the properties. The table schema object may have the following properties:

Example

declaration

declaration is one of the sections that is part of the table definition. This section is required because it must contain the definition of the table field and index objects.

field

field is one of the types of table definition object. It should be specified within the table declaration section for each field that the table should contain.

The field schema object may have the following properties:

Example

index

index is another type of table definition object. It should also be specified within the table declaration section for each field that the table should contain.

The index schema object may have the following properties:

index field

field is a section that is part of the table index declaration. It should be used once per each field on which the index should be created.

The index field declaration may have the following properties:

Example

foreign

foreign was added in Schema version 0.7.3. [description here]

The foreign schema object may have the following properties:

foreign field

foreign field should be used once per each field on which the foreign key should be created. Allow a string value indicating a field name.

foreign references

references is a section that is part of the table foreign key declaration.

This section is optional. If it is absent, it is assumed that the referenced key is the primary key of referenced table.

The foreign references declaration may have the following properties:

Example

initialization

initialization is another section that is part of the table definition. This section is optional and may contain the definition of actions that should be executed when installing the database for the first time. update and delete was added in Schema version 0.7.0.

The available actions, in the execution order, are:

insert

insert is a table initialization command that specifies the values of the fields of rows that should be inserted in the respective table after the database is created for the first time.

The insert initialization object do not have any property.

insert field

field is a section that is part of the table insert initialization. It should be used once per each field which should be filled with data.

It may have only the property name.

One, and only one, of the following objects should be used to specify the initial data:

When doing an insert missing fields are implicitly initialized by the DBMS with the respective default values.

Example
insert select

select was added in Schema version 0.7.3 to acomplish the INSERT...SELECT syntax. It is optional and, when used, should be the only child of insert.

It may have only the property table that designates the table name in the select statement.

insert select field

The insert select field object is defined exactly the same way as insert field. Here name goes to the fields portion of the insert statement and the next property goes to the fields portion of the select statement

insert select where

The insert select where object, that is not required, must contain only one expression object.

In that case the expression object usually links operants by logical operators like AND, OR and others.

update

update is a table initialization command that replaces the values of the fields of rows, matched by a condition, in the respective table after the database is created for the first time.

The update initialization object do not have any property.

update field

The update field object is defined exactly the same way as insert field

update where

The update where object, that is not required, must contain only one expression object.

In that case the expression object usually links operants by logical operators like AND, OR and others.

Example

delete

delete is a table initialization command that removes rows, matched by a condition, in the respective table after the database is created for the first time.

The delete command definition only contains one more where section used to select the rows to be deleted.

The delete initialization object do not have any property.

delete where

The delete where object must contain only one expression object.

In that case the expression object usually links operants by logical operators like AND, OR and others.

Example

Complex initialization objects

function

The function object will bind to a DBMS function, that can be called with any number of arguments.

It may have only the property name.

Any of the following objects should be used, how many times it is necessary, to provide the arguments:

expression

The expression object is a recursive structure used to relate pairs of value, column, function and expression objects.

The expression object do not have any property and has the exact form operant operator operant.

expression operant

The expression operant isn't a object itself but may be represented by one of the following objects:

expression operator

Operators aren't implemented in a portable way yet. Currently the following descriptive operators are always binded to the same operators:
OperatorBinding
PLUS+
MINUS-
TIMES*
DIVIDED/
EQUAL=
NOT EQUAL!=
LESS THAN<
GREATER THAN>
LESS THAN OR EQUAL<=
GREATER THAN OR EQUAL>=

Any other value given to operator will be parsed with no translation.

Future versions of Schema may provide a common set of descriptive operators that are translated to the respective operator for your DBMS

sequence

The sequence is another main database schema object. It may be used in a schema description multiple times, once per each sequence that is contained the database being described.

A sequence is an object that only contains an integer value. A sequence value is automatically incremented after every time it is fetch the next sequence value. The fetch and increment actions constitute an atomic operation. This means that if two concurrent accesses to the database fetch the next sequence value, they will always get different values.

Sequences are useful to generate integer numbers that may be used to store in fields that must contain unique numbers.

The sequence object definition may contain one on section besides the properties. The sequence schema object may have the following properties:

sequence on

The sequence on table field section specifies a table field on which the sequence value should be synchronized.

This information is meant to be used only by the database manager class when it is added a sequence later after the database has been installed for the first time. If the sequence on is specified, the database manager class will override the sequence start value with a value that is higher than the highest value in the specified field table. Therefore, the specified field table type must be integer.

Example

variable

The variable tag is not meant to define any database schema object but rather a means to replace property values with variables defined at parsing time.

For instance, if you have several tables that store user names in text fields with the same length limit, instead of hard coding the length value in all field definitions, you may use a variable reference. The parser will replace variable references found in property definitions by the text values of the respective values passed to the parser.

The parser will issue an error if it find a reference for a variable that is not defined.

Schema overview data structure

Main structure

Common Elements

Legend