%status-entities; This document will be considered ready for transition to Proposed Recommendation at the same time that the XQuery 3.1 specification is ready for transition to Proposed Recommendation.
'> This &doc.w3c-doctype-full; specifies XSLT and XQuery Functions and Operators (F&O) version 4.0, a fully compatible extension ofThis document is a working draft developed and maintained by a W3C Community Group,
the
The community group welcomes comments on the specification. Comments are best submitted
as issues on the group's
The community group maintains two extensive test suites,
one oriented to XQuery and XPath, the other to XSLT.
These can be found at
The publications of this community group
This document defines constructor functions, operators, and functions on the datatypes defined in
A summary of changes since version 3.1 is provided at
The purpose of this document is to define functions and operators for inclusion in
XPath 4.0, XQuery 4.0, and XSLT 4.0.
The exact syntax used to call these
functions and operators is specified in
This document defines three classes of functions:
General purpose functions, available for direct use in user-written queries, stylesheets, and XPath expressions,
whose arguments and results are values defined by the
Constructor functions, used for creating instances of a datatype from values of (in general) a different datatype. These functions are also available for general use; they are named after the datatype that they return, and they always take a single argument.
Functions that specify the semantics of operators defined in
xs:dateTimeStamp, and it
incorporates as built-in types the two types xs:yearMonthDuration and xs:dayTimeDuration
which were previously XDM additions to the type system. In addition, XSD 1.1 clarifies and updates many
aspects of the definitions of the existing datatypes: for example, it extends the value space of
xs:double to allow both positive and negative zero, and extends the lexical space to allow +INF;
it modifies the value space of xs:Name
to permit additional Unicode characters; it allows year zero and disallows leap seconds in xs:dateTime
values; and it allows any character string to appear as the value of an xs:anyURI item.
Implementations of this specification
In some cases, this specification references XSD for the semantics of operations such as the effect of matching using regular expressions, or conversion of atomic items to strings. In most such cases there is no intended technical difference between the XSD 1.0 and XSD 1.1 specifications, but the 1.1 version often provides clearer explanations and sometimes also corrects technical errors. In such cases this specification often chooses to reference the XSD 1.1 specification. This should not be taken as implying that it is necessary to invoke an XSD 1.1 processor.
References to specific sections of some of the above documents are indicated by
cross-document links in this document. Each such link consists of a pointer to a
specific section followed a superscript specifying the linked document. The
superscripts have the following meanings: XQ
Despite its title, this document does not attempt to define the semantics of all the operators available
in the
The remaining operators that are described in this publication are the arithmetic operators,
where the semantics of the operator
depend on the types of the arguments. For these operators, the language specification describes rules for selecting
an internal function defined in this specification to underpin the operator. For example, when the operator x+y
is applied to two operands of type xs:double, the function op:numeric-add is selected.
Previous versions of this specification also defined the value comparison operators such as
eq, lt, and gt in terms of functions such as
op:date-greater-than. These have been dropped; for most data types the semantics
of the value comparison operators are defined by reference to the
This recommendation contains a set of function specifications. It defines conformance at the level of individual functions. An implementation of a function conforms to a function specification in this recommendation if all the following conditions are satisfied:
For all combinations of valid inputs to the function (both explicit arguments and implicit context dependencies), the result of the function meets the mandatory requirements of this specification.
For all invalid inputs to the function, the implementation raises (in some way appropriate to the calling environment) a dynamic error.
For a sequence of calls within the same
Other recommendations (“host languages”) that reference this document may dictate:
Subsets or supersets of this set of functions to be available in particular environments;
Mechanisms for invoking functions, supplying arguments, initializing the static and dynamic context, receiving results, and handling errors;
A concrete realization of concepts such as
Which versions of other specifications referenced herein (for example, XML, XSD, or Unicode) are to be used.
Any behavior that is discretionary (implementation-defined or implementation-dependent) in this specification may be constrained by a host language.
Adding such constraints in a host language, however, is discouraged because it makes it difficult to reuse implementations of the function library across host languages.
This specification allows flexibility in the choice of versions of specifications on which it depends:
It is
It is
It is
The XML Schema 1.1 recommendation
introduces one new concrete datatype: xs:dateTimeStamp; it also incorporates
the types xs:dayTimeDuration, xs:yearMonthDuration,
and xs:anyAtomicType which were previously defined in earlier versions of xs:NCName
based on the rules in XML 1.1 rather than 1.0.
The
In this document, text labeled as an example or as a note is provided for explanatory purposes and is not normative.
The functions and operators defined in this document are contained in one of
several namespaces (see xs:QName.
This document uses conventional prefixes to refer to these namespaces. User-written
applications can choose a different prefix to refer to the namespace, so long as it is
bound to the correct URI. The host language may also define a default namespace for
function calls, in which case function names in that namespace need not be prefixed
at all. In many cases the default namespace will be
http://www.w3.org/2005/xpath-functions, allowing a call on the name() rather than fn:name();
in this document, however, all example function calls are explicitly prefixed.
The URIs of the namespaces and the conventional prefixes associated with them are:
http://www.w3.org/2001/XMLSchema for constructors —
associated with xs.
The section http://www.w3.org/2001/XMLSchema,
and are named in this document using the xs prefix.
http://www.w3.org/2005/xpath-functions
for functions — associated with fn.
The namespace
prefix used in this document for most functions that are available to users is
fn.
http://www.w3.org/2005/xpath-functions/math
for functions — associated with math.
This namespace is used for some mathematical functions. The namespace
prefix used in this document for these functions is math.
These functions are available to users in exactly the same way as those in the
fn namespace.
http://www.w3.org/2005/xpath-functions/map
for functions — associated with map.
This namespace is used for some functions that manipulate maps (see
map.
These functions are available to users in exactly the same way as those in the
fn namespace.
http://www.w3.org/2005/xpath-functions/array
for functions — associated with array.
This namespace is used for some functions that manipulate maps (see
array.
These functions are available to users in exactly the same way as those in the
fn namespace.
http://www.w3.org/2005/xqt-errors — associated with
err.
There are no functions in this namespace; it is used for error codes.
This document uses the prefix err to represent the namespace URI
http://www.w3.org/2005/xqt-errors, which is the namespace for all XPath
and XQuery error codes and messages. This namespace prefix is not predeclared and
its use in this document is not normative.
http://www.w3.org/2010/xslt-xquery-serialization — associated with
output.
There are no functions in this namespace: it is
used for serialization parameters, as described in
Functions defined with the op prefix are described here to
underpin the definitions of the operators in op prefix. For example, multiplication is generally
associated with the * operator, but it is described as a function
in this document:
Sometimes there is a need to use an operator as a function.
To meet this requirement, the function fn:for-each-pair($seq1, $seq2, op("+"))
performs a pairwise addition of the values in two input sequences.
The above namespace URIs are not expected to change from one version of this document to another. The contents of these namespaces may be extended to allow additional functions (and errors, and serialization parameters) to be defined.
A function is uniquely defined by its name and arity (number of arguments); it is therefore
not possible to have two different functions that have the same name and arity, but different
types in their signature. That is, function overloading in this sense of the term is not permitted.
Consequently, functions such as item()?
which accepts any single item; supplying an inappropriate item (such as a function item) causes
a dynamic error.
Some functions on numeric types include the type xs:numeric in their signature
as an argument or result type. In this version of the specification, xs:numeric
has been redefined as a built-in union type representing the union of
xs:decimal, xs:float, xs:double (and thus automatically
accepting types derived from these, including xs:integer).
Operators such as + may be overloaded: they map to different underlying functions depending
on the dynamic types of the supplied operands.
It is possible for two functions to have the same name provided they have different arity (number of arguments). For the functions defined in this specification, where two functions have the same name and different arity, they also have closely related behavior, so they are defined in the same section of this document.
Each function (or group of functions having the same name) is defined in this specification using a standard proforma. This has the following sections:
The function name is a QName as defined in math:sin and
math:cos for sine and cosine). If a
function name contains a
The first section in the proforma is a short summary of what the function does. This is intended to be informative rather than normative.
Each function is then defined by specifying its signature(s), which define the types of the parameters and of the result value.
Where functions take a variable number of arguments, two conventions are used:
Wherever possible, a single function signature is used giving default values for those parameters that can be omitted.
If this is not possible, because the effect of omitting a parameter cannot be specified by giving a default value, multiple signatures are given for the function.
Each function signature is presented in a form like this:
In this notation, http://www.w3.org/2005/xpath-functions:
this is one of the conventional prefixes listed in (); otherwise, the name is followed by a parenthesized list of
parameter declarations. Each parameter declaration includes:
The name of the parameter (which in 4.0 is significant because it can be used as a keyword in a function call)
The static type of the parameter (in italics)
If the parameter is optional, then an expression giving the default value
(preceded by the symbol :=).
The default value expression is evaluated using the static and
dynamic context of the function caller (or of a named function reference). For example,
if the default value is given as ., then it evaluates to the context value
from the dynamic context of the function caller; if it is given as default-collation,
then its value is the default collation from the static context of the function caller;
if it is given as deep-equal#2, then the third argument supplied to deep-equal
is the default collation from the static context of the caller.
If there are two or more parameter declarations, they are separated by a comma.
The return-type
The next section in the proforma defines the semantics of the function as a set of rules.
The order in which the rules appear is significant; they are to be applied in the order in which
they are written. Error conditions, however, are generally listed in a separate section that follows
the main rules, and take precedence over non-error rules except where otherwise stated. The principles outlined
in
Some functions supplement the prose rules with a more formal specification that describes the effect of the function in terms of an equivalent XPath or XQuery implementation. This is intended to take precedence over the prose rules in the event of any conflict; however, both sections are intended to be complete and not to rely on each other.
In writing the formal equivalents, a number of guidelines have been followed:
Where the equivalent code calls other functions, these should either be primitives
defined in the data model specification (see
There should be minimal reliance on XPath or XQuery language features. Although no attempt has been made to precisely define a core set of language constructs, the specifications try to avoid relying on features other than function calls and a few basic operators including the comma operator, equality testing, and simple integer arithmetic.
There is no suggestion that the formal equivalent is a practical implementation; in many cases it might have very poor performance.
In some cases the formal equivalent does not attempt to replicate correct behavior in error cases; if so, this is always clearly stated.
The formal equivalent will always produce a conformant result for the function, but in some cases this will not be the only possible conformant result.
There is no attempt to write formal equivalents for functions that have complex logic
(such as
Where the proforma includes a section headed
Where the proforma includes a section headed
Many of the examples are given in structured form, showing example expressions and their expected results.
These published examples are derived from executable test cases, so they follow a standard format. In general,
the actual result of the expression is expected to be deep-equal to the presented result, under the
rules of the
For more complex functions, examples may be given using informal narrative prose.
Rules for evaluating the operands of operators are described in the relevant sections
of xs:untypedAtomic and the empty sequence are specified in this section.
For function calls, the required type of an argument is defined in the function
signature of each function, and the way in which a supplied value is converted to the
required type (or rejected if it cannot be converted) is defined by the
Some functions accept a single value or the empty sequence as an argument and
some may return a single value or the empty sequence. This is indicated in the
function signature by following the parameter or return type name with a
question mark: ?, indicating that either a single value or the
empty sequence must appear. See below.
Note that this function signature is different from a signature in which the
parameter is omitted. See, for example, the two signatures
for ..
In the second signature, the argument must be present but may be the empty
sequence, written as ().
Some functions accept a sequence of zero or more values as an argument. This is
indicated by following the name of the type of the items in the sequence with
*. The sequence may contain zero or more items of the named type.
For example, the function below accepts a sequence of xs:double and
returns a xs:double or the empty sequence.
In XPath 4.0, the arguments in a function call can be supplied by
keyword as an alternative to supplying them positionally. For example the call
resolve-uri(@href, static-base-uri()) can now be written
resolve-uri(base: static-base-uri(), relative: @href). The order in which
arguments are supplied can therefore differ from the order in which they are declared.
The specification, however, continues to use phrases such as “the second argument” as a
convenient shorthand for "the value of the argument that is bound to the second parameter
declaration".
As a matter of convention, a number of functions defined in this document take a parameter whose value is a map, defining options controlling the detail of how the function is evaluated. Maps are a new datatype introduced in XPath 3.1.
For example, the function
Where a function adopts the
The value of the relevant argument must be a map. The entries in the map are
referred to as options: the key of the entry is called the option name, and the
associated value is the option value. Option names defined in this specification
are always strings (single xs:string values). Option values may
be of any type.
The type of the options parameter in the function signature is always
given as map(*).
Although option names are described above as strings, the actual key may be
any value that is the xs:untypedAtomic
or xs:anyURI are equally acceptable.
This means that the implementation of the function can check for the
presence and value of particular options using the functions
Implementations xs:QName as the option
names, using an appropriate namespace.
If an option is present whose key is not described in the specification,
then a type error xs:QName with a non-absent namespace.
All entries in the options map are optional, and supplying the empty map has the same effect as omitting the relevant argument in the function call, assuming this is permitted.
The ordering of the options map is immaterial.
For each named option, the function
specification defines a required type for the option value. The value that is actually
supplied in the map is converted to this required type using the
It is the responsibility of each function implementation to invoke this conversion; it does not happen automatically as a consequence of the function-calling rules.
In cases where the value of an option is itself a map, the specification of the particular function must indicate whether or not these rules apply recursively to the contents of that map.
The diagrams in this section show how nodes, functions, primitive simple types, and user defined types fit together into a type system. This type system comprises two distinct subsystems that both include the primitive atomic types. In the diagrams, connecting lines represent relationships between derived types and the types from which they are derived; the former are always below and to the right of the latter.
The xs:IDREFS, xs:NMTOKENS,
xs:ENTITIES types, and xs:numeric and both the
user-defined list types and
user-defined union types
are special types in that these types are lists or unions
rather than types derived by extension or restriction.
The first diagram illustrates the relationship of various item types.
Item types are used to characterize the various types of item that can appear in a sequence (nodes, atomic items, and functions), and they are therefore used in declaring the types of variables or the argument types and result types of functions.
In XDM, item types include node types,
function types, and built-in atomic types.
Item types form a directed graph, rather than a
hierarchy or lattice: in the relationship defined by the
derived-from(A, B) function, some types are derived from
more than one other type. Examples include functions
(function(xs:string) as xs:int is substitutable for
function(xs:NCName) as xs:int and also for
function(xs:string) as xs:decimal), and choice types
(A is substitutable for the choice type (A | B) and also
for (A | C). Record types provide an alternative way of categorizing
maps: the instances of record(longitude, latitude) overlap with
the instances of map(xs:string, xs:double). The diagram, which shows
only hierarchic relationships, is therefore a simplification of the
full model.
The next diagram illustrate the schema type subsystem, in which
all types are derived from xs:anyType.
Schema types include built-in types defined in the XML Schema specification, and user-defined types defined using mechanisms described in the XML Schema specification. Schema types define the permitted contents of nodes. The main categories are complex types, which define the permitted content of elements, and simple types, which can be used to constrain the values of both elements and attributes.
&common-anyType.xml;The final diagram shows all of the atomic types, including the primitive simple types and the
built-in types derived from the primitive simple types.
This includes all the built-in datatypes defined in
Atomic types are both item types and schema types, so the root type xs:anyAtomicType may be found
in both the previous diagrams.
The terminology used to describe the functions and operators on types defined in
Following in the tradition of
The following definitions are adopted from
xs:untypedAtomic
defined in
xs:untypedAtomic), and these have non-overlapping value spaces, so each
datum belongs to exactly one primitive atomic type.
The term
This document uses the terms string, character, and codepoint
with meanings that are normatively defined in
This
definition excludes Unicode characters in the surrogate blocks as well as
xs:string datatype.
The set of codepoints is thus wider than the set of characters.
This specification spells “codepoint” as one word; the Unicode specification spells
it as “code point”.
Equivalent terms found in other specifications are
“character number” or “code position”. See
Because these terms appear so frequently, they are hyperlinked to the definition only when there is a particular desire to draw the reader’s attention to the definition; the absence of a hyperlink does not mean that the term is being used in some other sense.
It is
This specification adopts the Unicode notation U+xxxx to refer to a codepoint
by its hexadecimal value (always four to six hexadecimal digits). This is followed where appropriate
by the official Unicode character name and its graphical representation: for example
Unless explicitly stated, the functions in this document do not ensure that any
returned xs:string values are normalized in the sense of
In functions that involve character counting such
as
Wherever encoding names (such as UTF-8 and UTF-16) are used in this specification,
they are compared without regard to case: the strings "UTF-8" and "utf-8" both
refer to the same encoding.
This document uses the phrase “namespace URI” to identify the concept identified
in
It also uses the term expanded-QName
defined below.
xs:QName datatype as defined in the XDM data model
(see
The term URI is used as follows:
xs:anyURI datatype
as defined in
This means, in practice, that where this
specification requires a “URI Reference”, an IRI as defined in xs:anyURI is a wider definition than the definition in
In this specification:
The auxiliary verb
When the sentence relates to an implementation of a function (for example "All implementations
When the sentence relates to the result of a function (for example "The result $arg") then the implementation is not conformant unless it delivers a result as stated.
When the sentence relates to the arguments to a function (for example "The value of $arg
The auxiliary verb
The auxiliary verb
Where this specification states that something is implementation-defined or implementation-dependent, it is open to host languages to place further constraints on the behavior.
This section is concerned with the question of whether two calls on a function, with the same arguments, may produce different results.
In this section the term
use-when attributes, are in a separate execution scope).
The following definition explains more precisely what it means for two function calls to return the same result:
$V1 and $V2 are
defined to be
Both items are atomic items, of precisely the same type, and the values are equal as defined using the eq operator,
using the Unicode codepoint collation when comparing strings.
Both items are nodes, and represent the same node.
Both items are maps, both maps have the same number of entries,
and for every entry E1 in the first map there is an entry E2 in the second map such
that the keys of E1 and E2 are
Both items are arrays, both arrays have the same number of members, and the members
are pairwise
Both items are function items,
neither item is a map or array, and the two function items have the same function identity.
The concept of function identity is explained in
Some functions produce results that depend not only on their explicit arguments, but also on the static and dynamic context.
The main categories of context-dependent functions are:
Functions that explicitly deliver the value of a component of the static or dynamic context,
for example
Functions with an optional parameter whose default value is taken from the static
or dynamic context of the caller, usually either the context value (for example,
Functions that use the static context of the caller to expand or disambiguate
the values of supplied arguments: for example xs:QName expands its first argument
using the in-scope namespaces of the caller.
Some functions depend on aspects of the dynamic context that remain invariant
within an
User-defined functions in XQuery and XSLT may depend on the static context of the function definition (for example, the in-scope namespaces) and also in a limited way on the dynamic context (for example, the values of global variables). However, the only way they can depend on the static or dynamic context of the caller — which is what concerns us here — is by defining optional parameters whose default values are context-dependent.
Because the focus is a specific part of the dynamic context, all
A
The result of a dynamic call to a function item never depends on the static or dynamic context of the dynamic function call, only (where relevant) on the captured context held within the function item itself.
The
All functions defined in this specification are
Some functions (such as is
operator). However, if non-identical nodes are returned, their content will be the
same in the sense of the
Some functions (such as "stable":false() that makes
them nondeterministic as a user option, and implementations
Where the results of a function are described as being (to a greater or lesser
extent)
A sequence is an ordered collection of zero or more items.
An item is a node, an atomic item, or a function, such as a map or an array. The terms
sequence and item are defined formally in
The following functions are defined on sequences. These functions work on any sequence, without performing any operations that are sensitive to the individual items in the sequence.
As in the previous section, for the illustrative examples below, assume an XQuery
or transformation operating on a non-empty Purchase Order document containing a
number of line-item elements. The variable $seq is bound to the
sequence of line-item nodes in document order. The variables
$item1, $item2, etc. are bound to separate, individual
line-item nodes in the sequence.
The functions in this section perform comparisons between the items in one or more sequences.
Many of these functions require atomic items to be compared for equality.
fn:compare(A, B)
returns zero when evaluated with a specified or context-determined collation
and implicit timezone.
Except where explicitly stated otherwise, an appeal to
NaN is treated as equal to NaN.
The following functions assert the cardinality of their sequence arguments.
The functions fn:remove($seq, fn:index-of($seq2, 'abc'))
requires the result of the call on fn:remove($seq, fn:exactly-one(fn:index-of($seq2, 'abc')))
would provide a suitable static type at query analysis time, and ensure that the length of the sequence is
correct with a dynamic check at query execution time.
The 4.0 specifications no longer define strict static typing as an option, so the utility of these functions has declined. They may still serve a purpose, however, as assertions signaling expected preconditions both to the processor and to anyone reading the code.
The type signatures for these functions deliberately declare the argument type as
item()*, permitting a sequence of any length. A more restrictive
signature would defeat the purpose of the function, which is to defer
cardinality checking until query execution time.
Aggregate functions take a sequence as argument and return a single value
computed from values in the sequence. Except for xs:untypedAtomic values are permitted in the
input sequence and handled by special conversion rules. The type of the items in
the sequence must also support certain operations.
The following functions take function items as an argument.
With all these functions, if the caller-supplied function fails with a dynamic error, this error is propagated as an error from the higher-order function itself.
This section defines functions and operators on the xs:boolean datatype.
Since no literals are defined in XPath to reference the constant boolean values true and false,
two functions are provided for the purpose.
The following functions are defined on boolean values:
This section specifies arithmetic operators on the numeric datatypes defined in
The operators described in this section are defined on the following atomic types.
&common-numeric-types.xml;They also apply to types derived by restriction from the above types.
The type xs:numeric is defined as a union type whose member types are
(in order) xs:double, xs:float, and xs:decimal. This type is implicitly imported
into the static context, so it can also be used in defining the signature of user-written functions. Apart from the fact that
it is implicitly imported, it behaves exactly like a user-defined type with the same definition. This means, for example:
If the expected type of a function parameter is given as xs:numeric, the actual value supplied
can be an instance of any of these three types, or any type derived from these three by restriction (this includes the built-in
type xs:integer, which is derived from xs:decimal).
If the expected type of a function parameter is given as xs:numeric, and the actual value supplied
is xs:untypedAtomic (or a node whose atomized value is xs:untypedAtomic), then it will
be cast to the union type xs:numeric using the rules in xs:double subsumes the lexical space of the other member types, and
xs:double is listed first, the effect is that if the untyped atomic item is in the lexical space of
xs:double, it will be converted to an xs:double, and if not, a dynamic error occurs.
When the return type of a function is given as xs:numeric, the actual value returned will be
an instance of one of the three member types (and perhaps also of types derived from these by restriction). The rules
for the particular function will specify how the type of the result depends on the values supplied as arguments.
In many cases, for the functions in this specification, the result is defined to be the same type as the first
argument.
This specification uses xs:float and xs:double values.
One consequence of this is that some operations result in the value NaN (not a number), which
has the unusual property that it is not equal to itself. Another consequence is that some operations return the value negative zero.
This differs from NaN as being equal to itself and defines only a single zero in the value space.
The text accompanying several functions defines behavior for both positive and negative zero inputs and outputs
in the interest of alignment with -0.0e0 (which is actually a unary minus operator
applied to an xs:double value) will always return negative zero: see -0
XML Schema 1.1 introduces support for positive and negative zero as distinct values, and also uses the NaN.
The following functions define the semantics of arithmetic operators defined in
| Operator | Meaning |
|---|---|
op:numeric-add
|
Addition |
op:numeric-subtract
|
Subtraction |
op:numeric-multiply
|
Multiplication |
op:numeric-divide
|
Division |
op:numeric-integer-divide
|
Integer division |
op:numeric-mod
|
Modulus |
op:numeric-unary-plus
|
Unary plus |
op:numeric-unary-minus
|
Unary minus (negation) |
The parameters and return types for the above operators are in most cases declared to be of type
xs:numeric, which permits the basic numeric
types: xs:integer, xs:decimal, xs:float
and xs:double, and types derived from them.
In general the two-argument functions require that both arguments are of the same primitive type,
and they return a value of this same type.
The exceptions are op:numeric-divide, which returns
an xs:decimal if called with two xs:integer operands,
and op:numeric-integer-divide which always returns an xs:integer.
If the two operands of an arithmetic expression are not of the same type, they
may be converted to a common type as described in
The result type of operations depends on their argument datatypes and is defined in the following table:
| Operator | Returns |
|---|---|
op:operation(xs:integer, xs:integer)
|
xs:integer (except for op:numeric-divide(integer,
integer), which returns xs:decimal) |
op:operation(xs:decimal, xs:decimal)
|
xs:decimal
|
op:operation(xs:float, xs:float)
|
xs:float
|
op:operation(xs:double, xs:double)
|
xs:double
|
op:operation(xs:integer)
|
xs:integer
|
op:operation(xs:decimal)
|
xs:decimal
|
op:operation(xs:float)
|
xs:float
|
op:operation(xs:double)
|
xs:double
|
The basic rules for addition, subtraction, and multiplication
of ordinary numbers are not set out in this specification; they are taken as given. In the case of xs:double
and xs:float the rules are as defined in NaN,
and exception conditions such as overflow and underflow, are described more explicitly since they are not necessarily obvious.
On overflow and underflow situations during arithmetic operations, conforming
implementations
For xs:float and xs:double operations, overflow
behavior
Raising a dynamic error
Returning INF or -INF.
Returning the largest (positive or negative) non-infinite number.
For xs:float and xs:double operations,
underflow behavior
Raising a dynamic error
Returning 0.0E0 or +/- 2**Emin or a
denormalized value; where Emin is the smallest
possible xs:float or xs:double exponent.
For xs:decimal operations, overflow behavior 0.0 must be returned.
For xs:integer operations, implementations that support
limited-precision integer operations
They
They
The functions op:numeric-add, op:numeric-subtract,
op:numeric-multiply, op:numeric-divide,
op:numeric-integer-divide and op:numeric-mod are each
defined for pairs of numeric operands, each of which has the same
type:xs:integer, xs:decimal, xs:float, or
xs:double. The functions op:numeric-unary-plus and
op:numeric-unary-minus are defined for a single operand whose type
is one of those same numeric types.
For xs:float and xs:double arguments, if either
argument is NaN, the result is NaN.
For xs:decimal values, let N be the number of digits
of precision supported by the implementation, and let M (M <= N) be the minimum limit on the number of digits
required for conformance (18 digits for XSD 1.0, 16 digits for XSD 1.1). Then for addition, subtraction, and multiplication
operations, the returned result
This specification does not determine whether xs:decimal operations are fixed point or floating point.
In an implementation using floating point it is possible for very simple operations to require more digits of precision than
are available; for example, adding 1e100 to 1e-100 requires 200 digits of precision for an
accurate representation of the result.
The divideByZero and invalidOperation. The
IEEE divideByZero exception is raised not only by a direct attempt to divide by zero, but also by
operations such as log(0). The IEEE invalidOperation exception is raised by
attempts to call a function with an argument that is outside the function’s domain (for example,
sqrt(-1) or log(-1)).
Although IEEE defines these as exceptions, it also defines “default non-stop exception handling” in
which the operation returns a defined result, typically positive or negative infinity, or NaN. With this
function library,
these IEEE exceptions do not cause a dynamic error
at the application level; rather they result in the relevant function or operator returning
the defined non-error result.
The underlying IEEE exception -INF, +INF, or NaN) with no error.
The NaN values:
a quiet NaN and a signaling NaN. These two values are not distinguishable in the XDM model:
the value spaces of xs:float and xs:double each include only a single
NaN value. This does not prevent the implementation distinguishing them internally,
and triggering different
Although comparison of numeric values across heterogeneous types has changed
to convert both values to xs:decimal, arithmetic operations continue
to use xs:double as the common type.
xs:double
and xs:decimal) now generally converts both values to xs:decimal.Numeric values can be compared using the function
This function underpins the six value comparison operators eq, ne, lt,
le, gt, and ge and the six general comparison
operators =, !=, <, <=,
>, and >=, which are all defined in terms of
the
For a description of the different ways of comparing numeric
values using the operators = and eq and functions
such as
The following functions are defined on numeric types. Each function returns a value of the same type as the type of its argument.
If the argument is the empty sequence, the empty sequence is returned.
For xs:float and xs:double arguments, if the
argument is NaN, NaN is returned.
With the exception of xs:float and xs:double that are positive or
negative infinity return positive or negative infinity.
The
It is possible to convert strings to values of type xs:integer,
xs:float, xs:decimal, or xs:double
using the constructor functions described in cast expressions as described in
In addition the xs:double. It differs from the xs:double
constructor function in that any value outside the lexical space of the xs:double
datatype is converted to the xs:double value NaN.
This section defines a function for formatting decimal and floating point numbers.
This function can be used to format any numeric quantity, including an integer. For integers, however,
the
Decimal formats are defined in the static context, and the way they are defined is therefore outside the scope of this specification. XSLT and XQuery both provide custom syntax for creating a decimal format.
The static context provides a set of decimal formats. One of the decimal formats is unnamed, the others (if any)
are identified by a QName. There is always an unnamed decimal format available, but its contents are
Each decimal format provides a set of named properties.
A phrase such as "The
For any decimal format, the properties
representing characters used in a
This differs from the format-number function previously defined in XSLT 2.0 in that
any digit can be used in the picture string to represent a mandatory digit: for example the picture
strings "000", "001", and "999" are equivalent.
The digits will all be from the same decimal digit family,
specifically, the sequence of ten consecutive digits starting with the digit assigned to the zero-digit property.
This change is to align format-number
(which previously used "000") with format-dateTime (which used 001).
A dynamic error is raised
A picture-string consists either of a sub-picture, or of
two sub-pictures separated by the
A sub-picture
A sub-picture
The mantissa part of a
sub-picture (defined below)
A sub-picture
A sub-picture
A sub-picture
The integer part of a sub-picture (defined below)
A character that matches the
A sub-picture that contains a
If a sub-picture contains a character treated as an
exponent-separator-sign then this
The mantissa part of the sub-picture is defined as the part that appears to the left of the exponent-separator-sign if there is one, or the entire sub-picture otherwise. The exponent part of the subpicture is defined as the part that appears to the right of the exponent-separator-sign; if there is no exponent-separator-sign then the exponent part is absent.
The integer part of the sub-picture is defined as the part that
appears to the left of the
The fractional part of the sub-picture is defined as that
part of the mantissa part that
appears to the right of the
This phase of the algorithm analyzes
the
Several variables are associated with each sub-picture. If there are two sub-pictures, then these rules are applied to one sub-picture to obtain the values that apply to positive and unsigned zero numbers, and to the other to obtain the values that apply to negative numbers. If there is only one sub-picture, then the values for both cases are derived from this sub-picture.
The variables are as follows:
The integer-part-grouping-positions is a sequence of integers
representing the positions of grouping separators within the integer part of the
sub-picture. For each
The grouping is defined to be
There is an least one grouping-separator in the integer part of the sub-picture.
There is a positive integer G (the grouping size) such that the position of every grouping-separator in the integer part of the sub-picture is a positive integer multiple of G.
Every position in the integer part of the sub-picture that is a positive integer multiple of G is occupied by a grouping-separator.
If the grouping is regular, then the integer-part-grouping-positions sequence contains all integer multiples of G as far as necessary to accommodate the largest possible number.
The minimum-integer-part-size is an integer indicating the minimum number of digits that will
appear to the left of the decimal-separator character. It is initially set to
the number of
There is no maximum integer part size. All significant digits in the integer part of the
number will be displayed, even if this exceeds the number of
The scaling factor is a non-negative integer used to determine the scaling of the mantissa
in exponential notation. It is set to the number of
The prefix is set to contain all passive characters
in the sub-picture to the left of the leftmost active character.
If the picture string contains only one sub-picture,
the prefix
for the negative sub-picture is set by concatenating the
The fractional-part-grouping-positions is a sequence of integers
representing the positions of grouping separators within the fractional part of the
sub-picture. For each
There is no need to extrapolate grouping positions on the fractional side,
because the number of digits in the output will never exceed the number of
The minimum-fractional-part-size is set to the number of
The maximum-fractional-part-size is set to the total number of
If the effect of the above rules is that minimum-integer-part-size and maximum-fractional-part-size are both zero, then an adjustment is applied as follows:
If an exponent separator is present then:
minimum-fractional-part-size is changed to 1 (one).
maximum-fractional-part-size is changed to 1 (one).
This has the effect that with the picture #.e9, the value 0.123 is formatted as 0.1e0
Otherwise:
minimum-integer-part-size is changed to 1 (one).
This has the effect that with the picture #, the value 0.23 is formatted
as 0
If all the following conditions are true:
An exponent separator is present
The minimum-integer-part-size is zero
There is at least one
then the minimum-integer-part-size is changed to 1 (one).
This has the effect that with the picture .9e9, the value 0.1 is formatted
as .1e0, while with the picture #.9e9, it is formatted as 0.1e0
If (after making the above adjustments) the minimum-integer-part-size and the minimum-fractional-part-size are both zero, then the minimum-fractional-part-size is set to 1 (one).
The minimum-exponent-size is set to the number of
The rules for the syntax of the picture string ensure that if an exponent separator is present, then the minimum-exponent-size will always be greater than zero.
The suffix is set to contain all passive characters to the right of the rightmost active character in the sub-picture.
If there is only one sub-picture, then all variables
for positive numbers and negative numbers will be the same, except for
prefix: the prefix for negative numbers will
be preceded by the
This section describes the second phase of processing of the
The algorithm for this second stage of processing is as follows:
If the input number is NaN (not a number), the result is the
value of the
In the rules below, the positive sub-picture and its associated variables are used
if the input number is positive, and the negative sub-picture and its associated
variables are used if it is negative. For xs:double and xs:float,
negative zero is taken as negative, positive zero as positive. For xs:decimal
and xs:integer, the positive sub-picture is used for zero.
The adjusted number is determined as follows:
If the sub-picture contains a
If the sub-picture contains a
Otherwise, the adjusted number is the input number.
If the multiplication causes numeric overflow, no error occurs, and the adjusted number is positive or negative infinity as appropriate.
If the adjusted number is positive or negative infinity, the result is the
concatenation of the appropriate prefix, the value of the
If the minimum exponent size is non-zero,
The primitive type of the mantissa is the same as the primitive type of the adjusted number (integer, decimal, float, or double).
The mantissa multiplied by ten to the power of the exponent is equal to the adjusted number.
The mantissa
If the minimum exponent size is zero, then the mantissa is the adjusted number and there is no exponent.
If the minimum exponent size is non-zero and the adjusted number is zero, then the mantissa is the adjusted number and the exponent is zero.
The mantissa is converted (if necessary) to
an xs:decimal value,
using an implementation of xs:decimal that imposes no limits on the
totalDigits or fractionDigits facets. If there are several
such values that
are numerically equal to the mantissa (bearing in mind that if the
mantissa is an xs:double or xs:float, the comparison will be done by
converting the decimal value back to an xs:double or xs:float), the one that
is chosen maximum-fractional-part-size digits in
its fractional part. The rounded number is defined to be the result of
converting the mantissa to an xs:decimal value, as described above,
and then calling the function maximum-fractional-part-size as the second
argument, again with no limits on the totalDigits or fractionDigits in the
result.
The absolute value of the rounded number is converted to a string in decimal notation,
using the digits in the
If the number of digits to the left of the
If the number of digits to the right of the
For each integer N in the integer-part-grouping-positions list,
a
For each integer N in the fractional-part-grouping-positions list,
a
If there is no
If an exponent exists, then the string
produced from the mantissa as described above is extended with
the following, in order:
(a) the
The result of the function is the concatenation of the appropriate prefix, the string conversion of the number as obtained above, and the appropriate suffix.
The functions in this section perform trigonometric and other mathematical calculations on xs:double values. They
are provided primarily for use in applications performing geometrical computation, for example when generating
SVG graphics.
Functions are provided to support the six most commonly used trigonometric calculations: sine, cosine, and tangent, and their inverses arc sine, arc cosine, and arc tangent. Other functions such as secant, cosecant, and cotangent are not provided because they are easily computed in terms of these six.
The functions in this section (with the exception of math:pi)
are specified by reference to xs:double values. The IEEE specification
applies with the following caveats:
IEEE states that the preferred quantum is language-defined. In this
specification, it is
IEEE states that certain functions should raise the inexact exception if the result is inexact. In this specification, this exception if it occurs does not result in an error. Any diagnostic information is outside the scope of this specification.
IEEE defines various rounding algorithms for inexact results, and states
that the choice of rounding direction, and the mechanisms for influencing this choice,
are language-defined. In this specification, the rounding direction and any mechanisms for
influencing it are
Certain operations (such as taking the square root of a negative number)
are defined in IEEE to signal the invalid operation exception and return a
quiet NaN. In this specification, such operations return NaN
and do not raise an error. The same policy applies to operations (such as taking
the logarithm of zero) that raise a divide-by-zero exception. Any diagnostic
information is outside the scope of this specification.
Operations whose mathematical result is greater than the largest finite xs:double
value are defined in IEEE to signal the overflow exception; operations whose mathematical
result is closer to zero than the smallest non-zero xs:double value are similarly
defined in IEEE to signal the underflow exception. The treatment of these exceptions in
this specification is defined in
The function makes use of the record structure defined in the next section.
This section specifies functions and operators on the xs:string datatype and the datatypes derived from it.
The operators described in this section are defined on the following types.
&common-string-types.xml;They also apply to user-defined types derived by restriction from the above types.
The
Collations can indicate that two different codepoints are to be considered equal for comparison purposes (for example, “v” and “w” are considered equivalent in some Swedish collations). Strings can be compared codepoint-by-codepoint or in a linguistically appropriate manner.
Some sources, for example
This specification defines some collation URIs that provide interoperable sorting behavior across applications. Other collation URIs are defined only partially (leaving some aspects implementation-defined). Implementations may define further collation URIs, or may allow users or third parties to define them.
The
Collations may or may not perform Unicode normalization on strings before comparing them.
This specification allows a collation
name to be provided as an argument to many string functions. Although
collations are defined to be URIs, they are supplied as instances of
xs:string.
The XQuery/XPath static context supplies a default collation
for use when the collation argument is not specified.
(see
If the collation is specified using a relative URI reference,
it is resolved relative to an
Previous versions of this specification stated that it must
be resolved against the
This specification does not define whether or not the collation URI is
dereferenced. The collation URI may be an abstract identifier, or it may
refer to an actual resource describing the collation. If it refers to a
resource, this specification does not define the nature of that resource.
One possible candidate is that the resource is a locale description
expressed using the Locale Data Markup Language: see
The ability to access external resources depends on whether the
calling code is
XML allows elements to specify the xml:lang attribute to
indicate the language associated with the content of such an element.
This specification does not use xml:lang to identify the
default collation because using
xml:lang does not produce desired effects when the two
strings to be compared have different xml:lang values or
when a string is multilingual.
All collations support the ability to compare two strings to decide whether they are equal, and if not, which one should sort first. This must always define a total ordering, which implies that the comparison is transitive.
A collation may (or may not) support the ability to derive a
Furthermore, a collation may (or may not) support the ability to determine whether
one string is a substring of another under that collation. The use of collations
in substring matching is described in
The capabilities of a collation may be determined using the
http://www.w3.org/2005/xpath-functions/collation/codepoint identifies
a collation which must be recognized by every implementation: it is referred to as
the
The Unicode codepoint collation does not perform any normalization on the supplied strings.
The collation is defined as follows. Each of the two strings is
converted to a sequence of integers using the $A and $B are then compared as follows:
If both sequences are empty, the strings are equal.
If one sequence is empty and the other is not, then the string corresponding to the empty sequence is less than the other string.
If the first integer in $A is less than the first integer in $B, then
the string corresponding to $A is less than the string corresponding to
$B.
If the first integer in $A is greater than the first integer in $B, then
the string corresponding to $A is greater than the string corresponding to
$B.
Otherwise (the first pair of integers are equal), the result is obtained
by applying the same rules recursively to fn:tail($A) and
fn:tail($B)
While the Unicode codepoint collation does not produce results suitable for quality publishing of printed indexes or directories, it is adequate for many purposes where a restricted alphabet is used, such as sorting of vehicle registrations.
The Unicode codepoint collation differs from the default sort order used in programming languages that sort strings based on UTF-16 code units, which may include surrogate pairs.
This specification defines a family of collation URIs representing tailorings of the Unicode Collation
Algorithm (UCA) as defined in
This family of URIs use the scheme and path http://www.w3.org/2013/collation/UCA
followed by an optional query part. The query part, if present, consists of a question mark followed
by a sequence of zero or more semicolon-separated parameters. Each parameter is a keyword-value pair, the
keyword and value being separated by an equals sign.
All implementations must recognize URIs in this family in the collation argument of functions that
take a collation argument.
If the fallback parameter is
present with the value no, then the implementation fallback parameter
is omitted or takes the value yes, and if the collation URI is well-formed according to the rules in this section,
then the implementation http://www.w3.org/2013/collation/UCA?lang=se;fallback=yes and the implementation does not include a fully
conformant version of the UCA tailored for Swedish, then it
If two query parameters use the same keyword then the last one wins. If a query parameter uses a keyword or value which is not
defined in this specification then the meaning is fallback parameter is present with the value no it should reject
the collation as unsupported, otherwise it should ignore the unrecognized parameter.
The following query parameters are defined. If any parameter is absent, the default is
| Keyword | Values | Meaning |
|---|---|---|
| fallback | yes | no (default yes) | Determines whether the processor uses a fallback collation if a conformant collation is not available. |
| lang | language code: a string in the lexical space of xs:language. | The language whose collation conventions are to be used. |
| version | string | The version number of the UCA to be used. |
| strength | primary | secondary | tertiary | quaternary | identical, or 1|2|3|4|5 as synonyms (default tertiary / 3) | The collation strength as defined in UCA. Primary
strength takes only the base form of the character into account (so A=a=Äaut;=äaut;); secondary strength ignores case but considers accents
and diacritics as significant (so A=a and Äaut;=äaut; but äaut;≠a); tertiary considers case as significant (A≠a≠Äaut;≠äaut;); quaternary strength always considers as significant spaces and punctuation
(data-base≠database; if maxVariable is punct or higher and
alternate is not non-ignorable, lower strengths will treat data-base=database). |
| maxVariable | space | punct | symbol | currency (default punct) |
Given the sequence space, punct, symbol, currency,
all characters in the specified group and earlier groups are treated as “noise” characters
to be handled as defined by the alternate parameter. For example, maxVariable=punct indicates
that characters classified as whitespace or punctuation get this treatment. |
| alternate | non-ignorable | shifted | blanked (default non-ignorable) | Controls the handling of characters such as spaces and hyphens;
specifically, the "noise" characters in the groups selected by the maxVariable parameter. The value non-ignorable
indicates that such characters are treated as distinct at the primary level (so data base sorts before database);
shifted indicates that they are used to differentiate two strings only at the quaternary level,
and blanked indicates that they are taken into account only at the identical level. |
| backwards | yes | no (default no) | The value backwards=yes indicates that the last accent in the
string is the most significant. |
| normalization | yes | no (default no) | Indicates whether strings are converted to normalization form D. |
| caseLevel | yes | no (default no) | When used with primary strength, setting caseLevel=yes has the effect of ignoring accents
while taking account of case. |
| caseFirst | upper | lower (default lower) | Indicates whether upper-case precedes lower-case or vice versa. |
| numeric | yes | no (default no) | When numeric=yes is specified, a sequence of consecutive digits is interpreted as a number,
for example chap2 sorts before chap12. |
| reorder | a comma-separated sequence of reorder codes, where a reorder code is one of space, punct,
symbol, currency, digit, or a four-letter script code defined in |
Determines the relative ordering of text in different scripts; for example the value digit,Grek,Latn indicates
that digits precede Greek letters, which precede Latin letters. |
This list excludes parameters that are inconvenient to express in a URI, or that are applicable only to substring matching.
UCA collation URIs can be conveniently generated using the
The collation URI http://www.w3.org/2005/xpath-functions/collation/unicode-case-insensitive must be recognized
by every implementation.
The collation is defined as follows:
Let $UCI be the collation URI
"http://www.w3.org/2005/xpath-functions/collation/unicode-case-insensitive".
Let $UCC be the Unicode Codepoint Collation URI
http://www.w3.org/2005/xpath-functions/collation/codepoint.
For any two strings $A and $B, the result
of the comparison fn:compare($A, $B, $UCI) is defined to be the same as
the result of fn:compare(lower-case($A), lower-case($B), $UCC).
The collation supports collation units and can therefore
be used with functions such as
The collation URI http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive must be recognized
by every implementation. It is class attribute values.
The collation is defined as follows:
Let $HACI be the collation URI
"http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive".
Let $UCC be the Unicode Codepoint Collation URI
http://www.w3.org/2005/xpath-functions/collation/codepoint.
Let $lc be the function
fn:translate(?, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz").
Then for any two strings $A and $B, the result
of the comparison fn:compare($A, $B, $HACI) is defined to be the same as
the result of fn:compare($lc($A), $lc($B), $UCC).
HTML5 defines the semantics of equality matching using this collation;
The corresponding HTML5 definition is: A string A is an ASCII case-insensitive match for a string B, if the ASCII lowercase of A is the ASCII lowercase of B.
Many functions have a signature that includes a $collation
argument, which is generally optional and takes default-collation()
as its default value.
The collation to use for these functions is determined by the following rules:
If the function specifies an explicit collation, CollationA (e.g., if
the optional collation argument is specified in a call of the
If CollationA is supported by the implementation, then CollationA is used.
Otherwise, a dynamic error is raised
If no collation is explicitly specified for the function
$collation argument is omitted or is set to an
empty sequence)
If CollationB is supported by the implementation, then CollationB is used.
Otherwise, a dynamic error is raised
Because the set of collations that are supported is
If the value of the collation argument is a relative URI reference, it is resolved against the base-URI from the
static context. If it is a relative URI reference and cannot be resolved, perhaps because the base-URI property in the static context
is absent, a dynamic error is raised
There is no explicit requirement that the string used as a collation URI be a valid URI.
Implementations will in many cases reject such strings on the grounds that do not identify a supported collation; they
may also cause an error if they cannot be resolved against the
The following functions are defined on values of type xs:string and
types derived from it.
When the above operators and functions are applied to datatypes derived from
xs:string, they are guaranteed to return values that are instances of
xs:string, but the value might or might not be an instance of the
particular subtype of xs:string to which they were applied.
The strings returned by
The functions described in this section examine a string $arg1 to see
whether it contains another string $arg2 as a substring. The result
depends on whether $arg2 is a substring of $arg1, and
if so, on the range of $arg1 which $arg2 matches.
When the $arg1 contains a
contiguous sequence of characters whose $arg2.
When a collation is specified, the rules are more complex.
All collations support the capability of deciding whether two
The string
Q is then considered to contain P as a
substring if the sequence of collation units corresponding to P
is a subsequence of the sequence of collation units corresponding to
Q. The characters in P that match are the
characters corresponding to these collation units.
This rule may occasionally lead to surprises. For example, consider a collation
that treats "Jaeger" and "Jäaut;ger"
as equal. It might do this by treating "äaut;" as representing
two collation units, in which case the
expression fn:contains("Jäaut;ger", "eg") will return
true. Alternatively, a collation might treat "ae" as a single
collation unit, in which case the expression fn:contains("Jaeger",
"eg") will return false. The results of these functions thus
depend strongly on the properties of the collation that is used.
In addition,
collations may specify that some collation units should be ignored during matching. If hyphen is an ignored
collation unit, then fn:contains("code-point", "codepoint") will be true,
and fn:contains("codepoint", "-") will also be true.
In the rules for the functions defined in this section, we use the following terms
taken from
In the definitions in
C is the collation; that is, the value of the $collation
argument if specified, otherwise the default collation.
P is the (candidate) substring, the value of the $substring
argument to the function.
Q is the (candidate) containing string, the value of the $value
argument to the function.
The boundary condition B is satisfied at the start and end of a
string, and between any two characters that belong to different collation units
(“collation elements” in the language of
It is possible to define collations that do not have the ability to decompose a
string into units suitable for substring matching. An argument to a function
defined in this section may be a URI that identifies a collation that is able to
compare two strings, but that does not have the capability to split the string
into collation units. Such a collation may cause the function to fail, or to
give unexpected results, or it may be rejected as an unsuitable argument. The
ability to decompose strings into collation units is an
The functions described in this section make use of a regular expression syntax for pattern matching. The syntax and semantics of regular expressions are defined in this section.
#)
if the c flag is set.
^ and $) can no longer be followed
by a quantifier.
The regular expression syntax used by these functions is defined in terms of
the regular expression syntax specified in XSD 1.1 (see
Implementers should consult
The regular expression syntax and semantics are identical to those
defined in
In
XSD 1.1 is therefore used as the specification baseline, even for processors that only support XSD 1.0.
As well as extending the XSD 1.1 syntax for regular expressions, this specification also extends the processing model.
In XSD, a regular expression is defined to denote a set of strings, and the only functionality offered is to test whether a string matches a regular expression: that is, whether it is a member of the set of strings denoted by the regular expression.
In this specification, matching a string S against a regular expression delivers a more complex outcome.
First some terminology:
The operation of matching a string S against a regular expression delivers:
A set of matching
For each matching
The semantics of particular constructs in a regular expression are affected by
a set of flags. The available flags and their effect are defined in
The different functions available, such as fn:tokenize, are defined in terms of this outcome. For example:
The function
The function
The function
In principle the set of segments that match a regular expression can be determined by enumerating all the segments of the input string and examining each one independently to establish whether it matches. In practice, however:
If several matching segments have the same starting position, then only one of them is returned. This is chosen as follows:
In the case of a choice (operator "|") the first matching
branch is chosen.
In the case of a repetition with a greedy quantifier (for example "+"
or "*") the longest matching segment is chosen.
In the case of a repetition with a reluctant quantifier (for example "+?"
or "*?") the shortest matching segment is chosen.
A matching segment is not included in the result if it overlaps an earlier matching
segment: specifically, a segment with start position S1 is excluded if there
is a segment that has start position S0 and length L0, where
S0 < S1 < S0+L0.
Two segments can be adjacent: that is, the start position of one a*(?=x) has two non-overlapping matches against
the string aaax, one at position 1 and the other at position 4.
The semantics of a regular expression are thus defined by stating which segments of an input string it matches, and what the captured groups corresponding to this match are. This is defined recursively for each construct that may appear within a regular expression, in terms of the outcome of applying its subexpressions.
For constructs defined in XSD 1.1 (branch, piece,
NormalChar, charClass), XSD defines a set of strings
denoted by the construct. The corresponding semantics for this specification
are that the segments matched by such a construct are the segments whose string value
is contained in this set.
For constructs added to the XSD 1.1 baseline by this specification, the semantics are defined in the sections that follow.
Comments are enabled in regular expressions if the c flag is present.
A comment starts with a # character that is not escaped with an immediately
preceding backslash, and that is not contained in a CharClassExpr (that is,
in square brackets). It ends with the following # character, or with the
end of the string containing the regular expression.
Whether or not the c flag is present, the production for
SingleCharEsc allows the # character
to be escaped.
The grammar for regular expressions is summarized here. Rules that
differ from their definition in XSD 1.1 are marked with the character §
against their names.
In these rules the notation【abc】matches
any of the characters 'a', 'b', or 'c',
while 【0➜9】 matches any character whose Unicode codepoint is within a given range,
and ¬【abc】 matches any character other than 'a', 'b', or 'c'.
These symbols are used in place of the more conventional notation to allow special characters
such as square brackets and hyphens to appear directly without escaping. Within
the lenticular brackets, all characters other than ➜ (including hyphen and backslash)
represent themselves.
This grammar applies to the regular expression after removal of whitespace and comments
if enabled by the x and c flags respectively: see
XSD 1.1 defines additional rules to disambiguate this grammar.
?
following a quantifier. Specifically:
X?? matches X, once or not at all
X*? matches X, zero or more times
X+? matches X, one or more times
X{n}? matches X, exactly n times
X{n,}? matches X, at least n times
X{n,m}? matches X, at least n times, but
not more than m times
Quantifiers that are not
When a quantifier appears at the outermost level of a regular expression, the distinction between greedy and reluctant quantifiers affects the set of matching segments delivered by the matching operation. With a greedy quantifier, the longest matching segment at a given start position is returned; with a reluctant quantifier, the shortest matching segment at a given start position is returned.
When a quantifier appears within a subexpression, the quantified subexpression
matches the
Reluctant quantifiers have no effect on the results of the
boolean
The regular expression syntax defined by
? or * (see below),
is not within a character group (square brackets),
and is not escaped with a backslash. The sub-expression enclosed by a capturing left
parenthesis and its matching right parenthesis is referred to as a
More specifically, the
For example, in the regular expression A(BC(?:D(EF(GH[()])))), the
subexpression BC(?:D(EF(GH[()]))) is capturing subexpression 1, the string
subexpression EF(GH[()]) is capturing subexpression 2, and the subexpression
GH[()] is capturing subexpression 3.
When, in the course of evaluating a regular expression, a particular
When a (a*)+ and the input string "aaaa", an implementation
might legitimately capture either "aaaa" or a zero length string as the content
of the captured subgroup.
Parentheses that are required to group terms within the regular expression, but which are
not required for capturing of substrings, can be represented using
the syntax (?:xxxx).
In the absence of back-references (see below),
the presence of the optional ?: has no effect on the set of strings
that match the regular expression, but causes the left parenthesis not to be counted
by operations (such as
Back-references are allowed
outside a character class expression.
A back-reference is an additional kind of atom.
The construct \N where
N is a single digit is always recognized as a
back-reference; if this is followed by further digits, these
digits are taken to be part of the back-reference if and only if
the resulting number NN is such that
the back-reference is preceded by the opening parenthesis of the NNth
capturing left parenthesis.
The regular expression is invalid if a back-reference refers to a
capturing sub-expression that does not exist or whose
closing right parenthesis occurs after the back-reference.
A back-reference with number N matches a string that is the same as
the value of the Nth captured substring.
For example, the regular expression
('|").*\1 matches a sequence of characters
delimited either by an apostrophe at the start and end, or by a
quotation mark at the start and end.
If no string has been matched by the Nth capturing
sub-expression, the back-reference is interpreted as matching
a zero-length string.
Within a character class expression,
\ followed by a digit is invalid.
Some other regular expression languages interpret this as an octal character reference.
A regular expression that uses a Unicode block name that is not defined in the version(s) of Unicode
supported by the processor (for example \p{IsBadBlockName}) is deemed to be invalid
XSD 1.0 does not say how this situation should be handled; XSD 1.1 says that it should be handled by treating all characters as matching.
Assertions (sometimes called
Assertions fall into the following categories:
The startOfString assertion ^ tests whether the current
position is at the start of the string.
The endOfString assertion $ tests whether the current position
is at the end of the string.
The boundary assertions \b and \B test
whether the current position is at the start or end of a word.
The positive and negative lookahead assertions test whether there is (or is not) a substring starting at the current position that matches a given regular expression.
The positive and negative lookbehind assertions test whether there is (or is not) a substring ending at the current position that matches a given regular expression.
An assertion
Previous versions of this specification allowed a quantifier to follow the
startOfString and endOfString assertions, though this
served no practical purpose. Processors
Two meta-characters, ^ and $ are
added. By default, the meta-character ^ matches if the current position is the
start of the entire string, while $ matches if the current position is the end
of the entire string. In multi-line mode, ^ matches
the start of any line (that is, the start of the entire string,
and the position immediately after a newline character), while
$ matches the end of any line (that is, the end of
the entire string, and the position immediately before a newline
character). Newline here means the character
Single character escapes are extended to allow the
$ character to be escaped.
The assertion \b matches at any position where one of the following conditions is true:
The current position is the start of the string, the string is not empty, and the first
character in the string matches \w.
The current position is the end of the string, the string is not empty, and the
last character in the string matches \w.
The character before the current position matches \w and the character
after the current position matches \W.
The character before the current position matches \W and the character
after the current position matches \w.
Informally, \b matches if the current position is the start or end
of a word, where a word is defined as a sequence of consecutive characters other than
codepoints in Unicode groups P (punctuation), Z (separator),
or C (other).
The assertion \B matches at any position where \b does not match.
There are two equivalent ways of writing a positive lookahead assertion:
(?=xyz)
(*positive_lookahead:xyz)
In both cases, the assertion matches at a particular position in the input string only if there is a substring starting at that position that matches the regular expression xyz.
As with all assertions, evaluation of the assertion does not cause the current position to advance.
For example, Chapter(?=\s+[1-9]) will match "Chapter" only if
followed by a number, with intervening whitespace.
A parenthesized expression within a lookahead assertion can capture a substring in the normal way. There are some minor complications, however:
Substrings captured while evaluating a lookahead assertion are represented
differently in the result of the
If an assertion is satisfied, then any substrings that are captured are based on the first evaluation of the assertion that matches; alternative evaluations of the assertion that also match, but which capture different substrings, are not considered.
A positive lookahead assertion that matches a zero-length string is permitted but pointless, since it will always match, and thus cause the assertion to succeed.
There are two equivalent ways of writing a negative lookahead assertion:
(?!xyz)
(*negative_lookahead:xyz)
In both cases, the assertion matches at a particular position in the input string only if there is no substring starting at that position that matches the regular expression xyz.
As with all assertions, evaluation of the assertion does not cause the current position to advance.
For example, Chapter(?!\s*[1-9]) will match "Chapter" only if
it is not followed by a number, with optional intervening whitespace.
Any capturing parentheses within a negative lookahead assertion are counted for the purpose of numbering captured groups, but they cannot capture any result because the pattern in the assertion must fail to match.
A negative lookahead assertion that matches a zero-length string is permitted but pointless, since it will always match, and thus cause the assertion to fail.
There are two equivalent ways of writing a positive lookbehind assertion:
(?<=xyz)
(*positive_lookbehind:xyz)
The second form may be more convenient when the expression appears within an XML-based host language such as XSLT, where the angle bracket would need to be escaped.
In both cases, the assertion matches at a particular position in the input string only if there is a substring ending at that position that matches the regular expression xyz.
For efficiency and ease of implementation, the regular expression contained
within a lookbehind assertion is constrained. It must consist of one or more
alternatives separated by "|", and each alternative must be fixed-length,
consisting only of the following constructs, each of which matches a single character:
NormalChar (for example "A", "3")
SingleCharEsc (for example "\(", "\[")
charClassEsc (for example "\s", "\p{Lu}")
charClassExpr (for example "[a-z]")
WildcardEsc (".")
As with all assertions, evaluation of the assertion does not cause the current position to advance.
Parenthesized expressions cannot appear within lookbehind assertions.
For example, (?<=\[)[0-9+](?=\]) matches a sequence of digits immediately
preceded by an opening square bracket and followed by a closing square bracket, without
matching the brackets.
There are two equivalent ways of writing a negative lookbehind assertion:
(?<!xyz)
(*negative_lookbehind:xyz)
The second form may be more convenient when the expression appears within an XML-based host language such as XSLT, where the angle bracket would need to be escaped.
In both cases, the assertion matches at a particular position in the input string only if there is no substring ending at that position that matches the regular expression xyz.
The regular expression within a negative lookbehind assertion is subject to the
same restrictions as for a positive lookbehind assertion: see
For example, (?<!\$)[0-9]+ matches any sequence of digits that
is not immediately preceded by a dollar sign.
#)
if the c flag is set.
All these functions provide an optional parameter, $flags,
to set options for the interpretation of the regular expression. The
parameter accepts a xs:string, in which individual letters
are used to set options. The presence of a letter within the string
indicates that the option is on; its absence indicates that the option
is off. Letters may appear in any order and may be repeated. They are case-sensitive. If there
are characters present that are not defined here as flags, then a dynamic error
is raised
The following options are defined:
s: If present, the match operates in “dot-all”
mode. (Perl calls this the single-line mode.) If the
s flag is not specified, the meta-character
. matches any character except a newline
(#x0A) or carriage return (#x0D)
character. In dot-all mode, the
meta-character . matches any character whatsoever.
Suppose the input contains the strings "hello" and
"world" on two lines.
This will not be matched by the regular expression
"hello.*world" unless dot-all mode is enabled.
m: If present, the match operates in multi-line
mode. By default, the meta-character ^ matches the
start of the entire string, while $ matches the end of the
entire string. In multi-line mode, ^ matches the
start of any line (that is, the start of the entire string, and
the position immediately after a newline character
other than a newline
that appears as the last character in the string), while
$ matches the end of any line
(that is, the position immediately
before a newline character, and the end of the entire string if there is no
newline character at the end of the string).
Newline here means the character #x0A only.
i: If present, the match operates in
case-insensitive mode. The detailed rules are as follows.
In these
rules, a character C2 is considered to be a true when the two characters
are considered as strings of length one, and the
fn:lower-case(C1) eq fn:lower-case(C2) or
fn:upper-case(C1) eq fn:upper-case(C2)
Note that the case-variants of a character under this definition are always single characters.
When a normal character (Char) is used as an atom,
it represents
the set containing that character and all its case-variants.
For example, the regular expression "z" will
match both "z" and "Z".
A character range (production charRange
in the XSD 1.0 grammar, replaced by productions charRange and singleChar
in XSD 1.1) represents the set
containing all the characters that it would match in the absence
of the i flag, together with their case-variants.
For example,
the regular expression "[A-Z]" will match all
the letters A to Z and all the letters
a to z. It will also match
certain other characters such as #x212A (KELVIN SIGN), since
fn:lower-case("#x212A") is k.
This rule applies also to a character range used in a character
class subtraction (charClassSub): thus [A-Z-[IO]] will match
characters such as A, B, a, and b, but will not match
I, O, i, or o.
The rule also applies to a character range used as part of a
negative character group: thus "[^Q]" will match every character
except Q and q (these being the only case-variants of Q in
Unicode).
A back-reference is compared using case-blind comparison:
that is, each character must either be the same as the
corresponding character of the previously matched string, or must
be a case-variant of that character. For example, the strings
"Mum", "mom", "Dad",
and "DUD" all match the regular
expression "([md])[aeiou]\1" when the i flag is used.
All other constructs are unaffected by the i flag.
For example,
"\p{Lu}" continues to match upper-case letters only.
x: If present, whitespace characters
(that is, charClassExpr) are not removed. This flag can be used,
for example, to break up long regular expressions into readable lines.
Examples:
fn:matches("helloworld", "hello world", "x") returns true()
fn:matches("helloworld", "hello[ ]world", "x") returns false()
fn:matches("hello world", "hello\ sworld", "x") returns true()
fn:matches("hello world", "hello world", "x") returns false()
Whitespace is treated as a lexical construct to be removed before the regular expression is parsed; it is therefore not explicit in the regular expression grammar.
q: if present, all characters in the regular expression
are treated as representing themselves, not as metacharacters. In effect, every
character that would normally have a special meaning in a regular expression is implicitly escaped
by preceding it with a backslash.
Furthermore, when this flag is present, the characters $ and
\ have no special significance when used in the replacement string
supplied to the
This flag can be used in conjunction with the i flag. If it is used
together with the m, s, x,
c
Examples:
tokenize("12.3.5.6", ".", "q") returns ("12", "3", "5", "6")
replace("a\b\c", "\", "\\", "q") returns "a\\b\\c"
replace("a/b/c", "/", "$", "q") returns "a$b$c"
matches("abcd", ".*", "q") returns false()
matches("Mr. B. Obama", "B. OBAMA", "iq") returns true()
c: if present, comments are enabled
in the regular expression. This flag has no effect if the q flag is
present. A comment is recognized by the presence of a # character that
is not escaped by a backslash or contained in a character class expression
(charClassExpr), and it is terminated by the following #
character or by the end of the regular expression string.
For example:
replace("03/24/2025", "(..#month#)/(..#day#)/(....#year#)", "$3-$1-$2", "c")
Comments are treated as a lexical construct to be removed before the regular expression is parsed; they are therefore not explicit in the regular expression grammar.
This section specifies functions that manipulate URI values, either as instances
of xs:anyURI or as strings.
This section specifies functions that parse strings as URIs, to identify their structure, and construct URI strings from their structured representation.
Some URI schemes are hierarchical and some are non-hierarchical.
Implementations must treat the following schemes as non-hierarchical:
jar, mailto, news, tag,
tel, and urn. Whether additional schemes
are known to be non-hierarchical
Both functions use a structured representation of a URI as defined in the next section.
The segmented forms of the path and query parameters provide convenient access to commonly used information.
The path, if there is one, is tokenized on “/” characters and
each segment is unescaped (as per the http://example.com/path/to/a%2fb.
The path portion has to be returned as /path/to/a%2fb because
decoding the %2f would change the nature of the path.
The unescaped form is easily accessible from path-segments:
Note that the presence or absence of a leading slash on the path will affect whether or not the sequence begins with a zero-length string.
The query parameters are decoded into a map. Consider the URI:
http://example.com/path?a=1&b=2%264&a=3.
The decoded form in the query-parameters is the following map:
Note that both keys and values are unescaped. If a key
is repeated in the query string, the map will contain a
sequence of values for that key, as seen for a
in this example.
Operators are defined on the following type:
xs:duration
and on the two defined subtypes (see
xs:yearMonthDuration
xs:dayTimeDuration
Arithmetic on durations is defined only on these subtypes: this is because the results of some operations (for example one month minus one day) have no representation in the value space.
Two xs:duration values may however be compared.
A value of type xs:duration is considered to comprise two parts:
The total number of months, represented as a signed integer.
The total number of seconds, represented as a signed decimal number.
If one of these values is negative (less than zero), the other must not be positive (greater than zero).
In effect this means that operations on durations (including equality comparison,
casting to string, and extraction of components)
all treat the duration as normalized. The duration PT1M30S (one minute and
thirty seconds), for example,
is precisely equivalent to the duration PT90S (ninety seconds); these are
different representations of the same value, and the result of any operation will be
the same regardless which representation is used. For example, the function
The information content of an xs:duration
value can be reduced to an xs:integer number of months, and an xs:decimal
number of seconds. For the two defined subtypes this is further simplified so that one of these two
components is fixed at zero. Operations such as comparison of durations and arithmetic on durations
can be expressed in terms of numeric operations applied to these two components.
Two subtypes of xs:duration, namely xs:yearMonthDuration
and xs:dayTimeDuration, are defined in
The significance of these subtypes is that arithmetic and ordering become well defined; this is not the
case for xs:duration values in general, because of the variable number of days in a month. For this reason, many of the functions
and operators on durations require the arguments/operands to belong to these two subtypes.
In an xs:yearMonthDuration, the seconds component is always zero.
In an xs:dayTimeDuration, the months component is always zero.
All conforming processors
The total number of months can be represented as a signed xs:int value;
The total number of seconds can be represented as a signed xs:decimal
value with facets totalDigits=18 and fractionalDigits=3. That is,
durations must be supported to millisecond precision.
Processors
A processor that limits the range or precision of duration values
may encounter overflow and underflow conditions when it
tries to evaluate operations on durations. In
these situations, the processor
Similarly, a processor may be unable accurately to represent the result of dividing a duration
by 2, or multiplying a duration by 0.5. A processor that limits the precision of the seconds component
of duration values
Duration values may be compared using the
In this version of the specification, all xs:duration values
are mutually comparable: comparison operations are no longer restricted
to the two subtypes xs:yearMonthDuration and xs:dayTimeDuration.
However, although a total ordering is defined over all durations, the result
is not always meaningful: while it makes sense that P1Y1D (one year and a day)
is greater than P1Y (one year), it makes little sense that P32D
(thirty-two days) is less than P1M (one month).
Durations are treated as tuples with two components, the months and seconds components, and they are compared by treating the months as the primary key and the seconds as the primary key.
The duration datatype may be considered to be a composite datatype
in that it contains distinct properties or components. The extraction functions specified
below extract a single component from a duration value.
For xs:duration and its subtypes, including the two subtypes xs:yearMonthDuration and
xs:dayTimeDuration, the components are normalized: this means that the seconds and minutes
components will always be less than 60, the hours component less than 24, and the months component less than 12.
This section decribes the xs:dayTimeDuration value representing a decimal number of seconds.
For operators that combine a duration and a date/time value, see
This section defines operations on the
See
xs:dateTime, xs:date, xs:time, xs:gYearMonth,
xs:gYear, xs:gMonthDay, xs:gMonth, xs:gDay
are referred to collectively as the
This section describes operations on atomic items of these types.
Values of these types are modeled as comprising one or more of the seven components year, month, day, hour, minute, second, and timezone.
The only operations defined on
xs:gYearMonth, xs:gYear,
xs:gMonthDay, xs:gMonth, and xs:gDay values are
equality comparison and component extraction.
For other types, further operations are provided, including
order comparisons, arithmetic, formatted display, and timezone
adjustment.
All conforming processors
s.sss). However, processors
A processor that limits the number of digits in date and time datatype
representations may encounter overflow and underflow conditions when it
tries to execute the functions in
Similarly, a processor that limits the precision of the seconds component
of date and time or duration values may need to deliver a rounded result for arithmetic operations.
Such a processor
As defined in xs:dateTime,
xs:date, xs:time, xs:gYearMonth, xs:gYear,
xs:gMonthDay, xs:gMonth, xs:gDay values,
referred to collectively as date/time values, are represented as seven components or properties:
year, month, day, hour, minute,
second and timezone. The first five components are
xs:integer values. The value of the second component is an xs:decimal
and the value of the timezone component is an xs:dayTimeDuration.
For all the primitive date/time datatypes, the timezone property is optional and may or may not
be present. Depending on the datatype, some of the remaining six properties must be present and
some must be xs:dateTime values, this local value
For xs:time, 00:00:00 and 24:00:00 are alternate lexical forms
for the same value, whose canonical representation is 00:00:00. For xs:dateTime,
a time component 24:00:00 translates to 00:00:00 of the following day.
An xs:dateTime with lexical
representation 1999-05-31T05:00:00
is represented in the datamodel by { 1999, 5, 31, 5, 0, 0.0, () }.
An xs:dateTime with lexical
representation 1999-05-31T13:20:00-05:00
is represented by { 1999, 5, 31, 13, 20, 0.0, xs:dayTimeDuration("-PT5H") }.
An xs:dateTime with lexical
representation 1999-12-31T24:00:00
is represented by { 2000, 1, 1, 0, 0, 0.0, () }.
An xs:date with lexical
representation 2005-02-28+8:00
is represented by { 2005, 2, 28, (), (), (), xs:dayTimeDuration("PT8H") }.
An xs:time with lexical
representation 24:00:00
is represented by { (), (), (), 0, 0, 0, () }.
Date and time values can be compared using the function
An xs:dateTime can be considered to consist of seven components:
year, month, day, hour, minute,
second and timezone. For xs:dateTime six components (year,
month, day, hour, minute and second) are required
and timezone is optional. For other date/time values, of the first six components, some are required
and others must be Timezone is always optional. For example, for xs:date,
the year, month and day components are required and hour,
minute and second components must be absent; for xs:time the hour,
minute and second components are required and year, month and
day are missing; for xs:gDay, day is required and year,
month, hour, minute and second are missing.
In explicitTimezone facet is available with values
optional, required, or prohibited to
enable the timezone to be defined as mandatory or disallowed.
Values of the date/time datatypes xs:time, xs:gMonthDay, xs:gMonth,
and xs:gDay, can be considered to represent a sequence of recurring time instants or time periods.
An xs:time occurs every day. An xs:gMonth occurs every year. Comparison operators
on these datatypes compare the starting instants of equivalent occurrences in the recurring series.
These xs:dateTime values are calculated as described below.
Comparison operators on xs:date, xs:gYearMonth and xs:gYear compare
their starting instants. These xs:dateTime values are calculated as described below.
The starting instant of an occurrence of a date/time value is an xs:dateTime
calculated by filling
in the missing components of the local value from a reference xs:dateTime. An example of a suitable
reference xs:dateTime is 1972-01-01T00:00:00. Then, for example, the starting
instant corresponding to the xs:date value 2009-03-12 is
2009-03-12T00:00:00; the starting instant corresponding to the xs:time value
13:30:02 is 1972-01-01T13:30:02; and the starting instant corresponding to the
gMonthDay value --02-29 is 1972-02-29T00:00:00 (which explains
why a leap year was chosen for the reference).
In the previous version of this specification, the reference date/time chosen was
1972-12-31T00:00:00. While this gives the same results, it produces a "starting instant" for
a gMonth or gMonthDay that bears no
relation to the ordinary meaning of the term, and it also required special handling of short months.
The original choice was made to allow for leap seconds; but since leap seconds are not recognized
in date/time arithmetic, this is not actually necessary.
If the xs:time value written as
24:00:00 is to be compared, filling in the missing components gives 1972-01-01T00:00:00,
because 24:00:00 is an alternative representation of 00:00:00 (the lexical value
"24:00:00" is
converted to the time components { 0, 0, 0 } before the missing components are filled
in). This has the consequence that when ordering xs:time values,
24:00:00 is
considered to be earlier than 23:59:59. However, when ordering
xs:dateTime
values, a time component of 24:00:00 is considered equivalent to 00:00:00 on the
following day.
Note that the reference xs:dateTime does not have a timezone. The timezone component
is never filled in from the reference xs:dateTime. In some cases, if the date/time value does not
have a timezone, the implicit timezone from the dynamic context is used as the timezone.
This specification uses the reference xs:dateTime 1972-01-01T00:00:00 in the description of the
comparison operators. Implementations may use other reference xs:dateTime values
as long as they yield the same results. The reference xs:dateTime used must meet the following
constraints: when it is used to supply components into xs:gMonthDay values, the year must allow
for February 29 and so must be a leap year; when it is used to supply missing components into xs:gDay
values, the month must allow for 31 days. Different reference xs:dateTime values may be used for
different operators.
The date and time datatypes may be considered to be composite datatypes in that they contain distinct properties or components. The extraction functions specified below extract a single component from a date or time value. In all cases the local value (that is, the original value as written, without any timezone adjustment) is used.
A time written as 24:00:00 is treated as 00:00:00 on the
following day.
These functions adjust the timezone component of an xs:dateTime, xs:date or
xs:time value. The $timezone argument to these functions is defined as an
xs:dayTimeDuration but must be a valid timezone value.
These functions support adding or subtracting a duration value to or from an
xs:dateTime, an xs:date or an xs:time
value. Appendix E of
A processor that limits the number of digits in date and time datatype
representations may encounter overflow and underflow conditions when it
tries to execute the functions in this section. In
these situations, the processor
The value spaces of the two totally ordered subtypes of
xs:duration described in xs:integer months for xs:yearMonthDuration
and xs:decimal seconds for xs:dayTimeDuration. If
a processor limits the number of digits allowed in the representation of
xs:integer and xs:decimal then overflow and
underflow situations can arise when it tries to execute the functions in
Three functions are provided to represent dates and times as a string, using the conventions of a selected calendar,
language, and country. The functions are presented in their customary fashion,
except for the rules and examples, which are described en bloc at
The $value as a string using
the picture string specified by the $picture argument,
the calendar specified by the $calendar argument,
the language specified by the $language argument,
and the country or other place name specified by the $place argument.
The result of the function is the formatted string representation of the supplied
xs:dateTime, xs:date, or xs:time value.
If $value is the empty sequence, the function returns the empty sequence.
Calling the two-argument form of each of the three functions is equivalent to calling the five-argument form with each of the last three arguments set to the empty sequence.
For details of the $language, $calendar, and
$place arguments, see
In general, the use of an invalid $picture,
$language, $calendar, or
$place argument results in a dynamic error
The picture consists of a sequence of variable markers and literal substrings.
A substring enclosed in square brackets is interpreted as a variable marker; substrings
not enclosed in square brackets are taken as literal substrings.
The literal substrings are optional and if present are rendered unchanged, including any whitespace.
If an opening or closing square bracket
is required within a literal substring, it
A variable marker consists of a component specifier followed optionally by one or two presentation modifiers and/or optionally by a width modifier. Whitespace within a variable marker is ignored.
The variable marker may be separated into its components by applying the following rules:
The component specifier is always present and is always a single letter.
The width modifier may be recognized by the presence of a comma.
The substring between the component specifier and the comma (if present) or the end of the string (if there is no comma) contains the first and second presentation modifiers, both of which are optional. If this substring contains a single character, this is interpreted as the first presentation modifier. If it contains more than one character, the last character is examined: if it is valid as a second presentation modifier then it is treated as such, and the preceding part of the substring constitutes the first presentation modifier. Otherwise, the second presentation modifier is presumed absent and the whole substring is interpreted as the first presentation modifier.
The
| Specifier | Meaning | Default Presentation Modifier |
|---|---|---|
| Y | year (absolute value) | 1 |
| M | month in year | 1 |
| D | day in month | 1 |
| d | day in year | 1 |
| F | day of week | n |
| W | week in year | 1 |
| w | week in month | 1 |
| H | hour in day (24 hours) | 1 |
| h | hour in half-day (12 hours) | 1 |
| P | am/pm marker | n |
| m | minute in hour | 01 |
| s | second in minute | 01 |
| f | fractional seconds | 1 |
| Z | timezone | 01:01 |
| z | timezone (Same as Z, but modified where appropriate to include a prefix
as a time offset using GMT, for example GMT+1 or GMT-05:00. For this component there is a fixed
prefix of GMT, or a localized
variation thereof for the chosen language, and the remainder of the value is formatted as for specifier Z.)
|
01:01 |
| C | calendar: the name or abbreviation of a calendar name | n |
| E | era: the name of a baseline for the numbering of years, for example the reign of a monarch | n |
A dynamic error is reported
A dynamic error is reported $value,
for example if the picture supplied to the
It is not an error to include a timezone component when the supplied value has no timezone. In these circumstances the timezone component will be ignored.
The first
any format token permitted as a primary format token in the second argument
of the 1, 01, i, I, w, W,
or Ww) or
the format token n, N,
or Nn, indicating that the value of the component is to be output by name,
in lower-case, upper-case, or title-case respectively. Components that can be output by name
include (but are not limited to) months, days of the week, timezones, and eras.
If the processor cannot output these components by name for the chosen calendar and language
then it must use an
If a comma is to be used as a grouping separator within the format token, then there must be a width
specifier. More specifically: if a variable marker
contains one or more commas, then the last comma is treated as introducing the width modifier, and all others
are treated as grouping separators. So [Y9,999,*] will output the year as 2,008.
It is not possible to use a closing square bracket as a grouping separator within the format token.
If the implementation does not support the use of the requested format token, it
If the first presentation modifier is present, then it may optionally be followed by a second presentation modifier as follows:
| Modifier | Meaning |
|---|---|
either a or t |
indicates alphabetic or traditional numbering respectively,
the default being |
either c or o |
indicates cardinal or ordinal numbering respectively, for example
7 or seven for a cardinal number, or 7th,
seventh, or 7º
for an ordinal number.
This has the same meaning as
in the second argument of |
Although the formatting rules are expressed in terms of the rules
for format tokens in primo) for the first day of the month, and cardinal numbers
(due, tre, quattro ...) for the remaining days. A processor may therefore use
this convention to number days of the month, ignoring the presence or absence of the ordinal
presentation modifier.
Whether or not a presentation modifier is included, a width modifier may be supplied. This indicates the number of characters to be included in the representation of the value.
The width modifier, if present, is introduced by a comma. It takes the form:
"," min-width ("-" max-width)?
where min-width is either an unsigned integer indicating the minimum number of characters to
be output, or * indicating that there is no explicit minimum, and
max-width is either an unsigned integer indicating the maximum number of characters to
be output, or * indicating that there is no explicit maximum; if max-width
is omitted then * is assumed.
A dynamic error (min-width is present and less than one, or if
max-width is present and less than one or less than min-width.
A format token containing more than one digit, such as 001 or 9999, sets the
minimum and maximum width to the number of digits appearing in the format token; if a width
modifier is also present, then the width modifier takes precedence.
The rules in this section apply to the majority of integer-valued components: specifically M D d F W w H h m s.
In the rules below, the term
If the first presentation modifier takes the form of a
If there is no width modifier, then the value is formatted according to
the rules of the format-integer function.
If there is a width modifier, then the first presentation modifier is adjusted as follows:
If the decimal digit pattern includes a grouping separator, the output is implementation-defined (but this is not an error).
Use of a width modifier together with grouping separators is inadvisable for this reason. It is never necessary to use a width modifier with a decimal digit pattern, since the same effect can be achieved by use of optional digit signs.
Otherwise, the number of mandatory-digit-sign characters in the presentation modifier is increased if necessary. This is done first by replacing optional-digit-signs with mandatory-digit-signs, starting from the right, and then prepending mandatory-digit-signs to the presentation modifier, until the number of mandatory-digit-signs is equal to the minimum width. Any mandatory-digit-signs that are added by this process must use the same decimal digit family as existing mandatory-digit-signs in the presentation modifier if there are any, or ASCII digits otherwise.
The maximum width, if specified, is ignored.
The output is then as defined using the format-integer function with this adjusted decimal digit pattern.
If the first presentation modifiers is one of N, n, or Nn:
Let FN be the full name of the component, that is, the form of the name that would be used in the absence of any width modifier.
If FN is shorter than the minimum width, then it is padded by appending spaces to the end of the name.
If FN is longer than the maximum width, then it is abbreviated, either by choosing a conventional abbreviation that fits within the maximum width (for example, “Wednesday” might be abbreviated to “Weds”), or by removing characters from the end of FN until it fits within the maximum width.
For other presentation modifiers:
Any adjustment of the value to fit within the requested width range is implementation-defined.
The value should not be truncated if this results in output that will not be meaningful to users (for example, there is no sensible way to truncate Roman numerals).
If shorter than the minimum width, the value should be padded to the minimum width, either by appending spaces, or in some other way appropriate to the numbering scheme.
The rules for the year component (Y) are the same as those in
If the width modifier is present and defines a finite maximum width, then that maximum width.
Otherwise, if the first presentation modifier takes the form of a decimal-digit-pattern, then:
Let W be the number of optional-digit-signs and mandatory-digit-signs in that decimal-digit-pattern.
If W is 2 or more, then W.
Otherwise, N is infinity (that is, the year is output in full).
The output for the fractional seconds component (f) is equivalent to the result of the following algorithm:
If the first presentation modifier contains no Unicode digit, then the output is implementation-defined.
Otherwise, the value of the fractional seconds is output as follows:
If there is no width modifier and the first presentation modifier comprises in its
entirety a single mandatory-digit-sign (for example the default 1), then
the presentation modifier is extended on the right with as many optional-digit-signs as
are needed to accommodate the actual fractional seconds precision encountered in the
value to be formatted.
If there is a width modifier, then the first presentation modifier is adjusted as follows:
If a minimum width is specified, and if this exceeds the number of mandatory-digit-sign characters in the first presentation modifier, then the first presentation modifier is adjusted. This is done first by replacing optional-digit-signs with mandatory-digit-signs, starting from the left, and then appending mandatory-digit-signs to the presentation modifier, until the number of mandatory-digit-signs is equal to the minimum width. Any mandatory-digit-signs that are added by this process must use the same decimal digit family as existing mandatory-digit-signs in the presentation modifier.
If a maximum width is specified, the first presentation modifier is extended on the right with as many optional-digit-signs as are needed to ensure that the number of mandatory-digit-signs and optional-digit-signs is at least equal to the maximum width.
The sequence of characters in the (adjusted) first presentation modifier is reversed (for example,
999'### becomes ###'999).
If the result is not a valid
The sequence of digits in the conventional decimal representation of the fractional seconds component
is reversed, with insignificant zeroes removed, and the result is treated as an integer. For example, if the
seconds value is 25.8235, the reversed fractional seconds value is 5328.
The reversed fractional seconds value is formatted using the reversed decimal digit pattern according to the
rules of the 5'328
The resulting string is reversed. In our example, the result is 823'5.
If the result contains more digits than the number of mandatory-digit-signs and optional-digit-signs in the decimal digit pattern, then excess digits are removed from the right hand end (that is, the value is truncated towards zero rather than being rounded). Any grouping separator that immediately precedes a removed digit is also removed.
The reason for presenting the algorithm in this way is that it enables maximum reuse of the rules defined for
A format token consisting of a single digit,
such as 1, does not constrain the number of digits in the output.
In the case of fractional seconds in particular, [f001] requests three decimal digits,
[f01] requests two digits, but [f1] will retain all digits in the
supplied date/time value (the maximum number of digits is implementation-defined).
If exactly one digit is required, this can be achieved using the component specifier
[f1,1-1].
Special rules apply to the formatting of timezones. When the component specifiers Z
or z are used, the rules in this section override any rules given elsewhere in the case of
discrepancies.
If the date/time value to be formatted does not include a timezone offset, then the timezone component
specifier is generally ignored (results in no output). The exception is where military timezones are used
(format ZZ) in which case the string "J" is output, indicating local time.
When the component specifier is z, the output is the same as for component specifier
Z, except that it is prefixed by the characters GMT or some localized
equivalent. The prefix is omitted, however, in cases where the timezone is identified by name rather than by
a numeric offset from UTC.
If the first presentation modifier is numeric and comprises one or two digits
with no grouping-separator (for example 1
or 01), then the timezone is formatted as a displacement from UTC in hours, preceded by a plus or minus
sign: for example -5 or +03. If the actual timezone offset is not an integral number of hours,
then the minutes part of the offset is appended, separated by a colon: for example +10:30 or
-1:15.
If the first presentation modifier is numeric with a grouping-separator (for example 1:01
or 01.01), then the timezone offset is output in hours and minutes, separated by the grouping separator,
even if the number of minutes is zero: for example +5:00 or +10.30.
If the first presentation modifier is numeric and comprises three or four digits with no
grouping-separator, for example 001 or 0001, then the timezone offset
is shown in hours and minutes with no separator, for example -0500 or +1030.
If the first presentation modifier is numeric, in any of the above formats, and the second
presentation modifier is t, then a zero timezone offset (that is, UTC) is output as Z instead
of a signed numeric value. In this presentation modifier is absent or if the timezone offset is non-zero,
then the displayed timezone offset is preceded by a - sign for negative offsets
or a + sign for non-negative offsets.
If the first presentation modifier is Z, then the timezone is formatted
as a military timezone letter, using the convention Z = +00:00, A = +01:00, B = +02:00, ..., M = +12:00,
N = -01:00, O = -02:00, ... Y = -12:00. The letter J (meaning local time) is used in the case of a
value that does not specify a timezone offset. Timezone offsets that have no representation in this system
(for example Indian Standard Time, +05:30) are output as if the format 01:01 had been requested.
If the first presentation modifier is N, then the timezone is output
(where possible) as a timezone name, for example EST or CET. The same timezone
offset has different names in different places; it is therefore $place argument.
In the absence of this information, the implementation may apply a default, for example by using the timezone
names that are conventional in North America. If no timezone name can be identified, the timezone offset is
output using the fallback format 01:01
The following examples illustrate options for timezone formatting.
| Variable marker | $place |
Timezone offsets (with time = 12:00:00) | ||||
|---|---|---|---|---|---|---|
| -10:00 | -05:00 | +00:00 | +05:30 | +13:00 | ||
| [Z] | () | -10:00 | -05:00 | +00:00 | +05:30 | +13:00 |
| [Z0] | () | -10 | -5 | +0 | +5:30 | +13 |
| [Z0:00] | () | -10:00 | -5:00 | +0:00 | +5:30 | +13:00 |
| [Z00:00] | () | -10:00 | -05:00 | +00:00 | +05:30 | +13:00 |
| [Z0000] | () | -1000 | -0500 | +0000 | +0530 | +1300 |
| [Z00:00t] | () | -10:00 | -05:00 | Z | +05:30 | +13:00 |
| [z] | () | GMT‑10:00 | GMT‑05:00 | GMT+00:00 | GMT+05:30 | GMT+13:00 |
| [ZZ] | () | W | R | Z | +05:30 | +13:00 |
| [ZN] | "us" | HST | EST | GMT | IST | +13:00 |
| [H00]:[M00] [ZN] | "America/New_York" | 06:00 EST | 12:00 EST | 07:00 EST | 01:30 EST | 18:00 EST |
If a width specifier is present when formatting a timezone, then the representation as defined in this section is padded to the minimum
width as described in
This section applies to the remaining components: P (am/pm marker), C (calendar),
and E (era).
The output for these components is entirely n, indicating that they are output as names (or
conventional abbreviations), and the chosen names will in many cases depend on the chosen language: see
The set of languages, calendars, and places that are supported in the
If the fallback representation uses a different calendar from that requested,
the output string [Calendar: X] (where X is the calendar actually used),
localized as appropriate to the
requested language. If the fallback representation uses a different language
from that requested, the output string [Language: Y] (where Y is the language
actually used) localized in an
implementation-dependent way. If a particular component of the value cannot be output in
the requested format, it
The $language argument specifies the language to be used for the result string
of the function. The value of the argument xml:lang attribute (see
If the $language
argument is omitted or is set to the empty sequence, or if it is set to an invalid value or a
value that the implementation does not recognize,
then the processor uses the default language defined in the dynamic context.
The language is used to select the appropriate language-dependent forms of:
names (for example, of months)
numbers expressed as words or as ordinals (twenty, 20th, twentieth)
hour convention (0-23 vs 1-24, 0-11 vs 1-12)
first day of week, first week of year
Where appropriate this choice may also take into account the value of the
$place argument, though this language
argument.
The choice of the names and abbreviations used in any given language is
Jul while another uses Jly. In German,
one implementation might represent Saturday as Samstag while another
uses Sonnabend. Implementations
Where ordinal numbers are used, the selection of the correct representation of the
ordinal (for example, the grammatical gender)
The calendar attribute specifies that the dateTime, date,
or time supplied in the $value argument
The calendar value if present EQName
(dynamic error: QName then it is expanded into an expanded QName
using the statically known namespaces; if it has no prefix then it represents an expanded-QName in no namespace.
If the expanded QName is in no namespace,
then it
If the $calendar argument is omitted or is set to the empty sequence
then the default calendar defined in the dynamic context is used.
The calendars listed below were known to be in use during the last hundred years. Many other calendars have been used in the past.
This specification does not define any of these calendars, nor the way that they
map to the value space of the xs:date datatype in $place and/or $language arguments, with the
$place
argument taking precedence.
Information about some of these calendars, and algorithms for converting between them, may
be found in
| Designator | Calendar |
|---|---|
| AD | Anno Domini (Christian Era) |
| AH | Anno Hegirae (Islamic Era) |
| AME | Mauludi Era (solar years since Muhammad’s birth) |
| AM | Anno Mundi (Jewish Calendar) |
| AP | Anno Persici |
| AS | Aji Saka Era (Java) |
| BE | Buddhist Era |
| CB | Cooch Behar Era |
| CE | Common Era |
| CL | Chinese Lunar Era |
| CS | Chula Sakarat Era |
| EE | Ethiopian Era |
| FE | Fasli Era |
| ISO | ISO 8601 calendar |
| JE | Japanese Calendar |
| KE | Khalsa Era (Sikh calendar) |
| KY | Kali Yuga |
| ME | Malabar Era |
| MS | Monarchic Solar Era |
| NS | Nepal Samwat Era |
| OS | Old Style (Julian Calendar) |
| RS | Rattanakosin (Bangkok) Era |
| SE | Saka Era |
| SH | Solar Hijri (Islamic Era, used in Iran and Afghanistan) |
| SS | Saka Samvat |
| TE | Tripurabda Era |
| VE | Vikrama Era |
| VS | Vikrama Samvat Era |
At least one of the above calendars
The ISO 8601 calendar (ISO,
is very similar to the Gregorian calendar designated AD, but it
differs in several ways. The ISO calendar
is intended to ensure that date and time formats can be read
easily by other software, as well as being legible for human
users. The ISO calendar
prescribes the use of particular numbering conventions as defined in
ISO 8601, rather than allowing these to be localized on a per-language basis.
In particular it
provides a numeric “week date” format which identifies dates by
year, week of the year, and day in the week;
in the ISO calendar the days of the week are numbered from 1 (Monday) to 7 (Sunday), and
week 1 in any calendar year is the week (from Monday to Sunday) that includes the first Thursday
of that year. The numeric values of the components year, month, day, hour, minute, and second
are the same in the ISO calendar as the values used in the lexical representation of the date and
time as defined in E component)
with this calendar is either a minus sign (for negative years) or a zero-length string (for positive years).
For dates before 1 January, AD 1, year numbers in
the ISO and AD calendars are off by one from each other: ISO year
0000 is 1 BC, -0001 is 2 BC, etc.
ISO 8601 does not define a numbering for weeks within a month. When the w
component is used, the convention to be adopted is that each Monday-to-Sunday week is considered to
fall within a particular month if its Thursday occurs in that month; the weeks that fall in a particular
month under this definition are numbered starting from 1. Thus, for example,
29 January 2013 falls in week 5 because the Thursday of the week (31 January 2013) is the fifth Thursday
in January, and 1 February 2013 is also in week 5 for the same reason.
The value space of the date and time datatypes, as defined in XML Schema, is based on
absolute points in time. The lexical space of these datatypes defines a
representation of these absolute points in time using the proleptic Gregorian calendar,
that is, the modern Western calendar extrapolated into the past and the future; but the value space
is calendar-neutral. The
1502-01-11
(the day on which Pope Gregory XIII was born) might be
formatted using the Old Style (Julian) calendar as 1 January 1502. This reflects the fact
that there was at that time a ten-day difference between the two calendars. It would be
incorrect, and would produce incorrect results, to represent this date in an element or attribute
of type xs:date as 1502-01-01, even though this might reflect the way
the date was recorded in contemporary documents.
When referring to years occurring in antiquity, modern historians generally
use a numbering system in which there is no year zero (the year before 1 CE
is thus 1 BCE). This is the convention that xs:date and xs:dateTime
does not include a year zero: however, XSD 1.1 endorses the ISO 8601 convention. This means that the date on
which Julius Caesar was assassinated has the ISO 8601 lexical representation
-0043-03-13, but will be formatted as 15 March 44 BCE in the Julian calendar
or 13 March 44 BCE in the Gregorian calendar (dependent on the chosen
localization of the names of months and eras).
The intended use of the $place argument is to identify
the place where an event
represented by the dateTime, date,
or time supplied in the $value argument took place or will take place.
If the $place argument is omitted or is set
to the empty sequence, then the default place defined in the dynamic context is used.
If the value is supplied, and is not the empty sequence, then it
Country codes are defined in "de" for Germany
and "jp" for Japan. Implementations
IANA timezone names are defined in the IANA timezone database "America/New_York" and "Europe/Rome".
This argument is not intended to identify the location of the user
for whom the date or time is being formatted;
that should be done by means of the $language attribute.
This information
The geographical area identified by a country code is defined by the boundaries as they existed at the time of the date to be formatted, or the present-day boundaries for dates in the future.
If the $place argument is supplied in the form
of an IANA timezone name that is recognized by the implementation, then the date or
time being formatted is adjusted to the timezone offset applicable in that timezone.
For example, if the xs:dateTime value 2010-02-15T12:00:00Z
is formatted with the $place argument set to
America/New_York, then the output will be as if the value
2010-02-15T07:00:00-05:00 had been supplied. This adjustment takes daylight
savings time into account where possible; if the date in question falls during
daylight savings time in New York, then it is adjusted to timezone offset -PT4H
rather than -PT5H. Adjustment using daylight savings time is only possible
where the value includes a date, and where the date is within the range covered
by the timezone database.
The following examples show a selection of dates and times and the way they might be formatted. These examples assume the use of the Gregorian calendar as the default calendar.
| Required Output | Expression |
|---|---|
2002-12-31
|
format-date($d, "[Y0001]-[M01]-[D01]")
|
12-31-2002
|
format-date($d, "[M]-[D]-[Y]")
|
31-12-2002
|
format-date($d, "[D]-[M]-[Y]")
|
31 XII 2002
|
format-date($d, "[D1] [MI] [Y]")
|
31st December, 2002
|
format-date($d, "[D1o] [MNn], [Y]", "en", (), ())
|
31 DEC 2002
|
format-date($d, "[D01] [MN,*-3] [Y0001]", "en", (), ())
|
December 31, 2002
|
format-date($d, "[MNn] [D], [Y]", "en", (), ())
|
31 Dezember, 2002
|
format-date($d, "[D] [MNn], [Y]", "de", (), ())
|
Tisdag 31 December 2002
|
format-date($d, "[FNn] [D] [MNn] [Y]", "sv", (), ())
|
[2002-12-31]
|
format-date($d, "[[[Y0001]-[M01]-[D01]]]")
|
Two Thousand and Three
|
format-date($d, "[YWw]", "en", (), ())
|
einunddrei&eszet;igste Dezember
|
format-date($d, "[Dwo] [MNn]", "de", (), ())
|
3:58 PM
|
format-time($t, "[h]:[m01] [PN]", "en", (), ())
|
3:58:45 pm
|
format-time($t, "[h]:[m01]:[s01] [Pn]", "en", (), ())
|
3:58:45 PM PDT
|
format-time($t, "[h]:[m01]:[s01] [PN] [ZN,*-3]", "en", (), ())
|
3:58:45 o'clock PM PDT
|
format-time($t, "[h]:[m01]:[s01] o'clock [PN] [ZN,*-3]", "en", (), ())
|
15:58
|
format-time($t, "[H01]:[m01]")
|
15:58:45.762
|
format-time($t, "[H01]:[m01]:[s01].[f001]")
|
15:58:45 GMT+02:00
|
format-time($t, "[H01]:[m01]:[s01] [z,6-6]", "en", (), ())
|
15.58 Uhr GMT+2
|
format-time($t, "[H01]:[m01] Uhr [z]", "de", (), ())
|
3.58pm on Tuesday, 31st December
|
format-dateTime($dt, "[h].[m01][Pn] on [FNn], [D1o] [MNn]")
|
12/31/2002 at 15:58:45
|
format-dateTime($dt, "[M01]/[D01]/[Y0001] at [H01]:[m01]:[s01]")
|
The following examples use calendars other than the Gregorian calendar.
| Description | Request | Result |
|---|---|---|
| Islamic |
format-date($d, "[D١] [Mn] [Y١]", "ar", "AH", ())
|
٢٦ ﺸﻭّﺍﻝ ١٤٢٣ |
| Jewish (with Western numbering) |
format-date($d, "[D] [Mn] [Y]", "he", "AM", ())
|
26 טבת 5763 |
| Jewish (with traditional numbering) |
format-date($d, "[Dאt] [Mn] [Yאt]", "he", "AM", ())
|
כ״ו טבת תשס״ג |
| Julian (Old Style) |
format-date($d, "[D] [MNn] [Y]", "en", "OS", ())
|
18 December 2002 |
| Thai |
format-date($d, "[D๑] [Mn] [Y๑]", "th", "BE", ())
|
๓๑ ธันวาคม ๒๕๔๕ |
A function is provided to parse dates and times expressed using syntax that is commonly encountered in internet protocols.
In XPath 4.0, statically-known QNames can be expressed using a QName literal such as
#xml:space. Where the QName is not known statically,
the xs:QName constructor function can be used.
In addition to the xs:QName constructor function, QName values can
be constructed by combining a namespace URI, prefix, and local name, or by resolving
a lexical QName against the in-scope namespaces of an element node. This section
defines functions that perform these operations.
Leading and trailing whitespace, if present, is stripped from
string arguments before the result is constructed.
This section specifies functions on QNames as defined in
There are no functions designed explicitly to process xs:NOTATION items.
However, some generic functions such as fn:compare can be used on xs:NOTATION items.
Binary data is represented using the data types xs:hexBinary and
xs:base64Binary. Both types have the same value space: a sequence of octets,
which can be considered as integers in the range 0 to 255.
The coercion rules of XPath 4.0 ensure that the two types are interoperable; a function
that declares an argument of type xs:hexBinary will always accept a value of
type xs:base64Binary, and vice versa.
There are no functions defined in this document provided exclusively for processing binary data. A number of generic functions are available:
Functions such as fn:deep-equal, and
Functions such as
The function
The constructor functions xs:string, xs:hexBinary,
and xs:base64Binary can be used to convert binary values to and from their
string representation.
The function xs:base64Binary value.
A library of functions specific to processing of binary data can be found in
Accessors and their semantics are described in
Each of these functions has an arity-zero signature which is equivalent to the arity-one
form, with the context value supplied as the implicit first argument. In addition, each of the
arity-one functions accepts the empty sequence as the argument, in which case it generally delivers
the empty sequence as the result: the exception is
| Function | Accessor | Accepts | Returns |
|---|---|---|---|
|
|
node-name
|
node (optional) | xs:QName (optional)
|
|
|
nilled
|
node (optional) | xs:boolean (optional)
|
|
|
string-value
|
item (optional) |
xs:string
|
|
|
typed-value
|
zero or more items | a sequence of atomic items |
|
|
base-uri
|
node (optional) | xs:anyURI (optional)
|
|
|
document-uri
|
node (optional) | xs:anyURI (optional)
|
This section specifies further functions that return properties of nodes.
Nodes are formally defined in
This section specifies functions on sequences of nodes.
This section defines a number of functions used to find elements by ID or IDREF value,
or to generate identifiers.
The functions included in this section operate on function items, that is, values referring to a function.
Some functions such as
Maps were introduced as a new datatype in XDM 3.1. This section describes functions that operate on maps.
A map is a kind of item.
K1 and K2 are the fn:atomic-equal($K1, $K2)true.
It is not necessary that all the keys in a map should be of the same type (for example, they can include a mixture of integers and strings).
Maps are immutable, and have no identity separate from their content.
For example, the
A map can also be viewed as a function from keys to associated values. To achieve this, a map is also a
function item. The function corresponding to the map has the signature
function($key as xs:anyAtomicValue) as item()*. Calling the function has the same effect as calling
the $map($key) returns the same result as get($map, $key). For example, if $books-by-isbn
is a map whose keys are ISBNs and whose assocated values are book elements, then the expression
$books-by-isbn("0470192747") returns the book element with the given ISBN.
The fact that a map is a function item allows it to be passed as an argument to higher-order functions
that expect a function item as one of their arguments.
In 4.0, the entries in a map are ordered. The
The entry order of the entries in a map is defined by the function or expression
that creates the map, and affects the result of functions and expressions
that process multiple entries in a map, for example the function for key $k value $v return EXPR. The ordering
is also reflected in the output
of the json and adaptive serialization methods.
Order is maintained in maps for two main reasons:
To make the representation of a map (such as its JSON serialization) easier for human readers to process: for example when visually inspecting the result of a JSON transformation;
To make the result of different implementations interoperable.
Although it is possible to use the ordering of a map to capture
semantic information, the design of functions such as
It is often useful to decompose a map into a sequence of entries, or key-value pairs (in which the key is an atomic item and the value is an arbitrary sequence). Subsequently it may be necessary to reconstruct a map from these components, typically after modification.
There are two conventional ways of representing a map as a sequence of key-value pairs, each with its own advantages and disadvantages. These are described below:
A map can be represented as a sequence of single-entry maps.
It is possible to decompose any map into a sequence of
For example the map
{ "x": 1, "y": 2 } can be decomposed to the sequence ({ "x": 1 }, { "y": 2 }).
A map can be represented as a sequence of JNodes.
A JNode holds the map key in its
The following table summarizes the way in which these two representations can be used to compose and decompose maps:
| Operation | Single-Entry Maps | JNodes |
|---|---|---|
Decompose a map |
|
|
Compose a map |
|
|
Create a single entry |
|
|
Extract the key part of a single entry |
|
|
Extract the value part of a single entry |
|
|
It is also possible to decompose a map using:
The function
The expression for key $k value $v in $map return ....
The examples below show several ways of constructing a map with the same entries as an input map, but with the entries sorted by key.
Using map:entries and map:merge:
Using JNodes:
Using map:for-each:
Using an XQuery FLWOR expression:
The XDM data model (
dm:empty-map constructs the empty map.
dm:map-put adds or replaces an entry in a map.
dm:iterate-map applies a supplied function to every entry in a map.
The functions in this section are all specified by means of equivalent expressions that either call these primitives directly, or invoke other functions that rely on these primitives. The specifications avoid relying on XPath language constructs that manipulate maps, such as map constructor syntax, lookup expressions, or FLWOR expressions. This is done to allow these language constructs to be specified by reference to this function library, without risk of circularity.
There is one exception to this rule: for convenience, the notation {} is used to represent
the empty map, in preference to a call on dm:empty-map().
The formal equivalents are not intended to provide a realistic way of implementating the
functions (in particular, any real implementation might be expected to implement
map:find().The functions defined in this section use a conventional namespace prefix map, which
is assumed to be bound to the namespace URI http://www.w3.org/2005/xpath-functions/map.
The function call map:get($map, $key) can be used to retrieve the value associated with a given key.
There is no operation to atomize a map or convert it to a string. The function
Note that when the required type of an argument to a function such as
The
This section describes the mappings used by this function.
This mapping is designed with three objectives:
It should be possible to represent any XML element as a map suitable for JSON serialization.
The resulting JSON should be intuitive and easy to use.
The JSON should be consistent and stable: small variations in the input should not result in large variations in the output.
Achieving all three objectives requires design compromises. It also requires sacrificing some other desiderata. In consequence:
The conversion is not lossless (see
The conversion is not streamable.
The results are not necessarily compatible with those produced by other popular libraries.
The requirement for consistency and stability is particularly challenging. An element such as
John]]> maps naturally to the map { "name": "John" };
but adding an attribute (so it becomes John]]>)
then requires an incompatible change in the JSON representation. The format could be made extensible
by converting John]]> to { "name": {"#content":"John"} }
and John]]> to
{ "name": { "@role":"first", "#content":"John" } },
but this imposes unwanted complexity on the simplest cases. The solution adopted is threefold:
It is possible to analyze a corpus of XML documents to develop a conversion plan, which can then be applied consistently to individual input documents, whether or not these documents were present in the corpus. The conversion plan can be serialized and subsequently reused, so that it can be applied to input documents that might not have existed at the time the conversion plan was formulated.
Alternatively, the function can make use of schema information where available, so it considers not just the structure of an individual element instance, but the rules governing the element type.
It is possible to override the choices made by the system, and explicitly specify the format to be used for elements or attributes having a given name.
The key challenge in mapping XML to JSON is in deciding how element content is to be represented. To illustrate the variety of mappings that are possible, the following table lists some examples of typical XML elements and their JSON equivalents:
| XML element | JSON equivalent |
|---|---|
This specification defines a
number of named mappings, called
The layout to be used for a specific elements can be explicitly selected by
supplying a conversion plan as input
to the
It is possible to construct a conversion plan by analyzing a corpus of documents
using the
It is also possible to construct a conversion plan manually, or to modify
the conversion plan produced by the
In the absence of an explicit conversion plan, if the data has been schema-validated, the layout is inferred from the content model for the element type as defined in the schema.
When the data is untyped and no specific layout has been selected, a default layout is chosen based on the properties of the individual element instance.
The advantage of using schema information is that it gives a consistent representation for all elements of a particular type, even if they vary in content: for example if an element type allows optional attributes, the JSON representation will be consistent between those elements that have attributes and those without. In the absence of a schema, consistency can be achieved by supplying a conversion plan that applies uniformly to multiple documents.
The different layouts available are defined in the following sections. For each layout there is a table showing:
$elements argument; when used to convert
a descendant element, the corresponding key-value pair may appear as part of a larger map, depending
on the layout chosen for its parent element..
The
xsi:nil="true". These rules only apply if the
element has been schema-validated.
empty layout cannot be used for
an element that is not empty. In such a situation the recovery action is as follows, in order:
Attributes are dropped, and if this is sufficient to enable the layout to be used, then the element is converted without its attributes.
If the type of an element or attribute in the conversion
plan is given as boolean or numeric, but the actual
value of the element or attribute is not castable to xs:boolean
or xs:numeric respectively, then the node is output ignoring
the type property, that is, as an instance of xs:untypedAtomic.
If the conversion plan supplies a fallback layout (an entry with key
"*"), then the fallback layout is used.
The
The rules for selecting the layout for a particular element are given later,
in
Note that it is possible to request any layout for any element. If an inappropriate layout
is chosen for a particular element (for example, empty layout for an element
that is not empty), then the rules for that layout specify what happens.
It is possible to specify a fallback layout for use when the selected layout fails: this will typically
be a layout such as xml or mixed that can handle any element.
Acknowledgements for this categorization: see
| Layout name |
|
|---|---|
| Usage | Intended for XML elements that have no content and no attributes. |
| Example input | |
| Example output | |
| Mapping rules | The content is represented by the zero-length |
| Mapping for nilled elements | The content is represented by the QName
|
| Errors | Attributes are discarded, along with child comment nodes, processing instructions, and whitespace-only text nodes. If any other child nodes are present, this layout fails. |
| Layout name |
|
|---|---|
| Usage | Intended for XML elements that have no content but may have attributes. |
| Example input | |
| Example output | |
| Mapping rules | The content is represented by a map containing one entry for each
attribute in the XML element; if there are no attributes, the content
is represented as the empty map.
The rules for attribute names are
defined in |
| Mapping for nilled elements | An additional key-value pair |
| Errors | Child comment nodes, processing instructions, and whitespace-only text nodes are discarded. If any other child nodes are present, this layout fails. |
| Layout name |
|
|---|---|
| Usage | Intended for XML elements that have simple content and no attributes. |
| Example input | |
| Example output | |
| Mapping rules | The element is atomized and the resulting atomized value is handled
as described in If the element is untyped, the atomized value will always
appear in the result as an instance of |
| Mapping for nilled elements | The content is represented by the value |
| Errors | Attributes are discarded, along with child comment nodes and processing instructions; whitespace is retained. If any child elements are present, this layout fails. |
| Layout name |
|
|---|---|
| Usage | Intended for XML elements that have simple content and (optionally) attributes. |
| Example input | |
| Example output | |
| Mapping rules | The element is represented by a map containing one entry for each
of its attributes, plus an entry with key The rules for attribute names are
defined in If the element is untyped, the value of each attribute, and of If the element has been schema-validated, the types of the items in the atomized value are retained. |
| Mapping for nilled elements | The |
| Errors | Child comment nodes and processing instructions are discarded; whitespace is retained. If any child elements are present, this layout fails. |
| Layout name |
|
|---|---|
| Usage | Intended for XML elements that act as wrappers for a list of child elements, all having the same element name; neither the element itself nor any of its children should have any attributes. The expected child element name may be present in the conversion plan. The names of the child elements are not retained in the output. |
| Example input (1) | |
| Example output (1) | |
| Example input (2) | |
| Example output (2) | |
| Mapping rules | The content is represented by an array, whose members correspond one-to-one with the children of the element. Each child element is converted to a map as if it were a top-level element: the resulting map contains a single key-value pair. The key part is discarded, and the value part is used as a member in the resulting array. If there are no children then the content is represented by the empty array. |
| Mapping for nilled elements | The array is replaced by the value |
| Errors | Attributes are discarded for both the element itself, and its children. Comments, processing instructions, and whitespace text nodes in the content are discarded. This layout fails if any child element is present with a name that differs from the expected child element name, or if there are non-whitespace text node children. |
| Layout name |
|
|---|---|
| Usage | Intended for XML elements that act as wrappers for a list of child elements, all having the same element name. The wrapper element may have attributes, but the children should not. and the name of the child elements is retained in the output. |
| Example input (1) | |
| Example output (1) | |
| Example input (2) | |
| Example output (2) | |
| Mapping rules |
The content is represented by a map containing one entry for each
attribute in the XML element, plus a property named after the
child elements (the If there are no children and the element is untyped (which can occur when
this layout is chosen explicitly via the options to |
| Mapping for nilled elements | The array-valued entry in the result is replaced by the entry
|
| Errors | Any attributes on the element's children are discarded. Comments, processing instructions, and whitespace text nodes in the content are discarded. This layout fails if any child element is present with a name that differs from the expected child element name, or if there are non-whitespace text node children. |
| Layout name |
|
|---|---|
| Usage | Intended primarily for XML elements that contain multiple child elements, with different names, where the order of the child elements is not significant. Also used for elements whose content is a single element node child. The element may or may not have attributes. |
| Example input (1) | |
| Example output (1) | |
| Example input (2) | |
| Example output (2) | |
| Mapping rules |
The content is represented by a map containing one entry for each attribute in the XML element, plus one entry for each child element, whose value is formatted according to the rules for that element. If two or more child elements have the same name, or names that are represented by
the same string (taking into account the chosen The |
| Mapping for nilled elements | Alongside any attributes, the value includes the additional entry
|
| Errors | Although this layout is intended primarily for elements whose children are unordered and uniquely named, it is also viable to use it in cases where elements can repeat, so long as order relative to other elements is not significant. Comments, processing instructions, and whitespace text nodes in the content are discarded. This layout fails if there are non-whitespace text node children. |
| layout name |
|
|---|---|
| Usage | Intended for XML elements that contain a sequence of element node children, whose order is significant. The element may or may not have attributes. |
| Example input | Lorem ipsum. Dolor sit amet. ]]> |
| Example output | |
| Mapping rules | The mapping rules are identical to the rules for the |
| Mapping for nilled elements | A nilled element is indicated by including an additional map
|
| Errors | This layout fails if there are non-whitespace text node children. |
| Layout name |
|
|---|---|
| Usage | Intended for XML elements that contain mixed content (that is, elements that contain both child elements and child text nodes, intermingled). The element may or may not have attributes. |
| Example input | |
| Example output | |
| Mapping rules | The content is represented by an XDM array containing one entry for each attribute in the XML element, and one entry for each child node, in order. Each attribute node is represented within this array by a single-entry
map: the rules for attribute names are
defined in Child nodes are represented within the array as follows: A text node child is represented as an atomic item of type
An element node child is represented as a map containing a single entry, with the key representing the element name and the value representing the element's content, formatted according to the chosen layout for that element. A comment node is represented as a map containing a single
entry whose key is the string A processing instruction node is represented as a map containing
a single entry whose key is the string Whitespace text nodes are retained. |
| Mapping for nilled elements | A nilled element is indicated by including an additional map
|
| Errors | All children are retained, including comments, processing instructions, and text nodes, whether or not they are whitespace-only. This layout never fails. |
Serialized layout allows an element node to be represented as lexical XML, contained within a map.
| Layout name |
|
|---|---|
| Usage | This layout is useful when the input contains a mix of structured data and marked-up textual content. It allows the textual content to be output as serialized XML. It is also used as a fallback representation when the selected element layout is inappropriate for a particular element. |
| Example input | |
| Example output | |
| Mapping rules | The element node is serialized as if by the The serialization parameter The serialization parameter The serialization parameter Other serialization parameters take their default values. The outermost element name will typically be repeated, for example
|
| Mapping for nilled elements | A nilled element is represented using its normal XML serialization,
that is, the output serialization includes the attribute |
| Errors | This layout never fails. |
It is possible to create a conversion plan by analyzing a collection of sample input documents.
The function
The output of this function (the conversion plan) holds information about how elements and attributes (identified by name) should be converted.
For elements, the information is primarily a mapping from element names
(xs:QName instances) to layout names. In some cases additional information beyond
the layout name is also included. The conversion plan is represented as an XDM map, whose structure
is defined in this specification. A conversion plan can be constructed directly, or the plan
produced by calling
The
Let $EE be
the set of all elements named N, specifically
$input/descendant-or-self::*[node-name(.) eq N].
If empty($EE/(* | text()) (that is, if there
are no child elements or text nodes) then:
If empty($EE/@*) (that is, if there
are no attributes),
then the layout is empty: see
Otherwise, the layout is empty-plus: see
If empty($EE/*) (that is, if there are no child elements) then:
If empty($EE/@*) (that is, if there
are no attributes)
then the layout is simple: see
Otherwise, simple-plus: see
The plan also includes the property type. If all the elements
in $EE are castable as xs:boolean, then the type is boolean;
otherwise, if all the elements in $EE as castable as xs:numeric,
then the type is numeric; otherwise, the type is string.
If empty($EE/text()[normalize-space()]) (that is, there are no text node
children other than whitespace), then:
If all-equal($EE/*/node-name()) and exists($EE/*[2])
(that is, if all child elements have the same name, and at least one element has multiple child
elements), then:
If empty($EE/@*) (that is, if there
are no attributes)
then list: see
Otherwise, list-plus: see
If every $e in $EE satisfies all-different($e/*/node-name())
(that is, the child elements are uniquely named among their siblings),
then record: see
Otherwise, sequence: see
Otherwise, mixed: see
For elements with simple content (more specifically, elements where the chosen layout is
simple or simple-plus) the conversion plan also includes an entry indicating
whether the content should be represented as a boolean, a number, or a string. If every instance of
the element name has content that is castable to xs:boolean, the plan indicates
"type": "boolean". If every instance of
the element name has content that is castable to xs:numeric, the plan indicates
"type": "numeric". In other cases, the plan indicates "type": "string"; however,
this may be omitted because it is the default.
For attributes, the conversion plan identifies whether attributes (with a given name)
should be represented as booleans, numbers, or strings; alternatively, it may indicate
that attributes with a given name should be discarded. For every distinct attribute name present
in the input, an entry is output associating the attribute name with one of the types
boolean or numeric; the entry is generally omitted when the values are to
be represented as strings, though the type can also be given explicitly as string.
An entry with type boolean
is generated for an attribute name if all the attributes with that name are castable as xs:boolean.
Similarly, an entry with type numeric is generated for an attribute name if all the
attributes with that name are castable as xs:numeric. In other case, the attributes are
treated as being of type string. Entries with type
string may be omitted, since that is the default.
The entry for an attribute may also specify "type": "skip"
to indicate that the attribute should be discarded.
A plan that is produced by analyzing a corpus of input documents can then be customized by the user if required. For example:
If simple layout is chosen for a particular element name, but
it is known that some documents might be encountered in which that element
has attributes, then simple might be changed to simple-plus.
If record layout is chosen for a particular element name, but
it is known that some documents might be encountered in which child elements
can be repeated, then record might be changed to sequence.
If a generated plan determines that phone numbers should be represented as numbers, it might be modified to treat them as strings.
The conversion plan is a map of type map(xs:string, record(*)).
The key is an element or attribute name, representing element names in the form Q{uri}local,
and attributes in the form @Q{uri}localnotation: in both cases the Q{uri}
part xs:QName instances to allow the plan to be serialized
in JSON format.
A more detailed definition of the structure is given in
A small example might be (in its JSON serialization):
xsi namespace
This section defines modifications to the above rules that apply to elements having
attributes in the xsi namespace (that is,
http://www.w3.org/2001/XMLSchema-instance).
When analyzing a corpus using xsi:nil="true" are ignored. If all
elements with a given name have this attribute, allocate the layout mixed.
When deciding whether an element has any attributes (for example to decide
between the layouts empty and empty-plus),
all attributes in the xsi namespace are ignored.
When converting an individual element to a map,
all attributes in the xsi namespace are ignored.
Notwithstanding the above, elements having the nilled property
(which essentially means they are schema-validated and have the attribute xsi:nil="true"),
are treated specially by each of the possible element layouts.
This section provides a definition of the structure of the
conversion plan that is output by the
The structure is defined by the following item type:
The rules relating to this structure are as follows:
The keys of the map entries are strings of the form:
local-name representing the name of an element in no namespace.
Q{uri}local-name representing the name of an element in a namespace.
* representing a fallback rule for use with elements where either
(a) there is no more specific rule, or (b) processing using the selected layout
fails.
@local-name representing the name of an attribute in no namespace.
@Q{uri}local-name representing the name of an attribute in a namespace.
Any entries whose keys are not in this format will be ignored.
The layout entry is present if and only if the key represents the name of an element.
The child entry is present if and only if the value of layout is
list or list-plus. It represents an element name in the format
local-name for a name in no namespace, or Q{uri}local-name for a
name in a namespace.
The type entry is present if, and only if, one of the following conditions
applies:
The key represents the name of an attribute.
The layout is simple or simple-plus. In this
case the value must not be "skip".
If additional entries (beyond those described above) are present in any of the maps, they are ignored, provided that the map is coercible to the given type definition.
The fallback rule (with key "*") is used to process elements whose name has no
specific entry, and also for elements where normal processing fails (for example when the
selected layout is "empty", but the element has children). If no fallback rule
is present then "error" is assumed: this causes processing to fail with a dynamic
error. The fallback rule will typically set the layout property to one of the following:
error: this causes the function to fail with a dynamic error.
deep-skip: this causes the element and its content (recursively)
to be omitted from the output.
mixed: this causes the element to be output using layout mixed
xml: this outputs the element to be output using layout xml,
which represents the content as a string containing serialized XML.
However, any layout may be used as the fallback; if it fails, the error is unrecoverable.
As an alternative to constructing a conversion plan by analyzing a corpus of specimen documents, conversion may be controlled using type annotations derived from schema validation.
If the function xs:anyType or xs:untyped, then
the following rules apply:
This section uses the notation {prop} to refer to properties
of schema components, as defined in {open content} will inevitably be absent.
Let zeroLength(ST) be true for a simple type ST
if any of the following conditions is true:
ST.{variety} = list, and ST.{facets} includes
a length or maxLength facet whose value is 0 (zero).
ST.{variety} = atomic, and ST.{facets} includes
a length or maxLength facet whose value is 0 (zero).
ST.{variety} = atomic, and ST.{facets} includes
an enumeration facet constraining the value to be zero-length.
ST.{variety} = atomic, and ST.{facets} includes
a pattern facet with the value "" (a zero-length string).
If T is a simple type:
If zeroLength(T), then the selected layout is empty
(see
Otherwise,
the selected layout is simple (see boolean if T is derived from
xs:boolean; numeric if T is derived from
xs:decimal, xs:double, or xs:float;
or string otherwise.
Otherwise (if T is a complex type):
Let $noAttributes be true if
T.{attribute uses} is empty and T.{attribute wildcard}
is absent.
If T.{content type}.{variety} = empty, then:
If $noAttributes and if empty layout is not disabled,
then the selected layout is empty (see
Otherwise, the selected layout is empty-plus (see
If T.{content type}.{variety} = simple
(a complex type with simple content), then:
Let ST be T.{content type}.{simple type definition}
(the corresponding simple type).
If zeroLength(ST), then:
If $noAttributes, the selected layout is empty
(see
Otherwise, the selected layout is empty-plus
(see
Otherwise:
If $noAttributes, the selected layout is
simple (see
Otherwise the selected layout is simple-plus
(see
In both cases the selected type is one of boolean
numeric, or string, chosen in the same way
as for elements having a simple type.
If T.{content type}.{variety} = element-only (a complex type with
an element-only content model):
Let $noWildcards be true if T.{content type}.{open content}
is absent, and T.{content type}.{particle}, expanded recursively, contains
no wildcard term.
Let $childCardinalities be a set of (xs:QName,
xs:double) pairs
representing the expanded names of the element declaration terms within
T.{content type}.{particle},
expanded recursively, and for each one, the maximum number of occurrences of elements
with that name, computed
using the value of the {maxOccurs} property of the particles at each level, taking the value
unbounded as positive infinity.
If $noWildcards is true, and if $childCardinalities
contains a single entry, and that entry has a cardinality greater than one, then:
If $noAttributes then
the selected layout is list (see
Otherwise, the selected layout is list-plus
(see
If $noWildcards is true, and if every entry in $childCardinalities
has a cardinality of one, then the selected layout is
record (see
Otherwise, the selected layout is sequence (see
Otherwise (that is, when T.{content type}.{variety} = mixed, the
selected layout is mixed (see
For attribute nodes, the selected type is boolean if the type annotation is derived from
xs:boolean; numeric if the type annotation is derived from
xs:decimal, xs:double, or xs:float;
and string otherwise.
The various layouts available for elements are described in
If an explicit layout is given for the element name of E in the
conversion plan supplied to the deep-skip,
then no output is produced for that element. If the selected layout is error,
then the function fails with a dynamic error.
If the selected layout fails for the element instance, then
the fallback layout (identified with the key "*" in the conversion plan)
is used; in the absence of a fallback layout, the function fails with a dynamic error.
Otherwise (when no explicit layout is given for E), if the
type annotation of the element is something other than xs:untyped
or xs:anyType, then a schema-determined layout is used as defined
in
Otherwise, if the conversion plan supplies a fallback layout
(identified with the key "*"), then the fallback layout is used.
If the above rules do not provide a layout for E, then
a conversion plan for E is determined by applying the rules in
The name-format option gives control over how element and attribute names are formatted.
There are four options:
The default option (which may be explicitly requested by specifying "name-format": "default")
retains the namespace URI for any element that is either (a) the top-level element of a tree being
converted, or (b) has a name that is in a different namespace from its parent element. In such cases
the format "Q{uri}local" is used. For other elements, the name is output using the
local part of the element name alone. For attributes, the form "Q{uri}local" is used
for an attribute in a namespace, and the local name alone is used for a no-namespace name.
Namespace prefixes are not retained.
The option eqname uses the format "Q{uri}local" for all
element and attribute names that are in a namespace, or the local name alone for all names
that are not in a namespace.
The option local discards all namespace information: all elements and attributes
are output using the local name alone.
The option lexical outputs element and attribute names in the form
obtained by calling the function
Regardless of the chosen name-format, and regardless of the above rules,
attributes in the xml namespace (http://www.w3.org/XML/1998/namespace)
are output using a lexical QName, with the prefix xml.
Attribute names in the output are typically prefixed with the character "@".
The option attribute-marker allows this to be changed to a different
prefix or none.
Whichever format of names is chosen, if the rules for the selected layout would result in an output
map having two entries with the same key, the conflict is resolved by combining these
entries into an array. For example if name-format is set to local
then the element ]]> becomes either
{ "data": { "@val": ["3", "4"] } } or (because attribute order is unpredictable)
{ "data": { "@val": ["4", "3"] } }.
The conversion plan may indicate that element content is to be output
as type string, numeric, or boolean: the default
is string. In the case of untyped elements and attributes, the value is
output as an instance of a string, numeric, or boolean type, according to this
prescription. Specifically:
If the prescribed type is boolean and the value is castable
as xs:boolean, then it is output as an instance of xs:boolean.
If the prescribed type is numeric and the value is castable
as xs:numeric, then it is output as an instance of xs:integer,
xs:decimal, or xs:double depending on the lexical form of the
value, following the same rules as for XPath numeric literals. For example, "-1"
becomes an xs:integer, 12.00 becomes an xs:decimal,
and 1e-3 becomes an xs:double. The special xs:double
values NaN and INF (which cannot be used as numeric literals)
are also recognized.
In all other cases the value is output as an instance of xs:untypedAtomic,
retaining its original lexical form.
Where the element or attribute is schema-validated, however:
If an element has the nilled property (that is, xsi:nil="true"),
then the mapping for nilled elements with the chosen layout is used.
Let AV be the typed value of the node (that is, the result of atomization).
If, however, an element is annotated with a type that does not allow atomization
(specifically, a complex type with element-only content) then let AV be the string value
of the element, as an atomic item of type xs:untypedAtomic.
If an attribute is annotated as having a simple type of {variety} list,
or if an element using layout simple or simple-plus
is annotated as having either a simple type of {variety} list
or a complex type with simple content of {variety} list
then the atomized value AV is represented in the result as the array
represented by the XPath expression array{AV}. This applies whether or not the
atomized value actually contains multiple atomic items. The individual atomic items in the array retain their type,
for example items of type xs:date remain items of type xs:date
in the result.
In all other cases AV will be a single atomic item, and this value
is used
Atomic items in the result of the
This section is non-normative. Its purpose is to explain what information available
in the XDM nodes supplied as input to the
name-format is default or eqname,
then local names and namespace URIs of elements and attributes are retained,
but namespace prefixes are lost. If the chosen name-format is
lexical, then prefixes are retained but namespace URIs are lost.
If the chosen name-format is local then only
local names are retained; namespace URIs and prefixes are lost.
In addition, element names are lost when the parent element is mapped
using list layout: see
sequence, mixed or xml layouts.
empty,
empty-plus, list, list-plus, record,
or sequence layouts. Non-whitespace text nodes are never discarded.
is-id,
is-idref, and is-nilled properties of a node are lost.
record layout is used and the element has multiple children with the same name.
xsi namespace (for example,
xsi:type and xsi:nil) are not represented in the result. .
The following examples show the effect of transforming some simple XML documents with default options,
and then serializing the result as JSON with indent is set to true.
The actual indentation is implementation dependent.
| XDM element | JSON serialization of result |
|---|---|
The following more complex example demonstrates a case where the default conversion is inadequate (for example, it wrongly assumes that for the third production, the order of child elements is immaterial). A better result, shown below, can be achieved by using a schema-aware conversion.
| XDM element | JSON serialization of result |
|---|---|
In the above example, the schema used to validate the source document was simplified to eliminate
options that do not actually arise in this input instance (such as the g:string
element having attributes). This is a legitimate technique that may be useful when trying to obtain
the simplest possible JSON representation.
Further improvements to the usability of the JSON output could be achieved by doing some
simple transformation of the XML prior to conversion. For example, the name
attribute of various productions could be converted to a child element, and
<ref name="x"/> could be transformed to <ref>x</ref>.
Because a map is a function item, functions that apply to functions also apply
to maps. A map is an anonymous function, so 1.
Maps may be compared using the
There is no function or operator to atomize a map or convert it to a string (other than
XPath 4.0 defines a number of syntactic constructs that operate on maps. These all have equivalents in the function library:
The expression {} creates the empty map
(see dm:empty-map(). Using user-visible functions
the same can be achieved by calling
The map constructor { K1 : V1, K2 : V2,
... , K/n : V/n } is equivalent to
map:merge((map:entry(K1, V1), map:entry(K1, V1), ..., map:entry(K/n, V/n)), { "duplicates": "reject" })
The lookup expression $map?*
(see map:items($map).
The lookup expression $map?K, where K is a key value, is equivalent to
map:get($map, K)
The expression for key $k value $v in $map return EXPR
(see map:for-each($map, fn($k, $v) { EXPR }).
Maps can be filtered using the construct $map?[predicate]
(see
Arrays were introduced as a new datatype in XDM 3.1. This section describes functions that operate on arrays.
An array is an additional kind of item. An array of size N is a mapping from the integers (1 to N) to a set of values, called the members of the array, each of which is an arbitrary sequence. Because an array is an item, and therefore a sequence, arrays can be nested.
An array acts as a function from integer positions to associated values, so the
function call $array($index) can be used to retrieve the array member at a given position.
The function corresponding to the array has the signature
function($index as xs:integer) as item()*.
The fact that an array is a function item allows it to be passed as an argument to higher-order functions
that expect a function item as one of their arguments.
The XDM data model (
dm:empty-array constructs the empty array.
dm:array-append adds a member to an array.
dm:iterate-array applies a supplied function to every member of an array, in order.
The functions in this section are all specified by means of equivalent expressions that either call these primitives directly, or invoke other functions that rely on these primitives. The specifications avoid relying on XPath language constructs that manipulate arrays, such as array constructor syntax, lookup expressions, or FLWOR expressions. This is done to allow these language constructs to be specified by reference to this function library, without risk of circularity.
There is one exception to this rule: for convenience, the notation [] is used to represent
the empty array, in preference to a call on dm:empty-array().
The formal equivalents are not intended to provide a realistic way of implementating the functions. They do, however, provide a framework that allows the correctness of a practical implementation to be verified.
The functions defined in this section use a conventional namespace prefix array, which
is assumed to be bound to the namespace URI http://www.w3.org/2005/xpath-functions/array.
As with all other values, arrays are treated as immutable.
For example, the
All functionality on arrays is defined in terms of two primitives:
The function
The function
A record(value as item()*), that is, a map containing a single
entry whose key is the string "value" and whose value is the encapsulated sequence.
Note that when the required type of an argument to a function such as
Arrays may be compared using the
The XPath language provides explicit syntax for certain operations on arrays. These constructs can all be specified in terms of function primitives:
The empty array can be constructed using either of the expressions
[] or array{}. The effect is the same as the data model primitive
dm:empty-array(()) (see array:build(()) or array:of-members(()).
The expression array { $sequence } constructs an array whose members
are the items in $sequence. Every member of this array will
be a singleton item. The effect is the same as
array:build($sequence).
The expression [E1, E2, E3, ..., E/n] constructs an array in which
E1 is the first member, E2 is the second member,
and so on. The result is equivalent to the expression
[] => array:append(E1) => array:append(E2) => ... => array:append(E/n))).
The lookup expression $array?* returns the
array:fold-left($array, (), fn($result, $next){ $result, $next }).
The lookup expression $array?$N, where $N
is an integer within the bounds of the array, is equivalent to
array:get($array, $N).
Similarly, applying the array as a function, $array($N),
is also equivalent to array:get($array, [$N])
The expression for member $m in $array return EXPR
is equivalent to array:for-each($array, fn($m){ EXPR })
(see
Arrays can be filtered using the construct $array?[predicate]
(see
A $jnode/descendant::title, as described
at
In addition to the functions defined in this section, functions that operate on JNodes include:
These functions in this section access resources external to a query or stylesheet, and convert between external file formats and their XPath and XQuery data model representation.
The functions in this section provide access to resources (such as files) in the external environment.
These functions convert between the lexical representation of XML and the tree representation.
(The
This section describes a process called
The validation process takes the following inputs:
A schema to be used for validation,
called the
A boolean indicating whether any xsi:schemaLocation
or xsi:noNamespaceSchemaLocation attributes are to be taken
into consideration.
A document, element, or attribute node to be validated;
this is called the
A validation mode, which is one of strict
lax, or by-type.
XSLT also allows the value strip, but
this does not invoke validation (instead, it invokes stripping
of existing type annotations, and re-annotation of nodes as
xs:untyped.)
If the validation mode is by-type, then
a schema type to be used for validating the operand node.
This may be any simple or complex type present in the effective
schema: it must not be xs:untyped or xs:untypedAtomic.
An XQuery ValidateExpr allows the type to be
specified as xs:untyped or xs:untypedAtomic,
but this does not invoke validation (instead, it invokes stripping of
existing type annotations and re-annotation of nodes as untyped.)
The output of the validation process comprises one or more of the following:
A boolean indicating whether the operand node was found to be valid.
If the operand node was found to be valid, a deep copy of
the operand node augmented with
This creates a new node with its own identity and with no parent.
The base URI property of every node in the resulting XDM tree is the same as the base URI property of the corresponding node in the input tree.
If the operand node was not found to be valid, then optionally, a set of error diagnostics in implementation-defined format.
The operand node must be one of:
An element node
An attribute node
A well-formed document node, that is, a document node having among its children exactly one element node and zero or more comment and processing instruction nodes.
The term
Note that a
The result of the validation process is defined by the following rules.
The invoking application determines whether the validity assessment
process takes account of any xsi:schemaLocation or xsi:noNamespaceSchemaLocation
attributes in the tree being validated. If it does so, then it
Any schema loaded using these attributes must be
Any schema loaded using these attributes must not override or redefine any schema components in the effective schema.
Any schema components loaded using this mechanism must be used for this validity assessment only, and must not affect the outcome of any subsequent validity assessments of other documents.
A processor may choose to cache such schema components but the existence of such a cache should only affect performance, not the validation outcome.
A consequence of validating a document using schema components that are not
in the static context is that nodes may be annotated with types
that are not in the static context. But the rules for
If the instance being validated contains any xml:id attributes,
such attributes are validated against the type
xs:ID, making the containing element
eligible as a target for the xs:ID,
however, is carried out only if the operand node is a document node.
If the operand node is a document node:
The children of the document node
The element node child is validated, as described below.
The validation rule Validation Root Valid (ID/IDREF)
is
applied to the single element node child of the document node. This means
that validation will fail if there are non-unique ID values or dangling
IDREF values in the document tree.
This rule is
There is no check that the tree contains unparsed entities whose names match
the values of nodes of type xs:ENTITY or
xs:ENTITIES. This is because it is not possible
(either in XSLT or XQuery) to construct a tree containing
unparsed entities. It is possible to
add unparsed entity declarations to the result document by referencing a
suitable DOCTYPE during serialization.
All other children of the document node (comments and processing instructions) are copied unchanged, and the results become the children of a new document node, which is returned as the validation result.
If the operand node is an element node, then:
For specification purposes, because the XSD specifications
require the input document to be expressed as an XML Information Set
(
Validity assessment is carried out on the root element information item of the resulting Infoset, using the supplied schema. The process of validation applies recursively to contained elements and attributes to the extent required by the supplied schema.
A practical implementation is unlikely to perform any physical conversion, but the process is defined this way in order to align with the XSD specification.
If the validation mode is by-type, then
Schema-validity assessment is
carried out according to the rules defined in processor-stipulated type definition
for validation.
If validation mode is strict, then
strict validation is carried out as described in
have a name that matches a top-level element declaration in the effective schema, or
have an xsi:type attribute
whose value matches the name of a top-level type definition
in the effective schema
If there is no such element declaration or type definition, the element is assessed as invalid.
If validation mode is lax, then schema-validity
assessment is carried out in accordance with
If validation mode is lax and the root element
information item has neither a top-level element
declaration nor an xsi:type attribute, XSD 1.0
and XSD 1.1 define the recursive checking of children
and attributes as optional. This specification prescribes that this
recursive checking is required.
This means, for example, that when an instance document is structured as having an envelope in one namespace wrapping a payload in a different namespaces, and when schema definitions are available for the payload but not for the envelope, lax validation of the envelope may trigger validation of the payload.
If the operand node is an element node, the validation rules named “Validation Root Valid (ID/IDREF)” are not applied. This means that document-level constraints relating to uniqueness and referential integrity are not enforced.
There is no check that the document contains unparsed entities whose names match the
values of nodes of type xs:ENTITY or xs:ENTITIES.
If the operand node is an attribute node, in particular when it is a parentless attribute node, then validation cannot be defined directly in terms of the XSD-defined validation process. Instead, conceptually, a copy of the attribute is first added to an element node that is created for the purpose, and namespace fixup is performed on this element node to ensure that it has an in-scope namespace binding for the prefix and namespace of the attribute name. The name of this element is of no consequence, but it must be the same as the name of a synthesized element declaration of the form:
where A is the name of the attribute being validated.
This synthetic element is then validated using the procedure given above for
validating elements, and if it is found to be valid, a copy of the validated
attribute is made, retaining its
The XDM data model does not permit an attribute node with no parent to have a
typed value that includes a namespace-qualified name, that is, a value whose
type is derived from xs:QName or xs:NOTATION. This
restriction is imposed because these types rely on the in-scope namespaces of a
containing element to resolve namespace prefixes. Therefore,
a parentless attribute is considered to be invalid against such a type.
The outcome of the validation expression depends on the
validity property of the root element information item in the PSVI that results
from the XSD validation process.
If the validity property of the root element
information item is valid,
or if validation mode is
lax and the validity property of the root
element information item is notKnown,
the PSVI is converted back into a data model instance
as described in validate
expression.
Otherwise, the operand node is deemed invalid.
During conversion of the PSVI into an XDM instance
after validation, any element information items whose validity property is notKnown are
converted into element nodes with xs:anyType, and any attribute information items whose validity property is
notKnown are converted into attribute nodes with xs:untypedAtomic, as described in
This function converts between the lexical representation of HTML and the XDM tree representation.
The
The lexical HTML (supplied as a string) is parsed into an HTML DOM
as defined by the HTML5 specification: see
The resulting DOM is converted to an XDM tree as described in this
section. This is described by defining the actions of the accessor functions
defined in
Because the
An implementation must match the semantics of the mapping described in this section, but
the specific way it achieves that is
Some possible implementation strategies are:
Parse the HTML to an HTML DOM and then convert the HTML DOM to an XDM node tree.
Parse the HTML to an HTML DOM and then implement a wrapper or facade that presents an XDM interface to the HTML DOM.
Parse the lexical HTML directly to an XDM node tree, bypassing the HTML DOM.
The http://www.w3.org/1999/xhtml and the content type
application/xhtml+xml, and is popularly referred to as XHTML.
The HTML parsing algorithm constructs
an HTML DOM HTMLDocument document object for the HTML document. The XHTML parsing
algorithm constructs an HTML DOM XMLDocument object for the HTML document, following
XML parsing rules. This mapping supports both of these document types.
The
The HTML DOM Document interface maps to
The HTML DOM Element interface maps to template element.
The HTML DOM Attr interface maps to
Any HTML DOM Attr instances in an HTML DOM HTMLDocument that represent
namespace declarations will have been filtered out: see
The HTML DOM ProcessingInstruction interface maps to
The HTML parsing algorithm does not generate processing instruction nodes. If encountered
they are parsed as comment nodes. The HTML DOM ProcessingInstruction
interface is relevant only when the XHTML parsing algorithm is used.
The HTML DOM Comment interface maps to
The HTML DOM Text interface maps to Text nodes are combined into a single
The HTML DOM CDATASection interface is an instance of HTML DOM
Text, so CDATA sections also map to
The use of CDATA sections can result in the HTML DOM containing adjacent text nodes, which the mapping to XDM will merge into a single node.
An HTML template element is mapped to an XDM template
element with children corresponding to the children of the HTML DOM DocumentFragment
that is the value of the template element.
Given source HTML such as Lorem ipsum,
the HTML DOM represents the element Lorem ipsum
template element, but as the child of a free-standing
document fragment which is accessible (in the DOM API) as the value of the
template.content property of the element node. The XDM representation produced
by the Lorem ipsum]]> appears as an ordinary child node of the
template element.The HTML DOM DocumentFragment interface is not supported as an XML node.
There are two places in the HTML DOM where this is used:
The HTML DOM ShadowRoot interface is not present in the main HTML DOM
tree. It is only accessible via JavaScript.
The template element’s content property contains
the child nodes of the template element. The behaviour of this
is described above.
If an implementation allows these nodes to be passed in via an API or similar mechanism,
their behaviour is
The result of the dm:attributes($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Element then the result
is the value of the Element.attributes property mapped to a
sequence as described below;
Otherwise, the result is the empty sequence.
An HTML DOM NamedNodeMap is mapped to a sequence as follows:
NamedNodeMap.length is the length of the sequence, where a length
of 0 results in the empty sequence;
NamedNodeMap.item(n) is the nth element of the sequence.
That sequence is then filtered as follows:
If the Attr.namespaceURI property is
"http://www.w3.org/2000/xmlns/", the attribute is not included in
this sequence;
If the Attr.localName property is "xmlns", the attribute
is not included in this sequence;
If the Attr.localName property starts with "xmlns:",
the attribute is not included in this sequence;
Otherwise, the attribute is included in this sequence using the XDM mapping rules described in this section.
The HTML DOM Element.attributes property includes namespace and non-namespace
attributes in the list when the HTML or XML parser is used. As such, the namespace attributes
have to be filtered from the resulting XDM attribute sequence.
When the resulting document is an HTML DOM HTMLDocument, the
Attr.localName and Attr.name properties of HTML DOM
Attr nodes are both set to the qualified name. This includes
namespace declarations which are filtered out by the logic in this section.
The Attr.localName property will be ASCII lowercase. The
The result of the dm:base-uri($node) for an HTML DOM Node is the value of the
Node.baseURI property mapped as follows:
If the value is null or the zero-length string, then the result is the empty sequence;
Otherwise, the string value is cast to an xs:anyURI.
The result of the dm:children($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Document then the result
is the value of the Node.childNodes property mapped to a sequence;
If the node is an instance of HTML DOM HTMLTemplateElement then the
result is the HTML DOM DocumentFragment’s Node.childNodes
property, mapped to a sequence;
If the node is an instance of HTML DOM Element then the result the
value of the Node.childNodes property mapped to a sequence;
Otherwise, the result is the empty sequence.
An HTML DOM NodeList is mapped to a sequence as follows:
NodeList.length is the length of the sequence, where a length
of 0 results in the empty sequence;
NodeList.item(n) is the nth element of the sequence.
That sequence is then filtered as follows:
If the child is an instance of HTML DOM DocumentType, that child
is not included in this sequence;
A sequence of consecutive HTML DOM Text nodes is combined into a
single XDM text node;
Otherwise, the HTML DOM Node nodes are mapped to XDM according to
the rules in this section.
The result of the dm:document-uri($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Document then the value
of the Document.documentURI property mapped as follows:
If the value is null or the zero-length string, then the result is the empty sequence;
Otherwise, the string value is cast to an xs:anyURI.
Otherwise, the result is the empty sequence.
The result of the dm:is-id($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Attr then:
If the Attr.name property (its qualified name) is
"id", then:
If the Attr.value is castable to an xs:NCName,
the result is true;
Otherwise, the result is false;
Otherwise, the result is false;
Otherwise, the result is false.
In id attribute is defined as being unique in the element’s tree,
containing at least one character, and not having any ASCII whitespace
characters. This means that an HTML id attribute may not
conform to an xs:NCName.
If an HTML id is not a valid xs:NCName then that
attribute is not an XML ID.
The result of the dm:is-idrefs($node) for an HTML DOM Node is the empty sequence.
The result of the dm:namespace-nodes($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Element then an
Otherwise, the result is the empty sequence.
For the XHTML parsing algorithm, this will be equivalent to constructing the namespace nodes from an XML infoset, PSVI, or similar mapping.
For the HTML parsing algorithm, the
Section 2.1.3 http://www.w3.org/1999/xhtml.
Section 4.8.15 http://www.w3.org/1998/Math/MathML).
The default element namespace for these elements is the MathML namespace.
Section 4.8.16 http://www.w3.org/2000/svg).
The default element namespace for these elements is the SVG namespace.
Section 13.1.2.3
The supported namespace prefixes are:
xlink in the http://www.w3.org/1999/xlink namespace;
xml in the http://www.w3.org/XML/1998/namespace namespace; and
xmlns in the http://www.w3.org/2000/xmlns/ namespace.
No other namespaces are supported by the HTML parser.
Section number references to
The result of the dm:nilled($node) for an HTML DOM Node is false().
The result of the dm:node-kind($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Document then the result is
"document".
If the node is an instance of HTML DOM Element then the result is
"element".
If the node is an instance of HTML DOM Attr then the result is
"attribute".
If the node is an instance of HTML DOM ProcessingInstruction then
the result is "processing-instruction".
If the node is an instance of HTML DOM Comment then the result is
"comment".
If the node is an instance of HTML DOM Text then the result is
"text".
The result of the dm:node-name($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Element then the result is
determined as follows:
The Element.localName property. This is derived as follows:
The local name is initially set to the ASCII lowercase tag name. The
If the local name is an SVG element name, the case-sensitive name is used.
If the local name contains a character that is not a valid XML
NameStartChar or NameChar, then an
NCName.
Unnnnnn escape sequence. That would map :
to U00003A.
This local name escaping applies only to the HTML parsing algorithm.
If the XHTML parsing algorithm is used, the localName and
prefix will be correctly set for QName-based
node names.
The Element.prefix property, or empty if the value is null;
The Element.namespaceURI property, or empty if the value
is null.
If the element is an HTML element, the namespace URI is
"http://www.w3.org/1999/xhtml".
If the element is an SVG element, the namespace URI is
"http://www.w3.org/2000/svg".
If the element is a MathML element, the namespace URI is
"http://www.w3.org/1998/Math/MathML".
If the node is an instance of HTML DOM Attr then the result is
determined as follows:
The
The Attr.localName property. This is derived as follows:
The local name is initially set to the
If the local name is an SVG or MathML attribute name, the case-sensitive name
is used.
If the local name is an allowed xlink, xml, or
xmlns attribute name the local name is the value of the local name
column of the attribute name mapping table in
If the local name contains a character that is not a valid XML
NameStartChar or NameChar, then an
NCName.
Unnnnnn escape sequence. That would map :
to U00003A.
This local name escaping applies only to the HTML parsing algorithm.
If the XHTML parsing algorithm is used, the localName and
prefix will be correctly set for QName-based
node names.
The Attr.prefix property, or empty if the value is null.
If the attribute name is an allowed xlink, xml, or
xmlns attribute name the namespace prefix is the value of the
prefix column of the attribute name mapping table in
The Attr.namespaceURI property, or empty if the value is null;
If the attribute name is an allowed xlink, xml, or
xmlns attribute name the namespace URI is the value of the
namespace column of the attribute name mapping table in
If the node is an instance of HTML DOM ProcessingInstruction then
the result is an xs:QName constructed as follows:
The ProcessingInstruction.target property;
The
The
Otherwise, the result is the empty sequence.
When the resulting document is an HTML DOM HTMLDocument, the
Element.localName and Element.name properties of
HTML DOM Element nodes are both set to the qualified name.
When the resulting document is an HTML DOM HTMLDocument, the
Attr.localName and Attr.name properties of HTML DOM
Attr nodes are both set to the qualified name.
The result of the dm:parent($node) for an HTML DOM Node is as follows:
Let $parent be the Node.parentNode property of the
node;
If $parent is an instance of HTML DOM DocumentFragment,
then for each HTML DOM HTMLTemplateElement $template in
the parsed DOM tree:
Let $content be the value of the
HTMLTemplateElement.content property of $template;
If $content is the same node as $parent, then the
result is $template using the XDM mapping rules described in
this section;
If there are no more $template nodes, then the result is an
empty sequence;
If $parent is null, then the result is the empty sequence;
Otherwise, the result is $parent using the XDM mapping rules
described in this section.
The current node can have a HTML DOM DocumentFragment parent node only
if the include-template-content key of the html-parser-options
is true().
The HTML DOM DocumentFragment’s Node.parentNode property
is null, and a DocumentFragment attached to HTMLTemplateElement.content
property does not have a host property connecting the fragment back to
the template element.
If a future version of DocumentFragment.host
property that references the node’s template element, or the implementation
has access to that internal property, the implementation may choose to use that
instead of traversing the parsed HTML tree.
The result of the dm:string-value($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Document, then use the
algorithm described in
If the node is an instance of HTML DOM Element, then use the
algorithm described in
If the node is an instance of HTML DOM Text, then use the
algorithm described in
Otherwise, the result is the value of the Node.nodeValue property.
The following algorithm is used to construct the concatenated string value of a node in the HTML DOM tree:
Let $text be the string value "";
For each descendant node $node in document order:
If $node is not an instance of HTML DOM
Text, process the next node in document order;
Append the value of the Node.nodeValue property for
$node to $text;
The result is $text.
The following algorithm is used to construct the maximal sequence of adjacent
Let $text be the string value "";
Append the value of the Node.nodeValue property for
$node to $text;
Let $next be the value of Node.nextSibling;
Let $next is null, or not an instance of HTML DOM
Text, the result is $text;
Otherwise, repeat from step 2 using $next as $node.
Adjacent text nodes in the HTML DOM are treated as a single XDM text node by only including the first text node and providing logic to ensure that the text content is merged into a single text block.
The result of the dm:type-name($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Element then the result is
xs:untyped.
If the node is an instance of HTML DOM Attr then the result is
xs:untypedAtomic.
If the node is an instance of HTML DOM Text then the result is
xs:untypedAtomic.
Otherwise, the result is the empty sequence.
The result of the dm:typed-value($node) for an HTML DOM Node is as follows:
Let $string-value be the
If the node is an instance of HTML DOM Document then the result is
$string-value as an xs:untypedAtomic;
If the node is an instance of HTML DOM Element then the result is
$string-value as an xs:untypedAtomic;
If the node is an instance of HTML DOM Attr then the result is
$string-value as an xs:untypedAtomic;
If the node is an instance of HTML DOM Text then the result is
$string-value as an xs:untypedAtomic;
Otherwise, the result is $string-value.
The result of the dm:unparsed-entity-public-id($node) for an HTML DOM Node
is the empty sequence.
The result of the dm:unparsed-entity-system-id($node) for an HTML DOM Node
is the empty sequence.
The functions listed in this section parse or serialize JSON data.
JSON is a popular format for exchange of structured data on the web: it is specified in
This specification describes two ways of representing JSON data losslessly using XDM constructs. The first method uses XDM maps to represent JSON objects, and XDM arrays to represent JSON arrays. The second method represents all JSON constructs using XDM element and attribute nodes.
Note also:
The function
The function
This section defines a mapping from JSON data to XDM maps and arrays. Two functions are available
to support this mapping:
The conversion is lossless if recommended JSON good practice is followed. Information may however be lost if (a) JSON numbers are not exactly representable as double-precision floating point, or (b) duplicate key values appear within a JSON object.
The representation of JSON data produced by the { "Sun": 1, "Mon": 2, "Tue": 3, ... } produces a simple map, so if the result
of parsing is held in $weekdays, the number for a given weekday can be extracted
using an expression such as $weekdays?Tue. Similarly, a simple array such as
[ "Sun", "Mon", "Tue", ... ] produces an array that can be addressed as, for example,
$weekdays(3). A more deeply nested structure can be addressed in a similar way:
for example if the JSON text is an array of person objects, each of which has a property named
phones which is an array of strings containing phone numbers, then the first phone number of
each person in the data can be addressed as $data?phones(1).
This section defines a mapping from JSON data to XML (specifically, to XDM element and attribute nodes). A
function
The XML representation is designed to be capable of representing any valid JSON text including one that uses characters which are not valid in XML. The transformation is normally lossless: that is, distinct JSON texts convert to distinct XML representations. When converting JSON to XML, options are provided to reject unsupported characters, to replace them with a substitute character, or to leave them in backslash-escaped form.
The conversion is lossless if recommended JSON good practice is followed. Information may however be lost if (a) JSON numbers are not exactly representable as double-precision floating point, or (b) duplicate key values appear within a JSON object.
The following example demonstrates the correspondence of a JSON text and the corresponding XML representation.
Consider the following JSON text:
The XML representation of this text is as follows. Whitespace is included in the XML representation for purposes of illustration,
but it will not necessarily be present in the output of the
An XSD 1.0 schema for the XML representation is provided in http://www.w3.org/2005/xpath-functions, then:
Unless the host language specifies otherwise, the processor (if it is schema-aware)
If a schema location is provided, then the schema document at that location
The rules governing the mapping from JSON to XML are as follows. In these rules, the phrase
“an element named N” is to be interpreted as meaning “an element node whose local name is N and whose
namespace URI is http://www.w3.org/2005/xpath-functions”.
The JSON value null is represented by an element named null, with empty content.
The JSON values true and false are represented by an element named boolean,
with content conforming to the type xs:boolean. When the element is created by the
true or false.
The xs:boolean,
for example 1 and 0. Leading and trailing whitespace is accepted.
A JSON number is represented by an element named number,
with content conforming to the type xs:double, with the additional restriction that the value
must not be positive or negative infinity, nor NaN. The
xs:double and then casting the result to xs:string.
Leading and trailing whitespace is accepted.
Since JSON does not impose limits on the range or precision
of numbers, these rules mean that conversion from JSON to XML will always succeed, and will retain full precision
in the lexical representation unless the data model implementation is one that reconstructs the string value from
the typed value. In the reverse direction, conversion from XML to JSON may fail if the value is infinity or NaN,
or if the string value is such that casting to xs:double produces positive or negative infinity.
A JSON string is represented by an element named string, with
content conforming to the type xs:string. The string element has two
alternative representations: escaped form, and unescaped form.
A JSON array is represented by an element named array. The content is a sequence of
child elements representing the members of the array in order, each such element being the representation
of the array member obtained by applying these rules recursively.
A JSON object is represented by an element named map. The content is a sequence
of child elements each of which represents one of the name/value pairs in the object. The representation of the
name/value pair N:V is obtained by taking the element that represents the value V (by applying these
rules recursively) and adding an attribute with name key (in no namespace), whose
value is N as an instance of xs:string. The functions
The attribute escaped="true" may be specified on a string element to indicate
that the string value contains backslash-escaped characters that are to be interpreted according to the JSON
rules. The attribute escaped-key="true" may be specified on any element with a key attribute to indicate
that the key contains backslash-escaped characters that are to be interpreted according to the JSON
rules. Both attributes have the default value false, signifying that the relevant value is in unescaped form.
In unescaped form, the backslash character has no special significance (it represents itself).
The JSON grammar for number is a subset of the lexical space of
the XSD type xs:double. The mapping from JSON number values to xs:double
values is defined by the XPath rules for casting from xs:string to xs:double. Note that
these rules will never generate an error for out-of-range values; instead very large or very small values will be
converted to +INF or -INF. Since JSON does not impose limits on the range or precision
of numbers, the conversion is not guaranteed to retain full precision.
Although the order of entries in a JSON object is generally considered to have no significance, the functions
json-to-xml and xml-to-json both retain order.
The XDM representation of a JSON value may either be untyped (all elements annotated as xs:untyped, attributes
as xs:untypedAtomic), or it may be typed. If it is typed, then it http://www.w3.org/2005/xpath-functions
are ignored, including attributes such as xsi:type and xsi:nil that would normally influence the process
of schema validation.
The namespace prefix associated with the namespace http://www.w3.org/2005/xpath-functions (if any) is immaterial.
The effect of the
The set of characters that may appear in JSON texts is not the same as the set of characters allowed in XML. Specifically:
As plain unescaped characters, JSON allows any codepoint in the
numeric range 0x20 to 0x10FFFF, with the exception of
As a backslash-escaped character, JSON allows any codepoint in the numeric range 0x00 to 0xFFFF.
Whether escaped or not, the JSON grammar allows codepoints in the surrogate range to appear, and does not explicitly require that they be properly paired. However, the JSON specifications recognize that unpaired surrogates are likely to lead to interoperability problems.
Ignoring unpaired surrogates, this means that JSON allows codepoints that are not allowed by XML:
Not allowed by XML 1.0: 0x00 to 0x1F (other than 0x09, 0x0A, and 0x0D); 0xFFFE; 0xFFFF.
Not allowed by XML 1.1: 0x00; 0xFFFE; 0xFFFF.
The XDM data model (see xs:string data type in such a way that any
Unicode codepoint assigned to a character (which excludes surrogates)
is allowed. However, this
is not required: a conformant implementation
In consequence, parsing of conformant JSON texts may fail if they contain codepoints that the implementation does not support. However, if such codepoints are represented in the input using JSON escape sequences, these specifications define mechanisms for dealing with them, for example by substituting a replacement character.
This section describes functions that parse CSV data.
A CSV is a 2-dimensional tabular data structure consisting of multiple
CSV has developed informally for decades, and many variations are found.
This specification refers to
This specification uses the term
Line endings are normalized: specifically, the character sequences
Row delimiters other than newline are recognized.
Field delimiters other than
Quote characters other than
Non-ASCII characters are recognized.
This specification defines a mapping from this extended grammar to constructs in the XDM model, and provides illustrative examples of how these constructs can be combined with other language features to process CSV data.
The most basic function for parsing CSV is xs:string.
The other two functions recognize column names, and make it easier to address
individual fields using these names. The parse-csv function
delivers this capability using XDM maps and functions, while csv-to-xml
function represents the information using XDM element nodes.
The delimiters used for rows, columns, and quoting are configurable. An error
is raised if the same delimiter string is used in multiple roles
Rows in CSV files are typically delimited with CRLF (
The last row in the file may or may not be followed by a row delimiter. the empty file is treated as containing zero rows, while a file consisting solely of a row delimiter is treated as containing one empty row. In all other cases, a file that does not end with a row delimiter is treated as if a row delimiter were added at the end.
Fields in CSV are frequently delimited with a comma. Other field
delimiters are useful, for
example when numeric data uses comma as a decimal separator. The
chosen field delimiter is then often
The column delimiter thus defaults to column-delimiter option is set to a multi-character string.
CSVs, as specified in
If a field is to contain the quote character, the character must be escaped by doubling it,
as with escaping of quotes in XPath string literals (see
The quotes surrounding quoted fields are not included in the result. The following input string, when parsed, produces a sequence of strings, as shown below:
The quote character defaults to
No space is allowed between the column delimiter and a quote. An error is raised
The following example is therefore invalid and parsing it will raise an error.
The result of xs:string values.
The first row of the CSV is returned in the same way as all the other rows.
For example, given the input:
the
It is common practice for all rows in a CSV to have the same number of columns, but this is not required.
produces
While options parameter.
The
If no non-empty column names are available, then the columns
element and all column attributes are absent.
If non-empty column names are available for some columns but not for others,
then (a) the empty column element is included
within the columns element if and only if there is a subsequent
column with a non-empty name, and (b) the column attribute
for the corresponding field elements is absent.
For example (when no column names are available):
An XSD 1.0 schema for the XML representation is provided in
The following examples illustrate more complex applications making use of CSV parsing functions.
A variable $crlf is assumed to be in scope representing the CRLF string:
Direct conversion is a matter of iterating across the records and fields to
generate <tr> and <td> elements.
Using XQuery:
Using XSLT:
The
And in XSLT:
This section describes functions that support
Invisible XML defines a BNF-like language for specifying grammars, together with
a mapping from sentences in that grammar to an XML representation. By defining an
Invisible XML grammar, a great variety of non-XML data formats can be manipulated
as if they were XML. The function
The following functions allow dynamic loading and evaluation of XQuery queries, XSLT stylesheets, and XPath binary operators.
The functions in this section deliver information about schema types (including simple types and complex
types). These may represent built-in types (such as xs:dateTime),
user-defined types found in the static context (typically because they appear in an imported schema),
or types used as type annotations on schema-validated nodes.
For more information on schema types, see
The structured representation of a schema type is described in
Simple properties of a schema type that can be expressed as strings or booleans are
represented in this record structure directly as atomic field values, while complex properties
whose values are themselves types (for example, base-type and primitive-type)
are represented as functions. This is done partly to make it easier for implementations to compute
complex properties on demand rather than in advance, and partly to ensure that the overall
structure is always acyclic. For example, the primitive type of xs:decimal is itself
xs:decimal, and if this were represented as a field value without a guarding function,
serialization of the map using the JSON output method would not terminate.
The following functions are defined to obtain information from the static or dynamic context.
In this document, as well as in an error is raised
is used. Raising an error is equivalent to calling the fn:string(fn:abs#1). Host languages may allow type errors
to be reported statically if they are discovered during static analysis.
When function specifications indicate that an error is to be raised, the notation
[
is used to specify an error code. Each error defined
in this document is identified by an xs:QName that is in the
http://www.w3.org/2005/xqt-errors namespace, represented in this document by the err prefix. It is this
xs:QName that is actually passed as an argument to the
The xs:QName argument.
Constructor functions are used to convert a supplied value to a given type, and the name of the function is the same as the name of the target type. This section describes constructor functions corresponding to the following types:
Simple types (atomic types, union types, and list types as
defined in
These constructor functions always take a single argument.
Record types defined as
These take one argument for each named field of the record type.
Constructor functions for record types are defined in
Constructor functions are defined for all user-defined named simple types, and for most built-in atomic, list,
and union types. The only named simple types that have no constructor function are those that have no instances
other than instances of their derived types: specifically, xs:anySimpleType, xs:anyAtomicType,
and xs:NOTATION.
Every built-in atomic
type that is defined in xs:anyAtomicType and xs:NOTATION, has an
associated constructor function. The type xs:untypedAtomic, defined
in xs:yearMonthDuration and xs:dayTimeDuration defined
in xs:dateTimeStamp introduced in
A constructor function is not defined for xs:anyAtomicType as there are no atomic items with type annotation xs:anyAtomicType at runtime, although this can be a statically inferred type.
A constructor function is not defined for xs:NOTATION since it is defined as an abstract type in xs:NOTATION then a constructor function is defined for it.
See
The form of the constructor function for an atomic type
If $arg is the empty sequence, the empty sequence is returned. For
example, the signature of the constructor function corresponding to the
xs:unsignedInt type defined in
Calling the constructor function xs:unsignedInt(12) returns
the xs:unsignedInt value 12. Another call of that constructor
function that returns the same xs:unsignedInt value is
xs:unsignedInt("12").
The same result would also be returned if the
constructor function were to be called with a node that had a typed value equal
to the xs:unsignedInt 12.
Because the declared parameter type for the argument is xs:anyAtomicType?,
the coercion rules will atomize the supplied argument
(see
If the value passed to a constructor function, after atomization, is not in the lexical space
of the datatype to be constructed,
and cannot be converted to a value in the value space of the datatype under the rules in
The semantics of the constructor function
xs:TYPE(arg)
are identical to the semantics of
arg cast as xs:TYPE?
. See
If the argument to a constructor function is a literal, the result of the
function
Special rules apply to constructor functions for xs:QName and types derived from xs:QName and xs:NOTATION. See
The argument is optional, and defaults to the context value (which will be atomized if necessary).
The following constructor functions for the built-in atomic types are supported:
Implementations xs:float("-0.0E0").
But because
Implementations xs:double("-0.0E0").
But because
See
See xs:ENTITY and types derived from it.
Special rules apply to constructor functions for the types xs:QName and xs:NOTATION, for two reasons:
Values cannot belong directly to the type xs:NOTATION, only to its subtypes.
The lexical representation of these types uses namespace prefixes, whose meaning is context-dependent.
These constraints result in the following rules:
There is no constructor function for xs:NOTATION. Constructors are defined, however, for xs:QName,
for types derived or constructed from xs:QName, and for types
derived or constructed from xs:NOTATION.
When converting from an xs:string, the prefix within the lexical
xs:QName supplied
as the argument is resolved to a namespace URI using the statically known
namespaces from the static context. If the lexical xs:QName
has no prefix, the
namespace URI of the resulting expanded-QName is the default namespace for elements and types,
taken from the static context. Components of the static context are
defined in
When a constructor function for a namespace-sensitive type is used as a literal function item
or in a partial function application (for example, xs:QName#1 or xs:QName(?)) the namespace
bindings that are relevant are those from the static context of the literal function item or partial function application.
When a constructor function for a namespace-sensitive type is obtained by means of the
When the supplied argument to the xs:QName constructor
function is a node, the node is atomized in the usual way, and if the result is xs:untypedAtomic it is then
converted as if a string had been supplied. The effect might not be what is desired.
For example, given the attribute xsi:type="my:type", the expression
xs:QName(@xsi:type) might fail on the grounds that the prefix my
is undeclared. This is because the namespace bindings are taken from the static context
(that is, from the query or stylesheet), and not from the source document containing the
@xsi:type attribute. The solution to this problem is to use the function call
resolve-QName(@xsi:type, .) instead.
Each of the three built-in list
types defined in xs:NMTOKENS, xs:ENTITIES, and xs:IDREFS, has an
associated constructor function.
The function signatures are as follows:
The semantics are equivalent to casting to the corresponding types from xs:string.
All three of these types have the facet minLength = 1 meaning that there must
always be at least one item in the list. The return type, however, allows for the fact that when the argument to
the function is the empty sequence, the result is the empty sequence.
In the case of atomic types, it is possible to use an expression such as
xs:date(@date-of-birth) to convert an attribute value to an instance of xs:date,
knowing that this will work both in the case where the attribute is already annotated as xs:date,
and also in the case where it is xs:untypedAtomic. This approach does not work with list types,
because it is not permitted to use a value of type xs:NMTOKEN* as input to the constructor
function xs:NMTOKENS. Instead, it is necessary to use conditional logic that performs the conversion
only in the case where the input is untyped:
if (@x instance of attribute(*, xs:untypedAtomic)) then xs:NMTOKENS(@x) else data(@x)
There is a constructor function for the union type xs:numeric
defined in
The semantics are determined by the rules in
If the argument is an instance of xs:double, xs:float, or xs:decimal,
then the result is an instance of the same primitive type, with the same value;
If the argument is an instance of xs:boolean, the result is the xs:double value
0.0e0 or 1.0e0;
If the argument is an instance of xs:string or xs:untypedAtomic, then:
If the value is in the lexical space of xs:double, the result will be the
corresponding xs:double value;
Otherwise, a dynamic error
The result will never be an instance of xs:float, xs:decimal,
or xs:integer. This is because xs:double appears first in the list of member
types of xs:numeric, and its lexical space subsumes the lexical space of the other numeric
types. Thus, unlike XPath numeric literals, the result does not depend on the lexical form of the supplied
value. The reason for this design choice is to retain compatibility with the function conversion rules:
functions such as xs:numeric as their first or only argument, and compatibility with the function conversion
rules defined in earlier versions of these specifications demands that when an untyped atomic item
(or untyped node) is supplied as the argument, it is converted to an xs:double value
even if its lexical form is that (say) of an integer.
In all other cases, a dynamic error
In the case of an implementation that supports XSD 1.1, there is a constructor function
associated with the built-in union type xs:error.
The function signature is as follows:
The semantics are equivalent to casting to the corresponding union type (see
Because xs:error has no member types, and therefore has an empty value space, casting
will always fail with a dynamic error except in the case where the supplied argument is the empty
sequence, in which case the result is also the empty sequence.
For every
For named atomic types, the rules
are the same as the rules for constructing built-in derived atomic types defined in T,
the signature of the function takes the form T($value as xs:anyAtomicType? := .) as T?,
and the semantics are the same as casting to derived types: see
For named union types, the rules
follow the same principles as the rules for constructing built-in union types defined in U,
the signature of the function takes the form U($value as xs:anyAtomicType? := .) as U?,
and the semantics are the same as casting to union types: see
For named list types, the rules
follow the same principles as the rules for constructing built-in list types defined in L,
where the item type of L is I,
the signature of the function takes the form L($value as xs:string? := .) as I*,
and the semantics are the same as casting to list types: see
Constructor functions are available both for named types defined in an imported schema (that is,
named simple types in the xs:string, and named local union types follow the same rules as
union types defined in a schema.
Special rules apply to constructor functions for namespace-sensitive types, that is,
atomic types derived from xs:QName and xs:NOTATION, list types that have
a namespace-sensitive item type, and union types that have a namespace-sensitive member type. See
Consider a situation where the static context contains an atomic type
called hatSize defined in a schema whose target namespace is bound
to the prefix eg. In such a case the following constructor function is available to users:
The resulting function may be used in an expression such as eg:hatSize("10½").
In the case of an atomic type A, the return type of the function is A?, reflecting
the fact that the result will be the empty sequence if the input is the empty sequence. For a union or list type,
the return type of the function is specified only as xs:anyAtomicType*. Implementations performing
static type checking will often be able to compute a more specific result type. For example, if the target type
is a list type whose item type is the atomic type A, the result will always be an instance of A*;
if the target type is a pure union type U then the result will always be an instance of U?.
In general, however, applications needing interoperable behavior on implementations that do strict static type
checking will need to use a treat as expression to assert the specific type of the result.
To construct an instance of a user-defined type
that is not in a namespace, it is possible to use an
EQName (for example Q{}hatsize(17)). Alternatives are
to use a cast expression (17 cast as hatsize) or (if the host language allows it)
to undeclare the default function namespace.
Both XQuery 4.0 and XSLT 4.0 provide syntax to declare named record types;
such a declaration implicitly adds a constructor function for values of that
type to the (See
For example, if there is a named item type with the XQuery definition:
then there will be a function definition equivalent to:
Equivalently using XSLT syntax, if there is a named item type with the XSLT definition:
then there will be a function definition equivalent to:
The rules defining the relationship of the function definition to the
record type are given for XQuery 4.0 in
Constructor functions and cast expressions accept an expression and return a value of a given type. They both convert a source value SV, of a source type, ST to a target value TV, of the given target type TT.
Constructor functions and cast expressions have identical semantics
but different syntax. The name of the
constructor function is the same as the name of the built-in xs:date("2003-01-01")
means exactly the same as
"2003-01-01" cast as xs:date?.
The cast expression takes a type name to indicate the target type of the conversion.
See
Where the argument to a cast is a literal, the result of the function
The general rules for casting from primitive types to primitive types are defined in
xs:string (and xs:untypedAtomic)
follow in
Casting is not supported to or from xs:anySimpleType.
Casting to xs:anySimpleType is not permitted and raises a static error:
Similarly, casting is not supported to or from xs:anyAtomicType and will raise
a static error: xs:anyAtomicType, although this can be a
statically inferred type.
xs:untypedAtomic. The three types xs:integer,
xs:dayTimeDuration, and xs:yearMonthDuration, which have custom
casting rules but are not strictly-speaking primitive, are now handled in other subsections.
This section defines casting between xs:untypedAtomic.
The type conversions
that are supported between primitive atomic types are indicated in the table below;
casts between other (non-primitive) types are defined in terms of these primitives.
Where the target type TT is a primitive type, the result TV will always
be an instance of TT. The result xs:NCName SV
to xs:string
In this table, there is a
row for each
Y indicates that a conversion from values of the type to which
the row applies to the type to which the column applies is supported;
N indicates that there are no supported conversions from values
of the type to which the row applies to the type to which the column applies;
M indicates that a conversion from values of the type to
which the row applies to the type to which the column applies may succeed for
some values in the value space and fail for others.
There is no row or column for xs:untypedAtomic because the casting rules are exactly the same
as for xs:string. When casting from xs:string or xs:untypedAtomic
the semantics in
xs:NOTATION as an abstract type.
Thus, casting to xs:NOTATION from any other type including xs:NOTATION
is not permitted and raises a static error xs:NOTATION to another subtype of
xs:NOTATION is permitted.
Casting is not supported to or from xs:anySimpleType. Thus, there is no row
or column for this type in the table below. For any node that has not been validated or
has been validated as xs:anySimpleType, the typed value of the node is an
atomic item of type xs:untypedAtomic. There are no atomic items with the
type annotation xs:anySimpleType at runtime.
Casting to
xs:anySimpleType is not permitted and raises a static error:
Similarly, casting is not supported to or from xs:anyAtomicType and will raise
a static error: xs:anyAtomicType at runtime, although this can be a
statically inferred type.
If casting is attempted from an
In the following table, the columns and rows are identified by short codes that identify simple types as follows:
In the following table, the notation S\T
indicates that the source
(S
) of the conversion is indicated in the column below the
notation and that the target (T
) is indicated in the row to the
right of the notation.
| S\T | str | flt | dbl | dec | dur | dT | tim | dat | gYM | gYr | gMD | gDay | gMon | bool | b64 | hxB | aURI | QN | NOT |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| str | Y | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M |
| flt | Y | Y | Y | M | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N |
| dbl | Y | Y | Y | M | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N |
| dec | Y | Y | Y | Y | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N |
| dur | Y | N | N | N | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N |
| dT | Y | N | N | N | N | Y | Y | Y | Y | Y | Y | Y | Y | N | N | N | N | N | N |
| tim | Y | N | N | N | N | N | Y | N | N | N | N | N | N | N | N | N | N | N | N |
| dat | Y | N | N | N | N | Y | N | Y | Y | Y | Y | Y | Y | N | N | N | N | N | N |
| gYM | Y | N | N | N | N | N | N | N | Y | N | N | N | N | N | N | N | N | N | N |
| gYr | Y | N | N | N | N | N | N | N | N | Y | N | N | N | N | N | N | N | N | N |
| gMD | Y | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N | N | N | N |
| gDay | Y | N | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N | N | N |
| gMon | Y | N | N | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N | N |
| bool | Y | Y | Y | Y | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N |
| b64 | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | Y | Y | N | N | N |
| hxB | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | Y | Y | N | N | N |
| aURI | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Y | N | N |
| QN | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Y | M |
| NOT | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Y | M |
xs:untypedAtomic
Any atomic item SV can be cast to xs:untypedAtomic.
The effect is the same as casting to xs:string (see xs:untypedAtomic value comprising the same sequence of
characters.
xs:string
Any atomic item SV can be cast to xs:string.
The resulting xs:string value TV depends on
the source type ST as follows.
If SV is an instance of xs:string,
TV is an instance of xs:string comprising
the same sequence of characters as SV.
The implementation is free to return SV unchanged, including its original type annotation.
If SV is an instance of xs:anyURI, the result TV is an instance
of xs:string comprising the same sequence of characters
as SV, but with a type annotation of xs:anyURI.
No escaping of special characters takes place.
If SV is an instance of xs:QName or xs:NOTATION:
if the qualified name has a prefix, then TV is the concatenation of the prefix of SV, a single colon (:), and the local name of SV.
otherwise TV is the local name of SV.
If SV is an instance of xs:numeric,
the rules in
If SV is an instance of xs:dateTime, xs:date
or xs:time, the rules in
If ST is xs:duration, or any subtype thereof including
xs:yearMonthDuration and xs:dayTimeDuration, then the rules
in
In all other cases, TV is the
To cast as xs:untypedAtomic the value is cast as
xs:string, as described above, and the type annotation changed
to xs:untypedAtomic.
xs:string
The following rules apply when the source type ST is xs:decimal,
xs:double, or xs:float, or any subtype of these including
xs:integer.
If SV is an instance of xs:decimal,
then the canonical representation of
SV is returned, as defined in
Unlike previous versions of this specification, no special
rule is given for the case where SV is an instance of
xs:integer. This is because the general rule for
xs:decimal gives the same result. The result
in this case will be a sequence of decimal digits in the range
42, -1, 0, or 1000000000.
An xs:decimal that is equal to an integer is converted
to a string as if it were first cast to an xs:integer.
Specifically, there will be no decimal point and no fractional part.
If the value is not equal to an integer, then there will be a decimal
point and a fractional part, which will be a sequence of decimal digits
with no trailing zeroes. For example: 42.3, -1.5,
or 0.00001.
If SV is an instance of xs:float or
xs:double, then:
TV will be an xs:string in the lexical space
of xs:double or xs:float that when
converted to an xs:double or xs:float under
the rules of NaN
if SV is NaN.
In addition, TV must satisfy the constraints in the
following sub-bullets.
If SV has an absolute value that is
greater than or equal to 0.000001 (one millionth)
and less than 1000000 (one million), then the value
is converted to an xs:decimal and the
resulting xs:decimal is converted to an
xs:string according to the rules above, as though using an
implementation of xs:decimal that imposes no limits on the
totalDigits or
fractionDigits facets.
If SV has the value positive or negative zero, TV
is "0" or "-0" respectively.
If SV is positive or negative infinity,
TV is the string "INF" or "-INF" respectively.
In other cases, the result consists of a mantissa, which has the lexical form
of an xs:decimal, followed by the letter "E", followed by an exponent which has
the lexical form of an xs:integer. Leading zeroes and "+" signs are prohibited
in the exponent. For the mantissa, there must be a decimal point, and there must
be exactly one digit before the decimal point, which must be non-zero. The "+"
sign is prohibited. There must be at least one digit after the decimal point.
Apart from this mandatory digit, trailing zero digits are prohibited.
The above rules allow more than one representation of the same value.
For example, the xs:float value whose exact decimal representation is 1.26743223E15
might be represented by any of the strings "1.26743223E15",
"1.26743222E15" or "1.26743224E15" (inter alia).
It is implementation-dependent which of these representations is chosen.
The string representations of numeric values are backwards compatible
with XPath 1.0 except for the special values positive and negative
infinity, negative zero and values outside the range 1.0e-6 to 1.0e+6.
xs:string
If SV is an instance of xs:dateTime,
xs:date, xs:time, xs:gYear,
xs:gYearMonth, xs:gMonth, xs:gMonthDay, or xs:gDay,
then TV is the
canonical representation of SV as defined in
The result TV includes the original timezone if a timezone is present.
All these data types contain different combinations of the components year, month, day, hour, minute, second, and timezone; all the components relevant to the data type (with the exception of the timezone) are output, and the results are concatenated together with suitable punctuation. Specifically:
The year component is
represented as a xs:string of four digits, or more if needed. A leading minus
sign is present for BCE years.
The month, day, hour and minute
components are represented as two digits (with a leading zero if needed).
For example, February is represented as 02.
The hours component will never be "24": midnight
is always represented as "00:00:00".
The second component is output using as a two-digit integer
if it is a whole number (for example, 30, 05, or 00),
or if it is fractional, as two digits followed by a decimal point followed by as many digits as
are necessary, with no trailing zeroes (for example 30.5 or 00.001).
The timezone component, if present, is
cast to xs:string by applying the function eg:convertTZtoString
given in Z, +01:00,
-05:00, or +05:30.
.
xs:duration values to xs:string
If SV is an instance of xs:duration (including its subtypes
xs:yearMonthDuration and xs:dayTimeDuration), then TV is the
canonical representation of SV as defined in
The rules have the effect of normalizing the value so that the number of months is always
less than 12, the number of hours less than 24, and the number of minutes and seconds less
than 60. Zero-valued components are omitted. Fractional seconds follow the same rules
as xs:decimal. For example, the duration P15MT30H
is represented as P1Y3M1DT6H. A zero-length duration is output as PT0S.
At the time of writing, the published XSD 1.1 recommendation contains
cut-and-paste errors in the definition
of the dayTimeDuration canonical mapping. The binding of variable s
should be to dt's ·seconds· (not ·months·) component, and the return
expression given as sgn & 'P' & ·duYearMonthCanonicalFragmentMap·(|s|)
should read sgn & 'P' & ·duDayTimeCanonicalFragmentMap·(|s|)
In reading these XSD formulations, be aware that a & b represents
string concatenation, while |s| computes the absolute value of a number.
This section defines the rules for casting to the primitive numeric types xs:float,
xs:double, and xs:decimal. Rules for casting to the derived type
xs:integer are given in
When a value of any simple type is cast as xs:float, the xs:float
TV is derived from the ST and the
SV as follows:
If ST is xs:float, then TV
is SV and the conversion is complete.
If ST is xs:double, then
TV is obtained as follows:
if SV is the xs:double value
INF, -INF, NaN,
positive zero, or negative zero, then TV is
the xs:float value INF,
-INF, NaN, positive zero, or
negative zero respectively.
otherwise, SV can be expressed in the form
m × 2^e where the mantissa
m and exponent e are signed
xs:integers whose value range is defined in
if m (the mantissa of
SV) is outside the permitted range
for the mantissa of an xs:float
value (-2^24-1 to +2^24-1), then it
is divided by 2^N where
N is the lowest positive
xs:integer that brings the result
of the division within the permitted range, and
the exponent e is increased by
N. This is integer division (in
effect, the binary value of the mantissa is
truncated on the right). Let M be
the mantissa and E the exponent
after this adjustment.
if E exceeds 104 (the
maximum exponent value in the value space of
xs:float) then TV is
the xs:float value INF
or -INF depending on the sign of M.
if E is less than -149
(the minimum exponent value in the value space
of xs:float) then TV is
the xs:float value positive or
negative zero depending on the sign of M
otherwise, TV is the
xs:float value M × 2^E.
If ST is xs:decimal, or
xs:integer, then TV is xs:float(
SV
cast as xs:string) and the conversion is complete.
If ST is xs:boolean, SV is
converted to 1.0E0 if SV is
true and to 0.0E0 if SV
is false and the conversion is complete.
If ST is xs:untypedAtomic
or xs:string, see
XSD 1.1 adds the value +INF to the lexical space,
as an alternative to INF. XSD 1.1 also adds negative zero
to the value space.
Implementations xs:float("-0.0E0").
But because
When a value of any simple type is cast as xs:double, the
xs:double value TV is derived from the
ST and the SV as follows:
If ST is xs:double, then
TV is SV and the conversion is complete.
If ST is xs:float or a type derived
from xs:float, then TV is obtained as follows:
if SV is the xs:float value
INF, -INF, NaN,
positive zero, or negative zero, then TV is
the xs:double value INF,
-INF, NaN, positive zero, or
negative zero respectively.
otherwise, SV can be expressed in the form
m × 2^e where the
mantissa m and exponent e are
signed xs:integer values whose value range
is defined in xs:double value
m × 2^e.
If ST is xs:decimal or
xs:integer, then TV is xs:double(
SV
cast as xs:string) and the conversion is complete.
If ST is xs:boolean, SV is
converted to 1.0E0 if SV is
true and to 0.0E0 if SV
is false and the conversion is complete.
If ST is xs:untypedAtomic
or xs:string, see
XSD 1.1 adds the value +INF to the lexical space,
as an alternative to INF. XSD 1.1 also adds negative zero
to the value space.
Implementations xs:double("-0.0E0").
But because
This section defines the rules for casting to the primitive type xs:decimal.
The rules are also invoked implicitly as part of the process of converting to types
derived from xs:decimal. There are special rules, however, if the
target type TT is xs:integer, or a type derived from
xs:integer: those rules are given in
When the target type TT is xs:decimal, the
resulting xs:decimal value TV is derived from
ST and SV as follows:
If ST is xs:decimal or a subtype thereof
(including xs:integer), then
the result TV has the same xs:decimal or any
subtype of xs:decimal for which this is a valid instance, including
the original type ST.
If ST is xs:float or
xs:double, then TV is the
xs:decimal value, within the set of
xs:decimal values that the implementation is
capable of representing, that is numerically closest to
SV. If two values are equally close, then the one
that is closest to zero is chosen. If SV is too
large to be accommodated as an xs:decimal, (see
xs:float or xs:double values
NaN, INF, or -INF, a dynamic
error is raised
If ST is xs:boolean, the result TV is
1.0 if SV is
1 or true and to 0.0 if
SV is 0 or false.
The type annotation of the result may be any subtype of xs:decimal
whose value space includes the integer values 0 and 1.
If ST is xs:untypedAtomic
or xs:string, see
This section defines the rules for casting to the primitive duration type xs:duration.
Rules for casting to the derived types
xs:yearMonthDuration and xs:dayTimeDuration
are given in
If the source value SV is an instance of xs:duration
(including instances of subtypes such as xs:yearMonthDuration
and xs:dayTimeDuration, then the datum of the result
TV is the same as the datum of SV, and the
type annotation is xs:duration or any subtype thereof that
includes this datum in its value space (in particular, it
If ST is xs:untypedAtomic
or xs:string, see
In several situations, casting to date and time types requires the extraction
of a component from SV or from the result of
xs:string. These conversions must follow certain rules. For
example, converting an xs:integer year value requires
converting to an xs:string with four or more characters, preceded
by a minus sign if the value is negative.
This document defines four functions to perform these conversions. These functions are for illustrative purposes only and make no recommendations as to style or efficiency. References to these functions from the following text are not normative.
The arguments to these functions come from functions defined in this document. Thus, the functions below assume that they are correct and do no range checking on them.
Conversion from
When a value of any primitive type is cast as
xs:dateTime, the xs:dateTime value
TV is derived from ST and SV
as follows:
If ST is xs:dateTime, then
TV is SV.
If ST is xs:date, then let
eg:convertYearToString( year-from-date(
SV
)), let eg:convertTo2CharString( month-from-date(
SV
)), let eg:convertTo2CharString( day-from-date(
SV
)) and let eg:convertTZtoString( timezone-from-date(
SV
)); TV is xs:dateTime( concat(
, '-',
, '-',
, 'T00:00:00 ', ) ).
If ST is xs:untypedAtomic or
xs:string, see
When a value of any primitive type is cast as xs:time,
the xs:time value TV is derived from
ST and SV as follows:
If ST is xs:time, then
TV is SV.
If ST is xs:dateTime, then
TV is xs:time( concat(
eg:convertTo2CharString( hours-from-dateTime(
SV
)), ':', eg:convertTo2CharString( minutes-from-dateTime(
SV
)), ':', eg:convertSecondsToString( seconds-from-dateTime(
SV
)), eg:convertTZtoString( timezone-from-dateTime(
SV
)) )).
If ST is xs:untypedAtomic
or xs:string, see
When a value of any primitive type is cast as xs:date,
the xs:date value TV is derived from
ST and SV as follows:
If ST is xs:date, then
TV is SV.
If ST is xs:dateTime, then let
eg:convertYearToString( year-from-dateTime(
SV
)), let eg:convertTo2CharString( month-from-dateTime(
SV
)), let eg:convertTo2CharString( day-from-dateTime(
SV
)) and let eg:convertTZtoString(timezone-from-dateTime(
SV
)); TV is xs:date( concat(
, '-',
, '-',
) ).
If ST is xs:untypedAtomic
or xs:string, see
When a value of any primitive type is cast as
xs:gYearMonth, the xs:gYearMonth value
TV is derived from ST and SV
as follows:
If ST is xs:gYearMonth, then
TV is SV.
If ST is xs:dateTime, then let
eg:convertYearToString( year-from-dateTime(
SV
)), let eg:convertTo2CharString( month-from-dateTime(
SV
)) and let eg:convertTZtoString( timezone-from-dateTime(
SV
)); TV is xs:gYearMonth( concat(
, '-',
) ).
If ST is xs:date, then let
eg:convertYearToString( year-from-date(
SV
)), let eg:convertTo2CharString( month-from-date(
SV
)) and let eg:convertTZtoString( timezone-from-date(
SV
)); TV is xs:gYearMonth( concat(
, '-',
) ).
If ST is xs:untypedAtomic
or xs:string, see
When a value of any primitive type is cast as xs:gYear,
the xs:gYear value TV is derived from
ST and SV as follows:
If ST is xs:gYear, then
TV is SV.
If ST is xs:dateTime, let
eg:convertYearToString( year-from-dateTime(
SV
)) and let eg:convertTZtoString( timezone-from-dateTime(
SV
)); TV is xs:gYear(concat(
)).
If ST is xs:date, let
eg:convertYearToString( year-from-date(
SV
)); and let eg:convertTZtoString( timezone-from-date(
SV
)); TV is xs:gYear(concat(
)).
If ST is xs:untypedAtomic
or xs:string, see
When a value of any primitive type is cast as
xs:gMonthDay, the xs:gMonthDay value
TV is derived from ST and SV
as follows:
If ST is xs:gMonthDay, then
TV is SV.
If ST is xs:dateTime, then let
eg:convertTo2CharString( month-from-dateTime(
SV
)), let eg:convertTo2CharString( day-from-dateTime(
SV
)) and let eg:convertTZtoString( timezone-from-dateTime(
SV
)); TV is xs:gYearMonth( concat(
'--',
'-',
) ).
If ST is xs:date, then let
eg:convertTo2CharString( month-from-date(
SV
)), let eg:convertTo2CharString( day-from-date(
SV
)) and let eg:convertTZtoString( timezone-from-date(
SV
)); TV is xs:gYearMonth( concat(
'--',
, '-',
) ).
If ST is xs:untypedAtomic
or xs:string, see
When a value of any primitive type is cast as xs:gDay,
the xs:gDay value TV is derived from
ST and SV as follows:
If ST is xs:gDay, then
TV is SV.
If ST is xs:dateTime, then let
eg:convertTo2CharString( day-from-dateTime(
SV
)) and let eg:convertTZtoString( timezone-from-dateTime(
SV
)); TV is xs:gDay(
concat( '---', )).
If ST is xs:date, then let
eg:convertTo2CharString( day-from-date(
SV
)) and let eg:convertTZtoString( timezone-from-date(
SV
)); TV is xs:gDay(
concat( '---', )).
If ST is xs:untypedAtomic
or xs:string, see
When a value of any primitive type is cast as xs:gMonth,
the xs:gMonth value TV is derived from
ST and SV as follows:
If ST is xs:gMonth, then
TV is SV.
If ST is xs:dateTime, then let
eg:convertTo2CharString( month-from-dateTime(
SV
)) and let eg:convertTZtoString( timezone-from-dateTime(
SV
)); TV is xs:gMonth(
concat( '--' , )).
If ST is xs:date, then let
eg:convertTo2CharString( month-from-date(
SV
)) and let eg:convertTZtoString( timezone-from-date(
SV
)); TV is xs:gMonth(
concat( '--', )).
If ST is xs:untypedAtomic
or xs:string, see
xs:boolean
When the target type TT is xs:boolean, the
resulting xs:boolean value TV is derived from
the source value SV as follows:
If SV is an instance of xs:boolean, then TV
is SV.
If SV is an instance of xs:numeric and
SV is 0, +0, -0,
0.0, 0.0E0 or NaN, then
TV is false.
If ST is is an instance of xs:numeric and
SV is not one of the above values, then TV
is true.
If ST is xs:untypedAtomic
or xs:string, see
xs:base64Binary and xs:hexBinary
Values of type xs:base64Binary can be cast as
xs:hexBinary and vice versa, since the two types have the same
value space. Casting to xs:base64Binary and
xs:hexBinary is also supported from the same type and from
xs:untypedAtomic, xs:string and subtypes of
xs:string using
Casting to xs:anyURI is supported only from the same type,
xs:untypedAtomic or xs:string.
When a value of any xs:anyURI, the
xs:anyURI value TV is derived from the
ST and SV as follows:
If ST is xs:untypedAtomic or xs:string see
Casting from xs:string or xs:untypedAtomic to
xs:QName or xs:NOTATION is described in
It is also possible to cast from xs:NOTATION to xs:QName,
or from xs:QName to
any type derived by restriction from xs:NOTATION. (Casting to xs:NOTATION
itself is not allowed, because xs:NOTATION is an abstract type.) The resulting
xs:QName or xs:NOTATION has the same prefix, local name, and namespace URI
parts as the supplied value.
See
The
value space of ENTITY is the set of all strings that match the
NCName production ... and have been
declared as an unparsed entity in a document type definition.
However,
xs:ENTITY match declared unparsed entities. Thus, this rule is relaxed in this specification and, in casting to xs:ENTITY and types derived from it, no check is made that the values correspond to declared unparsed entities.
This section applies when the supplied value SV
is an instance of xs:string or xs:untypedAtomic,
including types derived from these by restriction. If the value is
xs:untypedAtomic, it is treated in exactly the same way as a
string containing the same sequence of characters.
The supplied string is mapped to a typed value of the target type as defined in whiteSpace facet for the datatype. The resulting whitespace-normalized string
must be a valid lexical form for the datatype. The semantics of casting follow the rules of
XML Schema validation. For example, "13" cast as xs:unsignedInt returns
the xs:unsignedInt typed
value 13. This could also be written xs:unsignedInt("13").
The target type can be any simple type other than an abstract type. Specifically, it can be a type whose variety is atomic, union, or list. In each case the effect of casting to the target type is the same as constructing an element with the supplied value as its content, validating the element using the target type as the governing type, and atomizing the element to obtain its typed value.
When the target type is a derived type that is restricted by a pattern facet, the
lexical form is first checked against the pattern before further casting
is attempted (See
For example, consider a user-defined type my:boolean which is derived by
restriction from xs:boolean and specifies the pattern facet value="0|1".
The expression "true" cast as my:boolean would fail with a dynamic
error
Facets other than pattern are checked my:height
defined as a restriction of xs:integer with the facet <maxInclusive value="84"/>,
then the expression "100" cast as my:height would fail with a dynamic
error
Casting to the types xs:NOTATION, xs:anySimpleType,
or xs:anyAtomicType is not permitted because these types are abstract (they have
no immediate instances).
Special rules apply when casting to namespace-sensitive types. The types xs:QName
and xs:NOTATION are namespace-sensitive. Any type derived by restriction from
a namespace-sensitive type is itself namespace-sensitive, as is any union type having a
namespace-sensitive type among its members, and any list type having a namespace-sensitive type
as its item type. For details, see
Since version 3.0 of this specification, casting has been allowed between xs:QName
and xs:NOTATION in either direction; this was not permitted in previous Recommendations. Version 3.0 also removed
the rule that only a string literal (rather than a dynamic string) may be cast to an xs:QName
When casting to a numeric type:
If the value is too large or too small to be accurately represented by the implementation,
it is handled as an overflow or underflow as defined in
If the target type is xs:float or xs:double, the string -0 (and equivalents
such as -0.0 or -000)
In casting to xs:decimal or to a type derived from xs:decimal,
if the value is not too large or too small but nevertheless cannot be represented accurately
with the number of decimal digits available to the implementation, the implementation may round
to the nearest representable value or may raise a dynamic error
When casting to xs:duration, xs:dateTime, or xs:time,
if the seconds component has more fractional digits than are supported by the implementation,
excess digits xs:dateTime('2023-12-31T23:59:59.999999999')
is guaranteed to deliver an xs:dateTime value whose year component is 2023 rather than 2024.
Implementations are required to support millisecond precision or greater.
In casting to xs:date, xs:dateTime, xs:gYear,
or xs:gYearMonth
(or types derived from these), if the value is too large or too
small to be represented by the implementation, a dynamic error
In casting to a duration value, if the value is too large or too small to be represented by the
implementation, a dynamic error
For xs:anyURI, the extent to which an implementation validates the
lexical form of xs:anyURI is
If the cast fails for any other reason, a dynamic error
Casting from xs:string and xs:untypedAtomic to any other type
(primitive or non-primitive) has been described in
Casting a value to a derived type can be separated into a number of cases. In these rules:
The types xs:integer, xs:yearMonthDuration,
and xs:dayTimeDuration are treated as quasi-primitive types
(alongside the 20 truly
For any atomic type T, let P(T) denote the most specific primitive or quasi-primitive type
such that itemType-subtype(T, P(T)) is true.
The rules are then:
When the source type ST is the same type as the target type TT: this case always succeeds, returning the source value SV unchanged.
When itemType-subtype(ST, TT) is true:
see
When TT is the quasi-primitive type xs:integer
and SV is an instance of xs:numeric:
see
When TT is the quasi-primitive type xs:yearMonthDuration
or xs:dayTimeDuration and SV is an instance of xs:duration:
see
When P(ST) is the same type as P(TT):
see
Otherwise (P(ST) is not the same type as P(TT)):
see
When an atomic item SV is cast as xs:integer, the
resulting xs:integer value TV is obtained as follows:
If ST is
xs:decimal, xs:float or
xs:double, then TV is SV
with the fractional part discarded and the value converted to
xs:integer. Thus, casting 3.1456
returns 3 while -17.89 returns
-17. Casting 3.124E1
returns 31. If SV is too large to be
accommodated as an integer, (see xs:float or
xs:double values NaN,
INF, or -INF, a dynamic error is raised
In all other cases, the general rules of
When casting to a subtype of xs:integer (for example, xs:long), the
rules in xs:integer as a quasi-primitive type.
xs:yearMonthDuration and xs:dayTimeDuration
When the source value SV is an instance of xs:duration (including
any subtype of xs:duration), then:
If the target type TT is xs:yearMonthDuration, the result
is an instance of xs:yearMonthDuration whose months component
is equal to the months component of SV. The seconds
component of SV is ignored.
If the target type TT is xs:dayTimeDuration, the result
is an instance of xs:dayTimeDuration whose seconds component
is equal to the seconds component of SV. The months
component of SV is ignored.
In all other cases, the general rules of
In general, casting to xs:yearMonthDuration or xs:dayTimeDuration
loses information.
When casting to a subtype of xs:dayTimeDuration or
xs:yearMonthDuration, the
rules in xs:dayTimeDuration and xs:yearMonthDuration as quasi-primitive types.
It is always possible to cast an atomic item A to a type T
if the relation A instance of T is true, provided that T
is not an abstract type.
For example, it is
possible to cast an xs:unsignedShort to an
xs:unsignedInt, to an xs:integer, to an
xs:decimal, or to a union type
whose member types are xs:integer and xs:double.
Since the value space of the original type is a subset of the value space of the target type, such a cast is always successful.
For the expression A instance of T to be true, T must be
either an atomic type, or a union type that has no constraining facets. It cannot
be a list type, nor a union type derived by restriction from another union type, nor
a union type that has a list type among its member types.
The result will have the same value as the original, but will have a new type annotation:
If T is an atomic type, then the type annotation of the result is T.
If T is a union type, then the type of the result is an atomic type M
such that M is one of the atomic types in the transitive membership of
the union type T and A instance of M is true; if there is more
than one type M that satisfies these conditions (which could happen, for example,
if T is the union of two overlapping types such as xs:int
and xs:positiveInteger) then the first one is used, taking the member types
in the order in which they appear within the definition of the union type.
It is possible to cast an SV to a TT if the type of the
SV and the TT type are both derived by restriction
(directly or indirectly) from the same xs:byte can be cast as
xs:unsignedShort, provided the value is not negative.
If the value does not conform to the facets defined for the target type, then a dynamic
error is raised xs:string, in the case of types that have no canonical
representation defined for them).
Note that this will cause casts to fail if the pattern excludes the canonical
lexical representation of the source type. For example, if the type
my:distance is defined as a restriction of xs:decimal
with a pattern that requires two digits after the decimal point, casting of an
xs:integer to my:distance will always fail, because
the canonical representation of an xs:integer does not conform to
this pattern.
In some cases, casting from a parent type to a derived type requires special
rules. See xs:yearMonthDuration and xs:dayTimeDuration. See xs:ENTITY and types derived from it.
When the ST and the TT are derived, directly or
indirectly, from different
Cast the SV, up the hierarchy, to the
If SV is an instance of xs:string or xs:untypedAtomic, check its value against the
pattern facet of TT, and raise a dynamic error
Let P(TT) be the most specific primitive or quasi-primitive type of which TT
is a subtype, as described in
Cast the value to P(TT), as described in
If TT is derived from xs:NOTATION, assume for the
purposes of this rule that casting to xs:NOTATION succeeds.
Cast the value down to the target type TT, as described in
If the target type of a cast expression (or a constructor function) is a type with variety union, the supplied value must be one of the following:
A value of type xs:string or xs:untypedAtomic.
This case follows the general rules for casting from strings, and has already been
described in
If the union type has a pattern facet, the pattern is tested against the supplied
value after whitespace normalization, using the whiteSpace
normalization rules of the member datatype against which validation succeeds.
A value that is an instance of one of the atomic types in the transitive
membership of the union type, and of the union type itself. This case has already been described in
This situation only applies when the value is an instance of the union type, which means it will never apply when the union is derived by facet-based restriction from another union type.
A value that is castable to one or more of the atomic types in the transitive membership
of the union type (in the sense that the castable as operator returns true).
In this case the supplied value is cast to each atomic type in the transitive membership
of the union type in turn (in the order in which the member types appear in the declaration)
until one of these casts is successful; if none of them is successful, a dynamic error occurs
If the union type has a pattern facet, the pattern is tested against the canonical representation of the result value.
Only the atomic types in the transitive membership of the union type are considered. The
union type may have list types in its transitive membership, but (unless the supplied value
is of type xs:string or xs:untypedAtomic, in which case the
rules in
If more than one of these conditions applies, then the casting is done according to the rules for the first condition that applies.
If none of these conditions applies, the cast fails with a dynamic error
Example: consider a type U whose member types are xs:integer
and xs:date.
The expression "123" cast as U returns the
xs:integer value 123.
The expression current-date() cast as U returns
the current date as an instance of xs:date.
The expression 23.1 cast as U returns the xs:integer
value 23.
Example: consider a type V whose member types are xs:short
and xs:negativeInteger.
The expression "-123" cast as V returns the
xs:short value -123.
The expression "-100000" cast as V returns the
xs:negativeInteger value -100000.
The expression 93.7 cast as V returns the
xs:short value 93.
The expression "93.7" cast as V raises
a dynamic error "93.7" is not in the lexical space of the union type.
Example: consider a type W that is derived from the above type V
by restriction, with a pattern facet of -?\d\d.
The expression "12" cast as V returns the
xs:short value 12.
The expression "123" cast as V raises
an dynamic error "123" does not match the pattern facet.
If the target type of a cast expression (or a constructor function) is a
type with variety list, the supplied value must be of type xs:string or
xs:untypedAtomic. The rules follow the general principle for
all casts from xs:string outlined in
If the supplied value is not of type xs:string or
xs:untypedAtomic, a type error is raised
The semantics of the operation are consistent with validation: that is, the effect of casting a string S to a list type L is the same as constructing an element or attribute node whose string value is S, validating it using L as the governing type, and atomizing the resulting node. The result will always be either failure, or a sequence of zero or more atomic items each of which is an instance of the item type of L (or if the item type of L is a union type, an instance of one of the atomic types in its transitive membership).
If the item type of the list type is namespace-sensitive, then the
namespace bindings in the static context will be used to
resolve any namespace prefix, in the same way as when the target type is
xs:QName.
If the list type has a pattern facet, the pattern must match
the supplied value after collapsing whitespace (an operation equivalent to the
use of the
For example, the expression cast "A B C D" as xs:NMTOKENS
produces a sequence of four xs:NMTOKEN values,
("A", "B", "C", "D").
For example, given a user-defined type my:coordinates defined
as a list of xs:integer with the facet <xs:length value="2"/>,
the expression my:coordinates("2 -1") will return a sequence of two
xs:integer values (2, -1), while the expression my:coordinates("1 2 3")
will result in a dynamic error because the length of the list does not conform to the
length facet. The expression my:coordinates("1.0 3.0")
will also fail because the strings 1.0 and 3.0
are not in the lexical space of xs:integer.
The error text provided with these errors is non-normative.
Error code used by
Raised when
This error is raised whenever an attempt is made to divide by zero.
This error is raised whenever numeric operations result in an overflow or underflow.
This error is raised when an integer used to select a member of an array is outside the range of values for that array.
This error is raised when the $length argument to
Raised when casting to xs:decimal if the supplied value exceeds the
implementation-defined limits for the datatype.
Raised by NaN or Infinity.
Raised when casting to xs:integer if the supplied value exceeds the
implementation-defined limits for the datatype.
Raised when multiplying or dividing a duration by a number, if the number supplied is NaN.
Raised when casting a string to xs:decimal if the string has more digits of precision
than the implementation can represent (the implementation also has the option of rounding).
Raised by
Raised by any function that uses a collation if the requested collation is not recognized.
Raised by
Raised by functions such as
Raised by
Raised when parsing CSV input if a syntax error in the input CSV is found.
Raised when parsing CSV input if the field-separator,
record-separator, or quote-character option is set to
an invalid value.
Raised when parsing CSV input if the same delimiter character is assigned to more than one role.
Raised by the function from the get entry of
csv-columns-record, if its $key argument is an
xs:string and is not one of the known column names.
Raised by
Raised by
Raised by
Raised by xs:anyURI.
Raised (optionally) by xs:anyURI.
Raised by
Raised by
Raised when the xsd-validation option to type Q{U}NNN
is used, and Q{U}NNN does not identify a type in the static context.
Raised when the xsd-validation option to skip, if the processor is not schema-aware.
Raised when
Raised by
Raised when the dtd-validation option to dtd-validation option, but there
may be environments (such as web browsers) where this is not practically feasible.
Raised by
Raised by
This error is raised if the decimal format name supplied to
This error is raised if a decimal format value supplied to
This error is raised if the picture string supplied to
Raised when casting to date/time datatypes, or performing arithmetic with date/time values, if arithmetic overflow or underflow occurs.
Raised when casting to duration datatypes, or performing arithmetic with duration values, if arithmetic overflow or underflow occurs.
Raised by adjust-date-to-timezone and related functions if the supplied timezone is invalid.
Raised by civil-timezone if no timezone data is available for the given date/time and place.
Raised by build-dateTime if the set of fields supplied does not correspond to those present in one of the
Raised by build-dateTime if one of the fields supplied has a value that is outside the
supported range.
This error is raised if the picture string or calendar supplied to
This error is raised if the picture string supplied to
Raised by
Raised by functions such as
Raised by functions such as
Raised by
Raised by functions such as $options map contains an invalid entry.
Raised by
Raised by escaped="true" or escaped-key="true", and the corresponding string
or key contains an invalid JSON escape sequence.
Raised by
Raised by
Raised by
Raised by origin
option is not an ancestor of the $node whose relative path is required.
Raised by
Raised by
Raised by
Raised by
Raised by
A general-purpose error raised when casting, if a cast between two datatypes is allowed in principle,
but the supplied value cannot be converted: for example when attempting to cast the string "nine" to an integer.
Raised when either argument to
Raised by
Raised by
Raised by
Raised by functions such as
Raised by
A catch-all error for
Raised when the input to
Raised when the radix supplied to
Raised when the digits in the string supplied to
Raised by regular expression functions such as i, m, q, s, or x.
Raised by regular expression functions such as
Raised by
Raised by $replacement
and $action arguments are supplied.
Raised by
Raised by
Raised by
Raised by $source argument contains a fragment identifier,
or if it cannot be resolved to an absolute URI (for example, because the
base-URI property in the static context is absent), or if it cannot be used to
retrieve the string representation of a resource.
Raised by $encoding argument is not a valid encoding name,
if the processor does not support the specified encoding, if the string
representation of the retrieved resource contains octets that cannot be decoded
into Unicode
Raised by $encoding argument is absent and the processor
cannot infer the encoding using external information and the
encoding is not UTF-8.
A dynamic error is raised if the authority component of a URI contains an open square bracket but no corresponding close square bracket.
A dynamic error is raised if no XSLT processor suitable for evaluating a call on
A dynamic error is raised if the parameters supplied to initial-template does not exist in the stylesheet), that error code should
be used in preference.
A dynamic error is raised if an XSLT transformation invoked using
A dynamic error is raised if the
A dynamic error is raised if the result of the
This appendix lists the named record types that are used in function signatures in this function library, and that are available in the static context of every application.
These definitions are all in the standard function namespace http://www.w3.org/2005/xpath-functions,
which is normally bound to the prefix fn. Because this will not usually be the default namespace for
types, the names will usually be written with the prefix fn.
fn namespace.
Two functions in this specification, http://www.w3.org/2005/xpath-functions, which is therefore the target namespace
of the relevant schema.
A processor xs:redefine or xs:override, adding members to substitution
groups, or defining derived types. Processors are not xsi:schemaLocation or
xsi:type attributes in the instance being validated.
The schema for this namespace is organized as three schema documents. The first is a simple
umbrella document that includes the other two. A copy can be found at
This schema describes the output of the function
The schema is reproduced below, and can also be found in
This schema describes the output of the function
The schema is reproduced below, and can also be found in
This schema describes the output of the function
The schema is reproduced below, and can also be found in
This Appendix describes some sources of functions that fall outside the scope of the function library defined in this specification. It includes both function specifications and function implementations. Inclusion of a function in this appendix does not constitute any kind of recommendation or endorsement; neither is omission from this appendix to be construed negatively. This Appendix does not attempt to give any information about licensing arrangements for these function specifications or implementations.
A number of W3C Recommendations make use of XPath, and in some cases such Recommmendations define additional functions to be made available when XPath is used in a specific host language.
The various versions of XSLT have all included additional functions intended to be available only when XPath is used within XSLT, and not in other host language environments. Some of these functions were originally defined in XSLT, and subsequently migrated into the core function library defined in this specification.
Generally, the reason that functions have been defined in XSLT rather than in the core library has been that they required additional static or dynamic context information.
XSLT-defined functions share the core namespace http://www.w3.org/2005/xpath-functions (but in XPath 1.0
and XSLT 1.0, no namespace was defined for these functions).
The following table lists all functions that have been defined in XSLT, and summarizes their current status.
| Function name | Availability |
|---|---|
| fn:accumulator-after | XSLT 3.0 and later |
| fn:accumulator-before | XSLT 3.0 and later |
| fn:apply-templates | XSLT 4.0 |
| fn:available-system-properties | XSLT 3.0 and later |
| fn:character-map | XSLT 4.0 |
| fn:collation-key | Originally XSLT 3.0, then XPath 3.1 and later |
| fn:copy-of | XSLT 3.0 and later |
| fn:current | XSLT 1.0 and later |
| fn:current-group | XSLT 2.0 and later |
| fn:current-grouping-key | XSLT 2.0 and later |
| fn:current-merge-group | XSLT 3.0 and later |
| fn:current-merge-key | XSLT 3.0 and later |
| fn:current-merge-key-array | XSLT 4.0 |
| fn:current-output-uri | XSLT 3.0 and later |
| fn:document | XSLT 1.0 and later |
| fn:element-available | XSLT 1.0 and later |
| fn:format-date | Originally XSLT 2.0, then XPath 3.0 and later |
| fn:format-dateTime | Originally XSLT 2.0, then XPath 3.0 and later |
| fn:format-number | Originally XSLT 1.0 and 2.0; then XPath 3.0 and later |
| fn:format-time | Originally XSLT 2.0; then XPath 3.0 and later |
| fn:function-available | XSLT 1.0 and later |
| fn:generate-id | Originally XSLT 1.0 and 2.0; then XPath 3.0 and later |
| fn:json-to-xml | Originally XSLT 3.0, then XPath 3.1 and later |
| fn:key | XSLT 1.0 and later |
| fn:map-for-key | XSLT 4.0 |
| fn:regex-group | XSLT 2.0 and later |
| fn:regex-groups | XSLT 4.0 |
| fn:snapshot | XSLT 3.0 and later |
| fn:stream-available | XSLT 3.0 and later |
| fn:system-property | XSLT 1.0 and later |
| fn:type-available | XSLT 2.0 and later |
| fn:unparsed-entity-public-id | XSLT 2.0 and later |
| fn:unparsed-entity-uri | XSLT 1.0 and later |
| fn:unparsed-text | Originally XSLT 2.0; then XPath 3.0 and later |
| fn:unparsed-text-available | Originally XSLT 2.0; then XPath 3.0 and later |
| fn:xml-to-json | Originally XSLT 3.0, then XPath 3.1 and later |
| map:contains | Originally XSLT 3.0, then XPath 3.1 and later |
| map:entry | Originally XSLT 3.0, then XPath 3.1 and later |
| map:find | Originally XSLT 3.0, then XPath 3.1 and later |
| map:for-each | Originally XSLT 3.0, then XPath 3.1 and later |
| map:get | Originally XSLT 3.0, then XPath 3.1 and later |
| map:keys | Originally XSLT 3.0, then XPath 3.1 and later |
| map:merge | Originally XSLT 3.0, then XPath 3.1 and later |
| map:put | Originally XSLT 3.0, then XPath 3.1 and later |
| map:remove | Originally XSLT 3.0, then XPath 3.1 and later |
| map:size | Originally XSLT 3.0, then XPath 3.1 and later |
XSLT 3.0 was well advanced when work started on XPath 3.1, but XPath 3.1 appeared as a Recommendation before XSLT 3.0 reached that status.
XForms 1.1 is based on XPath 1.0. It adds the following functions to the set defined in XPath 1.0, using the same namespace:
boolean-from-string, is-card-number, avg, min, max,
count-non-empty, index, power, random, compare,
if, property,
digest, hmac, local-date, local-dateTime, now,
days-from-date, days-to-date, seconds-from-dateTime, seconds-to-dateTime,
adjust-dateTime-to-timezone, seconds, months, instance,
current, id, context, choose, event.
XForms 2.0 was first published as a W3C Working Draft, and subsequently as a W3C Community Group specification. These draft specifications do not include any additional functions beyond those in the core XPath specification.
The XQuery Update 1.0 specification defines one additional function in the core namespace
http://www.w3.org/2005/xpath-functions, namely fn:put. This function can be used
to write a document to external storage. It is thus unusual in that it has side-effects; the XQuery Update 1.0
specification defines semantics for updating expressions including this function.
Although XQuery Update 1.0 is defined as an extension of XQuery 1.0, a number of implementers have adapted it, in a fairly intuitive way, to work with later versions of XQuery. At the time of this publication, later versions of the XQuery Update specification remain at Working Draft status.
A number of community groups, with varying levels of formal organization, have defined specifications for additional function libraries to augment the core functions defined in this specification. Many of the resulting function specifications have implementations available for popular XPath, XQuery, and XSLT processors, though the level of support is highly variable.
The first such group was EXSLT. This activity was primarily concerned with augmenting the capability of XSLT 1.0, and many of its specifications were overtaken by core functions that became available in XPath 2.0. EXSLT defined a number of function modules covering:
node-set function)max, min, abs, and trigonometric functions)Specifications from the EXSLT group can be found at
A renewed attempt to define additional function libraries using XPath 2.0 as its baseline formed under the name EXPath. Again, the specifications are in various states of maturity and stability, and implementation across popular processors is patchy. At the time of this publication the function libraries that exist in stable published form include:
The EXPath community has also been engaged in other related projects, such as defining packaging
standards for distribution of XSLT/XQuery components, and tools for unit testing. Its specifications
can be found at
A third activity has operated under the name EXQuery, which as the name suggests has focused
on extensions to XQuery. EXQuery has published a single specification, RestXQ, which is primarily a
system of function annotations allowing XQuery functions to act as endpoints for RESTful services.
It also includes some simple functions to assist with the creation of such services. The RestXQ specification
can be found at
Many useful functions can be written in XSLT or XQuery, and in this case the function implementations themselves can be portable across different XSLT and XQuery processors. This section describes one such library.
FunctX is an open-source library of general-purpose functions, supplied in the form of XQuery 1.0 and XSLT 2.0 implementations. It contains over a hundred functions. Typical examples of these functions are:
The FunctX library can be found at
The keyword for the argument has changed from arg to value.
The argument is now optional, and defaults to the context value (which is atomized if necessary).
This change aligns constructor functions such as xs:string, xs:boolean,
and xs:numeric with
The semantics of the HTML case-insensitive collation
"http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive"
are now defined normatively in this specification rather than by reference to the
living HTML5 specification (which has changed since 3.1); and the rules now make ordering explicit rather than leaving
it implementation-defined.
An option in an
These changes are not highlighted in the change-marked version of the specification.
The value comparison operators such as eq, lt, and gt
are now defined in op:XX-equal,
op:XX-less-than, and op:XX-greater-than
have therefore been dropped.
The names of parameters appearing in function signatures have been changed. This is to reflect the introduction of keyword arguments in XPath 4.0; the names chosen for parameters are now more consistent across the function library.
In 3.1 and earlier versions, the keywords used in the specification were for documentation purposes only, so these changes do not affect backwards compatibility.
Where appropriate, the phrase "the value of $x" has been replaced
by the simpler $x. No change in meaning is intended.
For functions that take a variable number of arguments, wherever possible the specification now gives a single function signature indicating default values for arguments that may be omitted, rather than multiple signatures.
The formal specifications of array functions have been rewritten to use two new
primitives: array:members which converts an array to a sequence of value records,
and array:of-members which does the inverse. This has enabled many of the
functions to be specified more concisely, and with less duplication between similar functions
for sequences and arrays.
The appendix containing illustrative user-written functions has been dropped; many of these functions are no longer needed.
This section summarizes the extent to which this specification is compatible with previous versions.
Version 4.0 of this function library is fully backwards compatible with version 3.1, except as noted below:
In xs:double and xs:decimal) have changed.
In previous versions of the specification, xs:decimal values were converted
to xs:double, leading to a possible loss of precision. This could make
comparisons non-transitive, leading to problems when grouping,
and potentially (depending on the sort algorithm) with sorting. The problem has been fixed by requiring
comparisons to be performed based on the exact mathematical value without any loss of precision.
This means, for example, that deep-equal(0.2, 0.2e0) is now false, whereas in previous
versions it was true. The two values are not mathematically equal, because the exact decimal equivalent
of the xs:double value written as 0.2e0 is
0.200000000000000011102230246251565404236316680908203125.
The corresponding change has not been made to the = and eq operators,
because it was found to be too disruptive. For example, if the context node is the element
<e price="10.0" discount="0.2"/>, there is an expectation that the expression
@price - @discount = 9.8 should return true. But (assuming untyped data), the result of
the subtraction is an xs:double whose precise value is
9.800000000000000710542735760100185871124267578125, so comparing the two values as
decimals would return false.
In previous versions, unrecognized options supplied to the $options
parameter of functions such as
In version 4.0, omitting the $value of
In version 3.1, the abcdef]]>
and abcdef]]> were considered non-equal. In version 4.0,
the text nodes are now merged prior to comparison, so these two elements compare equal.
In version 3.1, the atomic types xs:hexBinary and xs:base64Binary
were not mutually comparable under the eq operator, and always compared not equal
as map keys or under operations such as fn:distinct-values and fn:deep-equal.
In version 4.0, instances of xs:hexBinary and xs:base64Binary are
equal if they represent the same octet sequence. This means, for example, that the zero-length
values xs:hexBinary("") and xs:base64Binary("") can no longer co-exist
as keys in the same map.
The format of numeric values in the output of xs:double and then serialized
using the casting rules, resulting in an input value of 10000000 being output as 1e7.
In version 4.0, the value is output
In version 4.0, the function signature of xs:NCName or a zero-length string (the new coercion rules
mean that any string in the form of an xs:NCName is acceptable). If a string is supplied
that does not meet these requirements, a type error will be raised. In version 3.1, this was not an error:
it came under the rule that when no namespace binding existed for the supplied prefix, the function
would return the empty sequence.
Furthermore, because the expected type of this parameter is no longer xs:string, the
special coercion rules for xs:string parameters in XPath 1.0 compatibility mode no longer apply.
For example, supplying xs:duration('PT1H') as the first argument will now raise a
type error, rather than looking for a namespace binding for the prefix PT1H.
Version 4.0 makes it clear that the casting of a value other than xs:string
or xs:untypedAtomic to a list type (whether using a cast expression or a
constructor function) is a type error xs:string?.
The way that xs:integer or xs:decimal values, the result is an xs:integer or
xs:decimal, rather than the result of converting this to an xs:double.
The type of the third argument of xs:string to (xs:string | xs:QName).
Because the expected type of this parameter is no longer xs:string, the
special coercion rules for xs:string parameters no longer apply.
For example, it is no
longer possible to supply an instance of xs:anyURI or (when XPath 1.0 compatibility
mode is in force) an instance of xs:boolean or xs:duration.
When
In regular expressions, the assertions ^ and $ can no longer be
followed by a quantifier. This is because (a) a quantifier that allows zero occurrences means
that the assertion will always match, and (b) a quantifier that allows multiple occurrences
has no effect. Processors may provide an option that allows such regular expressions to be
accepted for compatibility reasons.
The NaN as equal to NaN.
For compatibility issues regarding earlier versions, see the 3.1 version of this specification.