%status-entities; This document will be considered ready for transition to Proposed Recommendation at the same time that the XQuery 3.1 specification is ready for transition to Proposed Recommendation.
'> This &doc.w3c-doctype-full; specifies XSLT and XQuery Functions and Operators (F&O) version 4.0, a fully compatible extension ofThis document defines constructor functions, operators, and functions on the datatypes defined in
A summary of changes since version 3.1 is provided at
This version of the specification is work in progress. It is produced by the QT4 Working Group, officially
the W3C XSLT 4.0 Extensions Community Group. Individual functions specified in the document may be at
different stages of review, reflected in their
The purpose of this document is to define functions and operators for inclusion in
XPath 4.0, XQuery 4.0, and XSLT 4.0.
The exact syntax used to call these
functions and operators is specified in
This document defines three classes of functions:
General purpose functions, available for direct use in user-written queries, stylesheets, and XPath expressions,
whose arguments and results are values defined by the
Constructor functions, used for creating instances of a datatype from values of (in general) a different datatype. These functions are also available for general use; they are named after the datatype that they return, and they always take a single argument.
Functions that specify the semantics of operators defined in
xs:dateTimeStamp
, and it
incorporates as built-in types the two types xs:yearMonthDuration
and xs:dayTimeDuration
which were previously XDM additions to the type system. In addition, XSD 1.1 clarifies and updates many
aspects of the definitions of the existing datatypes: for example, it extends the value space of
xs:double
to allow both positive and negative zero, and extends the lexical space to allow +INF
;
it modifies the value space of xs:Name
to permit additional Unicode characters; it allows year zero and disallows leap seconds in xs:dateTime
values; and it allows any character string to appear as the value of an xs:anyURI
item.
Implementations of this specification
References to specific sections of some of the above documents are indicated by
cross-document links in this document. Each such link consists of a pointer to a
specific section followed a superscript specifying the linked document. The
superscripts have the following meanings: XQ
Despite its title, this document does not attempt to define the semantics of all the operators available
in the x/y
, x!y
, and x[y]
,
as well simple operators such as x,y
, x and y
, x or y
,
x<<y
, x>>y
, x is y
, x||y
, x|y
,
x union y
, x except y
, x intersect y
, x to y
and x otherwise y
) are now defined entirely within
The remaining operators that are described in this publication are those where the semantics of the operator
depend on the types of the arguments. For these operators, the language specification describes rules for selecting
an internal function defined in this specification to underpin the operator. For example, when the operator x+y
is applied to two operands of type xs:double
, the function op:numeric-add
is selected.
XPath defines a range of comparison operators x=y
, x!=y
, x<y
,
x>y
, x<=y
, x>=y
, x eq y
, x ne y
, x lt y
,
x gt y
, x le y
, x ge y
, which apply to a variety of operand types including
for example numeric values, strings, dates and times, and durations. For each relevant data type, two functions
are defined in this specification, for example op:date-equal
and op:date-less-than
.
These define the semantics of the eq
and lt
operators applied to operands of that data type. The operators
x ne y
, x gt y
, x le y
, and x ge y
are defined by reference to
these two; and the =
, !=
, <
,
>
, <=
, and >=
are defined by reference to
eq
, ne
, lt
,
gt
, le
, and ge
respectively.
Previous versions of this specification also defined a third comparison function of the form
op:date-greater-than
. This has been dropped, as it is always the inverse of the -less-than
form.
This recommendation contains a set of function specifications. It defines conformance at the level of individual functions. An implementation of a function conforms to a function specification in this recommendation if all the following conditions are satisfied:
For all combinations of valid inputs to the function (both explicit arguments and implicit context dependencies), the result of the function meets the mandatory requirements of this specification.
For all invalid inputs to the function, the implementation raises (in some way appropriate to the calling environment) a dynamic error.
For a sequence of calls within the same
Other recommendations (“host languages”) that reference this document may dictate:
Subsets or supersets of this set of functions to be available in particular environments;
Mechanisms for invoking functions, supplying arguments, initializing the static and dynamic context, receiving results, and handling errors;
A concrete realization of concepts such as
Which versions of other specifications referenced herein (for example, XML, XSD, or Unicode) are to be used.
Any behavior that is discretionary (implementation-defined or implementation-dependent) in this specification may be constrained by a host language.
Adding such constraints in a host language, however, is discouraged because it makes it difficult to reuse implementations of the function library across host languages.
This specification allows flexibility in the choice of versions of specifications on which it depends:
It is
It is
It is
The XML Schema 1.1 recommendation
introduces one new concrete datatype: xs:dateTimeStamp
; it also incorporates
the types xs:dayTimeDuration
, xs:yearMonthDuration
,
and xs:anyAtomicType
which were previously defined in earlier versions of xs:NCName
based on the rules in XML 1.1 rather than 1.0.
The
In this document, text labeled as an example or as a note is provided for explanatory purposes and is not normative.
The functions and operators defined in this document are contained in one of
several namespaces (see xs:QName
.
This document uses conventional prefixes to refer to these namespaces. User-written
applications can choose a different prefix to refer to the namespace, so long as it is
bound to the correct URI. The host language may also define a default namespace for
function calls, in which case function names in that namespace need not be prefixed
at all. In many cases the default namespace will be
http://www.w3.org/2005/xpath-functions
, allowing a call on the fn:name
function (for example) to be written as name()
rather than fn:name()
;
in this document, however, all example function calls are explicitly prefixed.
The URIs of the namespaces and the conventional prefixes associated with them are:
http://www.w3.org/2001/XMLSchema
for constructors —
associated with xs
.
The section http://www.w3.org/2001/XMLSchema
,
and are named in this document using the xs
prefix.
http://www.w3.org/2005/xpath-functions
for functions — associated with fn
.
The namespace
prefix used in this document for most functions that are available to users is
fn
.
http://www.w3.org/2005/xpath-functions/math
for functions — associated with math
.
This namespace is used for some mathematical functions. The namespace
prefix used in this document for these functions is math
.
These functions are available to users in exactly the same way as those in the
fn
namespace.
http://www.w3.org/2005/xpath-functions/map
for functions — associated with map
.
This namespace is used for some functions that manipulate maps (see
map
.
These functions are available to users in exactly the same way as those in the
fn
namespace.
http://www.w3.org/2005/xpath-functions/array
for functions — associated with array
.
This namespace is used for some functions that manipulate maps (see
array
.
These functions are available to users in exactly the same way as those in the
fn
namespace.
http://www.w3.org/2005/xqt-errors
— associated with
err
.
There are no functions in this namespace; it is used for error codes.
This document uses the prefix err
to represent the namespace URI
http://www.w3.org/2005/xqt-errors
, which is the namespace for all XPath
and XQuery error codes and messages. This namespace prefix is not predeclared and
its use in this document is not normative.
http://www.w3.org/2010/xslt-xquery-serialization
— associated with
output
.
There are no functions in this namespace: it is
used for serialization parameters, as described in
Functions defined with the op
prefix are described here to
underpin the definitions of the operators in op
prefix. For example, multiplication is generally
associated with the *
operator, but it is described as a function
in this document:
Sometimes there is a need to use an operator as a function.
To meet this requirement, the function fn:op
takes any simple binary operator as its argument,
and returns a corresponding function. So for example fn:for-each-pair($seq1, $seq2, op("+"))
performs a pairwise addition of the values in two input sequences.
The above namespace URIs are not expected to change from one version of this document to another. The contents of these namespaces may be extended to allow additional functions (and errors, and serialization parameters) to be defined.
A function is uniquely defined by its name and arity (number of arguments); it is therefore
not possible to have two different functions that have the same name and arity, but different
types in their signature. That is, function overloading in this sense of the term is not permitted.
Consequently, functions such as fn:string
which accept arguments of many different
types have a signature that defines a very general argument type, in this case item()?
which accepts any single item; supplying an inappropriate item (such as a function item) causes
a dynamic error.
Some functions on numeric types include the type xs:numeric
in their signature
as an argument or result type. In this version of the specification, xs:numeric
has been redefined as a built-in union type representing the union of
xs:decimal
, xs:float
, xs:double
(and thus automatically
accepting types derived from these, including xs:integer
).
Operators such as +
may be overloaded: they map to different underlying functions depending
on the dynamic types of the supplied operands.
It is possible for two functions to have the same name provided they have different arity (number of arguments). For the functions defined in this specification, where two functions have the same name and different arity, they also have closely related behavior, so they are defined in the same section of this document.
Each function (or group of functions having the same name) is defined in this specification using a standard proforma.
The function name is a QName
as defined in -
). Abbreviations are
used only where there is a strong precedent in other programming languages (as with math:sin
and
math:cos
for sine and cosine). If a
function name contains a fn:timezone-from-dateTime
.
The first section in the proforma is a short summary of what the function does. This is intended to be informative rather than normative.
Each function is then defined by specifying its signature(s), which define the types of the parameters and of the result value.
Where functions take a variable number of arguments, two conventions are used:
Wherever possible, a single function signature is used giving default values for those parameters that can be omitted.
If this is not possible, because the effect of omitting a parameter cannot be specified by giving a default value, multiple signatures are given for the function.
Each function signature is presented in a form like this:
In this notation, http://www.w3.org/2005/xpath-functions
:
this is one of the conventional prefixes listed in ()
; otherwise, the name is followed by a parenthesized list of
parameter declarations. Each parameter declaration includes:
The name of the parameter (which in 4.0 is significant because it can be used as a keyword in a function call)
The static type of the parameter (in italics)
If this is the last parameter of a variadic function, an ellipsis (...
)
If the parameter is optional, then an expression giving the default value
(preceded by the symbol :=
).
The default value expression is evaluated using the static and
dynamic context of the function caller (or of a named function reference). For example,
if the default value is given as .
, then it evaluates to the context value
from the dynamic context of the function caller; if it is given as default-collation
,
then its value is the default collation from the static context of the function caller;
if it is given as deep-equal#2
, then the third argument supplied to deep-equal
is the default collation from the static context of the caller.
If there are two or more parameter declarations, they are separated by a comma.
The return-type
One function, fn:concat
, has a variable number of arguments (zero or more).
More strictly, there is an infinite set of functions having the name fn:concat
, with arity
ranging from 0 to infinity. For this special case, a single function signature is given, with an ellipsis
indicating an indefinite number of arguments.
The next section in the proforma defines the semantics of the function as a set of rules.
The order in which the rules appear is significant; they are to be applied in the order in which
they are written. Error conditions, however, are generally listed in a separate section that follows
the main rules, and take precedence over non-error rules except where otherwise stated. The principles outlined
in
Where the proforma includes sections headed
Rules for passing parameters to operators are described in the relevant sections
of xs:untypedAtomic
and the empty sequence are specified in this section.
As is customary, the parameter type name indicates that the function or operator
accepts arguments of that type, or types derived from it, in that position. This
is called xs:anyURI
can be promoted to produce an argument
of the required type. (See
xs:integer
may be used
where xs:decimal
is expected.
xs:decimal
may be
promoted to xs:float
or xs:double
.
Promotion to xs:double
should be done directly, not via
xs:float
, to avoid loss of precision.
xs:anyURI
can be promoted to the
type xs:string
.
Some functions accept a single value or the empty sequence as an argument and
some may return a single value or the empty sequence. This is indicated in the
function signature by following the parameter or return type name with a
question mark: ?
, indicating that either a single value or the
empty sequence must appear. See below.
Note that this function signature is different from a signature in which the
parameter is omitted. See, for example, the two signatures
for fn:string
. In the first signature, the parameter is omitted
and the argument defaults to the context value, referred to as .
.
In the second signature, the argument must be present but may be the empty
sequence, written as ()
.
Some functions accept a sequence of zero or more values as an argument. This is
indicated by following the name of the type of the items in the sequence with
*
. The sequence may contain zero or more items of the named type.
For example, the function below accepts a sequence of xs:double
and
returns a xs:double
or the empty sequence.
In XPath 4.0, the arguments in a function call can be supplied by
keyword as an alternative to supplying them positionally. For example the call
resolve-uri(@href, static-base-uri())
can now be written
resolve-uri(base: static-base-uri(), relative: @href)
. The order in which
arguments are supplied can therefore differ from the order in which they are declared.
The specification, however, continues to use phrases such as “the second argument” as a
convenient shorthand for "the value of the argument that is bound to the second parameter
declaration".
As a matter of convention, a number of functions defined in this document take a parameter whose value is a map, defining options controlling the detail of how the function is evaluated. Maps are a new datatype introduced in XPath 3.1.
For example, the function fn:xml-to-json
has an options parameter
allowing specification of whether the output is to be indented. A call might be written:
Where a function adopts the
The value of the relevant argument must be a map. The entries in the map are
referred to as options: the key of the entry is called the option name, and the
associated value is the option value. Option names defined in this specification
are always strings (single xs:string
values). Option values may
be of any type.
The type of the options parameter in the function signature is always
given as map(*)
.
Although option names are described above as strings, the actual key may be
any value that compares equal to the required string (using the eq
operator
with Unicode codepoint collation; or equivalently, the fn:atomic-equal
relation).
For example, instances of xs:untypedAtomic
or xs:anyURI
are equally acceptable.
This means that the implementation of the function can check for the
presence and value of particular options using the functions map:contains
and/or map:get
.
Implementations xs:QName
as the option
names, using an appropriate namespace.
If an option is present whose key is not described in the specification,
then a type error xs:QName
with a non-absent namespace.
All entries in the options map are optional, and supplying an empty map has the same effect as omitting the relevant argument in the function call, assuming this is permitted.
For each named option, the function
specification defines a required type for the option value. The value that is actually
supplied in the map is converted to this required type using the
It is the responsibility of each function implementation to invoke this conversion; it does not happen automatically as a consequence of the function-calling rules.
In cases where an option is list-valued, by convention the function should accept
either a sequence or an array: but this rule applies only if the specification
of the option explicitly accepts either. Accepting a sequence is convenient if the
value is generated programmatically using an XPath expression; while accepting an array
allows the options to be held in an external file in JSON format, to be read using
a call on the fn:json-doc
function.
In cases where the value of an option is itself a map, the specification of the particular function must indicate whether or not these rules apply recursively to the contents of that map.
The diagrams in this section show how nodes, functions, primitive simple types, and user defined types fit together into a type system. This type system comprises two distinct subsystems that both include the primitive atomic types. In the diagrams, connecting lines represent relationships between derived types and the types from which they are derived; the former are always below and to the right of the latter.
The xs:IDREFS
, xs:NMTOKENS
,
xs:ENTITIES
types, and xs:numeric
and both the
user-defined list types
and
user-defined union types
are special types in that these types are lists or unions
rather than types derived by extension or restriction.
The first diagram illustrates the relationship of various item types.
Item types are used to characterize the various types of item that can appear in a sequence (nodes, atomic values, and functions), and they are therefore used in declaring the types of variables or the argument types and result types of functions.
Item types in the data model form a directed graph, rather than a
hierarchy or lattice: in the relationship defined by the
derived-from(A, B)
function, some types are derived from
more than one other type. Examples include functions
(function(xs:string) as xs:int
is substitutable for
function(xs:NCName) as xs:int
and also for
function(xs:string) as xs:decimal
), and union types
(A
is substitutable for the union type (A | B)
and also
for (A | C)
. In XDM, item types include node types,
function types, and built-in atomic types. The diagram, which shows
only hierarchic relationships, is therefore a simplification of the
full model.
The next diagram illustrate the schema type subsystem, in which
all types are derived from xs:anyType
.
Schema types include built-in types defined in the XML Schema specification, and user-defined types defined using mechanisms described in the XML Schema specification. Schema types define the permitted contents of nodes. The main categories are complex types, which define the permitted content of elements, and simple types, which can be used to constrain the values of both elements and attributes.
&common-anyType.xml;The final diagram shows all of the atomic types, including the primitive simple types and the
built-in types derived from the primitive simple types.
This includes all the built-in datatypes defined in
Atomic types are both item types and schema types, so the root type xs:anyAtomicType
may be found
in both the previous diagrams.
The terminology used to describe the functions and operators on types defined in
Following in the tradition of
This document uses the terms string
, character
, and codepoint
with meanings that are normatively defined in
This
definition excludes Unicode characters in the surrogate blocks as well as
xs:string
datatype.
The set of codepoints is thus wider than the set of characters.
This specification spells “codepoint” as one word; the Unicode specification spells
it as “code point”.
Equivalent terms found in other specifications are
“character number” or “code position”. See
Because these terms appear so frequently, they are hyperlinked to the definition only when there is a particular desire to draw the reader’s attention to the definition; the absence of a hyperlink does not mean that the term is being used in some other sense.
It is
This specification adopts the Unicode notation U+xxxx
to refer to a codepoint
by its hexadecimal value (always four to six hexadecimal digits). This is followed where appropriate
by the official Unicode character name and its graphical representation: for example
Unless explicitly stated, the functions in this document do not ensure that any
returned xs:string
values are normalized in the sense of
In functions that involve character counting such
as fn:substring
, fn:string-length
and
fn:translate
, what is counted is the number of XML
This document uses the phrase “namespace URI” to identify the concept identified
in
It also uses the term expanded-QName
defined below.
xs:QName
datatype as defined in the XDM data model
(see
The term URI is used as follows:
xs:anyURI
datatype
as defined in
This means, in practice, that where this
specification requires a “URI Reference”, an IRI as defined in xs:anyURI
is a wider definition than the definition in
In this specification:
The auxiliary verb
When the sentence relates to an implementation of a function (for example "All implementations
When the sentence relates to the result of a function (for example "The result $arg
") then the implementation is not conformant unless it delivers a result as stated.
When the sentence relates to the arguments to a function (for example "The value of $arg
The auxiliary verb
The auxiliary verb
Where this specification states that something is implementation-defined or implementation-dependent, it is open to host languages to place further constraints on the behavior.
This section is concerned with the question of whether two calls on a function, with the same arguments, may produce different results.
In this section the term
fn:current-dateTime
within the same execution scope will return the same result.
The execution scope is defined by the host language that invokes the function library.use-when
attributes, are in a separate execution scope).
The following definition explains more precisely what it means for two function calls to return the same result:
$V1
and $V2
are
defined to be
Both items are atomic values, of precisely the same type, and the values are equal as defined using the eq
operator,
using the Unicode codepoint collation when comparing strings.
Both items are nodes, and represent the same node.
Both items are maps, both maps have the same number of entries,
and for every entry E1 in the first map there is an entry E2 in the second map such
that the keys of E1 and E2 are
Both items are arrays, both arrays have the same number of members, and the members
are pairwise
Both items are function items,
neither item is a map or array, and the two function items have the same function identity.
The concept of function identity is explained in
Some functions produce results that depend not only on their explicit arguments, but also on the static and dynamic context.
fn:name#0
is context-dependent
while fn:name#1
is context-independent.
The main categories of context-dependent functions are:
Functions that explicitly deliver the value of a component of the static or dynamic context,
for example fn:static-base-uri
, fn:default-collation
,
fn:position
, or fn:last
.
Functions with an optional parameter whose default value is taken from the static
or dynamic context of the caller, usually either the context value (for example, fn:node-name
)
or the default collation (for example, fn:index-of
).
Functions that use the static context of the caller to expand or disambiguate
the values of supplied arguments: for example fn:doc
expands its first
argument using the static base URI of the caller, and xs:QName
expands its first argument
using the in-scope namespaces of the caller.
Some functions depend on aspects of the dynamic context that remain invariant
within an
User-defined functions in XQuery and XSLT may depend on the static context of the function definition (for example, the in-scope namespaces) and also in a limited way on the dynamic context (for example, the values of global variables). However, the only way they can depend on the static or dynamic context of the caller — which is what concerns us here — is by defining optional parameters whose default values are context-dependent.
Because the focus is a specific part of the dynamic context, all
A fn:function-lookup
.
The principle in such cases is that the static context used for the function evaluation
is taken from the static context of the named function reference, partial function application, or the call
on fn:function-lookup
; and the dynamic context for the function evaluation is taken from the dynamic
context of the evaluation of the named function reference, partial function application, or the call
of fn:function-lookup
.
The result of a dynamic call to a function item never depends on the static or dynamic context of the dynamic function call, only (where relevant) on the the captured context held within the function item itself.
Context-dependent functions fall into a number of categories:
The functions fn:current-date
, fn:current-dateTime
, fn:current-time
,
fn:default-language
, fn:implicit-timezone
,
fn:adjust-date-to-timezone
, fn:adjust-dateTime-to-timezone
, and
fn:adjust-time-to-timezone
depend on properties of the dynamic context that are
fixed within the op:
namespace that manipulate dates and times and
that make use of the implicit timezone. These functions will return the same
result if called repeatedly during a single
A number of functions including fn:base-uri#0
, fn:data#0
,
fn:document-uri#0
, fn:element-with-id#1
, fn:id#1
,
fn:idref#1
, fn:lang#1
, fn:last#0
, fn:local-name#0
,
fn:name#0
, fn:namespace-uri#0
, fn:normalize-space#0
,
fn:number#0
, fn:path#0
, fn:position#0
,
fn:root#0
, fn:string#0
, and
fn:string-length#0
depend on the
A function is
A function that
is not
The function fn:default-collation
and many
string-handling operators and functions depend
on the default collation and the in-scope collations, which are both properties
of the static context. If a particular call of one of these functions is
evaluated twice with the same arguments then it will return the same result
each time (because the static context, by definition, does not change at run
time). However, two distinct calls (that is, two calls on the function
appearing in different places in the source code) may produce different results
even if the explicit arguments are the same.
Functions such as fn:static-base-uri
, fn:doc
, and fn:collection
depend on
other aspects of the static context. As with functions that depend on
collations, a single call will produce the same results on each call if the
explicit arguments are the same, but two calls appearing in different places in
the source code may produce different results.
The fn:function-lookup
function is a special case because it is
potentially dependent on everything in the static and dynamic context. This is because the static and dynamic
context of the call to fn:function-lookup
fn:function-lookup
returns.
All functions defined in this specification are
fn:distinct-values
, fn:unordered
, map:keys
,
and map:for-each
) produce results in an
Some functions (such as fn:analyze-string
,
fn:parse-xml
, fn:parse-xml-fragment
,
fn:parse-html
, and fn:json-to-xml
)
construct a tree of nodes to
represent their results. There is no guarantee that repeated calls with the same
arguments will return the same identical node (in the sense of the is
operator). However, if non-identical nodes are returned, their content will be the
same in the sense of the fn:deep-equal
function. Such a function is said
to be
Some functions (such as fn:doc
and fn:collection
) create new nodes by reading external
documents. Such functions are guaranteed to be
Where the results of a function are described as being (to a greater or lesser
extent)
Accessors and their semantics are described in
Each of these functions has an arity-zero signature which is equivalent to the arity-one
form, with the context value supplied as the implicit first argument. In addition, each of the
arity-one functions accepts an empty sequence as the argument, in which case it generally delivers
an empty sequence as the result: the exception is fn:string
, which delivers
a zero-length string.
Function | Accessor | Accepts | Returns |
---|---|---|---|
fn:node-name
|
node-name
|
node (optional) | xs:QName (optional)
|
fn:nilled
|
nilled
|
node (optional) | xs:boolean (optional)
|
fn:string
|
string-value
|
item (optional) |
xs:string
|
fn:data
|
typed-value
|
zero or more items | a sequence of atomic values |
fn:base-uri
|
base-uri
|
node (optional) | xs:anyURI (optional)
|
fn:document-uri
|
document-uri
|
node (optional) | xs:anyURI (optional)
|
This section specifies further functions on nodes. Nodes are formally defined
in
This section specifies functions on sequences of nodes.
In this document, as well as in an error is raised
is used. Raising an error is equivalent to calling the fn:error
function defined in this section with the provided error code. Except where otherwise
specified, errors defined in this specification are dynamic errors. Some errors,
however, are classified as type errors. Type errors are typically used where the presence
of the error can be inferred from knowledge of the type of the actual arguments to a function, for
example with a call such as fn:string(fn:abs#1)
. Host languages may allow type errors
to be reported statically if they are discovered during static analysis.
When function specifications indicate that an error is to be raised, the notation
[
is used to specify an error code. Each error defined
in this document is identified by an xs:QName
that is in the
http://www.w3.org/2005/xqt-errors
namespace, represented in this document by the err
prefix. It is this
xs:QName
that is actually passed as an argument to the
fn:error
function. Calling this function raises an error. For a
more detailed treatment of error handing, see
The fn:error
function is a general function that may be called as above
but may also be called from xs:QName
argument.
This section specifies arithmetic operators on the numeric datatypes defined in
The operators described in this section are defined on the following atomic types.
&common-numeric-types.xml;They also apply to types derived by restriction from the above types.
The type xs:numeric
is defined as a union type whose member types are
(in order) xs:double
, xs:float
, and xs:decimal
. This type is implicitly imported
into the static context, so it can also be used in defining the signature of user-written functions. Apart from the fact that
it is implicitly imported, it behaves exactly like a user-defined type with the same definition. This means, for example:
If the expected type of a function parameter is given as xs:numeric
, the actual value supplied
can be an instance of any of these three types, or any type derived from these three by restriction (this includes the built-in
type xs:integer
, which is derived from xs:decimal
).
If the expected type of a function parameter is given as xs:numeric
, and the actual value supplied
is xs:untypedAtomic
(or a node whose atomized value is xs:untypedAtomic
), then it will
be cast to the union type xs:numeric
using the rules in xs:double
subsumes the lexical space of the other member types, and
xs:double
is listed first, the effect is that if the untyped atomic value is in the lexical space of
xs:double
, it will be converted to an xs:double
, and if not, a dynamic error occurs.
When the return type of a function is given as xs:numeric
, the actual value returned will be
an instance of one of the three member types (and perhaps also of types derived from these by restriction). The rules
for the particular function will specify how the type of the result depends on the values supplied as arguments.
In many cases, for the functions in this specification, the result is defined to be the same type as the first
argument.
This specification uses xs:float
and xs:double
values.
One consequence of this is that some operations result in the value NaN
(not a number), which
has the unusual property that it is not equal to itself. Another consequence is that some operations return the value negative zero.
This differs from NaN
as being equal to itself and defines only a single zero in the value space.
The text accompanying several functions defines behavior for both positive and negative zero inputs and outputs
in the interest of alignment with -0.0e0
(which is actually a unary minus operator
applied to an xs:double
value) will always return negative zero: see -0
XML Schema 1.1 introduces support for positive and negative zero as distinct values, and also uses the NaN
.
The following functions define the semantics of arithmetic operators defined in
Operator | Meaning |
---|---|
op:numeric-add
|
Addition |
op:numeric-subtract
|
Subtraction |
op:numeric-multiply
|
Multiplication |
op:numeric-divide
|
Division |
op:numeric-integer-divide
|
Integer division |
op:numeric-mod
|
Modulus |
op:numeric-unary-plus
|
Unary plus |
op:numeric-unary-minus
|
Unary minus (negation) |
The parameters and return types for the above operators are in most cases declared to be of type
xs:numeric
, which permits the basic numeric
types: xs:integer
, xs:decimal
, xs:float
and xs:double
, and types derived from them.
In general the two-argument functions require that both arguments are of the same primitive type,
and they return a value of this same type.
The exceptions are op:numeric-divide
, which returns
an xs:decimal
if called with two xs:integer
operands,
and op:numeric-integer-divide
which always returns an xs:integer
.
If the two operands of an arithmetic expression are not of the same type,
The result type of operations depends on their argument datatypes and is defined in the following table:
Operator | Returns |
---|---|
op:operation(xs:integer, xs:integer)
|
xs:integer (except for op:numeric-divide(integer,
integer) , which returns xs:decimal ) |
op:operation(xs:decimal, xs:decimal)
|
xs:decimal
|
op:operation(xs:float, xs:float)
|
xs:float
|
op:operation(xs:double, xs:double)
|
xs:double
|
op:operation(xs:integer)
|
xs:integer
|
op:operation(xs:decimal)
|
xs:decimal
|
op:operation(xs:float)
|
xs:float
|
op:operation(xs:double)
|
xs:double
|
These rules define any operation on any pair of arithmetic types. Consider the following example:
For this operation, xs:int
must be converted to
xs:double
. This can be done, since by the rules above:
xs:int
can be substituted for xs:integer
,
xs:integer
can be substituted for xs:decimal
,
xs:decimal
can be promoted to xs:double
. As far as possible, the promotions should be done in a
single step. Specifically, when an xs:decimal
is promoted to an
xs:double
, it should not be converted to an xs:float
and then to xs:double
, as this risks loss of precision.
As another example, a user may define height
as a derived type of
xs:integer
with a minimum value of 20 and a maximum value of 100.
They may then derive fenceHeight
using an enumeration to restrict the
permitted set of values to, say, 36, 48 and 60.
fenceHeight
can be substituted for its base type
height
and height
can be substituted for its base type
xs:integer
.
The basic rules for addition, subtraction, and multiplication
of ordinary numbers are not set out in this specification; they are taken as given. In the case of xs:double
and xs:float
the rules are as defined in NaN
,
and exception conditions such as overflow and underflow, are described more explicitly since they are not necessarily obvious.
On overflow and underflow situations during arithmetic operations, conforming
implementations
For xs:float
and xs:double
operations, overflow
behavior
Raising a dynamic error
Returning INF
or -INF
.
Returning the largest (positive or negative) non-infinite number.
For xs:float
and xs:double
operations,
underflow behavior
Raising a dynamic error
Returning 0.0E0
or +/- 2**Emin
or a
denormalized value; where Emin
is the smallest
possible xs:float
or xs:double
exponent.
For xs:decimal
operations, overflow behavior 0.0
must be returned.
For xs:integer
operations, implementations that support
limited-precision integer operations
They
They
The functions op:numeric-add
, op:numeric-subtract
,
op:numeric-multiply
, op:numeric-divide
,
op:numeric-integer-divide
and op:numeric-mod
are each
defined for pairs of numeric operands, each of which has the same
type:xs:integer
, xs:decimal
, xs:float
, or
xs:double
. The functions op:numeric-unary-plus
and
op:numeric-unary-minus
are defined for a single operand whose type
is one of those same numeric types.
For xs:float
and xs:double
arguments, if either
argument is NaN
, the result is NaN
.
For xs:decimal
values, let N be the number of digits
of precision supported by the implementation, and let M (M <= N
) be the minimum limit on the number of digits
required for conformance (18 digits for XSD 1.0, 16 digits for XSD 1.1). Then for addition, subtraction, and multiplication
operations, the returned result
This specification does not determine whether xs:decimal
operations are fixed point or floating point.
In an implementation using floating point it is possible for very simple operations to require more digits of precision than
are available; for example, adding 1e100
to 1e-100
requires 200 digits of precision for an
accurate representation of the result.
The divideByZero
and invalidOperation
. The
IEEE divideByZero
exception is raised not only by a direct attempt to divide by zero, but also by
operations such as log(0)
. The IEEE invalidOperation
exception is raised by
attempts to call a function with an argument that is outside the function’s domain (for example,
sqrt(-1)
or log(-1)
).
Although IEEE defines these as exceptions, it also defines “default non-stop exception handling” in
which the operation returns a defined result, typically positive or negative infinity, or NaN
. With this
function library,
these IEEE exceptions do not cause a dynamic error
at the application level; rather they result in the relevant function or operator returning
the defined non-error result.
The underlying IEEE exception -INF
, +INF
, or NaN
) with no error.
The NaN
values:
a quiet NaN
and a signaling NaN
. These two values are not distinguishable in the XDM model:
the value spaces of xs:float
and xs:double
each include only a single
NaN
value. This does not prevent the implementation distinguishing them internally,
and triggering different
The six value comparison operators eq
, ne
, lt
,
le
, gt
, and ge
are defined in terms of two
underlying functions: op:numeric-equal
and op:numeric-less-than
.
These functions are defined to operate on values of the same type.
If the arguments are of different types, one argument is promoted to the type of the other
as described above in NaN
,
false
is returned.
For a description of the different ways of comparing numeric
values using the operators =
and eq
and the functions
fn:deep-equal
and fn:atomic-equal
,
see
See also the function fn:compare
.
The following functions are defined on numeric types. Each function returns a value of the same type as the type of its argument.
If the argument is the empty sequence, the empty sequence is returned.
For xs:float
and xs:double
arguments, if the
argument is NaN
, NaN
is returned.
With the exception of fn:abs
, functions with arguments of
type xs:float
and xs:double
that are positive or
negative infinity return positive or negative infinity.
fn:round
and fn:round-half-to-even
produce the same
result in all cases except when the argument is exactly midway between two values
with the required precision.
Other ways of rounding midway values can be achieved as follows:
Towards negative infinity: -round(-$x)
Away from zero: round(abs($x)) * compare($x, 0)
Towards zero: abs(round(-$x)) * -compare($x, 0)
It is possible to convert strings to values of type xs:integer
,
xs:float
, xs:decimal
, or xs:double
using the constructor functions described in cast
expressions as described in
In addition the fn:number
function is available to convert strings
to values of type xs:double
. It differs from the xs:double
constructor function in that any value outside the lexical space of the xs:double
datatype is converted to the xs:double
value NaN
.
This section defines a function for formatting decimal and floating point numbers.
This function can be used to format any numeric quantity, including an integer. For integers, however,
the fn:format-integer
function offers additional possibilities. Note also that the picture
strings used by the two functions are not 100% compatible, though they share some options in common.
Decimal formats are defined in the static context, and the way they are defined is therefore outside the scope of this specification. XSLT and XQuery both provide custom syntax for creating a decimal format.
The static context provides a set of decimal formats. One of the decimal formats is unnamed, the others (if any)
are identified by a QName. There is always an unnamed decimal format available, but its contents are
Each decimal format provides a set of named properties.
A phrase such as "The
For any decimal format, the properties
representing characters used in a
This differs from the format-number
function previously defined in XSLT 2.0 in that
any digit can be used in the picture string to represent a mandatory digit: for example the picture
strings "000"
, "001"
, and "999"
are equivalent.
The digits will all be from the same decimal digit family,
specifically, the sequence of ten consecutive digits starting with the digit assigned to the zero-digit property.
This change is to align format-number
(which previously used "000"
) with format-dateTime
(which used 001
).
A dynamic error is raised
A picture-string consists either of a sub-picture, or of
two sub-pictures separated by the
A sub-picture
A sub-picture
The mantissa part of a
sub-picture (defined below)
A sub-picture
A sub-picture
A sub-picture
The integer part of a sub-picture (defined below)
A character that matches the
A sub-picture that contains a
If a sub-picture contains a character treated as an
exponent-separator-sign then this
The mantissa part of the sub-picture is defined as the part that appears to the left of the exponent-separator-sign if there is one, or the entire sub-picture otherwise. The exponent part of the subpicture is defined as the part that appears to the right of the exponent-separator-sign; if there is no exponent-separator-sign then the exponent part is absent.
The integer part of the sub-picture is defined as the part that
appears to the left of the
The fractional part of the sub-picture is defined as that
part of the mantissa part that
appears to the right of the
This phase of the algorithm analyzes
the
Several variables are associated with each sub-picture. If there are two sub-pictures, then these rules are applied to one sub-picture to obtain the values that apply to positive and unsigned zero numbers, and to the other to obtain the values that apply to negative numbers. If there is only one sub-picture, then the values for both cases are derived from this sub-picture.
The variables are as follows:
The integer-part-grouping-positions is a sequence of integers
representing the positions of grouping separators within the integer part of the
sub-picture. For each
The grouping is defined to be
There is an least one grouping-separator in the integer part of the sub-picture.
There is a positive integer G (the grouping size) such that the position of every grouping-separator in the integer part of the sub-picture is a positive integer multiple of G.
Every position in the integer part of the sub-picture that is a positive integer multiple of G is occupied by a grouping-separator.
If the grouping is regular, then the integer-part-grouping-positions sequence contains all integer multiples of G as far as necessary to accommodate the largest possible number.
The minimum-integer-part-size is an integer indicating the minimum number of digits that will
appear to the left of the decimal-separator character. It is initially set to
the number of
There is no maximum integer part size. All significant digits in the integer part of the
number will be displayed, even if this exceeds the number of
The scaling factor is a non-negative integer used to determine the scaling of the mantissa
in exponential notation. It is set to the number of
The prefix is set to contain all passive characters
in the sub-picture to the left of the leftmost active character.
If the picture string contains only one sub-picture,
the prefix
for the negative sub-picture is set by concatenating the
The fractional-part-grouping-positions is a sequence of integers
representing the positions of grouping separators within the fractional part of the
sub-picture. For each
There is no need to extrapolate grouping positions on the fractional side,
because the number of digits in the output will never exceed the number of
The minimum-fractional-part-size is set to the number of
The maximum-fractional-part-size is set to the total number of
If the effect of the above rules is that minimum-integer-part-size and maximum-fractional-part-size are both zero, then an adjustment is applied as follows:
If an exponent separator is present then:
minimum-fractional-part-size is changed to 1 (one).
maximum-fractional-part-size is changed to 1 (one).
This has the effect that with the picture #.e9
, the value 0.123
is formatted as 0.1e0
Otherwise:
minimum-integer-part-size is changed to 1 (one).
This has the effect that with the picture #
, the value 0.23
is formatted
as 0
If all the following conditions are true:
An exponent separator is present
The minimum-integer-part-size is zero
There is at least one
then the minimum-integer-part-size is changed to 1 (one).
This has the effect that with the picture .9e9
, the value 0.1
is formatted
as .1e0
, while with the picture #.9e9
, it is formatted as 0.1e0
If (after making the above adjustments) the minimum-integer-part-size and the minimum-fractional-part-size are both zero, then the minimum-fractional-part-size is set to 1 (one).
The minimum-exponent-size is set to the number of
The rules for the syntax of the picture string ensure that if an exponent separator is present, then the minimum-exponent-size will always be greater than zero.
The suffix is set to contain all passive characters to the right of the rightmost active character in the sub-picture.
If there is only one sub-picture, then all variables
for positive numbers and negative numbers will be the same, except for
prefix: the prefix for negative numbers will
be preceded by the
This section describes the second phase of processing of the
fn:format-number
function. This phase takes as input a number to be formatted
(referred to as the fn:format-number
function.
The algorithm for this second stage of processing is as follows:
If the input number is NaN
(not a number), the result is the
value of the
In the rules below, the positive sub-picture and its associated variables are used
if the input number is positive, and the negative sub-picture and its associated
variables are used if it is negative. For xs:double
and xs:float
,
negative zero is taken as negative, positive zero as positive. For xs:decimal
and xs:integer
, the positive sub-picture is used for zero.
The adjusted number is determined as follows:
If the sub-picture contains a
If the sub-picture contains a
Otherwise, the adjusted number is the input number.
If the multiplication causes numeric overflow, no error occurs, and the adjusted number is positive or negative infinity as appropriate.
If the adjusted number is positive or negative infinity, the result is the
concatenation of the appropriate prefix, the value of the
If the minimum exponent size is non-zero,
The primitive type of the mantissa is the same as the primitive type of the adjusted number (integer, decimal, float, or double).
The mantissa multiplied by ten to the power of the exponent is equal to the adjusted number.
The mantissa
If the minimum exponent size is zero, then the mantissa is the adjusted number and there is no exponent.
If the minimum exponent size is non-zero and the adjusted number is zero, then the mantissa is the adjusted number and the exponent is zero.
The mantissa is converted (if necessary) to
an xs:decimal
value,
using an implementation of xs:decimal
that imposes no limits on the
totalDigits
or fractionDigits
facets. If there are several
such values that
are numerically equal to the mantissa (bearing in mind that if the
mantissa is an xs:double
or xs:float
, the comparison will be done by
converting the decimal value back to an xs:double
or xs:float
), the one that
is chosen maximum-fractional-part-size
digits in
its fractional part. The rounded number is defined to be the result of
converting the mantissa to an xs:decimal
value, as described above,
and then calling the function fn:round-half-to-even
with this converted number
as the first argument and the maximum-fractional-part-size
as the second
argument, again with no limits on the totalDigits
or fractionDigits
in the
result.
The absolute value of the rounded number is converted to a string in decimal notation,
using the digits in the
If the number of digits to the left of the
If the number of digits to the right of the
For each integer N in the integer-part-grouping-positions list,
a
For each integer N in the fractional-part-grouping-positions list,
a
If there is no
If an exponent exists, then the string
produced from the mantissa as described above is extended with
the following, in order:
(a) the
The result of the function is the concatenation of the appropriate prefix, the string conversion of the number as obtained above, and the appropriate suffix.
The functions in this section perform trigonometric and other mathematical calculations on xs:double
values. They
are provided primarily for use in applications performing geometrical computation, for example when generating
SVG graphics.
Functions are provided to support the six most commonly used trigonometric calculations: sine, cosine and tangent, and their inverses arc sine, arc cosine, and arc tangent. Other functions such as secant, cosecant, and cotangent are not provided because they are easily computed in terms of these six.
The functions in this section (with the exception of math:pi
)
are specified by reference to xs:double
values. The IEEE specification
applies with the following caveats:
IEEE states that the preferred quantum is language-defined. In this
specification, it is
IEEE states that certain functions should raise the inexact exception if the result is inexact. In this specification, this exception if it occurs does not result in an error. Any diagnostic information is outside the scope of this specification.
IEEE defines various rounding algorithms for inexact results, and states
that the choice of rounding direction, and the mechanisms for influencing this choice,
are language-defined. In this specification, the rounding direction and any mechanisms for
influencing it are
Certain operations (such as taking the square root of a negative number)
are defined in IEEE to signal the invalid operation exception and return a
quiet NaN
. In this specification, such operations return NaN
and do not raise an error. The same policy applies to operations (such as taking
the logarithm of zero) that raise a divide-by-zero exception. Any diagnostic
information is outside the scope of this specification.
Operations whose mathematical result is greater than the largest finite xs:double
value are defined in IEEE to signal the overflow exception; operations whose mathematical
result is closer to zero than the smallest non-zero xs:double
value are similarly
defined in IEEE to signal the underflow exception. The treatment of these exceptions in
this specification is defined in
This section specifies functions and operators on the xs:string
datatype and the datatypes derived from it.
The operators described in this section are defined on the following types.
&common-string-types.xml;They also apply to user-defined types derived by restriction from the above types.
A collation is a specification of the manner in which xs:string
or a type derived from xs:string
are
compared (or, equivalently, sorted), the comparisons are inherently
performed according to some collation (even if that collation is defined
entirely on codepoint values). The
Collations can indicate that two different codepoints are, in fact, equal for comparison purposes (e.g., “v” and “w” are considered equivalent in some Swedish collations). Strings can be compared codepoint-by-codepoint or in a linguistically appropriate manner, as defined by the collation.
Some collations, especially those based on the
Unicode Collation Algorithm (see
The
Collations may or may not perform Unicode normalization on strings before comparing them.
This specification assumes that collations are named and that the collation
name may be provided as an argument to string functions. Functions that
allow specification of a collation do so with an argument whose type is
xs:string
but whose lexical form must conform to an
xs:anyURI
.
This specification also defines the manner in which a
default collation is determined if the collation argument is not specified
in calls of functions that use a collation but allow it to be omitted.
If the collation is specified using a relative URI reference,
it is resolved relative to an
Previous versions of this specification stated that it must
be resolved against the
This specification does not define whether or not the collation URI is
dereferenced. The collation URI may be an abstract identifier, or it may
refer to an actual resource describing the collation. If it refers to a
resource, this specification does not define the nature of that resource.
One possible candidate is that the resource is a locale description
expressed using the Locale Data Markup Language: see
Functions such as fn:compare
and fn:max
that
compare xs:string
values use a single collation URI to identify
all aspects of the collation rules. This means that any parameters such as
the strength of the collation must be specified as part of the collation
URI. For example, suppose there is a collation
http://www.example.com/collations/French
that refers to a French collation that compares on the basis of
base characters. Collations that use the same basic rules, but with higher
strengths, for example, base characters and accents, or base characters,
accents and case, would need to be given different names, say
http://www.example.com/collations/French1
and
http://www.example.com/collations/French2
.
Note that some specifications use the term collation to refer to
an algorithm that can be parameterized, but in this specification, each
possible parameterization is considered to be a distinct collation.
The XQuery/XPath static context includes a provision for a default collation
that can be used for string comparisons and ordering operations. See the
description of the static context in
XML allows elements to specify the xml:lang
attribute to
indicate the language associated with the content of such an element.
This specification does not use xml:lang
to identify the
default collation because using
xml:lang
does not produce desired effects when the two
strings to be compared have different xml:lang
values or
when a string is multilingual.
http://www.w3.org/2005/xpath-functions/collation/codepoint
identifies
a collation which must be recognized by every implementation: it is referred to as
the
The Unicode codepoint collation does not perform any normalization on the supplied strings.
The collation is defined as follows. Each of the two strings is
converted to a sequence of integers using the fn:string-to-codepoints
function. These two sequences $A
and $B
are then compared as follows:
If both sequences are empty, the strings are equal.
If one sequence is empty and the other is not, then the string corresponding to the empty sequence is less than the other string.
If the first integer in $A
is less than the first integer in $B
, then
the string corresponding to $A
is less than the string corresponding to
$B
.
If the first integer in $A
is greater than the first integer in $B
, then
the string corresponding to $A
is greater than the string corresponding to
$B
.
Otherwise (the first pair of integers are equal), the result is obtained
by applying the same rules recursively to fn:tail($A)
and
fn:tail($B)
While the Unicode codepoint collation does not produce results suitable for quality publishing of printed indexes or directories, it is adequate for many purposes where a restricted alphabet is used, such as sorting of vehicle registrations.
This specification defines a family of collation URIs representing tailorings of the Unicode Collation
Algorithm (UCA) as defined in
This family of URIs use the scheme and path http://www.w3.org/2013/collation/UCA
followed by an optional query part. The query part, if present, consists of a question mark followed
by a sequence of zero or more semicolon-separated parameters. Each parameter is a keyword-value pair, the
keyword and value being separated by an equals sign.
All implementations must recognize URIs in this family in the collation
argument of functions that
take a collation argument.
If the fallback
parameter is
present with the value no
, then the implementation fallback
parameter
is omitted or takes the value yes
, and if the collation URI is well-formed according to the rules in this section,
then the implementation http://www.w3.org/2013/collation/UCA?lang=se;fallback=yes
and the implementation does not include a fully
conformant version of the UCA tailored for Swedish, then it
If two query parameters use the same keyword then the last one wins. If a query parameter uses a keyword or value which is not
defined in this specification then the meaning is fallback
parameter is present with the value no
it should reject
the collation as unsupported, otherwise it should ignore the unrecognized parameter.
The following query parameters are defined. If any parameter is absent, the default is
Keyword | Values | Meaning |
---|---|---|
fallback | yes | no (default yes) | Determines whether the processor uses a fallback collation if a conformant collation is not available. |
lang | language code: a string in the lexical space of xs:language . | The language whose collation conventions are to be used. |
version | string | The version number of the UCA to be used. |
strength | primary | secondary | tertiary | quaternary | identical, or 1|2|3|4|5 as synonyms (default tertiary / 3) | The collation strength as defined in UCA. Primary
strength takes only the base form of the character into account (so A=a=Äaut;=äaut;); secondary strength ignores case but considers accents
and diacritics as significant (so A=a and Äaut;=äaut; but äaut;≠a); tertiary considers case as significant (A≠a≠Äaut;≠äaut;); quaternary strength always considers as significant spaces and punctuation
(data-base≠database; if maxVariable is punct or higher and
alternate is not non-ignorable , lower strengths will treat data-base=database). |
maxVariable | space | punct | symbol | currency (default punct) |
Given the sequence space , punct , symbol , currency ,
all characters in the specified group and earlier groups are treated as “noise” characters
to be handled as defined by the alternate parameter. For example, maxVariable=punct indicates
that characters classified as whitespace or punctuation get this treatment. |
alternate | non-ignorable | shifted | blanked (default non-ignorable) | Controls the handling of characters such as spaces and hyphens;
specifically, the "noise" characters in the groups selected by the maxVariable parameter. The value non-ignorable
indicates that such characters are treated as distinct at the primary level (so data base sorts before database );
shifted indicates that they are used to differentiate two strings only at the quaternary level,
and blanked indicates that they are taken into account only at the identical level. |
backwards | yes | no (default no) | The value backwards=yes indicates that the last accent in the
string is the most significant. |
normalization | yes | no (default no) | Indicates whether strings are converted to normalization form D. |
caseLevel | yes | no (default no) | When used with primary strength, setting caseLevel=yes has the effect of ignoring accents
while taking account of case. |
caseFirst | upper | lower (default lower) | Indicates whether upper-case precedes lower-case or vice versa. |
numeric | yes | no (default no) | When numeric=yes is specified, a sequence of consecutive digits is interpreted as a number,
for example chap2 sorts before chap12 . |
reorder | a comma-separated sequence of reorder codes, where a reorder code is one of space , punct ,
symbol , currency , digit , or a four-letter script code defined in |
Determines the relative ordering of text in different scripts; for example the value digit,Grek,Latn indicates
that digits precede Greek letters, which precede Latin letters. |
This list excludes parameters that are inconvenient to express in a URI, or that are applicable only to substring matching.
UCA collation URIs can be conveniently generated using the
fn:collation
function.
The collation URI http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive
must be recognized
by every implementation. It is class
attribute values.
The collation is defined as follows:
Let $HACI
be the collation URI
"http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive"
.
Let $UCC
be the Unicode Codepoint Collation URI
http://www.w3.org/2005/xpath-functions/collation/codepoint
.
Let $lc
be the function
fn:translate(?, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz")
.
Then for any two strings $A
and $B
, the result
of the comparison fn:compare($A, $B, $HACI)
is defined to be the same as
the result of fn:compare($lc($A), $lc($B), $UCC)
.
HTML5 defines the semantics of equality matching using this collation; fn:contains
; each Unicode codepoint is a single collation unit.
The corresponding HTML5 definition is: A string A is an ASCII case-insensitive match for a string B, if the ASCII lowercase of A is the ASCII lowercase of B.
Many functions have a signature that includes a $collation
argument, which is generally optional and takes default-collation()
as its default value.
The collation to use for these functions is determined by the following rules:
If the function specifies an explicit collation, CollationA (e.g., if
the optional collation argument is specified in a call of the
fn:compare
function), then:
If CollationA is supported by the implementation, then CollationA is used.
Otherwise, a dynamic error is raised
If no collation is explicitly specified for the function
$collation
argument is omitted or is set to an
empty sequence)
If CollationB is supported by the implementation, then CollationB is used.
Otherwise, a dynamic error is raised
Because the set of collations that are supported is
If the value of the collation argument is a relative URI reference, it is resolved against the base-URI from the
static context. If it is a relative URI reference and cannot be resolved, perhaps because the base-URI property in the static context
is absent, a dynamic error is raised
There is no explicit requirement that the string used as a collation URI be a valid URI.
Implementations will in many cases reject such strings on the grounds that do not identify a supported collation; they
may also cause an error if they cannot be resolved against the
The following functions are defined on values of type xs:string
and
types derived from it.
When the above operators and functions are applied to datatypes derived from
xs:string
, they are guaranteed to return values that are instances of
xs:string
, but the value might or might not be an instance of the
particular subtype of xs:string
to which they were applied.
The strings returned by fn:concat
and fn:string-join
are not guaranteed to be normalized.
But see note in fn:concat
.
The functions described in this section examine a string $arg1
to see
whether it contains another string $arg2
as a substring. The result
depends on whether $arg2
is a substring of $arg1
, and
if so, on the range of $arg1
which $arg2
matches.
When the $arg1
contains a
contiguous sequence of characters whose $arg2
.
When a collation is specified, the rules are more complex.
All collations support the capability of deciding whether two fn:compare
, this is
all that is required. For other functions, such as fn:contains
,
the collation needs to support an additional property: it must be able to
decompose the string into a sequence of collation units, each unit consisting of
one or more characters, such that two strings can be compared by pairwise
comparison of these units. (“collation unit” is equivalent to "collation
element" as defined in $arg1
is then considered to contain $arg2
as a
substring if the sequence of collation units corresponding to $arg2
is a subsequence of the sequence of the collation units corresponding to
$arg1
. The characters in $arg1
that match are the
characters corresponding to these collation units.
This rule may occasionally lead to surprises. For example, consider a collation
that treats "Jaeger"
and "Jäaut;ger"
as equal. It might do this by treating "äaut;"
as representing
two collation units, in which case the
expression fn:contains("Jäaut;ger", "eg")
will return
true
. Alternatively, a collation might treat "ae" as a single
collation unit, in which case the expression fn:contains("Jaeger",
"eg")
will return false
. The results of these functions thus
depend strongly on the properties of the collation that is used.
In addition,
collations may specify that some collation units should be ignored during matching. If hyphen is an ignored
collation unit, then fn:contains("code-point", "codepoint")
will be true
,
and fn:contains("codepoint", "-")
will also be true
.
In the definitions below, we refer to the terms
C is the collation; that is, the value of the $collation
argument if specified, otherwise the default collation.
P is the (candidate) substring $arg2
Q is the (candidate) containing string $arg1
The boundary condition B is satisfied at the start and end of a
string, and between any two characters that belong to different collation units
(“collation elements” in the language of
It is possible to define collations that do not have the ability to decompose a
string into units suitable for substring matching. An argument to a function
defined in this section may be a URI that identifies a collation that is able to
compare two strings, but that does not have the capability to split the string
into collation units. Such a collation may cause the function to fail, or to
give unexpected results, or it may be rejected as an unsuitable argument. The
ability to decompose strings into collation units is an
The four functions described in this section make use of a regular expression syntax for pattern matching, described below.
The regular expression syntax used by these functions is defined in terms of
the regular expression syntax specified in XML Schema (see
It is recommended that implementers consult
The regular expression syntax and semantics are identical to those
defined in
In
Implementers, even in cases where XSD 1.1 is not supported, are advised to consult the XSD 1.1 regular expression specification for guidance on how to handle cases where the XSD 1.0 specification is unclear or inconsistent.
Two meta-characters, ^
and $
are
added. By default, the meta-character ^
matches the
start of the entire string, while $
matches the end
of the entire string. In multi-line mode, ^
matches
the start of any line (that is, the start of the entire string,
and the position immediately after a newline character), while
$
matches the end of any line (that is, the end of
the entire string, and the position immediately before a newline
character). Newline here means the character #x0A
only.
This means that the production in
[10] Char ::= [^.\?*+()|#x5B#x5D]
is modified to read:
[10] Char ::= [^.\?*+{}()|^$#x5B#x5D]
The XSD 1.1 grammar for regular expressions uses the same
production rule, but renumbered and renamed [73] NormalChar
; it
is affected in the same way.
The characters #x5B
and #x5D
correspond
to [
and ]
respectively.
The definition of Char (production [10]) in {
) and right brace (}
). That error is corrected here.
The following production:
[11] charClass ::= charClassEsc | charClassExpr | WildCardEsc
is modified to read:
[11] charClass ::= charClassEsc | charClassExpr |
WildCardEsc | "^" | "$"
Using XSD 1.1 as the baseline the equivalent is to change the production:
[74] charClass ::= SingleCharEsc | charClassEsc | charClassExpr | WildCardEsc
to read:
[74] charClass ::= SingleCharEsc | charClassEsc | charClassExpr |
WildCardEsc | "^" | "$"
Single character escapes are extended to allow the
$
character to be escaped. #
character may be escaped: see
[24]SingleCharEsc ::= '\' [nrt\|.?*+(){}#x2D#x5B#x5D#x5E]
to
[24]SingleCharEsc ::= '\' [nrt\|.?*+(){}$#x2D#x5B#x5D#x5E\#]
(In the XSD 1.1 version of the regular expression grammar, the production rule
for SingleCharEsc
is unchanged from 1.0, but is renumbered [84])
?
following a quantifier. Specifically:
X??
matches X, once or not at all
X*?
matches X, zero or more times
X+?
matches X, one or more times
X{n}?
matches X, exactly n times
X{n,}?
matches X, at least n times
X{n,m}?
matches X, at least n times, but
not more than m times
The effect of these quantifiers is that the regular expression
matches the ?
, the regular expression matches the
To achieve this, the production in
[4] quantifier ::= [?*+] | ( '{' quantity '}' )
is changed to:
[4] quantifier ::= ( [?*+] | ( '{' quantity '}' ) ) '?'?
(In the XSD 1.1 version of the regular expression grammar, this rule is unchanged from 1.0, but is renumbered [67])
Reluctant quantifiers have no effect on the results of the
boolean fn:matches
function, since this
function is only interested in discovering whether a match
exists, and not where it exists.
Sub-expressions (groups) within the regular expression are
recognized. The regular expression syntax defined by fn:replace
function) allow access to the parts of the
input string that matched a sub-expression (called captured substrings).
?:
(see below), is not within a character group (square brackets),
and is not escaped with a backslash. The sub-expression enclosed by a capturing left
parenthesis and its matching right parenthesis is referred to as a
More specifically, the
For example, in the regular expression A(BC(?:D(EF(GH[()]))))
, the string matched
by the sub-expression BC(?:D(EF(GH[()])))
is capturing sub-expression 1, the string
matched by EF(GH[()])
is capturing sub-expression 2, and the string matched by
GH[()]
is capturing sub-expression 3.
When, in the course of evaluating a regular expression, a particular substring of the input
matches a capturing sub-expression, that substring becomes available as a
When a (a*)+
and the input string "aaaa"
, an implementation
might legitimately capture either "aaaa"
or a zero length string as the content
of the captured subgroup.
Parentheses that are required to group terms within the regular expression, but which are
not required for capturing of substrings, can be represented using
the syntax (?:xxxx)
. To achieve this, the production rule for atom
in
( '(' regExp ')' )
with:
( '(' '?:'? regExp ')' )
(For the new versions of the XSD 1.0 and XSD 1.1 production rules for
atom
, see below.)
In the absence of back-references (see below),
the presence of the optional ?:
has no effect on the set of strings
that match the regular expression, but causes the left parenthesis not to be counted
by operations (such as fn:replace
and back-references) that number the capturing sub-expressions
within a regular expression.
Back-references are allowed
outside a character class expression.
A back-reference is an additional kind of atom.
The construct \N
where
N
is a single digit is always recognized as a
back-reference; if this is followed by further digits, these
digits are taken to be part of the back-reference if and only if
the resulting number NN is such that
the back-reference is preceded by the opening parenthesis of the NNth
capturing left parenthesis.
The regular expression is invalid if a back-reference refers to a
capturing sub-expression that does not exist or whose
closing right parenthesis occurs after the back-reference.
A back-reference with number N matches a string that is the same as
the value of the N
th captured substring.
For example, the regular expression
('|").*\1
matches a sequence of characters
delimited either by an apostrophe at the start and end, or by a
quotation mark at the start and end.
If no string has been matched by the N
th capturing
sub-expression, the back-reference is interpreted as matching
a zero-length string.
Combining this change with the introduction of non-capturing groups (see above), back-references change the following production:
[9] atom ::= Char | charClass | ( '(' regExp ')' )
to
[9] atom ::= Char | charClass | ( '(' '?:'? regExp ')' ) | backReference
[9a] backReference ::= "\" [1-9][0-9]*
With respect to the XSD 1.1 version of the regular expression grammar, the effect is to change:
[72] atom ::= NormalChar | charClass | ( '(' regExp ')' )
to
[72] atom ::= NormalChar | charClass | ( '(' '?:'? regExp ')' ) | backReference
[72a] backReference ::= "\" [1-9][0-9]*
Within a character class expression,
\
followed by a digit is invalid.
Some other regular expression languages interpret this as an octal character reference.
A regular expression that uses a Unicode block name that is not defined in the version(s) of Unicode
supported by the processor (for example \p{IsBadBlockName}
) is deemed to be invalid
XSD 1.0 does not say how this situation should be handled; XSD 1.1 says that it should be handled by treating all characters as matching.
Comments are enabled in regular expressions if the c
flag is present.
A comment starts with a #
character that is not escaped with an immediately
preceding backslash, and that is not contained in a CharClassExpr
(that is,
in square brackets). It ends with the following #
character, or with the
end of the string containing the regular expression.
Whether or not the c
flag is present, the production for
SingleCharEsc
is extended to allow the #
character
to be escaped.
All these functions provide an optional parameter, $flags
,
to set options for the interpretation of the regular expression. The
parameter accepts a xs:string
, in which individual letters
are used to set options. The presence of a letter within the string
indicates that the option is on; its absence indicates that the option
is off. Letters may appear in any order and may be repeated. They are case-sensitive. If there
are characters present that are not defined here as flags, then a dynamic error
is raised
The following options are defined:
s
: If present, the match operates in “dot-all”
mode. (Perl calls this the single-line mode.) If the
s
flag is not specified, the meta-character
.
matches any character except a newline
(#x0A
) or carriage return (#x0D
)
character. In dot-all mode, the
meta-character .
matches any character whatsoever.
Suppose the input contains the strings "hello"
and
"world"
on two lines.
This will not be matched by the regular expression
"hello.*world"
unless dot-all mode is enabled.
m
: If present, the match operates in multi-line
mode. By default, the meta-character ^
matches the
start of the entire string, while $ matches the end of the
entire string. In multi-line mode, ^
matches the
start of any line (that is, the start of the entire string, and
the position immediately after a newline character
other than a newline
that appears as the last character in the string), while
$
matches the end of any line
(that is, the position immediately
before a newline character, and the end of the entire string if there is no
newline character at the end of the string).
Newline here means the character #x0A
only.
i
: If present, the match operates in
case-insensitive mode. The detailed rules are as follows.
In these
rules, a character C2 is considered to be a true
when the two characters
are considered as strings of length one, and the
fn:lower-case(C1) eq fn:lower-case(C2) or
fn:upper-case(C1) eq fn:upper-case(C2)
Note that the case-variants of a character under this definition are always single characters.
When a normal character (Char
) is used as an atom,
it represents
the set containing that character and all its case-variants.
For example, the regular expression "z"
will
match both "z"
and "Z"
.
A character range (production charRange
in the XSD 1.0 grammar, replaced by productions charRange
and singleChar
in XSD 1.1) represents the set
containing all the characters that it would match in the absence
of the i
flag, together with their case-variants.
For example,
the regular expression "[A-Z]"
will match all
the letters A
to Z
and all the letters
a
to z
. It will also match
certain other characters such as #x212A
(KELVIN SIGN), since
fn:lower-case("#x212A")
is k
.
This rule applies also to a character range used in a character
class subtraction (charClassSub
): thus [A-Z-[IO]] will match
characters such as A
, B
, a
, and b
, but will not match
I
, O
, i
, or o
.
The rule also applies to a character range used as part of a
negative character group: thus "[^Q]"
will match every character
except Q
and q
(these being the only case-variants of Q
in
Unicode).
A back-reference is compared using case-blind comparison:
that is, each character must either be the same as the
corresponding character of the previously matched string, or must
be a case-variant of that character. For example, the strings
"Mum"
, "mom"
, "Dad"
,
and "DUD"
all match the regular
expression "([md])[aeiou]\1"
when the i
flag is used.
All other constructs are unaffected by the i
flag.
For example,
"\p{Lu}"
continues to match upper-case letters only.
x
: If present, whitespace characters
(#x9
, #xA
, #xD
and #x20
)
in the regular expression are removed prior to matching with one exception:
whitespace characters within character class expressions
(charClassExpr
) are not removed. This flag can be used,
for example, to break up long regular expressions into readable lines.
Examples:
fn:matches("helloworld", "hello world", "x")
returns true()
fn:matches("helloworld", "hello[ ]world", "x")
returns false()
fn:matches("hello world", "hello\ sworld", "x")
returns true()
fn:matches("hello world", "hello world", "x")
returns false()
Whitespace is treated as a lexical construct to be removed before the regular expression is parsed; it is therefore not explicit in the regular expression grammar.
q
: if present, all characters in the regular expression
are treated as representing themselves, not as metacharacters. In effect, every
character that would normally have a special meaning in a regular expression is implicitly escaped
by preceding it with a backslash.
Furthermore, when this flag is present, the characters $
and
\
have no special significance when used in the replacement string
supplied to the fn:replace
function.
This flag can be used in conjunction with the i
flag. If it is used
together with the m
, s
, x
,
c
Examples:
tokenize("12.3.5.6", ".", "q")
returns ("12", "3", "5", "6")
replace("a\b\c", "\", "\\", "q")
returns "a\\b\\c"
replace("a/b/c", "/", "$", "q")
returns "a$b$c"
matches("abcd", ".*", "q")
returns false()
matches("Mr. B. Obama", "B. OBAMA", "iq")
returns true()
c
: if present, comments are enabled
in the regular expression. This flag has no effect if the q
flag is
present. A comment is recognized by the presence of a #
character that
is not escaped by a backslash or contained in a character class expression
(charClassExpr
), and it is terminated by the following #
character or by the end of the regular expression string.
For example:
replace("03/24/2025", "(..#month#)/(..#day#)/(....#year#)", "$3-$1-$2", "c")
Comments are treated as a lexical construct to be removed before the regular expression is parsed; they are therefore not explicit in the regular expression grammar.
This section specifies functions that manipulate URI values, either as instances
of xs:anyURI
or as strings.
This section specifies functions that parse strings as URIs, to identify their structure, and construct URI strings from their structured representation.
Some URI schemes are hierarchical and some are non-hierarchical.
Implementations must treat the following schemes as non-hierarchical:
jar
, mailto
, news
, tag
,
tel
, and urn
. Whether additional schemes
are known to be non-hierarchical
The structured representation of a URI is described by the
uri-structure-record
:
The parts of this structure are:
The original URI. This element is returned by fn:parse-uri
,
but ignored by fn:build-uri
.
The URI scheme (e.g., “https” or “file”).
Whether the URI is hierarchical or not.
The authority portion of the URI (e.g., “example.com:8080”).
The segmented forms of the path and query parameters provide convenient access to commonly used information.
The path, if there is one, is tokenized on “/” characters and
each segment is unescaped (as per the fn:decode-from-uri
function). Consider the URI
http://example.com/path/to/a%2fb
.
The path portion has to be returned as /path/to/a%2fb
because
decoding the %2f
would change the nature of the path.
The unescaped form is easily accessible from path-segments
:
Note that the presence or absence of a leading slash on the path will affect whether or not the sequence begins with an empty string.
The query parameters are decoded into a map. Consider the URI:
http://example.com/path?a=1&b=2%264&a=3
.
The decoded form in the query-parameters is the following map:
Note that both keys and values are unescaped. If a key
is repeated in the query string, the map will contain a
sequence of values for that key, as seen for a
in this example.
This section defines functions and operators on the xs:boolean
datatype.
Since no literals are defined in XPath to reference the constant boolean values true
and false
,
two functions are provided for the purpose.
The following functions define the semantics of operators on boolean values in
The ordering operator op:boolean-less-than
is provided for application purposes
and for compatibility with xs:boolean
is not ordered.
The following functions are defined on boolean values:
Operators are defined on the following type:
xs:duration
and on the two defined subtypes (see
xs:yearMonthDuration
xs:dayTimeDuration
No ordering relation is defined on xs:duration
values.
Two xs:duration
values may however be compared for equality.
A value of type xs:duration
is considered to comprise two parts:
The total number of months, represented as a signed integer.
The total number of seconds, represented as a signed decimal number.
If one of these values is negative (less than zero), the other must not be positive (greater than zero).
In effect this means that operations on durations (including equality comparison,
casting to string, and extraction of components)
all treat the duration as normalized. The duration PT1M30S
(one minute and
thirty seconds), for example,
is precisely equivalent to the duration PT90S
(ninety seconds); these are
different representations of the same value, and the result of any operation will be
the same regardless which representation is used. For example, the function
fn:seconds-from-duration
returns 30 in both cases.
The information content of an xs:duration
value can be reduced to an xs:integer
number of months, and an xs:decimal
number of seconds. For the two defined subtypes this is further simplified so that one of these two
components is fixed at zero. Operations such as comparison of durations and arithmetic on durations
can be expressed in terms of numeric operations applied to these two components.
Two subtypes of xs:duration
, namely xs:yearMonthDuration
and xs:dayTimeDuration
, are defined in
The significance of these subtypes is that arithmetic and ordering become well defined; this is not the
case for xs:duration
values in general, because of the variable number of days in a month. For this reason, many of the functions
and operators on durations require the arguments/operands to belong to these two subtypes.
In an xs:yearMonthDuration
, the seconds component is always zero.
In an xs:dayTimeDuration
, the months component is always zero.
All
The total number of months can be represented as a signed xs:int
value;
The total number of seconds can be represented as a signed xs:decimal
value with facets totalDigits=18
and fractionalDigits=3
. That is,
durations must be supported to millisecond precision.
Processors
A processor that limits the range or precision of duration values
may encounter overflow and underflow conditions when it
tries to evaluate operations on durations. In
these situations, the processor
Similarly, a processor may be unable accurately to represent the result of dividing a duration
by 2, or multiplying a duration by 0.5. A processor that limits the precision of the seconds component
of duration values
The following comparison operators are defined on the xs:boolean
result. As discussed in xs:duration
is a partial order rather than
a total order. For this reason, only equality is defined on xs:duration
.
A full complement of comparison and
arithmetic functions are defined on the two subtypes of duration described in
The duration datatype may be considered to be a composite datatype
in that it contains distinct properties or components. The extraction functions specified
below extract a single component from a duration value.
For xs:duration
and its subtypes, including the two subtypes xs:yearMonthDuration
and
xs:dayTimeDuration
, the components are normalized: this means that the seconds and minutes
components will always be less than 60, the hours component less than 24, and the months component less than 12.
This section decribes the fn:seconds
function, which constructs
an xs:dayTimeDuration
value representing a decimal number of seconds.
For operators that combine a duration and a date/time value, see
This section defines operations on the
See
The operators described in this section are defined on the following date and time types:
xs:dateTime
xs:date
xs:time
xs:gYearMonth
xs:gYear
xs:gMonthDay
xs:gMonth
xs:gDay
The only operation defined on
xs:gYearMonth
, xs:gYear
,
xs:gMonthDay
, xs:gMonth
and xs:gDay
values is
equality comparison.
For other types, further operations are provided, including component extraction,
order comparisons, arithmetic, formatted display, and timezone
adjustment.
All
A processor that limits the number of digits in date and time datatype
representations may encounter overflow and underflow conditions when it
tries to execute the functions in
Similarly, a processor that limits the precision of the seconds component
of date and time or duration values may need to deliver a rounded result for arithmetic operations.
Such a processor
As defined in xs:dateTime
,
xs:date
, xs:time
, xs:gYearMonth
, xs:gYear
,
xs:gMonthDay
, xs:gMonth
, xs:gDay
values,
referred to collectively as date/time values, are represented as seven components or properties:
year
, month
, day
, hour
, minute
,
second
and timezone
. The first five components are
xs:integer
values. The value of the second
component is an xs:decimal
and the value of the timezone
component is an xs:dayTimeDuration
.
For all the primitive date/time datatypes, the timezone
property is optional and may or may not
be present. Depending on the datatype, some of the remaining six properties must be present and
some must be xs:dateTime
values, this local value
For xs:time
, 00:00:00
and 24:00:00
are alternate lexical forms
for the same value, whose canonical representation is 00:00:00
. For xs:dateTime
,
a time component 24:00:00
translates to 00:00:00
of the following day.
An xs:dateTime
with lexical
representation 1999-05-31T05:00:00
is represented in the datamodel by { 1999, 5, 31, 5, 0, 0.0, () }
.
An xs:dateTime
with lexical
representation 1999-05-31T13:20:00-05:00
is represented by { 1999, 5, 31, 13, 20, 0.0, xs:dayTimeDuration("-PT5H") }
.
An xs:dateTime
with lexical
representation 1999-12-31T24:00:00
is represented by { 2000, 1, 1, 0, 0, 0.0, () }
.
An xs:date
with lexical
representation 2005-02-28+8:00
is represented by { 2005, 2, 28, (), (), (), xs:dayTimeDuration("PT8H") }
.
An xs:time
with lexical
representation 24:00:00
is represented by { (), (), (), 0, 0, 0, () }
.
A function is provided for constructing a
xs:dateTime
value from a xs:date
value and a
xs:time
value.
The following comparison operators are defined on the xs:boolean
result.
An xs:dateTime
can be considered to consist of seven components:
year
, month
, day
, hour
, minute
,
second
and timezone
. For xs:dateTime
six components (year
,
month
, day
, hour
, minute
and second
) are required
and timezone
is optional. For other date/time values, of the first six components, some are required
and others must be Timezone
is always optional. For example, for xs:date
,
the year
, month
and day
components are required and hour
,
minute
and second
components must be absent; for xs:time
the hour
,
minute
and second
components are required and year
, month
and
day
are missing; for xs:gDay
, day
is required and year
,
month
, hour
, minute
and second
are missing.
In explicitTimezone
facet is available with values
optional
, required
, or prohibited
to
enable the timezone to be defined as mandatory or disallowed.
Values of the date/time datatypes xs:time
, xs:gMonthDay
, xs:gMonth
,
and xs:gDay
, can be considered to represent a sequence of recurring time instants or time periods.
An xs:time
occurs every day. An xs:gMonth
occurs every year. Comparison operators
on these datatypes compare the starting instants of equivalent occurrences in the recurring series.
These xs:dateTime
values are calculated as described below.
Comparison operators on xs:date
, xs:gYearMonth
and xs:gYear
compare
their starting instants. These xs:dateTime
values are calculated as described below.
The starting instant of an occurrence of a date/time value is an xs:dateTime
calculated by filling
in the missing components of the local value from a reference xs:dateTime
. An example of a suitable
reference xs:dateTime
is 1972-01-01T00:00:00
. Then, for example, the starting
instant corresponding to the xs:date
value 2009-03-12
is
2009-03-12T00:00:00
; the starting instant corresponding to the xs:time
value
13:30:02
is 1972-01-01T13:30:02
; and the starting instant corresponding to the
gMonthDay
value --02-29
is 1972-02-29T00:00:00
(which explains
why a leap year was chosen for the reference).
In the previous version of this specification, the reference date/time chosen was
1972-12-31T00:00:00
. While this gives the same results, it produces a "starting instant" for
a gMonth
or gMonthDay
that bears no
relation to the ordinary meaning of the term, and it also required special handling of short months.
The original choice was made to allow for leap seconds; but since leap seconds are not recognized
in date/time arithmetic, this is not actually necessary.
If the xs:time
value written as
24:00:00
is to be compared, filling in the missing components gives 1972-01-01T00:00:00
,
because 24:00:00
is an alternative representation of 00:00:00
(the lexical value
"24:00:00"
is
converted to the time components { 0, 0, 0 } before the missing components are filled
in). This has the consequence that when ordering xs:time
values,
24:00:00
is
considered to be earlier than 23:59:59
. However, when ordering
xs:dateTime
values, a time component of 24:00:00
is considered equivalent to 00:00:00
on the
following day.
Note that the reference xs:dateTime
does not have a timezone. The timezone
component
is never filled in from the reference xs:dateTime
. In some cases, if the date/time value does not
have a timezone, the implicit timezone from the dynamic context is used as the timezone.
This specification uses the reference xs:dateTime 1972-01-01T00:00:00
in the description of the
comparison operators. Implementations may use other reference xs:dateTime
values
as long as they yield the same results. The reference xs:dateTime
used must meet the following
constraints: when it is used to supply components into xs:gMonthDay
values, the year must allow
for February 29 and so must be a leap year; when it is used to supply missing components into xs:gDay
values, the month must allow for 31 days. Different reference xs:dateTime
values may be used for
different operators.
The date and time datatypes may be considered to be composite datatypes in that they contain distinct properties or components. The extraction functions specified below extract a single component from a date or time value. In all cases the local value (that is, the original value as written, without any timezone adjustment) is used.
A time written as 24:00:00
is treated as 00:00:00
on the
following day.
These functions adjust the timezone component of an xs:dateTime
, xs:date
or
xs:time
value. The $timezone
argument to these functions is defined as an
xs:dayTimeDuration
but must be a valid timezone value.
These functions support adding or subtracting a duration value to or from an
xs:dateTime
, an xs:date
or an xs:time
value. Appendix E of
A processor that limits the number of digits in date and time datatype
representations may encounter overflow and underflow conditions when it
tries to execute the functions in this section. In
these situations, the processor
The value spaces of the two totally ordered subtypes of
xs:duration
described in xs:integer
months for xs:yearMonthDuration
and xs:decimal
seconds for xs:dayTimeDuration
. If
a processor limits the number of digits allowed in the representation of
xs:integer
and xs:decimal
then overflow and
underflow situations can arise when it tries to execute the functions in
Three functions are provided to represent dates and times as a string, using the conventions of a selected calendar,
language, and country. The functions are presented in their customary fashion,
except for the rules and examples, which are described en bloc at
The fn:format-dateTime
, fn:format-date
,
and fn:format-time
functions format $value
as a string using
the picture string specified by the $picture
argument,
the calendar specified by the $calendar
argument,
the language specified by the $language
argument,
and the country or other place name specified by the $place
argument.
The result of the function is the formatted string representation of the supplied
xs:dateTime
, xs:date
, or xs:time
value.
fn:format-dateTime
, fn:format-date
,
and fn:format-time
are referred to collectively as the
If $value
is the empty sequence, the function returns the empty sequence.
Calling the two-argument form of each of the three functions is equivalent to calling the five-argument form with each of the last three arguments set to an empty sequence.
For details of the $language
, $calendar
, and
$place
arguments, see
In general, the use of an invalid $picture
,
$language
, $calendar
, or
$place
argument results in a dynamic error
The picture consists of a sequence of variable markers and literal substrings.
A substring enclosed in square brackets is interpreted as a variable marker; substrings
not enclosed in square brackets are taken as literal substrings.
The literal substrings are optional and if present are rendered unchanged, including any whitespace.
If an opening or closing square bracket
is required within a literal substring, it
A variable marker consists of a component specifier followed optionally by one or two presentation modifiers and/or optionally by a width modifier. Whitespace within a variable marker is ignored.
The variable marker may be separated into its components by applying the following rules:
The component specifier is always present and is always a single letter.
The width modifier may be recognized by the presence of a comma.
The substring between the component specifier and the comma (if present) or the end of the string (if there is no comma) contains the first and second presentation modifiers, both of which are optional. If this substring contains a single character, this is interpreted as the first presentation modifier. If it contains more than one character, the last character is examined: if it is valid as a second presentation modifier then it is treated as such, and the preceding part of the substring constitutes the first presentation modifier. Otherwise, the second presentation modifier is presumed absent and the whole substring is interpreted as the first presentation modifier.
The
Specifier | Meaning | Default Presentation Modifier |
---|---|---|
Y | year (absolute value) | 1 |
M | month in year | 1 |
D | day in month | 1 |
d | day in year | 1 |
F | day of week | n |
W | week in year | 1 |
w | week in month | 1 |
H | hour in day (24 hours) | 1 |
h | hour in half-day (12 hours) | 1 |
P | am/pm marker | n |
m | minute in hour | 01 |
s | second in minute | 01 |
f | fractional seconds | 1 |
Z | timezone | 01:01 |
z | timezone (Same as Z, but modified where appropriate to include a prefix
as a time offset using GMT, for example GMT+1 or GMT-05:00. For this component there is a fixed
prefix of GMT , or a localized
variation thereof for the chosen language, and the remainder of the value is formatted as for specifier Z .)
|
01:01 |
C | calendar: the name or abbreviation of a calendar name | n |
E | era: the name of a baseline for the numbering of years, for example the reign of a monarch | n |
A dynamic error is reported
A dynamic error is reported $value
,
for example if the picture supplied to the fn:format-time
refers
to the year, month, or day component.
It is not an error to include a timezone component when the supplied value has no timezone. In these circumstances the timezone component will be ignored.
The first
any format token permitted as a primary format token in the second argument
of the fn:format-integer
function, indicating
that the value of the component is to be output numerically using the specified number format (for example,
1
, 01
, i
, I
, w
, W
,
or Ww
) or
the format token n
, N
,
or Nn
, indicating that the value of the component is to be output by name,
in lower-case, upper-case, or title-case respectively. Components that can be output by name
include (but are not limited to) months, days of the week, timezones, and eras.
If the processor cannot output these components by name for the chosen calendar and language
then it must use an
If a comma is to be used as a grouping separator within the format token, then there must be a width
specifier. More specifically: if a variable marker
contains one or more commas, then the last comma is treated as introducing the width modifier, and all others
are treated as grouping separators. So [Y9,999,*]
will output the year as 2,008
.
It is not possible to use a closing square bracket as a grouping separator within the format token.
If the implementation does not support the use of the requested format token, it
If the first presentation modifier is present, then it may optionally be followed by a second presentation modifier as follows:
Modifier | Meaning |
---|---|
either a or t |
indicates alphabetic or traditional numbering respectively,
the default being fn:format-integer . |
either c or o |
indicates cardinal or ordinal numbering respectively, for example
7 or seven for a cardinal number, or 7th ,
seventh , or 7º
for an ordinal number.
This has the same meaning as
in the second argument of fn:format-integer .
The actual representation of the ordinal form of a number
may depend not only on the language, but also on the grammatical context (for example,
in some languages it must agree in gender). |
Although the formatting rules are expressed in terms of the rules
for format tokens in fn:format-integer
, the formats actually used may be specialized
to the numbering of date components where appropriate. For example, in Italian, it is conventional to
use an ordinal number (primo
) for the first day of the month, and cardinal numbers
(due, tre, quattro ...
) for the remaining days. A processor may therefore use
this convention to number days of the month, ignoring the presence or absence of the ordinal
presentation modifier.
Whether or not a presentation modifier is included, a width modifier may be supplied. This indicates the number of characters to be included in the representation of the value.
The width modifier, if present, is introduced by a comma. It takes the form:
"," min-width ("-" max-width)?
where min-width
is either an unsigned integer indicating the minimum number of characters to
be output, or *
indicating that there is no explicit minimum, and
max-width
is either an unsigned integer indicating the maximum number of characters to
be output, or *
indicating that there is no explicit maximum; if max-width
is omitted then *
is assumed.
A dynamic error (min-width
is present and less than one, or if
max-width
is present and less than one or less than min-width
.
A format token containing more than one digit, such as 001
or 9999
, sets the
minimum and maximum width to the number of digits appearing in the format token; if a width
modifier is also present, then the width modifier takes precedence.
The rules in this section apply to the majority of integer-valued components: specifically M D d F W w H h m s
.
In the rules below, the term
If the first presentation modifier takes the form of a
If there is no width modifier, then the value is formatted according to
the rules of the format-integer
function.
If there is a width modifier, then the first presentation modifier is adjusted as follows:
If the decimal digit pattern includes a grouping separator, the output is implementation-defined (but this is not an error).
Use of a width modifier together with grouping separators is inadvisable for this reason. It is never necessary to use a width modifier with a decimal digit pattern, since the same effect can be achieved by use of optional digit signs.
Otherwise, the number of mandatory-digit-sign characters in the presentation modifier is increased if necessary. This is done first by replacing optional-digit-signs with mandatory-digit-signs, starting from the right, and then prepending mandatory-digit-signs to the presentation modifier, until the number of mandatory-digit-signs is equal to the minimum width. Any mandatory-digit-signs that are added by this process must use the same decimal digit family as existing mandatory-digit-signs in the presentation modifier if there are any, or ASCII digits otherwise.
The maximum width, if specified, is ignored.
The output is then as defined using the format-integer
function with this adjusted decimal digit pattern.
If the first presentation modifiers is one of N
, n
, or Nn
:
Let FN be the full name of the component, that is, the form of the name that would be used in the absence of any width modifier.
If FN is shorter than the minimum width, then it is padded by appending spaces to the end of the name.
If FN is longer than the maximum width, then it is abbreviated, either by choosing a conventional abbreviation that fits within the maximum width (for example, “Wednesday” might be abbreviated to “Weds”), or by removing characters from the end of FN until it fits within the maximum width.
For other presentation modifiers:
Any adjustment of the value to fit within the requested width range is implementation-defined.
The value should not be truncated if this results in output that will not be meaningful to users (for example, there is no sensible way to truncate Roman numerals).
If shorter than the minimum width, the value should be padded to the minimum width, either by appending spaces, or in some other way appropriate to the numbering scheme.
The rules for the year component (Y) are the same as those in
If the width modifier is present and defines a finite maximum width, then that maximum width.
Otherwise, if the first presentation modifier takes the form of a decimal-digit-pattern, then:
Let W be the number of optional-digit-signs and mandatory-digit-signs in that decimal-digit-pattern.
If W is 2 or more, then W.
Otherwise, N is infinity (that is, the year is output in full).
The output for the fractional seconds component (f
) is equivalent to the result of the following algorithm:
If the first presentation modifier contains no Unicode digit, then the output is implementation-defined.
Otherwise, the value of the fractional seconds is output as follows:
If there is no width modifier and the first presentation modifier comprises in its
entirety a single mandatory-digit-sign (for example the default 1
), then
the presentation modifier is extended on the right with as many optional-digit-signs as
are needed to accommodate the actual fractional seconds precision encountered in the
value to be formatted.
If there is a width modifier, then the first presentation modifier is adjusted as follows:
If a minimum width is specified, and if this exceeds the number of mandatory-digit-sign characters in the first presentation modifier, then the first presentation modifier is adjusted. This is done first by replacing optional-digit-signs with mandatory-digit-signs, starting from the left, and then appending mandatory-digit-signs to the presentation modifier, until the number of mandatory-digit-signs is equal to the minimum width. Any mandatory-digit-signs that are added by this process must use the same decimal digit family as existing mandatory-digit-signs in the presentation modifier.
If a maximum width is specified, the first presentation modifier is extended on the right with as many optional-digit-signs as are needed to ensure that the number of mandatory-digit-signs and optional-digit-signs is at least equal to the maximum width.
The sequence of characters in the (adjusted) first presentation modifier is reversed (for example,
999'###
becomes ###'999
).
If the result is not a valid
The sequence of digits in the conventional decimal representation of the fractional seconds component
is reversed, with insignificant zeroes removed, and the result is treated as an integer. For example, if the
seconds value is 25.8235
, the reversed fractional seconds value is 5328
.
The reversed fractional seconds value is formatted using the reversed decimal digit pattern according to the
rules of the fn:format-integer
function. Given the examples above, the result is 5'328
The resulting string is reversed. In our example, the result is 823'5
.
If the result contains more digits than the number of mandatory-digit-signs and optional-digit-signs in the decimal digit pattern, then excess digits are removed from the right hand end (that is, the value is truncated towards zero rather than being rounded). Any grouping separator that immediately precedes a removed digit is also removed.
The reason for presenting the algorithm in this way is that it enables maximum reuse of the rules defined for
fn:format-integer
. Since the fractional seconds value is not properly an integer, the rules do not
work if used directly: for example, the positions of grouping separators need to be counted from the left rather
than from the right. Implementations, as always, are free to use a different algorithm that yields the same result.
A format token consisting of a single digit,
such as 1
, does not constrain the number of digits in the output.
In the case of fractional seconds in particular, [f001]
requests three decimal digits,
[f01]
requests two digits, but [f1]
will retain all digits in the
supplied date/time value (the maximum number of digits is implementation-defined).
If exactly one digit is required, this can be achieved using the component specifier
[f1,1-1]
.
Special rules apply to the formatting of timezones. When the component specifiers Z
or z
are used, the rules in this section override any rules given elsewhere in the case of
discrepancies.
If the date/time value to be formatted does not include a timezone offset, then the timezone component
specifier is generally ignored (results in no output). The exception is where military timezones are used
(format ZZ
) in which case the string "J"
is output, indicating local time.
When the component specifier is z
, the output is the same as for component specifier
Z
, except that it is prefixed by the characters GMT
or some localized
equivalent. The prefix is omitted, however, in cases where the timezone is identified by name rather than by
a numeric offset from UTC.
If the first presentation modifier is numeric and comprises one or two digits
with no grouping-separator (for example 1
or 01
), then the timezone is formatted as a displacement from UTC in hours, preceded by a plus or minus
sign: for example -5
or +03
. If the actual timezone offset is not an integral number of hours,
then the minutes part of the offset is appended, separated by a colon: for example +10:30
or
-1:15
.
If the first presentation modifier is numeric with a grouping-separator (for example 1:01
or 01.01
), then the timezone offset is output in hours and minutes, separated by the grouping separator,
even if the number of minutes is zero: for example +5:00
or +10.30
.
If the first presentation modifier is numeric and comprises three or four digits with no
grouping-separator, for example 001
or 0001
, then the timezone offset
is shown in hours and minutes with no separator, for example -0500
or +1030
.
If the first presentation modifier is numeric, in any of the above formats, and the second
presentation modifier is t
, then a zero timezone offset (that is, UTC) is output as Z
instead
of a signed numeric value. In this presentation modifier is absent or if the timezone offset is non-zero,
then the displayed timezone offset is preceded by a -
sign for negative offsets
or a +
sign for non-negative offsets.
If the first presentation modifier is Z
, then the timezone is formatted
as a military timezone letter, using the convention Z = +00:00, A = +01:00, B = +02:00, ..., M = +12:00,
N = -01:00, O = -02:00, ... Y = -12:00. The letter J (meaning local time) is used in the case of a
value that does not specify a timezone offset. Timezone offsets that have no representation in this system
(for example Indian Standard Time, +05:30) are output as if the format 01:01
had been requested.
If the first presentation modifier is N
, then the timezone is output
(where possible) as a timezone name, for example EST
or CET
. The same timezone
offset has different names in different places; it is therefore $place
argument.
In the absence of this information, the implementation may apply a default, for example by using the timezone
names that are conventional in North America. If no timezone name can be identified, the timezone offset is
output using the fallback format 01:01
The following examples illustrate options for timezone formatting.
Variable marker | $place |
Timezone offsets (with time = 12:00:00) | ||||
---|---|---|---|---|---|---|
-10:00 | -05:00 | +00:00 | +05:30 | +13:00 | ||
[Z] | () | -10:00 | -05:00 | +00:00 | +05:30 | +13:00 |
[Z0] | () | -10 | -5 | +0 | +5:30 | +13 |
[Z0:00] | () | -10:00 | -5:00 | +0:00 | +5:30 | +13:00 |
[Z00:00] | () | -10:00 | -05:00 | +00:00 | +05:30 | +13:00 |
[Z0000] | () | -1000 | -0500 | +0000 | +0530 | +1300 |
[Z00:00t] | () | -10:00 | -05:00 | Z | +05:30 | +13:00 |
[z] | () | GMT‑10:00 | GMT‑05:00 | GMT+00:00 | GMT+05:30 | GMT+13:00 |
[ZZ] | () | W | R | Z | +05:30 | +13:00 |
[ZN] | "us" | HST | EST | GMT | IST | +13:00 |
[H00]:[M00] [ZN] | "America/New_York" | 06:00 EST | 12:00 EST | 07:00 EST | 01:30 EST | 18:00 EST |
If a width specifier is present when formatting a timezone, then the representation as defined in this section is padded to the minimum
width as described in
This section applies to the remaining components: P
(am/pm marker), C
(calendar),
and E
(era).
The output for these components is entirely n
, indicating that they are output as names (or
conventional abbreviations), and the chosen names will in many cases depend on the chosen language: see
The set of languages, calendars, and places that are supported in the
If the fallback representation uses a different calendar from that requested,
the output string [Calendar: X]
(where X is the calendar actually used),
localized as appropriate to the
requested language. If the fallback representation uses a different language
from that requested, the output string [Language: Y]
(where Y is the language
actually used) localized in an
implementation-dependent way. If a particular component of the value cannot be output in
the requested format, it
The $language
argument specifies the language to be used for the result string
of the function. The value of the argument xml:lang
attribute (see
If the $language
argument is omitted or is set to an empty sequence, or if it is set to an invalid value or a
value that the implementation does not recognize,
then the processor uses the default language defined in the dynamic context.
The language is used to select the appropriate language-dependent forms of:
names (for example, of months)
numbers expressed as words or as ordinals (twenty, 20th, twentieth
)
hour convention (0-23 vs 1-24, 0-11 vs 1-12)
first day of week, first week of year
Where appropriate this choice may also take into account the value of the
$place
argument, though this language
argument.
The choice of the names and abbreviations used in any given language is
Jul
while another uses Jly
. In German,
one implementation might represent Saturday as Samstag
while another
uses Sonnabend
. Implementations
Where ordinal numbers are used, the selection of the correct representation of the
ordinal (for example, the grammatical gender)
The calendar
attribute specifies that the dateTime
, date
,
or time
supplied in the $value
argument
The calendar value if present EQName
(dynamic error: QName
then it is expanded into an expanded QName
using the statically known namespaces; if it has no prefix then it represents an expanded-QName in no namespace.
If the expanded QName is in no namespace,
then it
If the $calendar
argument is omitted or is set to an empty sequence
then the default calendar defined in the dynamic context is used.
The calendars listed below were known to be in use during the last hundred years. Many other calendars have been used in the past.
This specification does not define any of these calendars, nor the way that they
map to the value space of the xs:date
datatype in $place
and/or $language
arguments, with the
$place
argument taking precedence.
Information about some of these calendars, and algorithms for converting between them, may
be found in
Designator | Calendar |
---|---|
AD | Anno Domini (Christian Era) |
AH | Anno Hegirae (Islamic Era) |
AME | Mauludi Era (solar years since Muhammad’s birth) |
AM | Anno Mundi (Jewish Calendar) |
AP | Anno Persici |
AS | Aji Saka Era (Java) |
BE | Buddhist Era |
CB | Cooch Behar Era |
CE | Common Era |
CL | Chinese Lunar Era |
CS | Chula Sakarat Era |
EE | Ethiopian Era |
FE | Fasli Era |
ISO | ISO 8601 calendar |
JE | Japanese Calendar |
KE | Khalsa Era (Sikh calendar) |
KY | Kali Yuga |
ME | Malabar Era |
MS | Monarchic Solar Era |
NS | Nepal Samwat Era |
OS | Old Style (Julian Calendar) |
RS | Rattanakosin (Bangkok) Era |
SE | Saka Era |
SH | Solar Hijri (Islamic Era, used in Iran and Afghanistan) |
SS | Saka Samvat |
TE | Tripurabda Era |
VE | Vikrama Era |
VS | Vikrama Samvat Era |
At least one of the above calendars
The ISO 8601 calendar (ISO
,
is very similar to the Gregorian calendar designated AD
, but it
differs in several ways. The ISO calendar
is intended to ensure that date and time formats can be read
easily by other software, as well as being legible for human
users. The ISO calendar
prescribes the use of particular numbering conventions as defined in
ISO 8601, rather than allowing these to be localized on a per-language basis.
In particular it
provides a numeric “week date” format which identifies dates by
year, week of the year, and day in the week;
in the ISO calendar the days of the week are numbered from 1 (Monday) to 7 (Sunday), and
week 1 in any calendar year is the week (from Monday to Sunday) that includes the first Thursday
of that year. The numeric values of the components year, month, day, hour, minute, and second
are the same in the ISO calendar as the values used in the lexical representation of the date and
time as defined in E
component)
with this calendar is either a minus sign (for negative years) or a zero-length string (for positive years).
For dates before 1 January, AD 1, year numbers in
the ISO and AD calendars are off by one from each other: ISO year
0000 is 1 BC, -0001 is 2 BC, etc.
ISO 8601 does not define a numbering for weeks within a month. When the w
component is used, the convention to be adopted is that each Monday-to-Sunday week is considered to
fall within a particular month if its Thursday occurs in that month; the weeks that fall in a particular
month under this definition are numbered starting from 1. Thus, for example,
29 January 2013 falls in week 5 because the Thursday of the week (31 January 2013) is the fifth Thursday
in January, and 1 February 2013 is also in week 5 for the same reason.
The value space of the date and time datatypes, as defined in XML Schema, is based on
absolute points in time. The lexical space of these datatypes defines a
representation of these absolute points in time using the proleptic Gregorian calendar,
that is, the modern Western calendar extrapolated into the past and the future; but the value space
is calendar-neutral. The
1502-01-11
(the day on which Pope Gregory XIII was born) might be
formatted using the Old Style (Julian) calendar as 1 January 1502
. This reflects the fact
that there was at that time a ten-day difference between the two calendars. It would be
incorrect, and would produce incorrect results, to represent this date in an element or attribute
of type xs:date
as 1502-01-01
, even though this might reflect the way
the date was recorded in contemporary documents.
When referring to years occurring in antiquity, modern historians generally
use a numbering system in which there is no year zero (the year before 1 CE
is thus 1 BCE). This is the convention that xs:date
and xs:dateTime
does not include a year zero: however, XSD 1.1 endorses the ISO 8601 convention. This means that the date on
which Julius Caesar was assassinated has the ISO 8601 lexical representation
-0043-03-13, but will be formatted as 15 March 44 BCE in the Julian calendar
or 13 March 44 BCE in the Gregorian calendar (dependent on the chosen
localization of the names of months and eras).
The intended use of the $place
argument is to identify
the place where an event
represented by the dateTime
, date
,
or time
supplied in the $value
argument took place or will take place.
If the $place
argument is omitted or is set
to an empty sequence, then the default place defined in the dynamic context is used.
If the value is supplied, and is not the empty sequence, then it
Country codes are defined in "de"
for Germany
and "jp"
for Japan. Implementations
IANA timezone names are defined in the IANA timezone database "America/New_York"
and "Europe/Rome"
.
This argument is not intended to identify the location of the user
for whom the date or time is being formatted;
that should be done by means of the $language
attribute.
This information
The geographical area identified by a country code is defined by the boundaries as they existed at the time of the date to be formatted, or the present-day boundaries for dates in the future.
If the $place
argument is supplied in the form
of an IANA timezone name that is recognized by the implementation, then the date or
time being formatted is adjusted to the timezone offset applicable in that timezone.
For example, if the xs:dateTime
value 2010-02-15T12:00:00Z
is formatted with the $place
argument set to
America/New_York
, then the output will be as if the value
2010-02-15T07:00:00-05:00
had been supplied. This adjustment takes daylight
savings time into account where possible; if the date in question falls during
daylight savings time in New York, then it is adjusted to timezone offset -PT4H
rather than -PT5H
. Adjustment using daylight savings time is only possible
where the value includes a date, and where the date is within the range covered
by the timezone database.
The following examples show a selection of dates and times and the way they might be formatted. These examples assume the use of the Gregorian calendar as the default calendar.
Required Output | Expression |
---|---|
2002-12-31
|
format-date($d, "[Y0001]-[M01]-[D01]")
|
12-31-2002
|
format-date($d, "[M]-[D]-[Y]")
|
31-12-2002
|
format-date($d, "[D]-[M]-[Y]")
|
31 XII 2002
|
format-date($d, "[D1] [MI] [Y]")
|
31st December, 2002
|
format-date($d, "[D1o] [MNn], [Y]", "en", (), ())
|
31 DEC 2002
|
format-date($d, "[D01] [MN,*-3] [Y0001]", "en", (), ())
|
December 31, 2002
|
format-date($d, "[MNn] [D], [Y]", "en", (), ())
|
31 Dezember, 2002
|
format-date($d, "[D] [MNn], [Y]", "de", (), ())
|
Tisdag 31 December 2002
|
format-date($d, "[FNn] [D] [MNn] [Y]", "sv", (), ())
|
[2002-12-31]
|
format-date($d, "[[[Y0001]-[M01]-[D01]]]")
|
Two Thousand and Three
|
format-date($d, "[YWw]", "en", (), ())
|
einunddrei&eszet;igste Dezember
|
format-date($d, "[Dwo] [MNn]", "de", (), ())
|
3:58 PM
|
format-time($t, "[h]:[m01] [PN]", "en", (), ())
|
3:58:45 pm
|
format-time($t, "[h]:[m01]:[s01] [Pn]", "en", (), ())
|
3:58:45 PM PDT
|
format-time($t, "[h]:[m01]:[s01] [PN] [ZN,*-3]", "en", (), ())
|
3:58:45 o'clock PM PDT
|
format-time($t, "[h]:[m01]:[s01] o'clock [PN] [ZN,*-3]", "en", (), ())
|
15:58
|
format-time($t, "[H01]:[m01]")
|
15:58:45.762
|
format-time($t, "[H01]:[m01]:[s01].[f001]")
|
15:58:45 GMT+02:00
|
format-time($t, "[H01]:[m01]:[s01] [z,6-6]", "en", (), ())
|
15.58 Uhr GMT+2
|
format-time($t, "[H01]:[m01] Uhr [z]", "de", (), ())
|
3.58pm on Tuesday, 31st December
|
format-dateTime($dt, "[h].[m01][Pn] on [FNn], [D1o] [MNn]")
|
12/31/2002 at 15:58:45
|
format-dateTime($dt, "[M01]/[D01]/[Y0001] at [H01]:[m01]:[s01]")
|
The following examples use calendars other than the Gregorian calendar.
Description | Request | Result |
---|---|---|
Islamic |
format-date($d, "[D١] [Mn] [Y١]", "ar", "AH", ())
|
٢٦ ﺸﻭّﺍﻝ ١٤٢٣ |
Jewish (with Western numbering) |
format-date($d, "[D] [Mn] [Y]", "he", "AM", ())
|
26 טבת 5763 |
Jewish (with traditional numbering) |
format-date($d, "[Dאt] [Mn] [Yאt]", "he", "AM", ())
|
כ״ו טבת תשס״ג |
Julian (Old Style) |
format-date($d, "[D] [MNn] [Y]", "en", "OS", ())
|
18 December 2002 |
Thai |
format-date($d, "[D๑] [Mn] [Y๑]", "th", "BE", ())
|
๓๑ ธันวาคม ๒๕๔๕ |
A function is provided to parse dates and times expressed using syntax that is commonly encountered in internet protocols.
In addition to the xs:QName
constructor function, QName values can
be constructed by combining a namespace URI, prefix, and local name, or by resolving
a lexical QName against the in-scope namespaces of an element node. This section
defines functions that perform these operations.
Leading and trailing whitespace, if present, is stripped from
string arguments before the result is constructed.
This section specifies functions and an operator on QNames as defined in
The following comparison operators on
xs:hexBinary
and xs:base64Binary
values are defined.
Each returns a boolean value.
These
functions can be used to compare any xs:hexBinary
or xs:base64Binary
value with any other xs:hexBinary
or xs:base64Binary
value:
both types have the same value space, namely a sequence of octets which are treated as integers
in the range 0 to 255.
This section specifies operators that take xs:NOTATION
values as arguments.
A sequence
is an ordered collection of zero or more items
.
An item
is a node, an atomic value, or a function, such as a map or an array. The terms
sequence
and item
are defined formally in
The following functions are defined on sequences. These functions work on any sequence, without performing any operations that are sensitive to the individual items in the sequence.
As in the previous section, for the illustrative examples below, assume an XQuery
or transformation operating on a non-empty Purchase Order document containing a
number of line-item elements. The variable $seq
is bound to the
sequence of line-item nodes in document order. The variables
$item1
, $item2
, etc. are bound to separate, individual
line-item nodes in the sequence.
The functions in this section perform comparisons between the items in one or more sequences.
The following functions test the cardinality of their sequence arguments.
The functions fn:zero-or-one
, fn:one-or-more
, and
fn:exactly-one
defined in this section, check that the cardinality
of a sequence is in the expected range. They are particularly useful with regard
to static typing. For example, the function call fn:remove($seq, fn:index-of($seq2, 'abc'))
requires the result of the call on fn:index-of
to be a singleton integer,
but the static type system cannot infer this; writing the expression as
fn:remove($seq, fn:exactly-one(fn:index-of($seq2, 'abc')))
will provide a suitable static type at query analysis time, and ensures that the length of the sequence is
correct with a dynamic check at query execution time.
The type signatures for these functions deliberately declare the argument type as
item()*
, permitting a sequence of any length. A more restrictive
signature would defeat the purpose of the function, which is to defer
cardinality checking until query execution time.
Aggregate functions take a sequence as argument and return a single value
computed from values in the sequence. Except for fn:count
, the
sequence must consist of values of a single type or one if its subtypes, or they
must be numeric. xs:untypedAtomic
values are permitted in the
input sequence and handled by special conversion rules. The type of the items in
the sequence must also support certain operations.
This section defines a number of functions used to find elements by ID
or IDREF
value,
or to generate IDs.
The functions in this section provide access to resources (such as files) in the external environment.
These functions convert between the lexical representation and XPath and XQuery data model representation of various file formats.
These functions convert between the lexical representation of XML and the tree representation.
(The fn:serialize
function also handles HTML and JSON output, but is included in this section
for editorial convenience.)
These functions convert between the lexical representation of HTML and the tree representation.
The fn:parse-html
function conceptually works in two phases:
The lexical HTML (supplied as a string) is parsed into an HTML DOM
as defined by the HTML5 specification: see
The resulting DOM is converted to an XDM tree as described in this
section. This is described by defining the actions of the accessor functions
defined in
Because the
An implementation must match the semantics of the mapping described in this section, but
the specific way it achieves that is
Some possible implementation strategies are:
Parse the HTML to an HTML DOM and then convert the HTML DOM to an XDM node tree.
Parse the HTML to an HTML DOM and then implement a wrapper or facade that presents an XDM interface to the HTML DOM.
Parse the lexical HTML directly to an XDM node tree, bypassing the HTML DOM.
The http://www.w3.org/1999/xhtml
and the content type
application/xhtml+xml
, and is popularly referred to as XHTML
.
The HTML parsing algorithm constructs
an HTML DOM HTMLDocument
document object for the HTML document. The XHTML parsing
algorithm constructs an HTML DOM XMLDocument
object for the HTML document, following
XML parsing rules. This mapping supports both of these document types.
The
The HTML DOM Document
interface maps to
The HTML DOM Element
interface maps to
The HTML DOM Attr
interface maps to
Any HTML DOM Attr
instances in an HTML DOM HTMLDocument
that represent
namespace declarations will have been filtered out: see
The HTML DOM ProcessingInstruction
interface maps to
The HTML parsing algorithm does not generate processing instruction nodes. If encountered
they are parsed as comment nodes. The HTML DOM ProcessingInstruction
interface is relevant only when the XHTML parsing algorithm is used.
The HTML DOM Comment
interface maps to
The HTML DOM Text
interface maps to Text
nodes are combined into a single
The HTML DOM CDATASection
interface is an instance of HTML DOM
Text
, so CDATA sections also map to
The use of CDATA sections can result in the HTML DOM containing adjacent text nodes, which the mapping to XDM will merge into a single node.
The HTML DOM DocumentFragment
interface is not supported as an XML node.
There are two places in the HTML DOM where this is used:
The HTML DOM ShadowRoot
interface is not present in the main HTML DOM
tree. It is only accessible via JavaScript.
The template
element’s content
property contains
the child nodes of the template
element. The behaviour of this
is defined by the include-template-content
key in the
If an implementation allows these nodes to be passed in via an API or similar mechanism,
their behaviour is
The result of the dm:attributes($node)
for an HTML DOM Node
is as follows:
If the node is an instance of HTML DOM Element
then the result
is the value of the Element.attributes
property mapped to a
sequence as described below;
Otherwise, the result is an empty sequence.
An HTML DOM NamedNodeMap
is mapped to a sequence as follows:
NamedNodeMap.length
is the length of the sequence, where a length
of 0
results in an empty sequence;
NamedNodeMap.item(n)
is the nth element of the sequence.
That sequence is then filtered as follows:
If the Attr.namespaceURI
property is
"http://www.w3.org/2000/xmlns/"
, the attribute is not included in
this sequence;
If the Attr.localName
property is "xmlns"
, the attribute
is not included in this sequence;
If the Attr.localName
property starts with "xmlns:"
,
the attribute is not included in this sequence;
Otherwise, the attribute is included in this sequence using the XDM mapping rules described in this section.
The HTML DOM Element.attributes
property includes namespace and non-namespace
attributes in the list when the HTML or XML parser is used. As such, the namespace attributes
have to be filtered from the resulting XDM attribute sequence.
When the resulting document is an HTML DOM HTMLDocument
, the
Attr.localName
and Attr.name
properties of HTML DOM
Attr
nodes are both set to the qualified name. This includes
namespace declarations which are filtered out by the logic in this section.
The Attr.localName
property will be ASCII lowercase. The
The result of the dm:base-uri($node)
for an HTML DOM Node
is the value of the
Node.baseURI
property mapped as follows:
If the value is null or an empty string, then the result is an empty sequence;
Otherwise, the string value is cast to an xs:anyURI
.
The result of the dm:children($node)
for an HTML DOM Node
is as follows:
If the node is an instance of HTML DOM Document
then the result
is the value of the Node.childNodes
property mapped to a sequence;
If the node is an instance of HTML DOM HTMLTemplateElement
then the
result is determined as follows:
If the include-template-content
key of the
parse-html-options
map is false()
, the result is
an empty sequence;
Select the HTML DOM DocumentFragment
from the
HTMLTemplateElement.content
property;
The HTML DOM DocumentFragment
’s Node.childNodes
property is mapped to a sequence;
If the node is an instance of HTML DOM Element
then the result the
value of the Node.childNodes
property mapped to a sequence;
Otherwise, the result is an empty sequence.
An HTML DOM NodeList
is mapped to a sequence as follows:
NodeList.length
is the length of the sequence, where a length
of 0
results in an empty sequence;
NodeList.item(n)
is the nth element of the sequence.
That sequence is then filtered as follows:
If the child is an instance of HTML DOM DocumentType
, that child
is not included in this sequence;
A sequence of consecutive HTML DOM Text
nodes is combined into a
single XDM text
node;
Otherwise, the HTML DOM Node
nodes are mapped to XDM according to
the rules in this section.
The result of the dm:document-uri($node)
for an HTML DOM Node
is as follows:
If the node is an instance of HTML DOM Document
then the value
of the Document.documentURI
property mapped as follows:
If the value is null or an empty string, then the result is an empty sequence;
Otherwise, the string value is cast to an xs:anyURI
.
Otherwise, the result is an empty sequence.
The result of the dm:is-id($node)
for an HTML DOM Node
is as follows:
If the node is an instance of HTML DOM Attr
then:
If the Attr.name
property (its qualified name) is
"id"
, then:
If the Attr.value
is castable to an xs:NCName
,
the result is true
;
Otherwise, the result is false
;
Otherwise, the result is false
;
Otherwise, the result is false
.
In id
attribute is defined as being unique in the element’s tree,
containing at least one character, and not having any ASCII whitespace
characters. This means that an HTML id
attribute may not
conform to an xs:NCName
.
If an HTML id
is not a valid xs:NCName
then that
attribute is not an XML ID.
The result of the dm:is-idrefs($node)
for an HTML DOM Node
is an empty sequence.
The result of the dm:namespace-nodes($node)
for an HTML DOM Node
is as follows:
If the node is an instance of HTML DOM Element
then an
Otherwise, the result is the empty sequence.
For the XHTML parsing algorithm, this will be equivalent to constructing the namespace nodes from an XML infoset, PSVI, or similar mapping.
For the HTML parsing algorithm, the
Section 2.1.3 http://www.w3.org/1999/xhtml
.
Section 4.8.15 http://www.w3.org/1998/Math/MathML
).
The default element namespace for these elements is the MathML namespace.
Section 4.8.16 http://www.w3.org/2000/svg
).
The default element namespace for these elements is the SVG namespace.
Section 13.1.2.3
The supported namespace prefixes are:
xlink
in the http://www.w3.org/1999/xlink
namespace;
xml
in the http://www.w3.org/XML/1998/namespace
namespace; and
xmlns
in the http://www.w3.org/2000/xmlns/
namespace.
No other namespaces are supported by the HTML parser.
Section number references to
The result of the dm:nilled($node)
for an HTML DOM Node
is false()
.
The result of the dm:node-kind($node)
for an HTML DOM Node
is as follows:
If the node is an instance of HTML DOM Document
then the result is
"document"
.
If the node is an instance of HTML DOM Element
then the result is
"element"
.
If the node is an instance of HTML DOM Attr
then the result is
"attribute"
.
If the node is an instance of HTML DOM ProcessingInstruction
then
the result is "processing-instruction"
.
If the node is an instance of HTML DOM Comment
then the result is
"comment"
.
If the node is an instance of HTML DOM Text
then the result is
"text"
.
The result of the dm:node-name($node)
for an HTML DOM Node
is as follows:
If the node is an instance of HTML DOM Element
then the result is
determined as follows:
The Element.localName
property. This is derived as follows:
The local name is initially set to the ASCII lowercase tag name. The
If the local name is an SVG element name, the case-sensitive name is used.
If the local name contains a character that is not a valid XML
NameStartChar
or NameChar
, then an
NCName
.
Unnnnnn
escape sequence. That would map :
to U00003A
.
This local name escaping applies only to the HTML parsing algorithm.
If the XHTML parsing algorithm is used, the localName
and
prefix
will be correctly set for QName-based
node names.
The Element.prefix
property, or empty if the value is null;
The Element.namespaceURI
property, or empty if the value
is null.
If the element is an HTML element, the namespace URI is
"http://www.w3.org/1999/xhtml"
.
If the element is an SVG element, the namespace URI is
"http://www.w3.org/2000/svg"
.
If the element is a MathML element, the namespace URI is
"http://www.w3.org/1998/Math/MathML"
.
If the node is an instance of HTML DOM Attr
then the result is
determined as follows:
The
The Attr.localName
property. This is derived as follows:
The local name is initially set to the
If the local name is an SVG or MathML attribute name, the case-sensitive name
is used.
If the local name is an allowed xlink
, xml
, or
xmlns
attribute name the local name is the value of the local name
column of the attribute name mapping table in
If the local name contains a character that is not a valid XML
NameStartChar
or NameChar
, then an
NCName
.
Unnnnnn
escape sequence. That would map :
to U00003A
.
This local name escaping applies only to the HTML parsing algorithm.
If the XHTML parsing algorithm is used, the localName
and
prefix
will be correctly set for QName-based
node names.
The Attr.prefix
property, or empty if the value is null.
If the attribute name is an allowed xlink
, xml
, or
xmlns
attribute name the namespace prefix is the value of the
prefix column of the attribute name mapping table in
The Attr.namespaceURI
property, or empty if the value is null;
If the attribute name is an allowed xlink
, xml
, or
xmlns
attribute name the namespace URI is the value of the
namespace column of the attribute name mapping table in
If the node is an instance of HTML DOM ProcessingInstruction
then
the result is an xs:QName
constructed as follows:
The ProcessingInstruction.target
property;
The
The
Otherwise, the result is an empty sequence.
When the resulting document is an HTML DOM HTMLDocument
, the
Element.localName
and Element.name
properties of
HTML DOM Element
nodes are both set to the qualified name.
When the resulting document is an HTML DOM HTMLDocument
, the
Attr.localName
and Attr.name
properties of HTML DOM
Attr
nodes are both set to the qualified name.
The result of the dm:parent($node)
for an HTML DOM Node
is as follows:
Let $parent
be the Node.parentNode
property of the
node;
If $parent
is an instance of HTML DOM DocumentFragment
,
then for each HTML DOM HTMLTemplateElement
$template
in
the parsed DOM tree:
Let $content
be the value of the
HTMLTemplateElement.content
property of $template
;
If $content
is the same node as $parent
, then the
result is $template
using the XDM mapping rules described in
this section;
If there are no more $template
nodes, then the result is an
empty sequence;
If $parent
is null, then the result is an empty sequence;
Otherwise, the result is $parent
using the XDM mapping rules
described in this section.
The current node can have a HTML DOM DocumentFragment
parent node only
if the include-template-content
key of the html-parser-options
is true()
.
The HTML DOM DocumentFragment
’s Node.parentNode
property
is null, and a DocumentFragment
attached to HTMLTemplateElement.content
property does not have a host
property connecting the fragment back to
the template element.
If a future version of DocumentFragment.host
property that references the node’s template
element, or the implementation
has access to that internal property, the implementation may choose to use that
instead of traversing the parsed HTML tree.
The result of the dm:string-value($node)
for an HTML DOM Node
is as follows:
If the node is an instance of HTML DOM Document
, then use the
algorithm described in
If the node is an instance of HTML DOM Element
, then use the
algorithm described in
If the node is an instance of HTML DOM Text
, then use the
algorithm described in
Otherwise, the result is the value of the Node.nodeValue
property.
The following algorithm is used to construct the concatenated string value of a node in the HTML DOM tree:
Let $text
be the string value ""
;
For each descendant node $node
in document order:
If $node
is not an instance of HTML DOM
Text
, process the next node in document order;
Append the value of the Node.nodeValue
property for
$node
to $text
;
The result is $text
.
The following algorithm is used to construct the maximal sequence of adjacent
Let $text
be the string value ""
;
Append the value of the Node.nodeValue
property for
$node
to $text
;
Let $next
be the value of Node.nextSibling
;
Let $next
is null, or not an instance of HTML DOM
Text
, the result is $text
;
Otherwise, repeat from step 2 using $next
as $node
.
Adjacent text nodes in the HTML DOM are treated as a single XDM text node by only including the first text node and providing logic to ensure that the text content is merged into a single text block.
The result of the dm:type-name($node)
for an HTML DOM Node
is as follows:
If the node is an instance of HTML DOM Element
then the result is
xs:untyped
.
If the node is an instance of HTML DOM Attr
then the result is
xs:untypedAtomic
.
If the node is an instance of HTML DOM Text
then the result is
xs:untypedAtomic
.
Otherwise, the result is an empty sequence.
The result of the dm:typed-value($node)
for an HTML DOM Node
is as follows:
Let $string-value
be the
If the node is an instance of HTML DOM Document
then the result is
$string-value
as an xs:untypedAtomic
;
If the node is an instance of HTML DOM Element
then the result is
$string-value
as an xs:untypedAtomic
;
If the node is an instance of HTML DOM Attr
then the result is
$string-value
as an xs:untypedAtomic
;
If the node is an instance of HTML DOM Text
then the result is
$string-value
as an xs:untypedAtomic
;
Otherwise, the result is $string-value
.
The result of the dm:unparsed-entity-public-id($node)
for an HTML DOM Node
is an empty sequence.
The result of the dm:unparsed-entity-system-id($node)
for an HTML DOM Node
is an empty sequence.
This section describes the record structure used to pass options to the
fn:parse-html
function.
The approach used to parse the HTML document into XDM nodes.
An implementation may use this to specify a specific algorithm, tool, or
library that is used, such as tidy
or tagsoup
.
An implementation may also use this to specify a non-standard variant of
HTML to support, such as word
for the Microsoft Word HTML variant.
The version of HTML to support when parsing HTML strings or sequences of octets.
Valid values an implementation must support for the html
method are:
3
, 3.2
for HTML 3.2 W3C Recommendation, 14 January 1997
4
, 4.01
for HTML 4.01 W3C Recommendation, 24 December 1999
5.0
for HTML5 W3C Recommendation, 28 October 2014
5.1
for HTML 5.1 W3C Recommendation, 1 November 2016
5.2
for HTML 5.2 W3C Recommendation, 14 December 2017
LS
for HTML Living Standard, WHATWG
5
may be equivalent to any of 5.0
, 5.1
, 5.2
, or LS
Valid values an implementation must support for the xhtml
method are:
1.0
for XHTML 1.0 W3C Recommendation, 26 January 2000
1.1
for XHTML 1.1 W3C Recommendation, 31 May 2001
Any other method
and html-version
combinations are
The character encoding to use to decode a sequence of octets that represents an HTML document.
Defines how to handle elements in the HTMLTemplateElement.content
property.
If this option is true()
, the template
element’s
children are the children of the content
property’s document
fragment node.
If this option is false()
, the template
element’s
children are the empty sequence.
The default behaviour is
This allows an implementation to support the behaviour defined in
template
elements with XSLT and XPath
This option would default to true()
for an XSLT processor
operating on an HTML DOM constructed from an XHTML document.
This option would default to false()
for an XPath processor
using the
Additional
An implementation may provide keys for options to the
The functions listed in this section parse or serialize JSON data.
JSON is a popular format for exchange of structured data on the web: it is specified in
This specification describes two ways of representing JSON data losslessly using XDM constructs. The first method uses XDM maps to represent JSON objects, and XDM arrays to represent JSON arrays. The second method represents all JSON constructs using XDM element and attribute nodes.
Note also that the function fn:serialize
has an option to act as the inverse function to fn:parse-json
.
This section defines a mapping from JSON data to XDM maps and arrays. Two functions are available
to support this mapping: fn:parse-json
and fn:serialize
(with options
selecting JSON as the output method).
The fn:parse-json
function will accept any JSON text as input, and converts it
to XDM data values. The fn:serialize
function (with JSON as the output method) will accept any XDM
value produced using fn:parse-json
and convert it back to the original JSON text
(subject to insignificant variations such as reordering the properties in a JSON object).
The conversion is lossless if recommended JSON good practice is followed. Information may however be lost if (a) JSON numbers are not exactly representable as double-precision floating point, or (b) duplicate key values appear within a JSON object.
The representation of JSON data produced by the fn:parse-json
function
has been chosen with ease of manipulation as a design aim. For example, a simple JSON object
such as { "Sun": 1, "Mon": 2, "Tue": 3, ... }
produces a simple map, so if the result
of parsing is held in $weekdays
, the number for a given weekday can be extracted
using an expression such as $weekdays?Tue
. Similarly, a simple array such as
[ "Sun", "Mon", "Tue", ... ]
produces an array that can be addressed as, for example,
$weekdays(3)
. A more deeply nested structure can be addressed in a similar way:
for example if the JSON text is an array of person objects, each of which has a property named
phones
which is an array of strings containing phone numbers, then the first phone number of
each person in the data can be addressed as $data?phones(1)
.
This section defines a mapping from JSON data to XML (specifically, to XDM element and attribute nodes). A
function fn:json-to-xml
is provided to take a JSON string as input and convert it
to the XML representation, and a second function fn:xml-to-json
performs the reverse operation.
The XML representation is designed to be capable of representing any valid JSON text including one that uses characters which are not valid in XML. The transformation is normally lossless: that is, distinct JSON texts convert to distinct XML representations. When converting JSON to XML, options are provided to reject unsupported characters, to replace them with a substitute character, or to leave them in backslash-escaped form.
The conversion is lossless if recommended JSON good practice is followed. Information may however be lost if (a) JSON numbers are not exactly representable as double-precision floating point, or (b) duplicate key values appear within a JSON object.
The following example demonstrates the correspondence of a JSON text and the corresponding XML representation.
Consider the following JSON text:
The XML representation of this text is as follows. Whitespace is included in the XML representation for purposes of illustration,
but it will not necessarily be present in the output of the
An XSD 1.0 schema for the XML representation is provided in http://www.w3.org/2005/xpath-functions
, then:
Unless the host language specifies otherwise, the processor (if it is schema-aware)
If a schema location is provided, then the schema document at that location
The rules governing the mapping from JSON to XML are as follows. In these rules, the phrase
“an element named N” is to be interpreted as meaning “an element node whose local name is N and whose
namespace URI is http://www.w3.org/2005/xpath-functions
”.
The JSON value null
is represented by an element named null
, with empty content.
The JSON values true
and false
are represented by an element named boolean
,
with content conforming to the type xs:boolean
. When the element is created by the
fn:json-to-xml
function, the string value of the element will be true
or false
.
The fn:xml-to-json
function also recognizes other strings that validate as xs:boolean
,
for example 1
and 0
. Leading and trailing whitespace is accepted.
A JSON number is represented by an element named number
,
with content conforming to the type xs:double
, with the additional restriction that the value
must not be positive or negative infinity, nor NaN
. The
fn:json-to-xml
function creates an element whose string value is lexically the same as the JSON representation
of the number. The fn:xml-to-json
function generates a JSON representation that is the result of casting the
(typed or untyped) value of the node to xs:double
and then casting the result to xs:string
.
Leading and trailing whitespace is accepted.
Since JSON does not impose limits on the range or precision
of numbers, these rules mean that conversion from JSON to XML will always succeed, and will retain full precision
in the lexical representation unless the data model implementation is one that reconstructs the string value from
the typed value. In the reverse direction, conversion from XML to JSON may fail if the value is infinity or NaN
,
or if the string value is such that casting to xs:double
produces positive or negative infinity.
A JSON string is represented by an element named string
, with
content conforming to the type xs:string
. The string
element has two
alternative representations: escaped form, and unescaped form.
A JSON array is represented by an element named array
. The content is a sequence of
child elements representing the members of the array in order, each such element being the representation
of the array member obtained by applying these rules recursively.
A JSON object is represented by an element named map
. The content is a sequence
of child elements each of which represents one of the name/value pairs in the object. The representation of the
name/value pair N:V is obtained by taking the element that represents the value V (by applying these
rules recursively) and adding an attribute with name key
(in no namespace), whose
value is N as an instance of xs:string
. The functions fn:json-to-xml
and
fn:xml-to-json
both retain the order of entries, subject to rules about how duplicate keys are handled. The
key may be represented in escaped or unescaped form.
The attribute escaped="true"
may be specified on a string
element to indicate
that the string value contains backslash-escaped characters that are to be interpreted according to the JSON
rules. The attribute escaped-key="true"
may be specified on any element with a key
attribute to indicate
that the key contains backslash-escaped characters that are to be interpreted according to the JSON
rules. Both attributes have the default value false
, signifying that the relevant value is in unescaped form.
In unescaped form, the backslash character has no special significance (it represents itself).
The JSON grammar for number
is a subset of the lexical space of
the XSD type xs:double
. The mapping from JSON number
values to xs:double
values is defined by the XPath rules for casting from xs:string
to xs:double
. Note that
these rules will never generate an error for out-of-range values; instead very large or very small values will be
converted to +INF
or -INF
. Since JSON does not impose limits on the range or precision
of numbers, the conversion is not guaranteed to retain full precision.
Although the order of entries in a JSON object is generally considered to have no significance, the functions
json-to-xml
and json-to-xml
both retain order.
The XDM representation of a JSON value may either be untyped (all elements annotated as xs:untyped
, attributes
as xs:untypedAtomic
), or it may be typed. If it is typed, then it http://www.w3.org/2005/xpath-functions
are ignored, including attributes such as xsi:type
and xsi:nil
that would normally influence the process
of schema validation.
The namespace prefix associated with the namespace http://www.w3.org/2005/xpath-functions
(if any) is immaterial.
The effect of the fn:xml-to-json
function does not depend on the choice of prefix, and the prefix (if any) generated by the
fn:json-to-xml
function is
This section describes functions that parse CSV data.
A CSV is a 2-dimensional tabular data structure consisting of multiple
CSV has developed informally for decades, and many variations are found.
This specification refers to
This specification uses the term
Row delimiters other than CRLF
are recognized.
Field delimiters other than comma (","
) are recognized.
Quote characters other than the double quotation mark ('"'
)
are recognized.
Non-ASCII characters are recognized.
This specification defines a mapping from this extended grammar to constructs in the XDM model, and provides illustrative examples of how these constructs can be combined with other language features to process CSV data.
The most basic function for parsing CSV is fn:csv-to-arrays
which recognizes the delimiters for rows and fields and returns a sequence
of arrays each corresponding to one row. The fields within each array are
represented as instances of xs:string
.
The other two functions recognize column names, and make it easier to address
individual fields using these names. The parse-csv
function
delivers this capability using XDM maps and functions, while csv-to-xml
function represents the information using XDM element nodes.
The delimiters used for rows, columns, and quoting are configurable. An error
is raised if the same delimiter string is used in multiple roles
Rows in CSV files are typically delimited with CRLF (fn:unparsed-text
function normalizes these line endings to LF (fn:unparsed-text
function does it automatically.
The last row in the file may or may not be followed by a row delimiter.
Fields in CSV are frequently delimited with a comma. Other field
delimiters are useful, for
example when numeric data uses comma as a decimal separator. The
chosen field delimiter is then often
The column delimiter defaults to column-delimiter
option is set to a multi-character string.
CSVs, as specified in
If a field is to contain the quote character, the character must be escaped by doubling it,
as with escaping of quotes in XPath string literals (see
The quotes surrounding quoted fields are not included in the result. The following input string, when parsed, produces a sequence of strings, as shown below:
The quote character defaults to
No space is allowed between the column delimiter and a quote. An error is raised
The following example is therefore invalid and parsing it will raise an error.
The result of fn:csv-to-arrays
is a sequence of rows, where
each row is represented as an array of xs:string
values.
The first row of the CSV is returned in the same way as all the other rows.
fn:csv-to-arrays
does not distinguish between a header row and data
rows, and returns all of them.
For example, given the input:
the fn:csv-to-arrays
function produces
It is common practice for all rows in a CSV to have the same number of columns, but this is not required.
produces
fn:csv-to-xml
and fn:parse-csv
functions provide
facilities to enforce uniformity and an expected number of
columns.
While fn:csv-to-arrays
simply delivers the CSV content
as a sequence of arrays, the fn:parse-csv
function goes a step
further and enables access to the data using column names. The column
names may be taken either from the first row of the CSV data, or from
data supplied by the caller in the options
parameter.
The function returns a
parsed-csv-structure-record
:
The record has four parts, which are always present (though potentially empty):
The list of column names, in order.
A map from column names to the 1-based integer position of the column.
The contents of the non-header rows in the CSV data, as a
sequence of arrays of xs:string
values; each array represents
one row of the CSV data.
A function providing ready access to a given field in a given
row. The get
function has signature:
The function takes two arguments: the first is an integer giving the row number (1-based), the second identifies a column either by its name or by its 1-based position.
The fn:csv-to-xml
function returns an XDM node tree representing the CSV data.
Following is a CSV text and the XML serialization of the corresponding node tree.
If column names were not extracted, then implementations ]]>
element, and
]]>
elements column
attribute:
An XSD 1.0 schema for the XML representation is provided in
The following examples illustrate more complex applications making use of CSV parsing functions.
A variable $crlf
is assumed to be in scope representing the CRLF string:
fn:parse-csv
Direct conversion is a matter of iterating across the records and fields to
generate <tr>
and <td>
elements.
Using XQuery:
Using XSLT:
fn:csv-to-xml
The fn:csv-to-xml
function makes these kinds of
conversion-to-XML-table tasks simpler by providing a simple XML represenation of the data. Here, in XQuery:
And in XSLT:
This section describes functions that support
Invisible XML defines a BNF-like language for specifying grammars, together with
a mapping from sentences in that grammar to an XML representation. By defining an
Invisible XML grammar, a great variety of non-XML data formats can be manipulated
as if they were XML. The function fn:invisible-xml
takes a grammar
as input, and returns a function which can be used for parsing data instances
and converting them to XML node trees.
The following functions are defined to obtain information from the static or dynamic context.
The functions included in this section operate on function items, that is, values referring to a function.
Some functions such as fn:parse-json
allow the option of supplying a callback function
for example to define exception behavior. Where this is not essential to the use of the function,
the function has not been classified as higher-order for this purpose; in applications where function items
cannot be created, these particular options will not be available.
The following functions take function items as an argument.
With all these functions, if the caller-supplied function fails with a dynamic error, this error is propagated as an error from the higher-order function itself.
The following functions allow dynamic loading and evaluation of XQuery queries, XSLT stylesheets, and XPath binary operators.
Maps were introduced as a new datatype in XDM 3.1. This section describes functions that operate on maps.
A map is an additional kind of item.
K1
and K2
are the fn:atomic-equal($K1, $K2)
true
.
It is not necessary that all the keys in a map should be of the same type (for example, they can include a mixture of integers and strings).
Maps are immutable, and have no identity separate from their content.
For example, the map:remove
function returns a map that differs
from the supplied map by the omission (typically) of one entry, but the supplied map is not changed by the operation.
Two calls on map:remove
with the same arguments return maps that are
indistinguishable from each other; there is no way of asking whether these are “the same map”.
A map can also be viewed as a function from keys to associated values. To achieve this, a map is also a
function item. The function corresponding to the map has the signature
function($key as xs:anyAtomicValue) as item()*
. Calling the function has the same effect as calling
the get
function: the expression
$map($key)
returns the same result as get($map, $key)
. For example, if $books-by-isbn
is a map whose keys are ISBNs and whose assocated values are book
elements, then the expression
$books-by-isbn("0470192747")
returns the book
element with the given ISBN.
The fact that a map is a function item allows it to be passed as an argument to higher-order functions
that expect a function item as one of their arguments.
It is often useful to decompose a map into a sequence of entries, or key-value pairs (in which the key is an atomic value and the value is an arbitrary sequence). Subsequently it may be necessary to reconstruct a map from these components, typically after modification.
There are two conventional ways of representing key-value pairs, each with its own advantages and disadvantages. Both approaches are supported by functions in this library. These are described below:
It is possible to decompose any map into a sequence of
For example the map
{ "x": 1, "y": 2 }
can be decomposed to the sequence ({ "x": 1 }, { "y": 2 })
.
"key"
) containing the key part of a key value pair, the other (with the key "value"
)
containing the value part of a key value pair.
For example
the map { "x": 1, "y": 2 }
can be decomposed as
({ "key": "x", "value": 1 }, { "key": "y", "value": 2 })
A record(key as xs:anyAtomicType, value as item()*)
.
The following table summarizes the way in which these two representations can be used to compose and decompose maps:
Operation | Singleton Maps | Key-Value Pair Maps |
---|---|---|
Decompose a map |
|
|
Compose a map |
|
|
Create a single entry |
|
|
Extract the key part of a single entry |
|
|
Extract the value part of a single entry |
|
|
The functions defined in this section use a conventional namespace prefix map
, which
is assumed to be bound to the namespace URI http://www.w3.org/2005/xpath-functions/map
.
The function call map:get($map, $key)
can be used to retrieve the value associated with a given key.
There is no operation to atomize a map or convert it to a string. The function fn:serialize
can in some cases
be used to produce a JSON representation of a map.
Because a map is a function item, functions that apply to functions also apply
to maps. A map is an anonymous function, so fn:function-name
returns the empty
sequence; fn:function-arity
always returns 1
.
Maps may be compared using the fn:deep-equal
function.
There is no function or operator to atomize a map or convert it to a string (other than fn:serialize
,
which can be used to serialize some maps as JSON texts).
Arrays were introduced as a new datatype in XDM 3.1. This section describes functions that operate on arrays.
An array is an additional kind of item. An array of size N is a mapping from the integers (1 to N) to a set of values, called the members of the array, each of which is an arbitrary sequence. Because an array is an item, and therefore a sequence, arrays can be nested.
An array acts as a function from integer positions to associated values, so the
function call $array($index)
can be used to retrieve the array member at a given position.
The function corresponding to the array has the signature
function($index as xs:integer) as item()*
.
The fact that an array is a function item allows it to be passed as an argument to higher-order functions
that expect a function item as one of their arguments.
The functions defined in this section use a conventional namespace prefix array
, which
is assumed to be bound to the namespace URI http://www.w3.org/2005/xpath-functions/array
.
As with all other values, arrays are treated as immutable.
For example, the array:reverse
function returns an array that differs from the supplied
array in the order of its members, but the supplied array is not changed by the operation. Two calls
on array:reverse
with the same argument will return arrays that are indistinguishable from
each other; there is no way of asking whether these are “the same array”. Like sequences, arrays have no identity.
All functionality on arrays is defined in terms of two primitives:
The function array:members
decomposes an array to a sequence of
The function array:of-members
composes an array from a sequence of
A record(value as item()*)
, that is, a map containing a single
entry whose key is the string "value"
and whose value is the encapsulated sequence.
The XPath language provides explicit syntax for certain operations on arrays. These constructs can also be specified in terms of functions primitives:
array { $sequence }
constructs an array whose members
are the items in $sequence
. Every member of this array will
be a singleton item. This can be defined as
array:join($sequence ! array { . })
.
[E1, E2, E3, ..., En]
constructs an array in which
E1
is the first member, E2
is the second member,
and so on. The result is equivalent to the expression
array:join(([ E1 ], [ E2 ], ... [ En ])).
The lookup expression $array?*
is equivalent to
array:split($array)?*
.
The lookup expression $array?$N
, where $N
is an integer within the bounds of the array, is equivalent to
array:split($array)[$N]?*
.
Similarly, applying the array as a function, $array($N)
,
is also equivalent to
array:split($array)[$N]?*
array { $sequence }
constructs an array whose members
are the items in $sequence
. Every member of this array will
be a singleton item. This can be defined as
array:of-members($sequence ! { 'value': . })
.
[E1, E2, E3, ..., En]
constructs an array in which
E1
is the first member, E2
is the second member,
and so on. The result is equivalent to the expression
array:of-members(({ 'value': E1 }, { 'value': E2 },
{ 'value': E3 }, ... { 'value': En })).
The lookup expression $array?*
is equivalent to
array:members($array) ! ?value
.
The lookup expression $array?$N
, where $N
is an integer within the bounds of the array, is equivalent to
array:members($array)[$N]?value
.
Similarly, applying the array as a function, $array($N)
,
is also equivalent to
array:members($array)[$N]?value
Constructor functions are used to convert a supplied value to a given type, and the name of the function is the same as the name of the target type. This section describes constructor functions corresponding to the following types:
Simple types (atomic types, union types, and list types as
defined in
These constructor functions always take a single argument.
Record tests defined as
These take one argument for each named field of the record test.
Constructor functions for record types are defined in
Constructor functions are defined for all user-defined named simple types, and for most built-in atomic, list,
and union types. The only named simple types that have no constructor function are those that have no instances
other than instances of their derived types: specifically, xs:anySimpleType
, xs:anyAtomicType
,
and xs:NOTATION
.
Every built-in atomic
type that is defined in xs:anyAtomicType
and xs:NOTATION
, has an
associated constructor function. The type xs:untypedAtomic
, defined
in xs:yearMonthDuration
and xs:dayTimeDuration
defined
in xs:dateTimeStamp
introduced in
A constructor function is not defined for xs:anyAtomicType
as there are no atomic values with type annotation xs:anyAtomicType
at runtime, although this can be a statically inferred type.
A constructor function is not defined for xs:NOTATION
since it is defined as an abstract type in xs:NOTATION
then a constructor function is defined for it.
See
The form of the constructor function for an atomic type
If $arg
is the empty sequence, the empty sequence is returned. For
example, the signature of the constructor function corresponding to the
xs:unsignedInt
type defined in
Calling the constructor function xs:unsignedInt(12)
returns
the xs:unsignedInt
value 12. Another call of that constructor
function that returns the same xs:unsignedInt
value is
xs:unsignedInt("12")
. The same result would also be returned if the
constructor function were to be called with a node that had a typed value equal
to the xs:unsignedInt
12. The standard features described in
The semantics of the constructor function
xs:TYPE(arg)
are identical to the semantics of
arg
cast as xs:TYPE?
. See
If the argument to a constructor function is a literal, the result of the
function
Special rules apply to constructor functions for xs:QName
and types derived from xs:QName
and xs:NOTATION
. See
The argument is optional, and defaults to the context value (which will be atomized if necessary).
The following constructor functions for the built-in atomic types are supported:
Implementations xs:float("-0.0E0")
.
But because
Implementations xs:double("-0.0E0")
.
But because
See
See xs:ENTITY
and types derived from it.
Special rules apply to constructor functions for the types xs:QName
and xs:NOTATION
, for two reasons:
Values cannot belong directly to the type xs:NOTATION
, only to its subtypes.
The lexical representation of these types uses namespace prefixes, whose meaning is context-dependent.
These constraints result in the following rules:
There is no constructor function for xs:NOTATION
. Constructors are defined, however, for xs:QName
,
for types derived or constructed from xs:QName
, and for types
derived or constructed from xs:NOTATION
.
When converting from an xs:string
, the prefix within the lexical
xs:QName
supplied
as the argument is resolved to a namespace URI using the statically known
namespaces from the static context. If the lexical xs:QName
has no prefix, the
namespace URI of the resulting expanded-QName is the default namespace for elements and types,
taken from the static context. Components of the static context are
defined in
When a constructor function for a namespace-sensitive type is used as a literal function item
or in a partial function application (for example, xs:QName#1
or xs:QName(?)
) the namespace
bindings that are relevant are those from the static context of the literal function item or partial function application.
When a constructor function for a namespace-sensitive type is obtained by means of the fn:function-lookup
function, the relevant namespace bindings are those from the static context of the call on fn:function-lookup
.
When the supplied argument to the xs:QName
constructor
function is a node, the node is atomized in the usual way, and if the result is xs:untypedAtomic
it is then
converted as if a string had been supplied. The effect might not be what is desired.
For example, given the attribute xsi:type="my:type"
, the expression
xs:QName(@xsi:type)
might fail on the grounds that the prefix my
is undeclared. This is because the namespace bindings are taken from the static context
(that is, from the query or stylesheet), and not from the source document containing the
@xsi:type
attribute. The solution to this problem is to use the function call
resolve-QName(@xsi:type, .)
instead.
Each of the three built-in list
types defined in xs:NMTOKENS
, xs:ENTITIES
, and xs:IDREFS
, has an
associated constructor function.
The function signatures are as follows:
The semantics are equivalent to casting to the corresponding types from xs:string
.
All three of these types have the facet minLength = 1
meaning that there must
always be at least one item in the list. The return type, however, allows for the fact that when the argument to
the function is an empty sequence, the result is an empty sequence.
In the case of atomic types, it is possible to use an expression such as
xs:date(@date-of-birth)
to convert an attribute value to an instance of xs:date
,
knowing that this will work both in the case where the attribute is already annotated as xs:date
,
and also in the case where it is xs:untypedAtomic
. This approach does not work with list types,
because it is not permitted to use a value of type xs:NMTOKEN*
as input to the constructor
function xs:NMTOKENS
. Instead, it is necessary to use conditional logic that performs the conversion
only in the case where the input is untyped:
if (@x instance of attribute(*, xs:untypedAtomic)) then xs:NMTOKENS(@x) else data(@x)
There is a constructor function for the union type xs:numeric
defined in
The semantics are determined by the rules in
If the argument is an instance of xs:double
, xs:float
, or xs:decimal
,
then the result is an instance of the same primitive type, with the same value;
If the argument is an instance of xs:boolean
, the result is the xs:double
value
0.0e0
or 1.0e0
;
If the argument is an instance of xs:string
or xs:untypedAtomic
, then:
If the value is in the lexical space of xs:double
, the result will be the
corresponding xs:double
value;
Otherwise, a dynamic error
The result will never be an instance of xs:float
, xs:decimal
,
or xs:integer
. This is because xs:double
appears first in the list of member
types of xs:numeric
, and its lexical space subsumes the lexical space of the other numeric
types. Thus, unlike XPath numeric literals, the result does not depend on the lexical form of the supplied
value. The reason for this design choice is to retain compatibility with the function conversion rules:
functions such as fn:abs
and fn:round
are declared to expect an instance
of xs:numeric
as their first or only argument, and compatibility with the function conversion
rules defined in earlier versions of these specifications demands that when an untyped atomic value
(or untyped node) is supplied as the argument, it is converted to an xs:double
value
even if its lexical form is that (say) of an integer.
In all other cases, a dynamic error
In the case of an implementation that supports XSD 1.1, there is a constructor function
associated with the built-in union type xs:error
.
The function signature is as follows:
The semantics are equivalent to casting to the corresponding union type (see
Because xs:error
has no member types, and therefore has an empty value space, casting
will always fail with a dynamic error except in the case where the supplied argument is an empty
sequence, in which case the result is also an empty sequence.
For every
For named atomic types, the rules
are the same as the rules for constructing built-in derived atomic types defined in T
,
the signature of the function takes the form T($value as xs:anyAtomicType? := .) as T?
,
and the semantics are the same as casting to derived types: see
For named union types, the rules
follow the same principles as the rules for constructing built-in union types defined in U
,
the signature of the function takes the form U($value as xs:anyAtomicType? := .) as U?
,
and the semantics are the same as casting to union types: see
For named list types, the rules
follow the same principles as the rules for constructing built-in list types defined in L
,
where the item type of L
is I
,
the signature of the function takes the form L($value as xs:string? := .) as I*
,
and the semantics are the same as casting to list types: see
Constructor functions are available both for named types defined in an imported schema (that is,
named simple types in the xs:string
, and named local union types follow the same rules as
union types defined in a schema.
Special rules apply to constructor functions for namespace-sensitive types, that is,
atomic types derived from xs:QName
and xs:NOTATION
, list types that have
a namespace-sensitive item type, and union types that have a namespace-sensitive member type. See
Consider a situation where the static context contains an atomic type
called hatSize
defined in a schema whose target namespace is bound
to the prefix eg
. In such a case the following constructor function is available to users:
The resulting function may be used in an expression such as eg:hatSize("10½")
.
In the case of an atomic type A, the return type of the function is A?
, reflecting
the fact that the result will be an empty sequence if the input is an empty sequence. For a union or list type,
the return type of the function is specified only as xs:anyAtomicType*
. Implementations performing
static type checking will often be able to compute a more specific result type. For example, if the target type
is a list type whose item type is the atomic type A, the result will always be an instance of A*;
if the target type is a pure union type U then the result will always be an instance of U?.
In general, however, applications needing interoperable behavior on implementations that do strict static type
checking will need to use a treat as
expression to assert the specific type of the result.
To construct an instance of a user-defined type
that is not in a namespace, it is possible to use an
EQName
(for example Q{}hatsize(17)
). Alternatives are
to use a cast expression (17 cast as hatsize
) or (if the host language allows it)
to undeclare the default function namespace.
For every named item type in the static context (See
For example, if there is a named item type with the XQuery definition:
then there will be a function definition equivalent to:
Equivalently using XSLT syntax, if there is a named item type with the XSLT definition:
then there will be a function definition equivalent to:
The rules defining the relationship of the function definition to the record test are as follows:
The name of the function is the same as the name of the named item type. A static error occurs if this clashes with the name and arity of other function definitions in the static context.
For every named field in the record test, in order, there is is one parameter defined as follows:
If the name of the field is an NCName
, then
the name of the parameter is the name of the field.
Otherwise, the name of the parameter is argNZ
,
where arg
is the literal string "arg"
,
N
is the ordinal position of the field, counting from 1 (one),
and Z
is an implementation-defined suffix, added only when
needed to make the parameter name unique.
The declared type of the parameter is the same as the declared
type of the field, but if the field is declared optional, then the
occurrence indicator is adjusted to ?
or *
if needed to make the empty sequence a valid value for the parameter.
If the field is optional and if all subsequent fields are optional,
then the parameter is declared as optional with a default value of ()
(the empty sequence). In all other cases the parameter is declared as required.
It is immaterial whether the record test is extensible; the constructor function cannot be used to create entries in the resulting map other than entries corresponding to named fields.
The return type of the constructor function is the record test (with no occurrence indicator).
The body of the function constructs a map having one entry for
each mandatory field in the record test, and one entry for each optional field
in the record test for which an actual value other than the empty sequence
is supplied in the arguments of the function call. The key of the entry
is the field name as an instance of xs:string
, and the corresponding
value is the value supplied in the arguments to the constructor function call,
after applying the coercion rules.
Consider the record test (in XQuery syntax):
This will result in an implicit function declaration equivalent to:
Constructor functions and cast expressions accept an expression and return a value
of a given type. They both convert a source value, xs:date("2003-01-01")
means exactly the same as
"2003-01-01"
cast as xs:date?
.
The cast expression takes a type name to indicate the target type of the conversion.
See
Where the argument to a cast is a literal, the result of the function
The general rules for casting from primitive types to primitive types are defined in
xs:string
(and xs:untypedAtomic
)
follow in
xs:untypedAtomic
, xs:integer
, xs:yearMonthDuration
and xs:dayTimeDuration
; and where the text refers to types derived from a particular
primitive type T, the reference is to types for which T is the nearest
ancestor-or-self primitive type in the type hierarchy.
When casting from xs:string
or xs:untypedAtomic
the semantics in
This section defines casting between xs:untypedAtomic
,
xs:integer
and the two derived types of
xs:duration
: xs:yearMonthDuration
and xs:dayTimeDuration
which are treated as primitive types in this section. The type conversions
that are supported between primitive atomic types are indicated in the table below;
casts between other (non-primitive) types are defined in terms of these primitives.
In this table, there is a
row for each Y
indicates that a conversion from values of the type to which
the row applies to the type to which the column applies is supported;
N
indicates that there are no supported conversions from values
of the type to which the row applies to the type to which the column applies;
and M
indicates that a conversion from values of the type to
which the row applies to the type to which the column applies may succeed for
some values in the value space and fail for others.
xs:NOTATION
as an abstract type.
Thus, casting to xs:NOTATION
from any other type including xs:NOTATION
is not permitted and raises a static error xs:NOTATION
to another subtype of
xs:NOTATION
is permitted.
Casting is not supported to or from xs:anySimpleType
. Thus, there is no row
or column for this type in the table below. For any node that has not been validated or
has been validated as xs:anySimpleType
, the typed value of the node is an
atomic value of type xs:untypedAtomic
. There are no atomic values with the
type annotation xs:anySimpleType
at runtime.
Casting to
xs:anySimpleType
is not permitted and raises a static error:
Similarly, casting is not supported to or from xs:anyAtomicType
and will raise
a static error: xs:anyAtomicType
at runtime, although this can be a
statically inferred type.
If casting is attempted from an
In the following table, the columns and rows are identified by short codes that identify simple types as follows:
In the following table, the notation S\T
indicates that the source
(S
) of the conversion is indicated in the column below the
notation and that the target (T
) is indicated in the row to the
right of the notation.
S\T | uA | str | flt | dbl | dec | int | dur | yMD | dTD | dT | tim | dat | gYM | gYr | gMD | gDay | gMon | bool | b64 | hxB | aURI | QN | NOT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uA | Y | Y | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M |
str | Y | Y | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M | M |
flt | Y | Y | Y | Y | M | M | N | N | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N |
dbl | Y | Y | Y | Y | M | M | N | N | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N |
dec | Y | Y | Y | Y | Y | Y | N | N | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N |
int | Y | Y | Y | Y | Y | Y | N | N | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N |
dur | Y | Y | N | N | N | N | Y | Y | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N |
yMD | Y | Y | N | N | N | N | Y | Y | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N |
dTD | Y | Y | N | N | N | N | Y | Y | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N |
dT | Y | Y | N | N | N | N | N | N | N | Y | Y | Y | Y | Y | Y | Y | Y | N | N | N | N | N | N |
tim | Y | Y | N | N | N | N | N | N | N | N | Y | N | N | N | N | N | N | N | N | N | N | N | N |
dat | Y | Y | N | N | N | N | N | N | N | Y | N | Y | Y | Y | Y | Y | Y | N | N | N | N | N | N |
gYM | Y | Y | N | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N | N | N | N | N | N |
gYr | Y | Y | N | N | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N | N | N | N | N |
gMD | Y | Y | N | N | N | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N | N | N | N |
gDay | Y | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N | N | N |
gMon | Y | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N | N |
bool | Y | Y | Y | Y | Y | Y | N | N | N | N | N | N | N | N | N | N | N | Y | N | N | N | N | N |
b64 | Y | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Y | Y | N | N | N |
hxB | Y | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Y | Y | N | N | N |
aURI | Y | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Y | N | N |
QN | Y | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Y | M |
NOT | Y | Y | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | Y | M |
Casting is permitted from any xs:string
and xs:untypedAtomic
.
When a value of any simple type is cast as xs:string
, the
derivation of the xs:string
value
If xs:string
or a type derived from
xs:string
,
If xs:anyURI
, the type conversion is
performed without escaping any characters.
If xs:QName
or xs:NOTATION
:
if the qualified name
has a prefix, then
otherwise
If
If xs:integer
,
If xs:decimal
, then:
If xs:integer
, that is, if there are no
significant digits after the decimal point, then the
value is converted from an xs:decimal
to an xs:integer
and the resulting
xs:integer
is converted to an
xs:string
using the rule above.
Otherwise, the canonical lexical representation of
If xs:float
or
xs:double
, then:
xs:string
in the lexical space of xs:double
or xs:float
that when
converted to an xs:double
or xs:float
under the rules of NaN
if NaN
.
In addition,
If xs:decimal
and the
resulting xs:decimal
is converted to an
xs:string
according to the rules above, as though using an
implementation of xs:decimal
that imposes no limits on the
totalDigits
or
fractionDigits
facets.
If "0"
or "-0"
respectively.
If "INF"
or "-INF"
respectively.
In other cases, the result consists of a mantissa, which has the lexical form
of an xs:decimal
, followed by the letter "E", followed by an exponent which has
the lexical form of an xs:integer
. Leading zeroes and "+" signs are prohibited
in the exponent. For the mantissa, there must be a decimal point, and there must
be exactly one digit before the decimal point, which must be non-zero. The "+"
sign is prohibited. There must be at least one digit after the decimal point.
Apart from this mandatory digit, trailing zero digits are prohibited.
The above rules allow more than one representation of the same value.
For example, the xs:float
value whose exact decimal representation is 1.26743223E15
might be represented by any of the strings "1.26743223E15"
,
"1.26743222E15"
or "1.26743224E15"
(inter alia).
It is implementation-dependent which of these representations is chosen.
If xs:dateTime
, xs:date
or xs:time
, xs:string
using the functions
described in year
component is
cast to xs:string
using eg:convertYearToString
.
The month
, day
, hour
and minute
components are cast to xs:string
using eg:convertTo2CharString
.
The second
component is cast to xs:string
using
eg:convertSecondsToString
. The timezone component, if present, is
cast to xs:string
using eg:convertTZtoString
.
Note that the hours component of the resulting string
will never be "24"
. Midnight is always represented as "00:00:00"
.
If xs:yearMonthDuration
or xs:dayTimeDuration
,
If xs:duration
then let SV
cast as xs:yearMonthDuration
, and let SV
cast as xs:dayTimeDuration
; Now, let the next intermediate value, SYM
cast as
TT
SDT
cast as
TT
"P0M"
, then
If "PT0S"
, then
Otherwise,
In all other cases,
To cast as xs:untypedAtomic
the value is cast as
xs:string
, as described above, and the type annotation changed
to xs:untypedAtomic
.
The string representations of numeric values are backwards compatible
with XPath 1.0 except for the special values positive and negative
infinity, negative zero and values outside the range 1.0e-6
to 1.0e+6
.
When a value of any simple type is cast as xs:float
, the xs:float
If xs:float
, then
If xs:double
, then
if xs:double
value
INF
, -INF
, NaN
,
positive zero, or negative zero, then xs:float
value INF
,
-INF
, NaN
, positive zero, or
negative zero respectively.
otherwise, m × 2^e
where the mantissa
m
and exponent e
are signed
xs:integer
s whose value range is defined in
if m
(the mantissa of
xs:float
value (-2^24-1 to +2^24-1)
, then it
is divided by 2^N
where
N
is the lowest positive
xs:integer
that brings the result
of the division within the permitted range, and
the exponent e
is increased by
N
. This is integer division (in
effect, the binary value of the mantissa is
truncated on the right). Let M
be
the mantissa and E
the exponent
after this adjustment.
if E
exceeds 104
(the
maximum exponent value in the value space of
xs:float
) then xs:float
value INF
or -INF
depending on the sign of M
.
if E
is less than -149
(the minimum exponent value in the value space
of xs:float
) then xs:float
value positive or
negative zero depending on the sign of M
otherwise, xs:float
value M × 2^E
.
If xs:decimal
, or
xs:integer
, then xs:float(
cast as xs:string)
and the conversion is complete.
If xs:boolean
, 1.0E0
if true
and to 0.0E0
if false
and the conversion is complete.
If xs:untypedAtomic
or xs:string
, see
XSD 1.1 adds the value +INF
to the lexical space,
as an alternative to INF
. XSD 1.1 also adds negative zero
to the value space.
Implementations xs:float("-0.0E0")
.
But because
When a value of any simple type is cast as xs:double
, the
xs:double
value
If xs:double
, then
If xs:float
or a type derived
from xs:float
, then
if xs:float
value
INF
, -INF
, NaN
,
positive zero, or negative zero, then xs:double
value INF
,
-INF
, NaN
, positive zero, or
negative zero respectively.
otherwise, m × 2^e
where the
mantissa m
and exponent e
are
signed xs:integer
values whose value range
is defined in xs:double
value
m × 2^e
.
If xs:decimal
or
xs:integer
, then xs:double(
cast as xs:string)
and the conversion is complete.
If xs:boolean
, 1.0E0
if true
and to 0.0E0
if false
and the conversion is complete.
If xs:untypedAtomic
or xs:string
, see
XSD 1.1 adds the value +INF
to the lexical space,
as an alternative to INF
. XSD 1.1 also adds negative zero
to the value space.
Implementations xs:double("-0.0E0")
.
But because
When a value of any simple type is cast as xs:decimal
, the
xs:decimal
value
If xs:decimal
,
xs:integer
or a type derived from them, then
xs:decimal
value if need be, and the conversion is complete.
If xs:float
or
xs:double
, then xs:decimal
value, within the set of
xs:decimal
values that the implementation is
capable of representing, that is numerically closest to
xs:decimal
, (see
xs:float
or xs:double
values
NaN
, INF
, or -INF
, a dynamic
error is raised
If xs:boolean
, 1.0
if 1
or true
and to 0.0
if
0
or false
and the
conversion is complete.
If xs:untypedAtomic
or xs:string
, see
When a value of any simple type is cast as xs:integer
, the
xs:integer
value
If xs:integer
, or a type derived
from xs:integer
, then xs:integer
value
if need be, and the conversion is complete.
If xs:decimal
, xs:float
or
xs:double
, then xs:integer
. Thus, casting 3.1456
returns 3
and -17.89
returns
-17
. Casting 3.124E1
returns 31
. If xs:float
or
xs:double
values NaN
,
INF
, or -INF
, a dynamic error is raised
If xs:boolean
, 1
if 1
or true
and to 0
if 0
or false
and the conversion is complete.
If xs:untypedAtomic
or xs:string
, see
When a value of type xs:untypedAtomic
, xs:string
,
a type derived from xs:string
,
xs:yearMonthDuration
or xs:dayTimeDuration
is
cast as xs:duration
, xs:yearMonthDuration
or
xs:dayTimeDuration
,
If
If xs:duration
, or a type derived
from xs:duration
, but not
xs:dayTimeDuration
or a type derived from
xs:dayTimeDuration
, and xs:yearMonthDuration
, then
If xs:duration
, or a type derived
from duration
, but not
xs:yearMonthDuration
or a type derived from
xs:yearMonthDuration
, and xs:dayTimeDuration
, then
If xs:yearMonthDuration
or xs:dayTimeDuration
, and xs:duration
, then
If xs:yearMonthDuration
and xs:dayTimeDuration
, the cast is permitted and
returns a xs:dayTimeDuration
with value
0.0
seconds.
If xs:dayTimeDuration
and xs:yearMonthDuration
, the cast is permitted and
returns a xs:yearMonthDuration
with value
0
months.
If xs:untypedAtomic
or xs:string
, see
Note that casting from xs:duration
to
xs:yearMonthDuration
or xs:dayTimeDuration
loses
information. To avoid this, users can cast the xs:duration
value to both an xs:yearMonthDuration
and an
xs:dayTimeDuration
and work with both values.
In several situations, casting to date and time types requires the extraction
of a component from fn:current-dateTime
and converting it to an
xs:string
. These conversions must follow certain rules. For
example, converting an xs:integer
year value requires
converting to an xs:string
with four or more characters, preceded
by a minus sign if the value is negative.
This document defines four functions to perform these conversions. These functions are for illustrative purposes only and make no recommendations as to style or efficiency. References to these functions from the following text are not normative.
The arguments to these functions come from functions defined in this document. Thus, the functions below assume that they are correct and do no range checking on them.
Conversion from
When a value of any primitive type is cast as
xs:dateTime
, the xs:dateTime
value
If xs:dateTime
, then
If xs:date
, then let
eg:convertYearToString( year-from-date(
))
, let eg:convertTo2CharString( month-from-date(
))
, let eg:convertTo2CharString( day-from-date(
))
and let eg:convertTZtoString( timezone-from-date(
))
; xs:dateTime( concat(
, '-',
, '-',
, 'T00:00:00 '
, ) )
.
If xs:untypedAtomic
or
xs:string
, see
When a value of any primitive type is cast as xs:time
,
the xs:time
value
If xs:time
, then
If xs:dateTime
, then
xs:time( concat(
eg:convertTo2CharString( hours-from-dateTime(
)), ':', eg:convertTo2CharString( minutes-from-dateTime(
)), ':', eg:convertSecondsToString( seconds-from-dateTime(
)), eg:convertTZtoString( timezone-from-dateTime(
)) ))
.
If xs:untypedAtomic
or xs:string
, see
When a value of any primitive type is cast as xs:date
,
the xs:date
value
If xs:date
, then
If xs:dateTime
, then let
eg:convertYearToString( year-from-dateTime(
))
, let eg:convertTo2CharString( month-from-dateTime(
))
, let eg:convertTo2CharString( day-from-dateTime(
))
and let eg:convertTZtoString(timezone-from-dateTime(
))
; xs:date( concat(
, '-',
, '-',
) )
.
If xs:untypedAtomic
or xs:string
, see
When a value of any primitive type is cast as
xs:gYearMonth
, the xs:gYearMonth
value
If xs:gYearMonth
, then
If xs:dateTime
, then let
eg:convertYearToString( year-from-dateTime(
))
, let eg:convertTo2CharString( month-from-dateTime(
))
and let eg:convertTZtoString( timezone-from-dateTime(
))
; xs:gYearMonth( concat(
, '-',
) )
.
If xs:date
, then let
eg:convertYearToString( year-from-date(
))
, let eg:convertTo2CharString( month-from-date(
))
and let eg:convertTZtoString( timezone-from-date(
))
; xs:gYearMonth( concat(
, '-',
) )
.
If xs:untypedAtomic
or xs:string
, see
When a value of any primitive type is cast as xs:gYear
,
the xs:gYear
value
If xs:gYear
, then
If xs:dateTime
, let
eg:convertYearToString( year-from-dateTime(
))
and let eg:convertTZtoString( timezone-from-dateTime(
))
; xs:gYear(concat(
))
.
If xs:date
, let
eg:convertYearToString( year-from-date(
))
; and let eg:convertTZtoString( timezone-from-date(
))
; xs:gYear(concat(
))
.
If xs:untypedAtomic
or xs:string
, see
When a value of any primitive type is cast as
xs:gMonthDay
, the xs:gMonthDay
value
If xs:gMonthDay
, then
If xs:dateTime
, then let
eg:convertTo2CharString( month-from-dateTime(
))
, let eg:convertTo2CharString( day-from-dateTime(
))
and let eg:convertTZtoString( timezone-from-dateTime(
))
; xs:gYearMonth( concat(
'--',
'-',
) )
.
If xs:date
, then let
eg:convertTo2CharString( month-from-date(
))
, let eg:convertTo2CharString( day-from-date(
))
and let eg:convertTZtoString( timezone-from-date(
))
; xs:gYearMonth( concat(
'--',
, '-',
) )
.
If xs:untypedAtomic
or xs:string
, see
When a value of any primitive type is cast as xs:gDay
,
the xs:gDay
value
If xs:gDay
, then
If xs:dateTime
, then let
eg:convertTo2CharString( day-from-dateTime(
))
and let eg:convertTZtoString( timezone-from-dateTime(
))
; xs:gDay(
concat( '---'
, ))
.
If xs:date
, then let
eg:convertTo2CharString( day-from-date(
))
and let eg:convertTZtoString( timezone-from-date(
))
; xs:gDay(
concat( '---'
, ))
.
If xs:untypedAtomic
or xs:string
, see
When a value of any primitive type is cast as xs:gMonth
,
the xs:gMonth
value
If xs:gMonth
, then
If xs:dateTime
, then let
eg:convertTo2CharString( month-from-dateTime(
))
and let eg:convertTZtoString( timezone-from-dateTime(
))
; xs:gMonth(
concat( '--'
, ))
.
If xs:date
, then let
eg:convertTo2CharString( month-from-date(
))
and let eg:convertTZtoString( timezone-from-date(
))
; xs:gMonth(
concat( '--'
, ))
.
If xs:untypedAtomic
or xs:string
, see
When a value of any xs:boolean
, the
xs:boolean
value
If xs:boolean
, then
If xs:float
, xs:double
,
xs:decimal
or xs:integer
and
0
, +0
, -0
,
0.0
, 0.0E0
or NaN
, then
false
.
If xs:float
, xs:double
,
xs:decimal
or xs:integer
and
true
.
If xs:untypedAtomic
or xs:string
, see
Values of type xs:base64Binary
can be cast as
xs:hexBinary
and vice versa, since the two types have the same
value space. Casting to xs:base64Binary
and
xs:hexBinary
is also supported from the same type and from
xs:untypedAtomic
, xs:string
and subtypes of
xs:string
using
Casting to xs:anyURI
is supported only from the same type,
xs:untypedAtomic
or xs:string
.
When a value of any xs:anyURI
, the
xs:anyURI
value
If xs:untypedAtomic
or xs:string
see
Casting from xs:string
or xs:untypedAtomic
to
xs:QName
or xs:NOTATION
is described in
It is also possible to cast from xs:NOTATION
to xs:QName
,
or from xs:QName
to
any type derived by restriction from xs:NOTATION
. (Casting to xs:NOTATION
itself is not allowed, because xs:NOTATION
is an abstract type.) The resulting
xs:QName
or xs:NOTATION
has the same prefix, local name, and namespace URI
parts as the supplied value.
See
The
value space of ENTITY is the set of all strings that match the
NCName production ... and have been
declared as an unparsed entity in a document type definition.
However,
xs:ENTITY
match declared unparsed entities. Thus, this rule is relaxed in this specification and, in casting to xs:ENTITY
and types derived from it, no check is made that the values correspond to declared unparsed entities.
This section applies when the supplied value SV
is an instance of xs:string
or xs:untypedAtomic
,
including types derived from these by restriction. If the value is
xs:untypedAtomic
, it is treated in exactly the same way as a
string containing the same sequence of characters.
The supplied string is mapped to a typed value of the target type as defined in whiteSpace
facet for the datatype. The resulting whitespace-normalized string
must be a valid lexical form for the datatype. The semantics of casting follow the rules of
XML Schema validation. For example, "13" cast as xs:unsignedInt
returns
the xs:unsignedInt
typed
value 13
. This could also be written xs:unsignedInt("13")
.
The target type can be any simple type other than an abstract type. Specifically, it can be a type whose variety is atomic, union, or list. In each case the effect of casting to the target type is the same as constructing an element with the supplied value as its content, validating the element using the target type as the governing type, and atomizing the element to obtain its typed value.
When the target type is a derived type that is restricted by a pattern facet, the
lexical form is first checked against the pattern before further casting
is attempted (See
For example, consider a user-defined type my:boolean
which is derived by
restriction from xs:boolean
and specifies the pattern facet value="0|1"
.
The expression "true" cast as my:boolean
would fail with a dynamic
error
Facets other than pattern
are checked my:height
defined as a restriction of xs:integer
with the facet <maxInclusive value="84"/>
,
then the expression "100" cast as my:height
would fail with a dynamic
error
Casting to the types xs:NOTATION
, xs:anySimpleType
,
or xs:anyAtomicType
is not permitted because these types are abstract (they have
no immediate instances).
Special rules apply when casting to namespace-sensitive types. The types xs:QName
and xs:NOTATION
are namespace-sensitive. Any type derived by restriction from
a namespace-sensitive type is itself namespace-sensitive, as is any union type having a
namespace-sensitive type among its members, and any list type having a namespace-sensitive type
as its item type. For details, see
This version of the specification allows casting between xs:QName
and xs:NOTATION
in either direction; this was not permitted in the previous Recommendation. This version also removes
the rule that only a string literal (rather than a dynamic string) may be cast to an xs:QName
When casting to a numeric type:
If the value is too large or too small to be accurately represented by the implementation,
it is handled as an overflow or underflow as defined in
If the target type is xs:float
or xs:double
, the string -0
(and equivalents
such as -0.0
or -000
)
In casting to xs:decimal
or to a type derived from xs:decimal
,
if the value is not too large or too small but nevertheless cannot be represented accurately
with the number of decimal digits available to the implementation, the implementation may round
to the nearest representable value or may raise a dynamic error
When casting to xs:duration
, xs:dateTime
, or xs:time
,
if the seconds component has more fractional digits than are supported by the implementation,
excess digits xs:dateTime('2023-12-31T23:59:59.999999999')
is guaranteed to deliver an xs:dateTime
value whose year component is 2023 rather than 2024.
Implementations are required to support millisecond precision or greater.
In casting to xs:date
, xs:dateTime
, xs:gYear
,
or xs:gYearMonth
(or types derived from these), if the value is too large or too
small to be represented by the implementation, a dynamic error
In casting to a duration value, if the value is too large or too small to be represented by the
implementation, a dynamic error
For xs:anyURI
, the extent to which an implementation validates the
lexical form of xs:anyURI
is
If the cast fails for any other reason, a dynamic error
Casting from xs:string
and xs:untypedAtomic
to any other type
(primitive or non-primitive) has been described in
A
Casting a value to a derived type can be separated into four cases. In these rules:
The types xs:untypedAtomic
, xs:integer
, xs:yearMonthDuration
,
and xs:dayTimeDuration
are treated as primitive types (alongside the 19 primitive types defined in XSD).
For any atomic type T, let P(T) denote the most specific primitive type
such that itemType-subtype(T, P(T))
is true
.
The rules are then:
When ST is the same type as TT: this case always succeeds, returning SV unchanged.
When itemType-subtype(ST, TT)
is true
: This case is described in
When P(ST) is the same type as P(TT): This case is described in
Otherwise (P(ST) is not the same type as P(TT)): This case is described in
It is always possible to cast an atomic value A to a type T
if the relation A instance of T
is true, provided that T
is not an abstract type.
For example, it is
possible to cast an xs:unsignedShort
to an
xs:unsignedInt
, to an xs:integer
, to an
xs:decimal
, or to a union type
whose member types are xs:integer
and xs:double
.
Since the value space of the original type is a subset of the value space of the target type, such a cast is always successful.
For the expression A instance of T
to be true, T must be
either an atomic type, or a union type that has no constraining facets. It cannot
be a list type, nor a union type derived by restriction from another union type, nor
a union type that has a list type among its member types.
The result will have the same value as the original, but will have a new type annotation:
If T is an atomic type, then the type annotation of the result is T
.
If T is a union type, then the type of the result is an atomic type M
such that M is one of the atomic types in the transitive membership of
the union type T and A instance of M
is true; if there is more
than one type M that satisfies these conditions (which could happen, for example,
if T is the union of two overlapping types such as xs:int
and xs:positiveInteger
) then the first one is used, taking the member types
in the order in which they appear within the definition of the union type.
It is possible to cast an xs:byte
can be cast as
xs:unsignedShort
, provided the value is not negative.
If the value does not conform to the facets defined for the target type, then a dynamic
error is raised xs:string
, in the case of types that have no canonical
lexical representation defined for them).
Note that this will cause casts to fail if the pattern excludes the canonical
lexical representation of the source type. For example, if the type
my:distance
is defined as a restriction of xs:decimal
with a pattern that requires two digits after the decimal point, casting of an
xs:integer
to my:distance
will always fail, because
the canonical representation of an xs:integer
does not conform to
this pattern.
In some cases, casting from a parent type to a derived type requires special
rules. See xs:yearMonthDuration
and xs:dayTimeDuration
. See xs:ENTITY
and types derived from it.
When the
Cast the
If xs:string
or xs:untypedAtomic
, check its value against the
pattern facet of
Cast the value to the
If xs:NOTATION
, assume for the
purposes of this rule that casting to xs:NOTATION
succeeds.
Cast the value down to the
If the target type of a cast expression (or a constructor function) is a type with variety union, the supplied value must be one of the following:
A value of type xs:string
or xs:untypedAtomic
.
This case follows the general rules for casting from strings, and has already been
described in
If the union type has a pattern facet, the pattern is tested against the supplied
value after whitespace normalization, using the whiteSpace
normalization rules of the member datatype against which validation succeeds.
A value that is an instance of one of the atomic types in the transitive
membership of the union type, and of the union type itself. This case has already been described in
This situation only applies when the value is an instance of the union type, which means it will never apply when the union is derived by facet-based restriction from another union type.
A value that is castable to one or more of the atomic types in the transitive membership
of the union type (in the sense that the castable as
operator returns true
).
In this case the supplied value is cast to each atomic type in the transitive membership
of the union type in turn (in the order in which the member types appear in the declaration)
until one of these casts is successful; if none of them is successful, a dynamic error occurs
If the union type has a pattern facet, the pattern is tested against the canonical representation of the result value.
Only the atomic types in the transitive membership of the union type are considered. The
union type may have list types in its transitive membership, but (unless the supplied value
is of type xs:string
or xs:untypedAtomic
, in which case the
rules in
If more than one of these conditions applies, then the casting is done according to the rules for the first condition that applies.
If none of these conditions applies, the cast fails with a dynamic error
Example: consider a type U whose member types are xs:integer
and xs:date
.
The expression "123" cast as U
returns the
xs:integer
value 123
.
The expression current-date() cast as U
returns
the current date as an instance of xs:date
.
The expression 23.1 cast as U
returns the xs:integer
value 23
.
Example: consider a type V whose member types are xs:short
and xs:negativeInteger
.
The expression "-123" cast as V
returns the
xs:short
value -123
.
The expression "-100000" cast as V
returns the
xs:negativeInteger
value -100000
.
The expression 93.7 cast as V
returns the
xs:short
value 93
.
The expression "93.7" cast as V
raises
a dynamic error "93.7"
is not in the lexical space of the union type.
Example: consider a type W that is derived from the above type V
by restriction, with a pattern facet of -?\d\d
.
The expression "12" cast as V
returns the
xs:short
value 12
.
The expression "123" cast as V
raises
an dynamic error "123"
does not match the pattern facet.
If the target type of a cast expression (or a constructor function) is a
type with variety list
, the supplied value must be of type xs:string
or
xs:untypedAtomic
. The rules follow the general principle for
all casts from xs:string
outlined in
If the supplied value is not of type xs:string
or
xs:untypedAtomic
, a type error is raised
The semantics of the operation are consistent with validation: that is, the effect of casting a string S to a list type L is the same as constructing an element or attribute node whose string value is S, validating it using L as the governing type, and atomizing the resulting node. The result will always be either failure, or a sequence of zero or more atomic values each of which is an instance of the item type of L (or if the item type of L is a union type, an instance of one of the atomic types in its transitive membership).
If the item type of the list type is namespace-sensitive, then the
namespace bindings in the static context will be used to
resolve any namespace prefix, in the same way as when the target type is
xs:QName
.
If the list type has a pattern
facet, the pattern must match
the supplied value after collapsing whitespace (an operation equivalent to the
use of the fn:normalize-space
function).
For example, the expression cast "A B C D" as xs:NMTOKENS
produces a sequence of four xs:NMTOKEN
values,
("A", "B", "C", "D")
.
For example, given a user-defined type my:coordinates
defined
as a list of xs:integer
with the facet <xs:length value="2"/>
,
the expression my:coordinates("2 -1")
will return a sequence of two
xs:integer values (2, -1)
, while the expression my:coordinates("1 2 3")
will result in a dynamic error because the length of the list does not conform to the
length
facet. The expression my:coordinates("1.0 3.0")
will also fail because the strings 1.0
and 3.0
are not in the lexical space of xs:integer
.
The error text provided with these errors is non-normative.
Error code used by fn:error
when no other error code is provided.
Raised when fn:apply
is called and the arity of the supplied function is not
the same as the number of members in the supplied array.
This error is raised whenever an attempt is made to divide by zero.
This error is raised whenever numeric operations result in an overflow or underflow.
This error is raised when an integer used to select a member of an array is outside the range of values for that array.
This error is raised when the $length
argument to array:subarray
is negative.
Raised when casting to xs:decimal
if the supplied value exceeds the
implementation-defined limits for the datatype.
Raised by fn:resolve-QName
and fn:QName
when a supplied value does not have the lexical
form of a QName or URI respectively; and when casting to decimal, if the supplied value is NaN
or Infinity.
Raised when casting to xs:integer
if the supplied value exceeds the
implementation-defined limits for the datatype.
Raised when multiplying or dividing a duration by a number, if the number supplied is NaN
.
Raised when casting a string to xs:decimal
if the string has more digits of precision
than the implementation can represent (the implementation also has the option of rounding).
Raised by fn:codepoints-to-string
if the input contains an integer that is not the codepoint
of a
Raised by any function that uses a collation if the requested collation is not recognized.
Raised by fn:normalize-unicode
if the requested normalization form is not
supported by the implementation.
Raised by functions such as fn:contains
if the requested collation does
not operate on a character-by-character basis.
Raised by fn:char
if the supplied character name is not recognized, or
if it represents a codepoint that is not
a
Raised when parsing CSV input if a syntax error in the input CSV is found.
Raised when parsing CSV input if the field-separator
,
record-separator
, or quote-character
option is set to
an invalid value.
Raised when parsing CSV input if the same delimiter character is assigned to more than one role.
Raised by the function from the get
entry of
csv-columns-record
, if its $key
argument is an
xs:string
and is not one of the known column names.
Raised by fn:id
, fn:idref
, and fn:element-with-id
if the node that identifies the tree to be searched is a node in a tree whose root is not
a document node.
Raised by fn:doc
, fn:collection
, and fn:uri-collection
to indicate that either the supplied URI cannot be dereferenced to obtain a resource, or the resource
that is returned is not parseable as XML.
Raised by fn:doc
, fn:collection
, and fn:uri-collection
to indicate that it is not possible to
return a result that is guaranteed deterministic.
Raised by fn:collection
and fn:uri-collection
if the argument is not a valid xs:anyURI
.
Raised (optionally) by fn:doc
if the argument
is not a valid xs:anyURI
.
Raised by fn:parse-xml
if the supplied string is not a well-formed and namespace-well-formed XML document;
or if DTD validation is requested and the document is not valid against its DTD.
Raised when fn:serialize
is called and the processor does not support serialization,
in cases where the host language makes serialization an optional feature.
Raised by fn:parse-html
if the supplied string is not a well-formed HTML document.
Raised by fn:parse-html
if a key passed to $options
, or its value,
is not supported by the implementation.
This error is raised if the decimal format name supplied to fn:format-number
is not a valid QName,
or if the prefix in the QName is undeclared, or if there is no decimal format in the static context with
a matching name.
This error is raised if a decimal format value supplied to
fn:format-number
is not valid for the associated property,
or if the properties of the decimal format resulting from a supplied map
do not have distinct values.
This error is raised if the picture string supplied to fn:format-number
or
fn:format-integer
has invalid syntax.
Raised when casting to date/time datatypes, or performing arithmetic with date/time values, if arithmetic overflow or underflow occurs.
Raised when casting to duration datatypes, or performing arithmetic with duration values, if arithmetic overflow or underflow occurs.
Raised by adjust-date-to-timezone
and related functions if the supplied timezone is invalid.
This error is raised if the picture string or calendar supplied to fn:format-date
, fn:format-time
,
or fn:format-dateTime
has invalid syntax.
This error is raised if the picture string supplied to fn:format-date
selects a component that is not present in a date, or if the picture string supplied to fn:format-time
selects a component that is not present in a time.
Raised by fn:hash
if the effective value of the supplied
algorithm is not one of the values supported by the implementation.
Raised by functions such as fn:json-doc
, fn:parse-json
or fn:json-to-xml
if the string supplied as input does not conform to the JSON grammar (optionally with implementation-defined extensions).
Raised by functions such as map:merge
, fn:json-doc
,
fn:parse-json
or fn:json-to-xml
if the input contains duplicate keys, when the chosen policy is to reject duplicates.
Raised by fn:json-to-xml
if validation
is requested when the processor does not support schema validation or typed nodes.
Raised by functions such as map:merge
, fn:parse-json
,
and fn:xml-to-json
if the $options
map contains an invalid entry.
Raised by fn:xml-to-json
if the XML input does not
conform to the rules for the XML representation of JSON.
Raised by fn:xml-to-json
if the XML input uses the attribute
escaped="true"
or escaped-key="true"
, and the corresponding string
or key contains an invalid JSON escape sequence.
Raised by fn:resolve-QName
and analogous functions if a supplied QName has a
prefix that has no binding to a namespace.
Raised by fn:resolve-uri
if no base URI is available for resolving a relative URI.
Raised by fn:load-xquery-module
if the supplied module URI is zero-length.
Raised by fn:load-xquery-module
if no module can be found with the supplied module URI.
Raised by fn:load-xquery-module
if a static error
(including a statically detected type error) is encountered when processing the library module.
Raised by fn:load-xquery-module
if a value is supplied for the initial context
item or for an external variable, and the value does not conform to the required
type declared in the dynamically loaded module.
Raised by fn:load-xquery-module
if no XQuery processor is available supporting the requested
XQuery version (or if none is available at all).
A general-purpose error raised when casting, if a cast between two datatypes is allowed in principle,
but the supplied value cannot be converted: for example when attempting to cast the string "nine"
to an integer.
Raised when either argument to fn:resolve-uri
is not a valid URI/IRI.
Raised by fn:zero-or-one
if the supplied value contains more than one item.
Raised by fn:one-or-more
if the supplied value is an empty sequence.
Raised by fn:exactly-one
if the supplied value is not a singleton sequence.
Raised by functions such as fn:max
, fn:min
, fn:avg
, fn:sum
if the supplied sequence contains values inappropriate to this function.
Raised by fn:dateTime
if the two arguments both have timezones and the timezones are different.
A catch-all error for fn:resolve-uri
, recognizing that the implementation can choose between a variety
of algorithms and that some of these may fail for a variety of reasons.
Raised when the input to fn:parse-ietf-date
does not match the prescribed
grammar, or when it represents an invalid date/time such as 31 February.
Raised when the radix supplied to fn:parse-integer
is not in the range 2 to 36.
Raised when the digits in the string supplied to fn:parse-integer
are not in the range appropriate
to the chosen radix.
Raised if the option in an
Raised by regular expression functions such as fn:matches
and fn:replace
if the
regular expression flags contain a character other than i
, m
, q
, s
, or x
.
Raised by regular expression functions such as fn:matches
and fn:replace
if the
regular expression is syntactically invalid.
For functions such as fn:replace
and fn:tokenize
, raises an error if
the supplied regular expression is capable of matching a zero length string.
Raised by fn:replace
to report errors in the replacement string.
Raised by fn:replace
if both the $replacement
and $action
arguments are supplied.
Raised by fn:data
, or by implicit atomization, if applied to a node with no typed value,
the main example being an element validated against a complex type that defines it to have element-only content.
Raised by fn:data
, or by implicit atomization, if the sequence to be atomized contains
a function item other than an array.
Raised by fn:string
, or by implicit string conversion, if the input sequence contains
a function item.
Raised by fn:unparsed-text
or fn:unparsed-text-lines
if the $href
argument contains a fragment identifier,
or if it cannot be resolved to an absolute URI (for example, because the
base-URI property in the static context is absent), or if it cannot be used to
retrieve the string representation of a resource.
Raised by fn:unparsed-text
or fn:unparsed-text-lines
if the $encoding
argument is not a valid encoding name,
if the processor does not support the specified encoding, if the string
representation of the retrieved resource contains octets that cannot be decoded
into Unicode
Raised by fn:unparsed-text
or fn:unparsed-text-lines
if the $encoding
argument is absent and the processor
cannot infer the encoding using external information and the
encoding is not UTF-8.
A dynamic error is raised if the authority component of a URI contains an open square bracket but no corresponding close square bracket.
A dynamic error is raised if no XSLT processor suitable for evaluating a call on
A dynamic error is raised if the parameters supplied to fn:transform
are invalid, for example
if two mutually exclusive parameters are supplied. If a suitable XSLT error code is available (for example in the
case where the requested initial-template
does not exist in the stylesheet), that error code should
be used in preference.
A dynamic error is raised if an XSLT transformation invoked using fn:transform
fails with a
static or dynamic error. The XSLT error code is used if available; this error code provides a fallback when no XSLT
error code is returned, for example because the processor is an XSLT 1.0 processor.
A dynamic error is raised if the fn:transform
function is invoked when XSLT transformation (or a specific
transformation option) has been disabled for security or other reasons.
A dynamic error is raised if the result of the fn:transform
function contains characters available
only in XML 1.1 and the calling processor cannot handle such characters.
Two functions in this specification, fn:analyze-string
and
fn:json-to-xml
, produce results in the form of an XDM node tree that must conform
to a specified schema, defined in this appendix.
In both cases the elements in the result are in the namespace
http://www.w3.org/2005/xpath-functions
, which is therefore the target namespace
of the relevant schema.
A processor xs:redefine
or xs:override
, adding members to substitution
groups, or defining derived types. Processors are not xsi:schemaLocation
or
xsi:type
attributes in the instance being validated.
The schema for this namespace is organized as three schema documents. The first is a simple
umbrella document that includes the other two. A copy can be found at
fn:analyze-string
This schema describes the output of the function fn:analyze-string
.
The schema is reproduced below, and can also be found in
fn:json-to-xml
This schema describes the output of the function fn:json-to-xml
, and the input to the
function fn:xml-to-json
.
The schema is reproduced below, and can also be found in
fn:csv-to-xml
This schema describes the output of the function fn:csv-to-xml
.
The schema is reproduced below, and can also be found in
This Appendix describes some sources of functions that fall outside the scope of the function library defined in this specification. It includes both function specifications and function implementations. Inclusion of a function in this appendix does not constitute any kind of recommendation or endorsement; neither is omission from this appendix to be construed negatively. This Appendix does not attempt to give any information about licensing arrangements for these function specifications or implementations.
A number of W3C Recommendations make use of XPath, and in some cases such Recommmendations define additional functions to be made available when XPath is used in a specific host language.
The various versions of XSLT have all included additional functions intended to be available only when XPath is used within XSLT, and not in other host language environments. Some of these functions were originally defined in XSLT, and subsequently migrated into the core function library defined in this specification.
Generally, the reason that functions have been defined in XSLT rather than in the core library has been that they required additional static or dynamic context information.
XSLT-defined functions share the core namespace http://www.w3.org/2005/xpath-functions
(but in XPath 1.0
and XSLT 1.0, no namespace was defined for these functions).
The conformance rules for XSLT 4.0 require implementations to support either XPath 3.0 or XPath 3.1. Some of the new functions in XPath 3.1, however, must be supported by all XSLT 4.0 implementations whether or not they implement other parts of XPath 3.1.
The following table lists all functions that have been defined in XSLT 1.0, 2.0, or 3.0, and summarizes their status.
Function name | Availability |
---|---|
fn:accumulator-after | XSLT 3.0 only |
fn:accumulator-before | XSLT 3.0 only |
fn:available-system-properties | XSLT 3.0 only |
fn:collation-key | Common to XSLT 3.0 and XPath 3.1 |
fn:copy-of | XSLT 3.0 only |
fn:current | XSLT 1.0, 2.0, and 3.0 |
fn:current-group | XSLT 2.0 and 3.0 |
fn:current-grouping-key | XSLT 2.0 and 3.0 |
fn:current-merge-group | XSLT 3.0 only |
fn:current-merge-key | XSLT 3.0 only |
fn:current-output-uri | XSLT 3.0 only |
fn:document | XSLT 1.0, 2.0, and 3.0 |
fn:element-available | XSLT 1.0, 2.0, and 3.0 |
fn:format-date | XSLT 2.0; migrated to XPath 3.0 and 3.1 |
fn:format-dateTime | XSLT 2.0; migrated to XPath 3.0 and 3.1 |
fn:format-number | XSLT 1.0 and 2.0; migrated to XPath 3.0 and 3.1 |
fn:format-time | XSLT 2.0; migrated to XPath 3.0 and 3.1 |
fn:function-available | XSLT 1.0, 2.0, and 3.0 |
fn:generate-id | XSLT 1.0 and 2.0; migrated to XPath 3.0 and 3.1 |
fn:json-to-xml | Common to XSLT 3.0 and XPath 3.1 |
fn:key | XSLT 1.0, 2.0, and 3.0 |
fn:regex-group | XSLT 2.0 and 3.0 |
fn:snapshot | XSLT 3.0 only |
fn:stream-available | XSLT 3.0 only |
fn:system-property | XSLT 1.0, 2.0, and 3.0 |
fn:type-available | XSLT 2.0 and 3.0 |
fn:unparsed-entity-public-id | XSLT 2.0 and 3.0 |
fn:unparsed-entity-uri | XSLT 1.0, 2.0, and 3.0 |
fn:unparsed-text | XSLT 2.0; migrated to XPath 3.0 and 3.1 |
fn:xml-to-json | Common to XSLT 3.0 and XPath 3.1 |
map:contains | Common to XSLT 3.0 and XPath 3.1 |
map:entry | Common to XSLT 3.0 and XPath 3.1 |
map:find | Common to XSLT 3.0 and XPath 3.1 |
map:for-each | Common to XSLT 3.0 and XPath 3.1 |
map:get | Common to XSLT 3.0 and XPath 3.1 |
map:keys | Common to XSLT 3.0 and XPath 3.1 |
map:merge | Common to XSLT 3.0 and XPath 3.1 |
map:put | Common to XSLT 3.0 and XPath 3.1 |
map:remove | Common to XSLT 3.0 and XPath 3.1 |
map:size | Common to XSLT 3.0 and XPath 3.1 |
XForms 1.1 is based on XPath 1.0. It adds the following functions to the set defined in XPath 1.0, using the same namespace:
boolean-from-string
, is-card-number
, avg
, min
, max
,
count-non-empty
, index
, power
, random
, compare
,
if
, property
,
digest
, hmac
, local-date
, local-dateTime
, now
,
days-from-date
, days-to-date
, seconds-from-dateTime
, seconds-to-dateTime
,
adjust-dateTime-to-timezone
, seconds
, months
, instance
,
current
, id
, context
, choose
, event
.
XForms 2.0 was first published as a W3C Working Draft, and subsequently as a W3C Community Group specification. These draft specifications do not include any additional functions beyond those in the core XPath specification.
The XQuery Update 1.0 specification defines one additional function in the core namespace
http://www.w3.org/2005/xpath-functions
, namely fn:put
. This function can be used
to write a document to external storage. It is thus unusual in that it has side-effects; the XQuery Update 1.0
specification defines semantics for updating expressions including this function.
Although XQuery Update 1.0 is defined as an extension of XQuery 1.0, a number of implementors have adapted it, in a fairly intuitive way, to work with later versions of XQuery. At the time of this publication, later versions of the XQuery Update specification remain at Working Draft status.
A number of community groups, with varying levels of formal organization, have defined specifications for additional function libraries to augment the core functions defined in this specification. Many of the resulting function specifications have implementations available for popular XPath, XQuery, and XSLT processors, though the level of support is highly variable.
The first such group was EXSLT. This activity was primarily concerned with augmenting the capability of XSLT 1.0, and many of its specifications were overtaken by core functions that became available in XPath 2.0. EXSLT defined a number of function modules covering:
node-set
function)max
, min
, abs
, and trigonometric functions)Specifications from the EXSLT group can be found at
A renewed attempt to define additional function libraries using XPath 2.0 as its baseline formed under the name EXPath. Again, the specifications are in various states of maturity and stability, and implementation across popular processors is patchy. At the time of this publication the function libraries that exist in stable published form include:
The EXPath community has also been engaged in other related projects, such as defining packaging
standards for distribution of XSLT/XQuery components, and tools for unit testing. Its specifications
can be found at
A third activity has operated under the name EXQuery, which as the name suggests has focused
on extensions to XQuery. EXQuery has published a single specification, RestXQ, which is primarily a
system of function annotations allowing XQuery functions to act as endpoints for RESTful services.
It also includes some simple functions to assist with the creation of such services. The RestXQ specification
can be found at
Many useful functions can be written in XSLT or XQuery, and in this case the function implementations themselves can be portable across different XSLT and XQuery processors. This section describes one such library.
FunctX is an open-source library of general-purpose functions, supplied in the form of XQuery 1.0 and XSLT 2.0 implementations. It contains over a hundred functions. Typical examples of these functions are:
The FunctX library can be found at
A number of new functions have been defined:
fn:all-different
fn:all-equal
fn:atomic-equal
fn:build-uri
fn:chain
fn:char
fn:characters
fn:contains-subsequence
fn:csv-to-xml
fn:csv-to-arrays
fn:decode-from-uri
fn:do-until
fn:duplicate-values
fn:ends-with-subsequence
fn:every
fn:expanded-QName
fn:foot
fn:function-annotations
fn:graphemes
fn:hash
fn:highest
fn:identity
fn:in-scope-namespaces
fn:index-where
fn:intersperse
fn:invisible-xml
fn:is-NaN
fn:items-at
fn:lowest
fn:message
fn:op
fn:parse-csv
fn:parse-html
fn:parse-integer
fn:parse-QName
fn:parse-uri
fn:partition
fn:replicate
fn:scan-left
fn:scan-right
fn:slice
fn:some
fn:sort-with
fn:stack-trace
fn:starts-with-subsequence
fn:subsequence-where
fn:transitive-closure
fn:trunk
fn:void
fn:while-do
fn:xdm-to-json
array:build
array:empty
array:foot
array:index-of
array:index-where
array:members
array:of-members
array:replace
array:slice
array:split
array:trunk
array:values
map:build
map:empty
map:entries
map:filter
map:keys-where
map:of-pairs
map:pair
map:pairs
map:replace
map:values
The keywords used for parameter names have been changed. Previously these names were
of no significance, but in 4.0 they can be used with keyword := value
argument syntax in function calls.
The fn:deep-equal
function has an options
argument
giving detailed control over how two values are compared.
The fn:compare
function has been enhanced to accept types
other than strings.
The fn:format-integer
function can produce output in non-decimal
radices, for example binary and hexadecimal.
The fn:json-doc
function accepts additional options.
The fn:remove
function allows several items to be removed from
a sequence in a single call.
The fn:replace
function has an additional optional argument
allowing the replacement string to be computed from the matched input string.
The third argument of fn:format-number
can now be supplied
as an xs:QName
instead of as a string that can be converted to a QName.
Using a xs:QName
, especially in the (rare) cases when the value is
supplied dynamically, avoids the need to maintain the static namespace context
at execution time. In addition an extra argument has been added to fn:format-number
to allow the decimal format to be supplied explicitly.
The function fn:xml-to-json
accepts an additional option:
number-formatter
allows the user to control the formatting of numeric
values, for example by preventing the use of exponential notation for large integers.
In many functions including fn:substring
, fn:subsequence
,
fn:unparsed-text
, fn:unparsed-text-available
, fn:unparsed-text-lines
,
array:subarray
, fn:resolve-uri
, fn:error
, and fn:trace
,
arguments that can be omitted can now also be set to an empty sequence;
the effect of supplying an empty sequence is equivalent to the effect of not supplying the argument.
The keyword for the argument has changed from arg
to value
.
The argument is now optional, and defaults to the context value (which is atomized if necessary).
This change aligns constructor functions such as xs:string
, xs:boolean
,
and xs:numeric
with fn:string
, fn:boolean
,
and fn:number
.
The semantics of the HTML case-insensitive collation
"http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive"
are now defined normatively in this specification rather than by reference to the
living HTML5 specification (which has changed since 3.1); and the rules now make ordering explicit rather than leaving
it implementation-defined.
An option in an
These changes are not highlighted in the change-marked version of the specification.
The operator mapping table has been simplified so all the value comparison operators
are now defined in terms of two functions (for each data type): op:XX-equal
,
and op:XX-less-than
. The entries for op:XX-greater-than
have therefore been removed.
The names of arguments appearing in function signatures have been changed. This is to reflect the introduction of keyword arguments in XPath 4.0; the names chosen for arguments are now more consistent across the function library.
Where appropriate, the phrase "the value of $x
" has been replaced
by the simpler $x
. No change in meaning is intended.
For functions that take a variable number of arguments, wherever possible the specification now gives a single function signature indicating default values for arguments that may be omitted, rather than multiple signatures.
The formal specifications of array functions have been rewritten to use two new
primitives: array:members
which converts an array to a sequence of value records,
and array:of-members
which does the inverse. This has enabled many of the
functions to be specified more concisely, and with less duplication between similar functions
for sequences and arrays.
The appendix containing illustrative user-written functions has been dropped; many of these functions are no longer needed.
This section summarizes the extent to which this specification is compatible with previous versions.
Version 4.0 of this function library is fully backwards compatible with version 3.1, except as noted below:
In fn:deep-equal
, and in other functions such as fn:distinct-values
that refer to fn:deep-equal
, the rules for comparing values of different numeric types
(for example, xs:double
and xs:decimal
) have changed.
In previous versions of the specification, xs:decimal
values were converted
to xs:double
, leading to a possible loss of precision. This could make
comparisons non-transitive, leading to problems when grouping,
and potentially (depending on the sort algorithm) with sorting. The problem has been fixed by requiring
comparisons to be performed based on the exact mathematical value without any loss of precision.
This means, for example, that deep-equal(0.2, 0.2e0)
is now false, whereas in previous
versions it was true. The two values are not mathematically equal, because the exact decimal equivalent
of the xs:double
value written as 0.2e0
is
0.200000000000000011102230246251565404236316680908203125
.
The corresponding change has not been made to the =
and eq
operators,
because it was found to be too disruptive. For example, if the context node is the element
<e price="10.0" discount="0.2"/>, there is an expectation that the expression
@price - @discount = 9.8
should return true. But (assuming untyped data), the result of
the subtraction is an xs:double
whose precise value is
9.800000000000000710542735760100185871124267578125
, so comparing the two values as
decimals would return false.
In version 4.0, omitting the $value
of fn:error
has the same
effect as setting it to an empty sequence. In 3.1, the effects could be different (the effect of omitting
the argument was implementation-defined).
In version 3.1, the fn:deep-equal
function did not merge adjacent text nodes after stripping
comments and processing instructions, so the elements abcdef
]]>
and abcdef
]]> were considered non-equal. In version 4.0,
the text nodes are now merged prior to comparison, so these two elements compare equal.
In version 4.0, the function signature of fn:namespace-uri-for-prefix
constrains the
first argument to be either an xs:NCName
or a zero-length string (the new coercion rules
mean that any string in the form of an xs:NCName
is acceptable). If a string is supplied
that does not meet these requirements, a type error will be raised. In version 3.1, this was not an error:
it came under the rule that when no namespace binding existed for the supplied prefix, the function
would return an empty sequence.
Furthermore, because the expected type of this parameter is no longer xs:string
, the
special coercion rules for xs:string
parameters in XPath 1.0 compatibility mode no longer apply.
For example, supplying xs:duration('PT1H')
as the first argument will now raise a
type error, rather than looking for a namespace binding for the prefix PT1H
.
Version 4.0 makes it clear that the casting of a value other than xs:string
or xs:untypedAtomic
to a list type (whether using a cast expression or a
constructor function) is a type error xs:string?
.
In version 3.1, end-of-line characters were adopted unchanged when calling fn:unparsed-text
.
In version 4.0, they are normalized as known from XML (see
The way that fn:min
and fn:max
compare numeric values of different types
has changed. The most noticeable effect is that when these functions are applied to a sequence of
xs:integer
or xs:decimal
values, the result is an xs:integer
or
xs:decimal
, rather than the result of converting this to an xs:double
.
The type of the third argument of fn:format-number
has
changed from xs:string
to (xs:string | xs:QName)
.
Because the expected type of this parameter is no longer xs:string
, the
special coercion rules for xs:string
parameters no longer apply.
For example, it is no
longer possible to supply an instance of xs:anyURI
or (when XPath 1.0 compatibility
mode is in force) an instance of xs:boolean
or xs:duration
.
For compatibility issues regarding earlier versions, see the 3.1 version of this specification.