%status-entities; This document will be considered ready for transition to Proposed Recommendation at the same time that the XQuery 3.1 specification is ready for transition to Proposed Recommendation.

'> This &doc.w3c-doctype-full; specifies XSLT and XQuery Functions and Operators (F&O) version 4.0, a fully compatible extension of F&O version 3.1. This publication differs from its version 3.1 primarily by the addition of a number of new functions. There are numerous smaller differences as well, all documented in the change log.

'> ]>
&language; &version; &doc.w3c-designation; W3C &doc.w3c-doctype-full; &date.day; &date.month; &date.year; &doc.publoc; Specification in XML format using HTML5 vocabulary XML function catalog HTML with change markings relative to version 3.0 &doc.latestloc; https://www.w3.org/TR/2017/REC-xpath-functions-31-20170321/ Michael Kay Saxonica http://www.saxonica.com/

This document defines constructor functions, operators, and functions on the datatypes defined in and the datatypes defined in . It also defines functions and operators on nodes and node sequences as defined in the . These functions and operators are defined for use in and and and other related XML standards. The signatures and summaries of functions defined in this document are available at: http://www.w3.org/2005/xpath-functions/.

A summary of changes since version 3.1 is provided at .

This version of the specification is work in progress. It is produced by the QT4 Working Group, officially the W3C XSLT 4.0 Extensions Community Group. Individual functions specified in the document may be at different stages of review, reflected in their History notes. Comments are invited, in the form of GitHub issues at https://github.com/qt4cg/qtspecs.

English

Introduction

The purpose of this document is to define functions and operators for inclusion in XPath 4.0, XQuery 4.0, and XSLT 4.0. The exact syntax used to call these functions and operators is specified in , and .

This document defines three classes of functions:

General purpose functions, available for direct use in user-written queries, stylesheets, and XPath expressions, whose arguments and results are values defined by the .

Constructor functions, used for creating instances of a datatype from values of (in general) a different datatype. These functions are also available for general use; they are named after the datatype that they return, and they always take a single argument.

Functions that specify the semantics of operators defined in and . These exist for specification purposes only, and are not intended for direct calling from user-written code.

defines a number of primitive and derived datatypes, collectively known as built-in datatypes. This document defines functions and operations on these datatypes as well as the other types (for example, nodes and sequences of nodes) defined in of the . These functions and operations are available for use in , and any other host language that chooses to reference them. In particular, they may be referenced in future versions of XSLT and related XML standards.

adds to the datatypes defined in . It introduces a new derived type xs:dateTimeStamp, and it incorporates as built-in types the two types xs:yearMonthDuration and xs:dayTimeDuration which were previously XDM additions to the type system. In addition, XSD 1.1 clarifies and updates many aspects of the definitions of the existing datatypes: for example, it extends the value space of xs:double to allow both positive and negative zero, and extends the lexical space to allow +INF; it modifies the value space of xs:Name to permit additional Unicode characters; it allows year zero and disallows leap seconds in xs:dateTime values; and it allows any character string to appear as the value of an xs:anyURI item. Implementations of this specification may support either XSD 1.0 or XSD 1.1 or both.

References to specific sections of some of the above documents are indicated by cross-document links in this document. Each such link consists of a pointer to a specific section followed a superscript specifying the linked document. The superscripts have the following meanings: XQ , XT , XP , and DM .

Operators

Despite its title, this document does not attempt to define the semantics of all the operators available in the language; indeed, in the interests of avoiding duplication, the majority of operators (including all higher-order operators such as x/y, x!y, and x[y], as well simple operators such as x,y, x and y, x or y, x<<y, x>>y, x is y, x||y, x|y, x union y, x except y, x intersect y, x to y and x otherwise y) are now defined entirely within .

The remaining operators that are described in this publication are those where the semantics of the operator depend on the types of the arguments. For these operators, the language specification describes rules for selecting an internal function defined in this specification to underpin the operator. For example, when the operator x+y is applied to two operands of type xs:double, the function op:numeric-add is selected.

XPath defines a range of comparison operators x=y, x!=y, x<y, x>y, x<=y, x>=y, x eq y, x ne y, x lt y, x gt y, x le y, x ge y, which apply to a variety of operand types including for example numeric values, strings, dates and times, and durations. For each relevant data type, two functions are defined in this specification, for example op:date-equal and op:date-less-than. These define the semantics of the eq and lt operators applied to operands of that data type. The operators x ne y, x gt y, x le y, and x ge y are defined by reference to these two; and the general comparison operators =, !=, <, >, <=, and >= are defined by reference to eq, ne, lt, gt, le, and ge respectively.

Previous versions of this specification also defined a third comparison function of the form op:date-greater-than. This has been dropped, as it is always the inverse of the -less-than form.

Conformance

This recommendation contains a set of function specifications. It defines conformance at the level of individual functions. An implementation of a function conforms to a function specification in this recommendation if all the following conditions are satisfied:

For all combinations of valid inputs to the function (both explicit arguments and implicit context dependencies), the result of the function meets the mandatory requirements of this specification.

For all invalid inputs to the function, the implementation raises (in some way appropriate to the calling environment) a dynamic error.

For a sequence of calls within the same execution scope, the requirements of this recommendation regarding the determinism of results are satisfied (see ).

Other recommendations (“host languages”) that reference this document may dictate:

Subsets or supersets of this set of functions to be available in particular environments;

Mechanisms for invoking functions, supplying arguments, initializing the static and dynamic context, receiving results, and handling errors;

A concrete realization of concepts such as execution scope;

Which versions of other specifications referenced herein (for example, XML, XSD, or Unicode) are to be used.

Any behavior that is discretionary (implementation-defined or implementation-dependent) in this specification may be constrained by a host language.

Adding such constraints in a host language, however, is discouraged because it makes it difficult to reuse implementations of the function library across host languages.

This specification allows flexibility in the choice of versions of specifications on which it depends:

It is which version of Unicode is supported, but it is recommended that the most recent version of Unicode be used.

It is whether the type system is based on XML Schema 1.0 or XML Schema 1.1.

It is whether definitions that rely on XML (for example, the set of valid XML characters) should use the definitions in XML 1.0 or XML 1.1.

The XML Schema 1.1 recommendation introduces one new concrete datatype: xs:dateTimeStamp; it also incorporates the types xs:dayTimeDuration, xs:yearMonthDuration, and xs:anyAtomicType which were previously defined in earlier versions of . Furthermore, XSD 1.1 includes the option of supporting revised definitions of types such as xs:NCName based on the rules in XML 1.1 rather than 1.0.

The allows flexibility in the repertoire of characters permitted during processing that goes beyond even what version of XML is supported. A processor may allow the user to construct nodes and atomic values that contain characters not allowed by any version of XML. A permitted character is one within the repertoire accepted by the implementation.

In this document, text labeled as an example or as a note is provided for explanatory purposes and is not normative.

Namespaces and prefixes

The functions and operators defined in this document are contained in one of several namespaces (see ) and referenced using an xs:QName.

This document uses conventional prefixes to refer to these namespaces. User-written applications can choose a different prefix to refer to the namespace, so long as it is bound to the correct URI. The host language may also define a default namespace for function calls, in which case function names in that namespace need not be prefixed at all. In many cases the default namespace will be http://www.w3.org/2005/xpath-functions, allowing a call on the fn:name function (for example) to be written as name() rather than fn:name(); in this document, however, all example function calls are explicitly prefixed.

The URIs of the namespaces and the conventional prefixes associated with them are:

http://www.w3.org/2001/XMLSchema for constructors — associated with xs.

The section defines constructor functions for the built-in datatypes defined in and in of . These datatypes and the corresponding constructor functions are in the XML Schema namespace, http://www.w3.org/2001/XMLSchema, and are named in this document using the xs prefix.

http://www.w3.org/2005/xpath-functions for functions — associated with fn.

The namespace prefix used in this document for most functions that are available to users is fn.

http://www.w3.org/2005/xpath-functions/math for functions — associated with math.

This namespace is used for some mathematical functions. The namespace prefix used in this document for these functions is math. These functions are available to users in exactly the same way as those in the fn namespace.

http://www.w3.org/2005/xpath-functions/map for functions — associated with map.

This namespace is used for some functions that manipulate maps (see ). The namespace prefix used in this document for these functions is map. These functions are available to users in exactly the same way as those in the fn namespace.

http://www.w3.org/2005/xpath-functions/array for functions — associated with array.

This namespace is used for some functions that manipulate maps (see ). The namespace prefix used in this document for these functions is array. These functions are available to users in exactly the same way as those in the fn namespace.

http://www.w3.org/2005/xqt-errors — associated with err.

There are no functions in this namespace; it is used for error codes.

This document uses the prefix err to represent the namespace URI http://www.w3.org/2005/xqt-errors, which is the namespace for all XPath and XQuery error codes and messages. This namespace prefix is not predeclared and its use in this document is not normative.

http://www.w3.org/2010/xslt-xquery-serialization — associated with output.

There are no functions in this namespace: it is used for serialization parameters, as described in

Functions defined with the op prefix are described here to underpin the definitions of the operators in , and . These functions are not available directly to users, and there is no requirement that implementations should actually provide these functions. For this reason, no namespace is associated with the op prefix. For example, multiplication is generally associated with the * operator, but it is described as a function in this document:

Sometimes there is a need to use an operator as a function. To meet this requirement, the function fn:op takes any simple binary operator as its argument, and returns a corresponding function. So for example fn:for-each-pair($seq1, $seq2, op("+")) performs a pairwise addition of the values in two input sequences.

The above namespace URIs are not expected to change from one version of this document to another. The contents of these namespaces may be extended to allow additional functions (and errors, and serialization parameters) to be defined.

Function overloading

A function is uniquely defined by its name and arity (number of arguments); it is therefore not possible to have two different functions that have the same name and arity, but different types in their signature. That is, function overloading in this sense of the term is not permitted. Consequently, functions such as fn:string which accept arguments of many different types have a signature that defines a very general argument type, in this case item()? which accepts any single item; supplying an inappropriate item (such as a function item) causes a dynamic error.

Some functions on numeric types include the type xs:numeric in their signature as an argument or result type. In this version of the specification, xs:numeric has been redefined as a built-in union type representing the union of xs:decimal, xs:float, xs:double (and thus automatically accepting types derived from these, including xs:integer).

Operators such as + may be overloaded: they map to different underlying functions depending on the dynamic types of the supplied operands.

It is possible for two functions to have the same name provided they have different arity (number of arguments). For the functions defined in this specification, where two functions have the same name and different arity, they also have closely related behavior, so they are defined in the same section of this document.

Function signatures and descriptions

Each function (or group of functions having the same name) is defined in this specification using a standard proforma.

The function name is a QName as defined in and must adhere to its syntactic conventions. Following the precedent set by , function names are generally composed of English words separated by hyphens (-). Abbreviations are used only where there is a strong precedent in other programming languages (as with math:sin and math:cos for sine and cosine). If a function name contains a datatype name, it may have intercapitalized spelling and is used in the function name as such. An example is fn:timezone-from-dateTime.

The first section in the proforma is a short summary of what the function does. This is intended to be informative rather than normative.

Each function is then defined by specifying its signature(s), which define the types of the parameters and of the result value.

Where functions take a variable number of arguments, two conventions are used:

Wherever possible, a single function signature is used giving default values for those parameters that can be omitted.

If this is not possible, because the effect of omitting a parameter cannot be specified by giving a default value, multiple signatures are given for the function.

Each function signature is presented in a form like this:

In this notation, function-name, in bold-face, is the local name of the function whose signature is being specified. The prefix fn indicates that the function is in the namespace http://www.w3.org/2005/xpath-functions: this is one of the conventional prefixes listed in . If the function takes no parameters, then the name is followed by an empty parameter list: (); otherwise, the name is followed by a parenthesized list of parameter declarations. Each parameter declaration includes:

The name of the parameter (which in 4.0 is significant because it can be used as a keyword in a function call)

The static type of the parameter (in italics)

If this is the last parameter of a variadic function, an ellipsis (...)

If the parameter is optional, then an expression giving the default value (preceded by the symbol :=).

The default value expression is evaluated using the static and dynamic context of the function caller (or of a named function reference). For example, if the default value is given as ., then it evaluates to the context value from the dynamic context of the function caller; if it is given as default-collation, then its value is the default collation from the static context of the function caller; if it is given as deep-equal#2, then the third argument supplied to deep-equal is the default collation from the static context of the caller.

If there are two or more parameter declarations, they are separated by a comma.

The return-type, also in italics, specifies the static type of the value returned by the function. The dynamic type of the value returned by the function is the same as its static type or derived from the static type. All parameter types and return types are specified using the SequenceType notation defined in .

One function, fn:concat, has a variable number of arguments (zero or more). More strictly, there is an infinite set of functions having the name fn:concat, with arity ranging from 0 to infinity. For this special case, a single function signature is given, with an ellipsis indicating an indefinite number of arguments.

The next section in the proforma defines the semantics of the function as a set of rules. The order in which the rules appear is significant; they are to be applied in the order in which they are written. Error conditions, however, are generally listed in a separate section that follows the main rules, and take precedence over non-error rules except where otherwise stated. The principles outlined in apply by default: to paraphrase, if the result of the function can be determined without evaluating all its arguments, then it is not necessary to evaluate the remaining arguments merely in order to determine whether any error conditions apply.

Where the proforma includes sections headed Notes or Examples, these are non-normative.

Rules for passing parameters to operators are described in the relevant sections of and . For example, the rules for passing parameters to arithmetic operators are described in . Specifically, rules for parameters of type xs:untypedAtomic and the empty sequence are specified in this section.

As is customary, the parameter type name indicates that the function or operator accepts arguments of that type, or types derived from it, in that position. This is called subtype substitution (See ). In addition, numeric type instances and instances of type xs:anyURI can be promoted to produce an argument of the required type. (See ).

Subtype Substitution: A derived type may substitute for its base type. In particular, xs:integer may be used where xs:decimal is expected.

Numeric Type Promotion: xs:decimal may be promoted to xs:float or xs:double. Promotion to xs:double should be done directly, not via xs:float, to avoid loss of precision.

anyURI Type Promotion: A value of type xs:anyURI can be promoted to the type xs:string.

Some functions accept a single value or the empty sequence as an argument and some may return a single value or the empty sequence. This is indicated in the function signature by following the parameter or return type name with a question mark: ?, indicating that either a single value or the empty sequence must appear. See below.

Note that this function signature is different from a signature in which the parameter is omitted. See, for example, the two signatures for fn:string. In the first signature, the parameter is omitted and the argument defaults to the context value, referred to as .. In the second signature, the argument must be present but may be the empty sequence, written as ().

Some functions accept a sequence of zero or more values as an argument. This is indicated by following the name of the type of the items in the sequence with *. The sequence may contain zero or more items of the named type. For example, the function below accepts a sequence of xs:double and returns a xs:double or the empty sequence.

In XPath 4.0, the arguments in a function call can be supplied by keyword as an alternative to supplying them positionally. For example the call resolve-uri(@href, static-base-uri()) can now be written resolve-uri(base: static-base-uri(), relative: @href). The order in which arguments are supplied can therefore differ from the order in which they are declared. The specification, however, continues to use phrases such as “the second argument” as a convenient shorthand for "the value of the argument that is bound to the second parameter declaration".

Options

As a matter of convention, a number of functions defined in this document take a parameter whose value is a map, defining options controlling the detail of how the function is evaluated. Maps are a new datatype introduced in XPath 3.1.

For example, the function fn:xml-to-json has an options parameter allowing specification of whether the output is to be indented. A call might be written:

Functions that take an options parameter adopt common conventions on how the options are used. These are referred to as the option parameter conventions. These rules apply only to functions that explicitly refer to them.

Where a function adopts the , the following rules apply:

The value of the relevant argument must be a map. The entries in the map are referred to as options: the key of the entry is called the option name, and the associated value is the option value. Option names defined in this specification are always strings (single xs:string values). Option values may be of any type.

The type of the options parameter in the function signature is always given as map(*).

Although option names are described above as strings, the actual key may be any value that compares equal to the required string (using the eq operator with Unicode codepoint collation; or equivalently, the fn:atomic-equal relation). For example, instances of xs:untypedAtomic or xs:anyURI are equally acceptable.

This means that the implementation of the function can check for the presence and value of particular options using the functions map:contains and/or map:get.

Implementations may attach an implementation-defined meaning to options in the map that are not described in this specification. These options should use values of type xs:QName as the option names, using an appropriate namespace.

If an option is present whose key is not described in the specification, then a type error must be raised unless either (a) the key is recognized by the implementation, or (b) the key is a value of type xs:QName with a non-absent namespace.

All entries in the options map are optional, and supplying an empty map has the same effect as omitting the relevant argument in the function call, assuming this is permitted.

For each named option, the function specification defines a required type for the option value. The value that is actually supplied in the map is converted to this required type using the coercion rules. This will result in an error (typically or ) if conversion of the supplied value to the required type is not possible. A type error also occurs if this conversion delivers a coerced function whose invocation fails with a type error. A dynamic error occurs if the supplied value after conversion is not one of the permitted values for the option in question: the error codes for this error are defined in the specification of each function.

It is the responsibility of each function implementation to invoke this conversion; it does not happen automatically as a consequence of the function-calling rules.

In cases where an option is list-valued, by convention the function should accept either a sequence or an array: but this rule applies only if the specification of the option explicitly accepts either. Accepting a sequence is convenient if the value is generated programmatically using an XPath expression; while accepting an array allows the options to be held in an external file in JSON format, to be read using a call on the fn:json-doc function.

In cases where the value of an option is itself a map, the specification of the particular function must indicate whether or not these rules apply recursively to the contents of that map.

Type System

The diagrams in this section show how nodes, functions, primitive simple types, and user defined types fit together into a type system. This type system comprises two distinct subsystems that both include the primitive atomic types. In the diagrams, connecting lines represent relationships between derived types and the types from which they are derived; the former are always below and to the right of the latter.

The xs:IDREFS, xs:NMTOKENS, xs:ENTITIES types, and xs:numeric and both the user-defined list types and user-defined union types are special types in that these types are lists or unions rather than types derived by extension or restriction.

Item Types

The first diagram illustrates the relationship of various item types.

Item types are used to characterize the various types of item that can appear in a sequence (nodes, atomic values, and functions), and they are therefore used in declaring the types of variables or the argument types and result types of functions.

Item types in the data model form a directed graph, rather than a hierarchy or lattice: in the relationship defined by the derived-from(A, B) function, some types are derived from more than one other type. Examples include functions (function(xs:string) as xs:int is substitutable for function(xs:NCName) as xs:int and also for function(xs:string) as xs:decimal), and union types (A is substitutable for the union type (A | B) and also for (A | C). In XDM, item types include node types, function types, and built-in atomic types. The diagram, which shows only hierarchic relationships, is therefore a simplification of the full model.

&common-item-types.xml;
Schema Type Hierarchy

The next diagram illustrate the schema type subsystem, in which all types are derived from xs:anyType.

Schema types include built-in types defined in the XML Schema specification, and user-defined types defined using mechanisms described in the XML Schema specification. Schema types define the permitted contents of nodes. The main categories are complex types, which define the permitted content of elements, and simple types, which can be used to constrain the values of both elements and attributes.

&common-anyType.xml;
Atomic Type Hierarchy

The final diagram shows all of the atomic types, including the primitive simple types and the built-in types derived from the primitive simple types. This includes all the built-in datatypes defined in .

Atomic types are both item types and schema types, so the root type xs:anyAtomicType may be found in both the previous diagrams.

&common-anyAtomicType.xml;
Terminology

The terminology used to describe the functions and operators on types defined in is defined in the body of this specification. The terms defined in this section are used in building those definitions.

Following in the tradition of , the terms type and datatype are used interchangeably.

Strings, characters, and codepoints

This document uses the terms string, character, and codepoint with meanings that are normatively defined in , and which are paraphrased here for ease of reference:

A character is an instance of the Char production of .

This definition excludes Unicode characters in the surrogate blocks as well as U+FFFE and U+FFFF, while including characters with codepoints greater than U+FFFF which some programming languages treat as two characters. The valid characters are defined by their codepoints, and include some whose codepoints have not been assigned by the Unicode consortium to any character.

A string is a sequence of zero or more characters, or equivalently, a value in the value space of the xs:string datatype.

A codepoint is an integer assigned to a character by the Unicode consortium, or reserved for future assignment to a character.

The set of codepoints is thus wider than the set of characters.

This specification spells “codepoint” as one word; the Unicode specification spells it as “code point”. Equivalent terms found in other specifications are “character number” or “code position”. See

Because these terms appear so frequently, they are hyperlinked to the definition only when there is a particular desire to draw the reader’s attention to the definition; the absence of a hyperlink does not mean that the term is being used in some other sense.

It is which version of is supported, but it is recommended that the most recent version of Unicode be used.

This specification adopts the Unicode notation U+xxxx to refer to a codepoint by its hexadecimal value (always four to six hexadecimal digits). This is followed where appropriate by the official Unicode character name and its graphical representation: for example U+20AC.

Unless explicitly stated, the functions in this document do not ensure that any returned xs:string values are normalized in the sense of .

In functions that involve character counting such as fn:substring, fn:string-length and fn:translate, what is counted is the number of XML characters in the string (or equivalently, the number of Unicode codepoints). Some implementations may represent a codepoint above U+FFFF using two 16-bit values known as a surrogate pair. A surrogate pair counts as one character, not two.

Namespaces and URIs

This document uses the phrase “namespace URI” to identify the concept identified in as “namespace name”, and the phrase “local name” to identify the concept identified in as “local part”.

It also uses the term expanded-QName defined below.

An expanded-QName is a value in the value space of the xs:QName datatype as defined in the XDM data model (see ): that is, a triple containing namespace prefix (optional), namespace URI (optional), and local name. Two expanded QNames are equal if the namespace URIs are the same (or both absent) and the local names are the same. The prefix plays no part in the comparison, but is used only if the expanded QName needs to be converted back to a string.

The term URI is used as follows:

Within this specification, the term URI refers to Universal Resource Identifiers as defined in and extended in with a new name IRI. The term URI Reference, unless otherwise stated, refers to a string in the lexical space of the xs:anyURI datatype as defined in .

This means, in practice, that where this specification requires a “URI Reference”, an IRI as defined in will be accepted, provided that other relevant specifications also permit an IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as “Base URI” that are defined or referenced across the whole family of XML specifications. Note also that the definition of xs:anyURI is a wider definition than the definition in ; for example it does not require non-ASCII characters to be escaped.

Conformance terminology

In this specification:

The auxiliary verb must, when rendered in small capitals, indicates a precondition for conformance.

When the sentence relates to an implementation of a function (for example "All implementations must recognize URIs of the form ...") then an implementation is not conformant unless it behaves as stated.

When the sentence relates to the result of a function (for example "The result must have the same type as $arg") then the implementation is not conformant unless it delivers a result as stated.

When the sentence relates to the arguments to a function (for example "The value of $arg must be a valid regular expression") then the implementation is not conformant unless it enforces the condition by raising a dynamic error whenever the condition is not satisfied.

The auxiliary verb may, when rendered in small capitals, indicates optional or discretionary behavior. The statement “An implementation may do X” implies that it is implementation-dependent whether or not it does X.

The auxiliary verb should, when rendered in small capitals, indicates desirable or recommended behavior. The statement “An implementation should do X” implies that it is desirable to do X, but implementations may choose to do otherwise if this is judged appropriate.

Where behavior is described as implementation-defined, variations between processors are permitted, but a conformant implementation must document the choices it has made.

Where behavior is described as implementation-dependent, variations between processors are permitted, and conformant implementations are not required to document the choices they have made.

Where this specification states that something is implementation-defined or implementation-dependent, it is open to host languages to place further constraints on the behavior.

Properties of functions

This section is concerned with the question of whether two calls on a function, with the same arguments, may produce different results.

In this section the term function, unless otherwise specified, applies equally to function definitions (which can be the target of a static function call) and function items (which can be the target of a dynamic function call).

An execution scope is a sequence of calls to the function library during which certain aspects of the state are required to remain invariant. For example, two calls to fn:current-dateTime within the same execution scope will return the same result. The execution scope is defined by the host language that invokes the function library. In XSLT, for example, any two function calls executed during the same transformation are in the same execution scope (except that static expressions, such as those used in use-when attributes, are in a separate execution scope).

The following definition explains more precisely what it means for two function calls to return the same result:

Two values $V1 and $V2 are defined to be identical if they contain the same number of items and the items are pairwise identical. Two items are identical if and only if one of the following conditions applies:

Both items are atomic values, of precisely the same type, and the values are equal as defined using the eq operator, using the Unicode codepoint collation when comparing strings.

Both items are nodes, and represent the same node.

Both items are maps, both maps have the same number of entries, and for every entry E1 in the first map there is an entry E2 in the second map such that the keys of E1 and E2 are the same key, and the corresponding values V1 and V2 are .

Both items are arrays, both arrays have the same number of members, and the members are pairwise .

Both items are function items, neither item is a map or array, and the two function items have the same function identity. The concept of function identity is explained in .

Some functions produce results that depend not only on their explicit arguments, but also on the static and dynamic context.

A function definition may have the property of being context-dependent: the result of such a function depends on the values of properties in the static and dynamic evaluation context of the caller as well as on the actual supplied arguments (if any). A function definition may be context-dependent for some arities in its arity range, and context-independent for others: for example fn:name#0 is context-dependent while fn:name#1 is context-independent.

A function definition that is not context-dependent is called context-independent.

The main categories of context-dependent functions are:

Functions that explicitly deliver the value of a component of the static or dynamic context, for example fn:static-base-uri, fn:default-collation, fn:position, or fn:last.

Functions with an optional parameter whose default value is taken from the static or dynamic context of the caller, usually either the context value (for example, fn:node-name) or the default collation (for example, fn:index-of).

Functions that use the static context of the caller to expand or disambiguate the values of supplied arguments: for example fn:doc expands its first argument using the static base URI of the caller, and xs:QName expands its first argument using the in-scope namespaces of the caller.

A function is focus-dependent if its result depends on the focus (that is, the context item, position, or size) of the caller.

A function that is not focus-dependent is called focus-independent.

Some functions depend on aspects of the dynamic context that remain invariant within an , such as the implicit timezone. Formally this is treated in the same way as any other context dependency, but internally, the implementation may be able to take advantage of the fact that the value is invariant.

User-defined functions in XQuery and XSLT may depend on the static context of the function definition (for example, the in-scope namespaces) and also in a limited way on the dynamic context (for example, the values of global variables). However, the only way they can depend on the static or dynamic context of the caller — which is what concerns us here — is by defining optional parameters whose default values are context-dependent.

Because the focus is a specific part of the dynamic context, all focus-dependent functions are also context-dependent. A context-dependent function, however, may be either focus-dependent or focus-independent.

A function definition that is context-dependent can be used as the target of a named function reference, can be partially applied, and can be found using fn:function-lookup. The principle in such cases is that the static context used for the function evaluation is taken from the static context of the named function reference, partial function application, or the call on fn:function-lookup; and the dynamic context for the function evaluation is taken from the dynamic context of the evaluation of the named function reference, partial function application, or the call of fn:function-lookup. These constructs all deliver a function item having a captured context based on the static and dynamic context of the construct that created the function item. This captured context forms part of the closure of the function item.

The result of a dynamic call to a function item never depends on the static or dynamic context of the dynamic function call, only (where relevant) on the the captured context held within the function item itself.

Context-dependent functions fall into a number of categories:

The functions fn:current-date, fn:current-dateTime, fn:current-time, fn:default-language, fn:implicit-timezone, fn:adjust-date-to-timezone, fn:adjust-dateTime-to-timezone, and fn:adjust-time-to-timezone depend on properties of the dynamic context that are fixed within the execution scope. The same applies to a number of functions in the op: namespace that manipulate dates and times and that make use of the implicit timezone. These functions will return the same result if called repeatedly during a single execution scope.

A number of functions including fn:base-uri#0, fn:data#0, fn:document-uri#0, fn:element-with-id#1, fn:id#1, fn:idref#1, fn:lang#1, fn:last#0, fn:local-name#0, fn:name#0, fn:namespace-uri#0, fn:normalize-space#0, fn:number#0, fn:path#0, fn:position#0, fn:root#0, fn:string#0, and fn:string-length#0 depend on the focus. These functions will in general return different results on different calls if the focus is different.

A function is focus-dependent if its result depends on the focus (that is, the context value, position, or size).

A function that is not focus-dependent is called focus-independent

The function fn:default-collation and many string-handling operators and functions depend on the default collation and the in-scope collations, which are both properties of the static context. If a particular call of one of these functions is evaluated twice with the same arguments then it will return the same result each time (because the static context, by definition, does not change at run time). However, two distinct calls (that is, two calls on the function appearing in different places in the source code) may produce different results even if the explicit arguments are the same.

Functions such as fn:static-base-uri, fn:doc, and fn:collection depend on other aspects of the static context. As with functions that depend on collations, a single call will produce the same results on each call if the explicit arguments are the same, but two calls appearing in different places in the source code may produce different results.

The fn:function-lookup function is a special case because it is potentially dependent on everything in the static and dynamic context. This is because the static and dynamic context of the call to fn:function-lookup form the captured context of the function item that fn:function-lookup returns.

For a context-dependent function, the parts of the context on which it depends are referred to as implicit arguments.

A function that is guaranteed to produce identical results from repeated calls within a single execution scope if the explicit and implicit arguments are identical is referred to as deterministic.

A function that is not deterministic is referred to as nondeterministic.

All functions defined in this specification are deterministic unless otherwise stated. Exceptions include the following:

Some functions (such as fn:distinct-values, fn:unordered, map:keys, and map:for-each) produce results in an implementation-defined or implementation-dependent order. In such cases two calls with the same arguments are not guaranteed to produce the results in the same order. These functions are said to be nondeterministic with respect to ordering.

Some functions (such as fn:analyze-string, fn:parse-xml, fn:parse-xml-fragment, fn:parse-html, and fn:json-to-xml) construct a tree of nodes to represent their results. There is no guarantee that repeated calls with the same arguments will return the same identical node (in the sense of the is operator). However, if non-identical nodes are returned, their content will be the same in the sense of the fn:deep-equal function. Such a function is said to be nondeterministic with respect to node identity.

Some functions (such as fn:doc and fn:collection) create new nodes by reading external documents. Such functions are guaranteed to be deterministic with the exception that an implementation is allowed to make them nondeterministic as a user option.

Where the results of a function are described as being (to a greater or lesser extent) implementation-defined or implementation-dependent, this does not by itself remove the requirement that the results should be deterministic: that is, that repeated calls with the same explicit and implicit arguments must return identical results.

Several functions defined in this specification are defined as variadic. This means that in a static function call, several arguments in the function call can be sequence-concatenated to supply the value of a single parameter in the function definition.

Functions on nodes and node sequences Accessors

Accessors and their semantics are described in . Some of these accessors are exposed to the user through the functions described below.

Each of these functions has an arity-zero signature which is equivalent to the arity-one form, with the context value supplied as the implicit first argument. In addition, each of the arity-one functions accepts an empty sequence as the argument, in which case it generally delivers an empty sequence as the result: the exception is fn:string, which delivers a zero-length string.

Function Accessor Accepts Returns
fn:node-name node-name node (optional) xs:QName (optional)
fn:nilled nilled node (optional) xs:boolean (optional)
fn:string string-value item (optional) xs:string
fn:data typed-value zero or more items a sequence of atomic values
fn:base-uri base-uri node (optional) xs:anyURI (optional)
fn:document-uri document-uri node (optional) xs:anyURI (optional)
Other functions on nodes

This section specifies further functions on nodes. Nodes are formally defined in .

Functions on sequences of nodes

This section specifies functions on sequences of nodes.

Errors and diagnostics Raising errors

In this document, as well as in and , the phrase an error is raised is used. Raising an error is equivalent to calling the fn:error function defined in this section with the provided error code. Except where otherwise specified, errors defined in this specification are dynamic errors. Some errors, however, are classified as type errors. Type errors are typically used where the presence of the error can be inferred from knowledge of the type of the actual arguments to a function, for example with a call such as fn:string(fn:abs#1). Host languages may allow type errors to be reported statically if they are discovered during static analysis.

When function specifications indicate that an error is to be raised, the notation [error code] is used to specify an error code. Each error defined in this document is identified by an xs:QName that is in the http://www.w3.org/2005/xqt-errors namespace, represented in this document by the err prefix. It is this xs:QName that is actually passed as an argument to the fn:error function. Calling this function raises an error. For a more detailed treatment of error handing, see .

The fn:error function is a general function that may be called as above but may also be called from or applications with, for example, an xs:QName argument.

Diagnostic tracing
Functions and operators on numerics

This section specifies arithmetic operators on the numeric datatypes defined in .

Numeric types

The operators described in this section are defined on the following atomic types.

&common-numeric-types.xml;

They also apply to types derived by restriction from the above types.

The type xs:numeric is defined as a union type whose member types are (in order) xs:double, xs:float, and xs:decimal. This type is implicitly imported into the static context, so it can also be used in defining the signature of user-written functions. Apart from the fact that it is implicitly imported, it behaves exactly like a user-defined type with the same definition. This means, for example:

If the expected type of a function parameter is given as xs:numeric, the actual value supplied can be an instance of any of these three types, or any type derived from these three by restriction (this includes the built-in type xs:integer, which is derived from xs:decimal).

If the expected type of a function parameter is given as xs:numeric, and the actual value supplied is xs:untypedAtomic (or a node whose atomized value is xs:untypedAtomic), then it will be cast to the union type xs:numeric using the rules in . Because the lexical space of xs:double subsumes the lexical space of the other member types, and xs:double is listed first, the effect is that if the untyped atomic value is in the lexical space of xs:double, it will be converted to an xs:double, and if not, a dynamic error occurs.

When the return type of a function is given as xs:numeric, the actual value returned will be an instance of one of the three member types (and perhaps also of types derived from these by restriction). The rules for the particular function will specify how the type of the result depends on the values supplied as arguments. In many cases, for the functions in this specification, the result is defined to be the same type as the first argument.

This specification uses arithmetic for xs:float and xs:double values. One consequence of this is that some operations result in the value NaN (not a number), which has the unusual property that it is not equal to itself. Another consequence is that some operations return the value negative zero. This differs from , which defines NaN as being equal to itself and defines only a single zero in the value space. The text accompanying several functions defines behavior for both positive and negative zero inputs and outputs in the interest of alignment with . A conformant implementation must respect these semantics. In consequence, the expression -0.0e0 (which is actually a unary minus operator applied to an xs:double value) will always return negative zero: see . As a concession to implementations that rely on implementations of XSD 1.0, however, when casting from string to double the lexical form -0 may be converted to positive zero, though negative zero is recommended.

XML Schema 1.1 introduces support for positive and negative zero as distinct values, and also uses the semantics for comparisons involving NaN.

Arithmetic operators on numeric values

The following functions define the semantics of arithmetic operators defined in and on these numeric types.

Operator Meaning
op:numeric-add Addition
op:numeric-subtract Subtraction
op:numeric-multiply Multiplication
op:numeric-divide Division
op:numeric-integer-divide Integer division
op:numeric-mod Modulus
op:numeric-unary-plus Unary plus
op:numeric-unary-minus Unary minus (negation)

The parameters and return types for the above operators are in most cases declared to be of type xs:numeric, which permits the basic numeric types: xs:integer, xs:decimal, xs:float and xs:double, and types derived from them. In general the two-argument functions require that both arguments are of the same primitive type, and they return a value of this same type. The exceptions are op:numeric-divide, which returns an xs:decimal if called with two xs:integer operands, and op:numeric-integer-divide which always returns an xs:integer.

If the two operands of an arithmetic expression are not of the same type, subtype substitution and numeric type promotion are used to obtain two operands of the same type. and describe the semantics of these operations in detail.

The result type of operations depends on their argument datatypes and is defined in the following table:

Operator Returns
op:operation(xs:integer, xs:integer) xs:integer (except for op:numeric-divide(integer, integer), which returns xs:decimal)
op:operation(xs:decimal, xs:decimal) xs:decimal
op:operation(xs:float, xs:float) xs:float
op:operation(xs:double, xs:double) xs:double
op:operation(xs:integer) xs:integer
op:operation(xs:decimal) xs:decimal
op:operation(xs:float) xs:float
op:operation(xs:double) xs:double

These rules define any operation on any pair of arithmetic types. Consider the following example:

op:operation(xs:double, xs:double)]]>

For this operation, xs:int must be converted to xs:double. This can be done, since by the rules above: xs:int can be substituted for xs:integer, xs:integer can be substituted for xs:decimal, xs:decimal can be promoted to xs:double. As far as possible, the promotions should be done in a single step. Specifically, when an xs:decimal is promoted to an xs:double, it should not be converted to an xs:float and then to xs:double, as this risks loss of precision.

As another example, a user may define height as a derived type of xs:integer with a minimum value of 20 and a maximum value of 100. They may then derive fenceHeight using an enumeration to restrict the permitted set of values to, say, 36, 48 and 60.

op:operation(xs:integer, xs:integer)]]>

fenceHeight can be substituted for its base type height and height can be substituted for its base type xs:integer.

The basic rules for addition, subtraction, and multiplication of ordinary numbers are not set out in this specification; they are taken as given. In the case of xs:double and xs:float the rules are as defined in . The rules for handling division and modulus operations, as well as the rules for handling special values such as infinity and NaN, and exception conditions such as overflow and underflow, are described more explicitly since they are not necessarily obvious.

On overflow and underflow situations during arithmetic operations, conforming implementations must behave as follows:

For xs:float and xs:double operations, overflow behavior must be conformant with . This specification allows the following options:

Raising a dynamic error via an overflow trap.

Returning INF or -INF.

Returning the largest (positive or negative) non-infinite number.

For xs:float and xs:double operations, underflow behavior must be conformant with . This specification allows the following options:

Raising a dynamic error via an underflow trap.

Returning 0.0E0 or +/- 2**Emin or a denormalized value; where Emin is the smallest possible xs:float or xs:double exponent.

For xs:decimal operations, overflow behavior must raise a dynamic error . On underflow, 0.0 must be returned.

For xs:integer operations, implementations that support limited-precision integer operations must select from the following options:

They may choose to always raise a dynamic error .

They may provide an mechanism that allows users to choose between raising an error and returning a result that is modulo the largest representable integer value. See .

The functions op:numeric-add, op:numeric-subtract, op:numeric-multiply, op:numeric-divide, op:numeric-integer-divide and op:numeric-mod are each defined for pairs of numeric operands, each of which has the same type:xs:integer, xs:decimal, xs:float, or xs:double. The functions op:numeric-unary-plus and op:numeric-unary-minus are defined for a single operand whose type is one of those same numeric types.

For xs:float and xs:double arguments, if either argument is NaN, the result is NaN.

For xs:decimal values, let N be the number of digits of precision supported by the implementation, and let M (M <= N) be the minimum limit on the number of digits required for conformance (18 digits for XSD 1.0, 16 digits for XSD 1.1). Then for addition, subtraction, and multiplication operations, the returned result should be accurate to N digits of precision, and for division and modulus operations, the returned result should be accurate to at least M digits of precision. The actual precision is . If the number of digits in the mathematical result exceeds the number of digits that the implementation retains for that operation, the result is truncated or rounded in an manner.

This specification does not determine whether xs:decimal operations are fixed point or floating point. In an implementation using floating point it is possible for very simple operations to require more digits of precision than are available; for example, adding 1e100 to 1e-100 requires 200 digits of precision for an accurate representation of the result.

The specification also describes handling of two exception conditions called divideByZero and invalidOperation. The IEEE divideByZero exception is raised not only by a direct attempt to divide by zero, but also by operations such as log(0). The IEEE invalidOperation exception is raised by attempts to call a function with an argument that is outside the function’s domain (for example, sqrt(-1) or log(-1)). Although IEEE defines these as exceptions, it also defines “default non-stop exception handling” in which the operation returns a defined result, typically positive or negative infinity, or NaN. With this function library, these IEEE exceptions do not cause a dynamic error at the application level; rather they result in the relevant function or operator returning the defined non-error result. The underlying IEEE exception may be notified to the application or to the user by some implementation-defined warning condition, but the observable effect on an application using the functions and operators defined in this specification is simply to return the defined result (typically -INF, +INF, or NaN) with no error.

The specification distinguishes two NaN values: a quiet NaN and a signaling NaN. These two values are not distinguishable in the XDM model: the value spaces of xs:float and xs:double each include only a single NaN value. This does not prevent the implementation distinguishing them internally, and triggering different implementation-defined warning conditions, but such distinctions do not affect the observable behavior of an application using the functions and operators defined in this specification.

Comparison operators on numeric values

The six value comparison operators eq, ne, lt, le, gt, and ge are defined in terms of two underlying functions: op:numeric-equal and op:numeric-less-than. These functions are defined to operate on values of the same type.

If the arguments are of different types, one argument is promoted to the type of the other as described above in . Each comparison operator returns a boolean value. If either, or both, operands are NaN, false is returned.

For a description of the different ways of comparing numeric values using the operators = and eq and the functions fn:deep-equal and fn:atomic-equal, see .

See also the function fn:compare.

Functions on numeric values

The following functions are defined on numeric types. Each function returns a value of the same type as the type of its argument.

If the argument is the empty sequence, the empty sequence is returned.

For xs:float and xs:double arguments, if the argument is NaN, NaN is returned.

With the exception of fn:abs, functions with arguments of type xs:float and xs:double that are positive or negative infinity return positive or negative infinity.

fn:round and fn:round-half-to-even produce the same result in all cases except when the argument is exactly midway between two values with the required precision.

Other ways of rounding midway values can be achieved as follows:

Towards negative infinity: -round(-$x)

Away from zero: round(abs($x)) * compare($x, 0)

Towards zero: abs(round(-$x)) * -compare($x, 0)

Parsing numbers

It is possible to convert strings to values of type xs:integer, xs:float, xs:decimal, or xs:double using the constructor functions described in or using cast expressions as described in .

In addition the fn:number function is available to convert strings to values of type xs:double. It differs from the xs:double constructor function in that any value outside the lexical space of the xs:double datatype is converted to the xs:double value NaN.

Formatting integers Formatting numbers

This section defines a function for formatting decimal and floating point numbers.

This function can be used to format any numeric quantity, including an integer. For integers, however, the fn:format-integer function offers additional possibilities. Note also that the picture strings used by the two functions are not 100% compatible, though they share some options in common.

Defining a decimal format

Decimal formats are defined in the static context, and the way they are defined is therefore outside the scope of this specification. XSLT and XQuery both provide custom syntax for creating a decimal format.

The static context provides a set of decimal formats. One of the decimal formats is unnamed, the others (if any) are identified by a QName. There is always an unnamed decimal format available, but its contents are .

Each decimal format provides a set of named properties.

A phrase such as "The minus-sign character" is to be read as “the character assigned to the minus-sign property in the relevant decimal format”.

The decimal digit family of a decimal format is the sequence of ten digits with consecutive Unicode codepoints starting with the character that is the value of the zero-digit property.

The optional digit character is the character that is the value of the digit property.

For any decimal format, the properties representing characters used in a picture string must have distinct values. These properties are decimal-separator , grouping-separator, exponent-separator, percent, per-mille, digit, and pattern-separator. Furthermore, none of these properties may be equal to any character in the decimal digit family.

Syntax of the picture string

This differs from the format-number function previously defined in XSLT 2.0 in that any digit can be used in the picture string to represent a mandatory digit: for example the picture strings "000", "001", and "999" are equivalent. The digits will all be from the same decimal digit family, specifically, the sequence of ten consecutive digits starting with the digit assigned to the zero-digit property. This change is to align format-number (which previously used "000") with format-dateTime (which used 001).

The formatting of a number is controlled by a picture string. The picture string is a sequence of characters, in which the characters assigned to the properties decimal-separator , exponent-separator, grouping-separator, digit, and pattern-separator and the members of the decimal digit family, are classified as active characters, and all other characters (including the values of the properties percent and per-mille) are classified as passive characters.

A dynamic error is raised if the picture string does not conform to the following rules. Note that in these rules the words "preceded" and "followed" refer to characters anywhere in the string; they are not to be read as "immediately preceded" and "immediately followed".

A picture-string consists either of a sub-picture, or of two sub-pictures separated by the pattern-separator character. A picture-string must not contain more than one instance of the pattern-separator character. If the picture-string contains two sub-pictures, the first is used for positive and unsigned zero values and the second for negative values.

A sub-picture must not contain more than one instance of the decimal-separator character.

A sub-picture must not contain more than one instance of the percent or per-mille characters, and it must not contain one of each.

The mantissa part of a sub-picture (defined below) must contain at least one character that is either an optional digit character or a member of the decimal digit family.

A sub-picture must not contain a passive character that is preceded by an active character and that is followed by another active character.

A sub-picture must not contain a grouping-separator character that appears adjacent to a decimal-separator character, or in the absence of a decimal-separator character, at the end of the integer part.

A sub-picture must not contain two adjacent instances of the grouping-separator character.

The integer part of a sub-picture (defined below) must not contain a member of the decimal digit family that is followed by an instance of the optional digit character. The fractional part of a sub-picture (defined below) must not contain an instance of the optional digit character that is followed by a member of the decimal digit family.

A character that matches the exponent-separator property is treated as an exponent-separator-sign if it is both preceded and followed within the sub-picture by an active character. Otherwise, it is treated as a passive character. A sub-picture must not contain more than one character that is treated as an exponent-separator-sign.

A sub-picture that contains a percent or per-mille character must not contain a character treated as an exponent-separator-sign.

If a sub-picture contains a character treated as an exponent-separator-sign then this must be followed by one or more characters that are members of the decimal digit family, and it must not be followed by any active character that is not a member of the decimal digit family.

The mantissa part of the sub-picture is defined as the part that appears to the left of the exponent-separator-sign if there is one, or the entire sub-picture otherwise. The exponent part of the subpicture is defined as the part that appears to the right of the exponent-separator-sign; if there is no exponent-separator-sign then the exponent part is absent.

The integer part of the sub-picture is defined as the part that appears to the left of the decimal-separator character if there is one, or the entire mantissa part otherwise.

The fractional part of the sub-picture is defined as that part of the mantissa part that appears to the right of the decimal-separator character if there is one, or the part that appears to the right of the rightmost active character otherwise. The fractional part may be zero-length.

Analyzing the picture string

This phase of the algorithm analyzes the picture string and the properties from the selected decimal format in the static context, and it has the effect of setting the values of various variables, which are used in the subsequent formatting phase. These variables are listed below. Each is shown with its initial setting and its datatype.

Several variables are associated with each sub-picture. If there are two sub-pictures, then these rules are applied to one sub-picture to obtain the values that apply to positive and unsigned zero numbers, and to the other to obtain the values that apply to negative numbers. If there is only one sub-picture, then the values for both cases are derived from this sub-picture.

The variables are as follows:

The integer-part-grouping-positions is a sequence of integers representing the positions of grouping separators within the integer part of the sub-picture. For each grouping-separator character that appears within the integer part of the sub-picture, this sequence contains an integer that is equal to the total number of optional digit character and decimal digit family characters that appear within the integer part of the sub-picture and to the right of the grouping-separator character.

The grouping is defined to be regular if the following conditions apply:

There is an least one grouping-separator in the integer part of the sub-picture.

There is a positive integer G (the grouping size) such that the position of every grouping-separator in the integer part of the sub-picture is a positive integer multiple of G.

Every position in the integer part of the sub-picture that is a positive integer multiple of G is occupied by a grouping-separator.

If the grouping is regular, then the integer-part-grouping-positions sequence contains all integer multiples of G as far as necessary to accommodate the largest possible number.

The minimum-integer-part-size is an integer indicating the minimum number of digits that will appear to the left of the decimal-separator character. It is initially set to the number of decimal digit family characters found in the integer part of the sub-picture, but may be adjusted as described below.

There is no maximum integer part size. All significant digits in the integer part of the number will be displayed, even if this exceeds the number of optional digit character and decimal digit family characters in the subpicture.

The scaling factor is a non-negative integer used to determine the scaling of the mantissa in exponential notation. It is set to the number of decimal digit family characters found in the integer part of the sub-picture.

The prefix is set to contain all passive characters in the sub-picture to the left of the leftmost active character. If the picture string contains only one sub-picture, the prefix for the negative sub-picture is set by concatenating the minus-sign character and the prefix for the positive sub-picture (if any), in that order.

The fractional-part-grouping-positions is a sequence of integers representing the positions of grouping separators within the fractional part of the sub-picture. For each grouping-separator character that appears within the fractional part of the sub-picture, this sequence contains an integer that is equal to the total number of optional digit character and decimal digit family characters that appear within the fractional part of the sub-picture and to the left of the grouping-separator character.

There is no need to extrapolate grouping positions on the fractional side, because the number of digits in the output will never exceed the number of optional digit character and decimal digit family characters in the fractional part of the sub-picture.

The minimum-fractional-part-size is set to the number of decimal digit family characters found in the fractional part of the sub-picture.

The maximum-fractional-part-size is set to the total number of optional digit character and decimal digit family characters found in the fractional part of the sub-picture.

If the effect of the above rules is that minimum-integer-part-size and maximum-fractional-part-size are both zero, then an adjustment is applied as follows:

If an exponent separator is present then:

minimum-fractional-part-size is changed to 1 (one).

maximum-fractional-part-size is changed to 1 (one).

This has the effect that with the picture #.e9, the value 0.123 is formatted as 0.1e0

Otherwise:

minimum-integer-part-size is changed to 1 (one).

This has the effect that with the picture #, the value 0.23 is formatted as 0

If all the following conditions are true:

An exponent separator is present

The minimum-integer-part-size is zero

There is at least one optional digit character in the integer part of the sub-picture

then the minimum-integer-part-size is changed to 1 (one).

This has the effect that with the picture .9e9, the value 0.1 is formatted as .1e0, while with the picture #.9e9, it is formatted as 0.1e0

If (after making the above adjustments) the minimum-integer-part-size and the minimum-fractional-part-size are both zero, then the minimum-fractional-part-size is set to 1 (one).

The minimum-exponent-size is set to the number of decimal digit family characters found in the exponent part of the sub-picture if present, or zero otherwise.

The rules for the syntax of the picture string ensure that if an exponent separator is present, then the minimum-exponent-size will always be greater than zero.

The suffix is set to contain all passive characters to the right of the rightmost active character in the sub-picture.

If there is only one sub-picture, then all variables for positive numbers and negative numbers will be the same, except for prefix: the prefix for negative numbers will be preceded by the minus-sign character.

Formatting the number

This section describes the second phase of processing of the fn:format-number function. This phase takes as input a number to be formatted (referred to as the input number), and the variables set up by analyzing the decimal format in the static context and the picture string, as described above. The result of this phase is a string, which forms the return value of the fn:format-number function.

The algorithm for this second stage of processing is as follows:

If the input number is NaN (not a number), the result is the value of the pattern separator property (with no prefix or suffix).

In the rules below, the positive sub-picture and its associated variables are used if the input number is positive, and the negative sub-picture and its associated variables are used if it is negative. For xs:double and xs:float, negative zero is taken as negative, positive zero as positive. For xs:decimal and xs:integer, the positive sub-picture is used for zero.

The adjusted number is determined as follows:

If the sub-picture contains a percent character, the adjusted number is the input number multiplied by 100.

If the sub-picture contains a per-mille character, the adjusted number is the input number multiplied by 1000.

Otherwise, the adjusted number is the input number.

If the multiplication causes numeric overflow, no error occurs, and the adjusted number is positive or negative infinity as appropriate.

If the adjusted number is positive or negative infinity, the result is the concatenation of the appropriate prefix, the value of the infinity property, and the appropriate suffix.

If the minimum exponent size is non-zero, and the adjusted number is non-zero, then the adjusted number is scaled to establish a mantissa and an integer exponent. The mantissa and exponent are chosen such that all the following conditions are true:

The primitive type of the mantissa is the same as the primitive type of the adjusted number (integer, decimal, float, or double).

The mantissa multiplied by ten to the power of the exponent is equal to the adjusted number.

The mantissa (unless it is zero) is less than 10N, and at least 10N-1, where N is the scaling factor.

If the minimum exponent size is zero, then the mantissa is the adjusted number and there is no exponent.

If the minimum exponent size is non-zero and the adjusted number is zero, then the mantissa is the adjusted number and the exponent is zero.

The mantissa is converted (if necessary) to an xs:decimal value, using an implementation of xs:decimal that imposes no limits on the totalDigits or fractionDigits facets. If there are several such values that are numerically equal to the mantissa (bearing in mind that if the mantissa is an xs:double or xs:float, the comparison will be done by converting the decimal value back to an xs:double or xs:float), the one that is chosen should be one with the smallest possible number of digits not counting leading or trailing zeroes (whether significant or insignificant). For example, 1.0 is preferred to 0.9999999999, and 100000000 is preferred to 100000001. This value is then rounded so that it uses no more than maximum-fractional-part-size digits in its fractional part. The rounded number is defined to be the result of converting the mantissa to an xs:decimal value, as described above, and then calling the function fn:round-half-to-even with this converted number as the first argument and the maximum-fractional-part-size as the second argument, again with no limits on the totalDigits or fractionDigits in the result.

The absolute value of the rounded number is converted to a string in decimal notation, using the digits in the decimal digit family to represent the ten decimal digits, and the decimal-separator character to separate the integer part and the fractional part. This string must always contain a decimal-separator, and it must contain no leading zeroes and no trailing zeroes. The value zero will at this stage be represented by a decimal-separator on its own.

If the number of digits to the left of the decimal-separator character is less than minimum-integer-part-size, leading zero digit characters are added to pad out to that size.

If the number of digits to the right of the decimal-separator character is less than minimum-fractional-part-size, trailing zero digit characters are added to pad out to that size.

For each integer N in the integer-part-grouping-positions list, a grouping-separator character is inserted into the string immediately after that digit that appears in the integer part of the number and has N digits between it and the decimal-separator character, if there is such a digit.

For each integer N in the fractional-part-grouping-positions list, a grouping-separator character is inserted into the string immediately before that digit that appears in the fractional part of the number and has N digits between it and the decimal-separator character, if there is such a digit.

If there is no decimal-separator character in the sub-picture, or if there are no digits to the right of the decimal-separator character in the string, then the decimal-separator character is removed from the string (it will be the rightmost character in the string).

If an exponent exists, then the string produced from the mantissa as described above is extended with the following, in order: (a) the exponent-separator character; (b) if the exponent is negative, the minus-sign character; (c) the value of the exponent represented as a decimal integer, extended if necessary with leading zeroes to make it up to the minimum exponent size, using digits taken from the decimal digit family.

The result of the function is the concatenation of the appropriate prefix, the string conversion of the number as obtained above, and the appropriate suffix.

Trigonometric and exponential functions

The functions in this section perform trigonometric and other mathematical calculations on xs:double values. They are provided primarily for use in applications performing geometrical computation, for example when generating SVG graphics.

Functions are provided to support the six most commonly used trigonometric calculations: sine, cosine and tangent, and their inverses arc sine, arc cosine, and arc tangent. Other functions such as secant, cosecant, and cotangent are not provided because they are easily computed in terms of these six.

The functions in this section (with the exception of math:pi) are specified by reference to , where they appear as Recommended operations in section 9. IEEE defines these functions for a variety of floating point formats; this specification defines them only for xs:double values. The IEEE specification applies with the following caveats:

IEEE states that the preferred quantum is language-defined. In this specification, it is .

IEEE states that certain functions should raise the inexact exception if the result is inexact. In this specification, this exception if it occurs does not result in an error. Any diagnostic information is outside the scope of this specification.

IEEE defines various rounding algorithms for inexact results, and states that the choice of rounding direction, and the mechanisms for influencing this choice, are language-defined. In this specification, the rounding direction and any mechanisms for influencing it are .

Certain operations (such as taking the square root of a negative number) are defined in IEEE to signal the invalid operation exception and return a quiet NaN. In this specification, such operations return NaN and do not raise an error. The same policy applies to operations (such as taking the logarithm of zero) that raise a divide-by-zero exception. Any diagnostic information is outside the scope of this specification.

Operations whose mathematical result is greater than the largest finite xs:double value are defined in IEEE to signal the overflow exception; operations whose mathematical result is closer to zero than the smallest non-zero xs:double value are similarly defined in IEEE to signal the underflow exception. The treatment of these exceptions in this specification is defined in .

Random Numbers
Functions on strings

This section specifies functions and operators on the xs:string datatype and the datatypes derived from it.

String types

The operators described in this section are defined on the following types.

&common-string-types.xml;

They also apply to user-defined types derived by restriction from the above types.

Functions to assemble and disassemble strings Comparison of strings Collations

A collation is a specification of the manner in which strings are compared and, by extension, ordered. When values whose type is xs:string or a type derived from xs:string are compared (or, equivalently, sorted), the comparisons are inherently performed according to some collation (even if that collation is defined entirely on codepoint values). The observes that some applications may require different comparison and ordering behaviors than other applications. Similarly, some users having particular linguistic expectations may require different behaviors than other users. Consequently, the collation must be taken into account when comparing strings in any context. Several functions in this and the following section make use of a collation.

Collations can indicate that two different codepoints are, in fact, equal for comparison purposes (e.g., “v” and “w” are considered equivalent in some Swedish collations). Strings can be compared codepoint-by-codepoint or in a linguistically appropriate manner, as defined by the collation.

Some collations, especially those based on the Unicode Collation Algorithm (see ) can be “tailored” for various purposes. This document does not discuss such tailoring, nor does it provide a mechanism to perform tailoring. Instead, it assumes that the collation argument to the various functions below is a tailored and named collation.

The Unicode codepoint collation is a collation available in every implementation, which sorts based on codepoint values. For further details see .

Collations may or may not perform Unicode normalization on strings before comparing them.

This specification assumes that collations are named and that the collation name may be provided as an argument to string functions. Functions that allow specification of a collation do so with an argument whose type is xs:string but whose lexical form must conform to an xs:anyURI. This specification also defines the manner in which a default collation is determined if the collation argument is not specified in calls of functions that use a collation but allow it to be omitted.

If the collation is specified using a relative URI reference, it is resolved relative to an implementation-defined base URI.

Previous versions of this specification stated that it must be resolved against the , but this is not always operationally convenient. It is recommended that processors should provide a means of setting the base URI for resolving collation URIs independently of the , though for backwards compatibility, the Static Base URI or Executable Base URI should be used as a default.

This specification does not define whether or not the collation URI is dereferenced. The collation URI may be an abstract identifier, or it may refer to an actual resource describing the collation. If it refers to a resource, this specification does not define the nature of that resource. One possible candidate is that the resource is a locale description expressed using the Locale Data Markup Language: see .

Functions such as fn:compare and fn:max that compare xs:string values use a single collation URI to identify all aspects of the collation rules. This means that any parameters such as the strength of the collation must be specified as part of the collation URI. For example, suppose there is a collation http://www.example.com/collations/French that refers to a French collation that compares on the basis of base characters. Collations that use the same basic rules, but with higher strengths, for example, base characters and accents, or base characters, accents and case, would need to be given different names, say http://www.example.com/collations/French1 and http://www.example.com/collations/French2. Note that some specifications use the term collation to refer to an algorithm that can be parameterized, but in this specification, each possible parameterization is considered to be a distinct collation.

The XQuery/XPath static context includes a provision for a default collation that can be used for string comparisons and ordering operations. See the description of the static context in . If the default collation is not specified by the user or the system, the default collation is the Unicode codepoint collation.

XML allows elements to specify the xml:lang attribute to indicate the language associated with the content of such an element. This specification does not use xml:lang to identify the default collation because using xml:lang does not produce desired effects when the two strings to be compared have different xml:lang values or when a string is multilingual.

The Unicode Codepoint Collation

The collation URI http://www.w3.org/2005/xpath-functions/collation/codepoint identifies a collation which must be recognized by every implementation: it is referred to as the Unicode codepoint collation (not to be confused with the Unicode collation algorithm).

The Unicode codepoint collation does not perform any normalization on the supplied strings.

The collation is defined as follows. Each of the two strings is converted to a sequence of integers using the fn:string-to-codepoints function. These two sequences $A and $B are then compared as follows:

If both sequences are empty, the strings are equal.

If one sequence is empty and the other is not, then the string corresponding to the empty sequence is less than the other string.

If the first integer in $A is less than the first integer in $B, then the string corresponding to $A is less than the string corresponding to $B.

If the first integer in $A is greater than the first integer in $B, then the string corresponding to $A is greater than the string corresponding to $B.

Otherwise (the first pair of integers are equal), the result is obtained by applying the same rules recursively to fn:tail($A) and fn:tail($B)

While the Unicode codepoint collation does not produce results suitable for quality publishing of printed indexes or directories, it is adequate for many purposes where a restricted alphabet is used, such as sorting of vehicle registrations.

The Unicode Collation Algorithm

This specification defines a family of collation URIs representing tailorings of the Unicode Collation Algorithm (UCA) as defined in . The parameters used for tailoring the UCA are based on the parameters defined in the Locale Data Markup Language (LDML), defined in .

This family of URIs use the scheme and path http://www.w3.org/2013/collation/UCA followed by an optional query part. The query part, if present, consists of a question mark followed by a sequence of zero or more semicolon-separated parameters. Each parameter is a keyword-value pair, the keyword and value being separated by an equals sign.

All implementations must recognize URIs in this family in the collation argument of functions that take a collation argument.

If the fallback parameter is present with the value no, then the implementation must either use a collation that conforms with the rules in the Unicode specifications for the requested tailoring, or fail with a static or dynamic error indicating that it does not provide the collation (the error code should be the same as if the collation URI were not recognized). If the fallback parameter is omitted or takes the value yes, and if the collation URI is well-formed according to the rules in this section, then the implementation must accept the collation URI, and should use the available collation that most closely reflects the user’s intentions. For example, if the collation URI requested is http://www.w3.org/2013/collation/UCA?lang=se;fallback=yes and the implementation does not include a fully conformant version of the UCA tailored for Swedish, then it may choose to use a Swedish collation that is known to differ from the UCA definition, or one whose conformance has not been established. It might even, as a last resort, fall back to using codepoint collation.

If two query parameters use the same keyword then the last one wins. If a query parameter uses a keyword or value which is not defined in this specification then the meaning is . If the implementation recognizes the meaning of the keyword and value then it should interpret it accordingly; if it does not recognize the keyword or value then if the fallback parameter is present with the value no it should reject the collation as unsupported, otherwise it should ignore the unrecognized parameter.

The following query parameters are defined. If any parameter is absent, the default is except where otherwise stated. The meaning given for each parameter is non-normative; the normative specification is found in .

KeywordValuesMeaning
fallbackyes | no (default yes)Determines whether the processor uses a fallback collation if a conformant collation is not available.
langlanguage code: a string in the lexical space of xs:language.The language whose collation conventions are to be used.
versionstringThe version number of the UCA to be used.
strengthprimary | secondary | tertiary | quaternary | identical, or 1|2|3|4|5 as synonyms (default tertiary / 3)The collation strength as defined in UCA. Primary strength takes only the base form of the character into account (so A=a=Äaut;=äaut;); secondary strength ignores case but considers accents and diacritics as significant (so A=a and Äaut;=äaut; but äaut;≠a); tertiary considers case as significant (A≠a≠Äaut;≠äaut;); quaternary strength always considers as significant spaces and punctuation (data-base≠database; if maxVariable is punct or higher and alternate is not non-ignorable, lower strengths will treat data-base=database).
maxVariablespace | punct | symbol | currency (default punct) Given the sequence space, punct, symbol, currency, all characters in the specified group and earlier groups are treated as “noise” characters to be handled as defined by the alternate parameter. For example, maxVariable=punct indicates that characters classified as whitespace or punctuation get this treatment.
alternatenon-ignorable | shifted | blanked (default non-ignorable)Controls the handling of characters such as spaces and hyphens; specifically, the "noise" characters in the groups selected by the maxVariable parameter. The value non-ignorable indicates that such characters are treated as distinct at the primary level (so data base sorts before database); shifted indicates that they are used to differentiate two strings only at the quaternary level, and blanked indicates that they are taken into account only at the identical level.
backwardsyes | no (default no)The value backwards=yes indicates that the last accent in the string is the most significant.
normalizationyes | no (default no)Indicates whether strings are converted to normalization form D.
caseLevelyes | no (default no)When used with primary strength, setting caseLevel=yes has the effect of ignoring accents while taking account of case.
caseFirstupper | lower (default lower)Indicates whether upper-case precedes lower-case or vice versa.
numericyes | no (default no)When numeric=yes is specified, a sequence of consecutive digits is interpreted as a number, for example chap2 sorts before chap12.
reordera comma-separated sequence of reorder codes, where a reorder code is one of space, punct, symbol, currency, digit, or a four-letter script code defined in , the register of scripts maintained by the Unicode Consortium in its capacity as registration authority for . Determines the relative ordering of text in different scripts; for example the value digit,Grek,Latn indicates that digits precede Greek letters, which precede Latin letters.

This list excludes parameters that are inconvenient to express in a URI, or that are applicable only to substring matching.

UCA collation URIs can be conveniently generated using the fn:collation function.

The HTML ASCII Case-Insensitive Collation

The collation URI http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive must be recognized by every implementation. It is designed to be compatible with the HTML ASCII case-insensitive collation as defined in (section 4.6, Strings), which is used, for example, when matching HTML class attribute values.

The collation is defined as follows:

Let $HACI be the collation URI "http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive".

Let $UCC be the Unicode Codepoint Collation URI http://www.w3.org/2005/xpath-functions/collation/codepoint.

Let $lc be the function fn:translate(?, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz").

Then for any two strings $A and $B, the result of the comparison fn:compare($A, $B, $HACI) is defined to be the same as the result of fn:compare($lc($A), $lc($B), $UCC).

HTML5 defines the semantics of equality matching using this collation; this specification additionally defines ordering rules. The collation supports collation units and can therefore be used with functions such as fn:contains; each Unicode codepoint is a single collation unit.

The corresponding HTML5 definition is: A string A is an ASCII case-insensitive match for a string B, if the ASCII lowercase of A is the ASCII lowercase of B.

Choosing a collation

Many functions have a signature that includes a $collation argument, which is generally optional and takes default-collation() as its default value.

The collation to use for these functions is determined by the following rules:

If the function specifies an explicit collation, CollationA (e.g., if the optional collation argument is specified in a call of the fn:compare function), then:

If CollationA is supported by the implementation, then CollationA is used.

Otherwise, a dynamic error is raised .

If no collation is explicitly specified for the function (that is, if the $collation argument is omitted or is set to an empty sequence), and the default collation in the XQuery/XPath static context is CollationB, then:

If CollationB is supported by the implementation, then CollationB is used.

Otherwise, a dynamic error is raised .

Because the set of collations that are supported is implementation-defined, an implementation has the option to support all collation URIs, in which case it will never raise this error.

If the value of the collation argument is a relative URI reference, it is resolved against the base-URI from the static context. If it is a relative URI reference and cannot be resolved, perhaps because the base-URI property in the static context is absent, a dynamic error is raised .

There is no explicit requirement that the string used as a collation URI be a valid URI. Implementations will in many cases reject such strings on the grounds that do not identify a supported collation; they may also cause an error if they cannot be resolved against the relevant base URI.

Functions on string values

The following functions are defined on values of type xs:string and types derived from it.

When the above operators and functions are applied to datatypes derived from xs:string, they are guaranteed to return values that are instances of xs:string, but the value might or might not be an instance of the particular subtype of xs:string to which they were applied.

The strings returned by fn:concat and fn:string-join are not guaranteed to be normalized. But see note in fn:concat.

Functions based on substring matching

The functions described in this section examine a string $arg1 to see whether it contains another string $arg2 as a substring. The result depends on whether $arg2 is a substring of $arg1, and if so, on the range of characters in $arg1 which $arg2 matches.

When the Unicode codepoint collation is used, this simply involves determining whether $arg1 contains a contiguous sequence of characters whose codepoints are the same, one for one, with the codepoints of the characters in $arg2.

When a collation is specified, the rules are more complex.

All collations support the capability of deciding whether two strings are considered equal, and if not, which of the strings should be regarded as preceding the other. For functions such as fn:compare, this is all that is required. For other functions, such as fn:contains, the collation needs to support an additional property: it must be able to decompose the string into a sequence of collation units, each unit consisting of one or more characters, such that two strings can be compared by pairwise comparison of these units. (“collation unit” is equivalent to "collation element" as defined in .) The string $arg1 is then considered to contain $arg2 as a substring if the sequence of collation units corresponding to $arg2 is a subsequence of the sequence of the collation units corresponding to $arg1. The characters in $arg1 that match are the characters corresponding to these collation units.

This rule may occasionally lead to surprises. For example, consider a collation that treats "Jaeger" and "Jäaut;ger" as equal. It might do this by treating "äaut;" as representing two collation units, in which case the expression fn:contains("Jäaut;ger", "eg") will return true. Alternatively, a collation might treat "ae" as a single collation unit, in which case the expression fn:contains("Jaeger", "eg") will return false. The results of these functions thus depend strongly on the properties of the collation that is used.

In addition, collations may specify that some collation units should be ignored during matching. If hyphen is an ignored collation unit, then fn:contains("code-point", "codepoint") will be true, and fn:contains("codepoint", "-") will also be true.

In the definitions below, we refer to the terms match and minimal match as defined in definitions DS2 and DS4 of . In applying these definitions:

C is the collation; that is, the value of the $collation argument if specified, otherwise the default collation.

P is the (candidate) substring $arg2

Q is the (candidate) containing string $arg1

The boundary condition B is satisfied at the start and end of a string, and between any two characters that belong to different collation units (“collation elements” in the language of ). It is not satisfied between two characters that belong to the same collation unit.

It is possible to define collations that do not have the ability to decompose a string into units suitable for substring matching. An argument to a function defined in this section may be a URI that identifies a collation that is able to compare two strings, but that does not have the capability to split the string into collation units. Such a collation may cause the function to fail, or to give unexpected results, or it may be rejected as an unsuitable argument. The ability to decompose strings into collation units is an property of the collation.

String functions that use regular expressions

The four functions described in this section make use of a regular expression syntax for pattern matching, described below.

Regular expression syntax

The regular expression syntax used by these functions is defined in terms of the regular expression syntax specified in XML Schema (see ), which in turn is based on the established conventions of languages such as Perl. However, because XML Schema uses regular expressions only for validity checking, it omits some facilities that are widely used with languages such as Perl. This section, therefore, describes extensions to the XML Schema regular expressions syntax that reinstate these capabilities.

It is recommended that implementers consult for information on using regular expression processing on Unicode characters.

The regular expression syntax and semantics are identical to those defined in with the additions described in the following subsections.

In there are no substantive technical changes to the syntax or semantics of regular expressions relative to XSD 1.0, but a number of errors and ambiguities have been resolved. For example, the rules for the interpretation of hyphens within square brackets in a regular expression have been clarified; and the semantics of regular expressions are no longer tied to a specific version of Unicode.

Implementers, even in cases where XSD 1.1 is not supported, are advised to consult the XSD 1.1 regular expression specification for guidance on how to handle cases where the XSD 1.0 specification is unclear or inconsistent.

Matching the Start and End of the String

Two meta-characters, ^ and $ are added. By default, the meta-character ^ matches the start of the entire string, while $ matches the end of the entire string. In multi-line mode, ^ matches the start of any line (that is, the start of the entire string, and the position immediately after a newline character), while $ matches the end of any line (that is, the end of the entire string, and the position immediately before a newline character). Newline here means the character #x0A only.

This means that the production in :

[10] Char ::= [^.\?*+()|#x5B#x5D]

is modified to read:

[10] Char ::= [^.\?*+{}()|^$#x5B#x5D]

The XSD 1.1 grammar for regular expressions uses the same production rule, but renumbered and renamed [73] NormalChar; it is affected in the same way.

The characters #x5B and #x5D correspond to [ and ] respectively.

The definition of Char (production [10]) in has a known error in which it omits the left brace ({) and right brace (}). That error is corrected here.

The following production:

[11] charClass ::= charClassEsc | charClassExpr | WildCardEsc

is modified to read:

[11] charClass ::= charClassEsc | charClassExpr | WildCardEsc | "^" | "$"

Using XSD 1.1 as the baseline the equivalent is to change the production:

[74] charClass ::= SingleCharEsc | charClassEsc | charClassExpr | WildCardEsc

to read:

[74] charClass ::= SingleCharEsc | charClassEsc | charClassExpr | WildCardEsc | "^" | "$"

Single character escapes are extended to allow the $ character to be escaped. Furthermore, the # character may be escaped: see . The following production is changed:

[24]SingleCharEsc ::= '\' [nrt\|.?*+(){}#x2D#x5B#x5D#x5E]

to

[24]SingleCharEsc ::= '\' [nrt\|.?*+(){}$#x2D#x5B#x5D#x5E\#]

(In the XSD 1.1 version of the regular expression grammar, the production rule for SingleCharEsc is unchanged from 1.0, but is renumbered [84])

Reluctant Quantifiers

Reluctant quantifiers are supported. They are indicated by a ? following a quantifier. Specifically:

X?? matches X, once or not at all

X*? matches X, zero or more times

X+? matches X, one or more times

X{n}? matches X, exactly n times

X{n,}? matches X, at least n times

X{n,m}? matches X, at least n times, but not more than m times

The effect of these quantifiers is that the regular expression matches the shortest possible substring consistent with the match as a whole succeeding. Without the ? , the regular expression matches the longest possible substring.

To achieve this, the production in :

[4] quantifier ::= [?*+] | ( '{' quantity '}' )

is changed to:

[4] quantifier ::= ( [?*+] | ( '{' quantity '}' ) ) '?'?

(In the XSD 1.1 version of the regular expression grammar, this rule is unchanged from 1.0, but is renumbered [67])

Reluctant quantifiers have no effect on the results of the boolean fn:matches function, since this function is only interested in discovering whether a match exists, and not where it exists.

Captured Sub-Expressions

Sub-expressions (groups) within the regular expression are recognized. The regular expression syntax defined by allows a regular expression to contain parenthesized sub-expressions, but attaches no special significance to them. Some operations associated with regular expressions (for example, back-references, and the fn:replace function) allow access to the parts of the input string that matched a sub-expression (called captured substrings).

A left parenthesis is recognized as a capturing left parenthesis provided it is not immediately followed by ?: (see below), is not within a character group (square brackets), and is not escaped with a backslash. The sub-expression enclosed by a capturing left parenthesis and its matching right parenthesis is referred to as a capturing sub-expression.

More specifically, the capturing sub-expression enclosed by the Nth capturing left parenthesis within the regular expression (determined by its character position in left-to-right order, and counting from one) is referred to as the Nth capturing sub-expression.

For example, in the regular expression A(BC(?:D(EF(GH[()])))), the string matched by the sub-expression BC(?:D(EF(GH[()]))) is capturing sub-expression 1, the string matched by EF(GH[()]) is capturing sub-expression 2, and the string matched by GH[()] is capturing sub-expression 3.

When, in the course of evaluating a regular expression, a particular substring of the input matches a capturing sub-expression, that substring becomes available as a captured substring. The string matched by the Nth capturing sub-expression is referred to as the Nth captured substring. By convention, the substring captured by the entire regular expression is treated as captured substring 0 (zero).

When a capturing sub-expression is matched more than once (because it is within a construct that allows repetition), then only the last substring that it matched will be captured. Note that this rule is not sufficient in all cases to ensure an unambiguous result, especially in cases where (a) the regular expression contains nested repeating constructs, and/or (b) the repeating construct matches a zero-length string. In such cases it is implementation-dependent which substring is captured. For example given the regular expression (a*)+ and the input string "aaaa", an implementation might legitimately capture either "aaaa" or a zero length string as the content of the captured subgroup.

Parentheses that are required to group terms within the regular expression, but which are not required for capturing of substrings, can be represented using the syntax (?:xxxx). To achieve this, the production rule for atom in is changed to replace the alternative:

( '(' regExp ')' )

with:

( '(' '?:'? regExp ')' )

(For the new versions of the XSD 1.0 and XSD 1.1 production rules for atom, see below.)

In the absence of back-references (see below), the presence of the optional ?: has no effect on the set of strings that match the regular expression, but causes the left parenthesis not to be counted by operations (such as fn:replace and back-references) that number the capturing sub-expressions within a regular expression.

Back-References

Back-references are allowed outside a character class expression. A back-reference is an additional kind of atom. The construct \N where N is a single digit is always recognized as a back-reference; if this is followed by further digits, these digits are taken to be part of the back-reference if and only if the resulting number NN is such that the back-reference is preceded by the opening parenthesis of the NNth capturing left parenthesis. The regular expression is invalid if a back-reference refers to a capturing sub-expression that does not exist or whose closing right parenthesis occurs after the back-reference.

A back-reference with number N matches a string that is the same as the value of the Nth captured substring.

For example, the regular expression ('|").*\1 matches a sequence of characters delimited either by an apostrophe at the start and end, or by a quotation mark at the start and end.

If no string has been matched by the Nth capturing sub-expression, the back-reference is interpreted as matching a zero-length string.

Combining this change with the introduction of non-capturing groups (see above), back-references change the following production:

[9] atom ::= Char | charClass | ( '(' regExp ')' )

to

[9] atom ::= Char | charClass | ( '(' '?:'? regExp ')' ) | backReference

[9a] backReference ::= "\" [1-9][0-9]*

With respect to the XSD 1.1 version of the regular expression grammar, the effect is to change:

[72] atom ::= NormalChar | charClass | ( '(' regExp ')' )

to

[72] atom ::= NormalChar | charClass | ( '(' '?:'? regExp ')' ) | backReference

[72a] backReference ::= "\" [1-9][0-9]*

Within a character class expression, \ followed by a digit is invalid. Some other regular expression languages interpret this as an octal character reference.

Unicode Block Names

A regular expression that uses a Unicode block name that is not defined in the version(s) of Unicode supported by the processor (for example \p{IsBadBlockName}) is deemed to be invalid .

XSD 1.0 does not say how this situation should be handled; XSD 1.1 says that it should be handled by treating all characters as matching.

Comments

Comments are enabled in regular expressions if the c flag is present.

A comment starts with a # character that is not escaped with an immediately preceding backslash, and that is not contained in a CharClassExpr (that is, in square brackets). It ends with the following # character, or with the end of the string containing the regular expression.

Whether or not the c flag is present, the production for SingleCharEsc is extended to allow the # character to be escaped.

Flags

All these functions provide an optional parameter, $flags, to set options for the interpretation of the regular expression. The parameter accepts a xs:string, in which individual letters are used to set options. The presence of a letter within the string indicates that the option is on; its absence indicates that the option is off. Letters may appear in any order and may be repeated. They are case-sensitive. If there are characters present that are not defined here as flags, then a dynamic error is raised .

The following options are defined:

s: If present, the match operates in “dot-all” mode. (Perl calls this the single-line mode.) If the s flag is not specified, the meta-character . matches any character except a newline (#x0A) or carriage return (#x0D) character. In dot-all mode, the meta-character . matches any character whatsoever. Suppose the input contains the strings "hello" and "world" on two lines. This will not be matched by the regular expression "hello.*world" unless dot-all mode is enabled.

m: If present, the match operates in multi-line mode. By default, the meta-character ^ matches the start of the entire string, while $ matches the end of the entire string. In multi-line mode, ^ matches the start of any line (that is, the start of the entire string, and the position immediately after a newline character other than a newline that appears as the last character in the string), while $ matches the end of any line (that is, the position immediately before a newline character, and the end of the entire string if there is no newline character at the end of the string). Newline here means the character #x0A only.

i: If present, the match operates in case-insensitive mode. The detailed rules are as follows. In these rules, a character C2 is considered to be a case-variant of another character C1 if the following XPath expression returns true when the two characters are considered as strings of length one, and the Unicode codepoint collation is used:

fn:lower-case(C1) eq fn:lower-case(C2) or fn:upper-case(C1) eq fn:upper-case(C2)

Note that the case-variants of a character under this definition are always single characters.

When a normal character (Char) is used as an atom, it represents the set containing that character and all its case-variants. For example, the regular expression "z" will match both "z" and "Z".

A character range (production charRange in the XSD 1.0 grammar, replaced by productions charRange and singleChar in XSD 1.1) represents the set containing all the characters that it would match in the absence of the i flag, together with their case-variants. For example, the regular expression "[A-Z]" will match all the letters A to Z and all the letters a to z. It will also match certain other characters such as #x212A (KELVIN SIGN), since fn:lower-case("#x212A") is k.

This rule applies also to a character range used in a character class subtraction (charClassSub): thus [A-Z-[IO]] will match characters such as A, B, a, and b, but will not match I, O, i, or o.

The rule also applies to a character range used as part of a negative character group: thus "[^Q]" will match every character except Q and q (these being the only case-variants of Q in Unicode).

A back-reference is compared using case-blind comparison: that is, each character must either be the same as the corresponding character of the previously matched string, or must be a case-variant of that character. For example, the strings "Mum", "mom", "Dad", and "DUD" all match the regular expression "([md])[aeiou]\1" when the i flag is used.

All other constructs are unaffected by the i flag. For example, "\p{Lu}" continues to match upper-case letters only.

x: If present, whitespace characters (#x9, #xA, #xD and #x20) in the regular expression are removed prior to matching with one exception: whitespace characters within character class expressions (charClassExpr) are not removed. This flag can be used, for example, to break up long regular expressions into readable lines.

Examples:

fn:matches("helloworld", "hello world", "x") returns true()

fn:matches("helloworld", "hello[ ]world", "x") returns false()

fn:matches("hello world", "hello\ sworld", "x") returns true()

fn:matches("hello world", "hello world", "x") returns false()

Whitespace is treated as a lexical construct to be removed before the regular expression is parsed; it is therefore not explicit in the regular expression grammar.

q: if present, all characters in the regular expression are treated as representing themselves, not as metacharacters. In effect, every character that would normally have a special meaning in a regular expression is implicitly escaped by preceding it with a backslash.

Furthermore, when this flag is present, the characters $ and \ have no special significance when used in the replacement string supplied to the fn:replace function.

This flag can be used in conjunction with the i flag. If it is used together with the m, s, x, or c flag, that flag has no effect.

Examples:

tokenize("12.3.5.6", ".", "q") returns ("12", "3", "5", "6")

replace("a\b\c", "\", "\\", "q") returns "a\\b\\c"

replace("a/b/c", "/", "$", "q") returns "a$b$c"

matches("abcd", ".*", "q") returns false()

matches("Mr. B. Obama", "B. OBAMA", "iq") returns true()

c: if present, comments are enabled in the regular expression. This flag has no effect if the q flag is present. A comment is recognized by the presence of a # character that is not escaped by a backslash or contained in a character class expression (charClassExpr), and it is terminated by the following # character or by the end of the regular expression string.

For example:

replace("03/24/2025", "(..#month#)/(..#day#)/(....#year#)", "$3-$1-$2", "c")

Comments are treated as a lexical construct to be removed before the regular expression is parsed; they are therefore not explicit in the regular expression grammar.

Functions that manipulate URIs

This section specifies functions that manipulate URI values, either as instances of xs:anyURI or as strings.

Parsing and building URIs

This section specifies functions that parse strings as URIs, to identify their structure, and construct URI strings from their structured representation.

Some URI schemes are hierarchical and some are non-hierarchical. Implementations must treat the following schemes as non-hierarchical: jar, mailto, news, tag, tel, and urn. Whether additional schemes are known to be non-hierarchical implementation-defined. If a scheme is not known to be non-hierarchical, it must be treated as hierarchical.

The structured representation of a URI is described by the uri-structure-record:

The parts of this structure are:

The original URI. This element is returned by fn:parse-uri, but ignored by fn:build-uri.

xs:string?

The URI scheme (e.g., “https” or “file”).

xs:string?

Whether the URI is hierarchical or not.

xs:boolean?

The authority portion of the URI (e.g., “example.com:8080”).

xs:string?
Any userinfo that was passed as part of the authority. xs:string? The host passed as part of the authority (e.g., “example.com”). xs:string? The port passed as part of the authority (e.g., “8080”). xs:integer? The path portion of the URI. xs:string? Any query string. xs:string? Any fragment identifier. xs:string? Parsed and unescaped path segments. xs:string? Parsed and unescaped query key-value pairs. map(xs:string, xs:string*)? The path of the URI, treated as a filepath. xs:string?

The segmented forms of the path and query parameters provide convenient access to commonly used information.

The path, if there is one, is tokenized on “/” characters and each segment is unescaped (as per the fn:decode-from-uri function). Consider the URI http://example.com/path/to/a%2fb. The path portion has to be returned as /path/to/a%2fb because decoding the %2f would change the nature of the path. The unescaped form is easily accessible from path-segments:

("", "path", "to", "a/b")

Note that the presence or absence of a leading slash on the path will affect whether or not the sequence begins with an empty string.

The query parameters are decoded into a map. Consider the URI: http://example.com/path?a=1&b=2%264&a=3. The decoded form in the query-parameters is the following map:

{ "a": ("1", "3"), "b": "2&4" }

Note that both keys and values are unescaped. If a key is repeated in the query string, the map will contain a sequence of values for that key, as seen for a in this example.

Functions and operators on Boolean values

This section defines functions and operators on the xs:boolean datatype.

Boolean constant functions

Since no literals are defined in XPath to reference the constant boolean values true and false, two functions are provided for the purpose.

Operators on Boolean values

The following functions define the semantics of operators on boolean values in and :

The ordering operator op:boolean-less-than is provided for application purposes and for compatibility with . The datatype xs:boolean is not ordered.

Functions on Boolean values

The following functions are defined on boolean values:

Functions and operators on durations

Operators are defined on the following type:

xs:duration

and on the two defined subtypes (see ):

xs:yearMonthDuration

xs:dayTimeDuration

No ordering relation is defined on xs:duration values. Two xs:duration values may however be compared for equality.

Duration data types

A value of type xs:duration is considered to comprise two parts:

The total number of months, represented as a signed integer.

The total number of seconds, represented as a signed decimal number.

If one of these values is negative (less than zero), the other must not be positive (greater than zero).

In effect this means that operations on durations (including equality comparison, casting to string, and extraction of components) all treat the duration as normalized. The duration PT1M30S (one minute and thirty seconds), for example, is precisely equivalent to the duration PT90S (ninety seconds); these are different representations of the same value, and the result of any operation will be the same regardless which representation is used. For example, the function fn:seconds-from-duration returns 30 in both cases.

The information content of an xs:duration value can be reduced to an xs:integer number of months, and an xs:decimal number of seconds. For the two defined subtypes this is further simplified so that one of these two components is fixed at zero. Operations such as comparison of durations and arithmetic on durations can be expressed in terms of numeric operations applied to these two components.

Subtypes of duration

Two subtypes of xs:duration, namely xs:yearMonthDuration and xs:dayTimeDuration, are defined in . These types must be available in the data model whether or not the implementation supports other aspects of XSD 1.1.

The significance of these subtypes is that arithmetic and ordering become well defined; this is not the case for xs:duration values in general, because of the variable number of days in a month. For this reason, many of the functions and operators on durations require the arguments/operands to belong to these two subtypes.

In an xs:yearMonthDuration, the seconds component is always zero. In an xs:dayTimeDuration, the months component is always zero.

Limits and precision

All minimally conforming processors must support duration values in which:

The total number of months can be represented as a signed xs:int value;

The total number of seconds can be represented as a signed xs:decimal value with facets totalDigits=18 and fractionalDigits=3. That is, durations must be supported to millisecond precision.

Processors may support a greater range and/or precision. The limits are .

A processor that limits the range or precision of duration values may encounter overflow and underflow conditions when it tries to evaluate operations on durations. In these situations, the processor must return a zero-length duration in case of duration underflow, and must raise a dynamic error in case of overflow.

Similarly, a processor may be unable accurately to represent the result of dividing a duration by 2, or multiplying a duration by 0.5. A processor that limits the precision of the seconds component of duration values must deliver a result that is as close as possible to the mathematically precise result, given these limits; if two values are equally close, the one that is chosen is .

Comparison operators on durations

The following comparison operators are defined on the duration datatypes. Each operator takes two operands of the same type and returns an xs:boolean result. As discussed in , the order relation on xs:duration is a partial order rather than a total order. For this reason, only equality is defined on xs:duration. A full complement of comparison and arithmetic functions are defined on the two subtypes of duration described in which do have a total order.

Component extraction functions on durations

The duration datatype may be considered to be a composite datatype in that it contains distinct properties or components. The extraction functions specified below extract a single component from a duration value. For xs:duration and its subtypes, including the two subtypes xs:yearMonthDuration and xs:dayTimeDuration, the components are normalized: this means that the seconds and minutes components will always be less than 60, the hours component less than 24, and the months component less than 12.

Constructing durations

This section decribes the fn:seconds function, which constructs an xs:dayTimeDuration value representing a decimal number of seconds.

Arithmetic operators on durations

For operators that combine a duration and a date/time value, see .

Functions and operators on dates and times

This section defines operations on the date and time types.

See for a disquisition on working with date and time values with and without timezones.

Date and time types

The operators described in this section are defined on the following date and time types:

xs:dateTime

xs:date

xs:time

xs:gYearMonth

xs:gYear

xs:gMonthDay

xs:gMonth

xs:gDay

The only operation defined on xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth and xs:gDay values is equality comparison. For other types, further operations are provided, including component extraction, order comparisons, arithmetic, formatted display, and timezone adjustment.

Limits and precision

All minimally conforming processors must support positive year values with a minimum of 4 digits (i.e., YYYY) and a minimum fractional second precision of 1 millisecond or three digits (i.e., s.sss). However, conforming processors may set larger limits on the maximum number of digits they support in these two situations. Processors may also choose to support the year 0 and years with negative values. The results of operations on dates that cross the year 0 are .

A processor that limits the number of digits in date and time datatype representations may encounter overflow and underflow conditions when it tries to execute the functions in . In these situations, the processor must return 00:00:00 in case of time underflow. It must raise a dynamic error in case of overflow.

Similarly, a processor that limits the precision of the seconds component of date and time or duration values may need to deliver a rounded result for arithmetic operations. Such a processor must deliver a result that is as close as possible to the mathematically precise result, given these limits: if two values are equally close, the one that is chosen is .

Date/time datatype values

As defined in , xs:dateTime, xs:date, xs:time, xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth, xs:gDay values, referred to collectively as date/time values, are represented as seven components or properties: year, month, day, hour, minute, second and timezone. The first five components are xs:integer values. The value of the second component is an xs:decimal and the value of the timezone component is an xs:dayTimeDuration. For all the primitive date/time datatypes, the timezone property is optional and may or may not be present. Depending on the datatype, some of the remaining six properties must be present and some must be absent. Absent, or missing, properties are represented by the empty sequence. This value is referred to as the local value in that the value retains its original timezone. Before comparing or subtracting xs:dateTime values, this local value must be translated or normalized to UTC.

For xs:time, 00:00:00 and 24:00:00 are alternate lexical forms for the same value, whose canonical representation is 00:00:00. For xs:dateTime, a time component 24:00:00 translates to 00:00:00 of the following day.

Examples

An xs:dateTime with lexical representation 1999-05-31T05:00:00 is represented in the datamodel by { 1999, 5, 31, 5, 0, 0.0, () }.

An xs:dateTime with lexical representation 1999-05-31T13:20:00-05:00 is represented by { 1999, 5, 31, 13, 20, 0.0, xs:dayTimeDuration("-PT5H") }.

An xs:dateTime with lexical representation 1999-12-31T24:00:00 is represented by { 2000, 1, 1, 0, 0, 0.0, () }.

An xs:date with lexical representation 2005-02-28+8:00 is represented by { 2005, 2, 28, (), (), (), xs:dayTimeDuration("PT8H") }.

An xs:time with lexical representation 24:00:00 is represented by { (), (), (), 0, 0, 0, () }.

Constructing a dateTime

A function is provided for constructing a xs:dateTime value from a xs:date value and a xs:time value.

Comparison operators on duration, date and time values

The following comparison operators are defined on the date/time datatypes. Each operator takes two operands of the same type and returns an xs:boolean result.

also states that the order relation on date and time datatypes is not a total order but a partial order because these datatypes may or may not have a timezone. This is handled as follows. If either operand to a comparison function on date or time values does not have an (explicit) timezone then, for the purpose of the operation, an implicit timezone, provided by the dynamic context , is assumed to be present as part of the value. This creates a total order for all date and time values.

An xs:dateTime can be considered to consist of seven components: year, month, day, hour, minute, second and timezone. For xs:dateTime six components (year, month, day, hour, minute and second) are required and timezone is optional. For other date/time values, of the first six components, some are required and others must be absent. Timezone is always optional. For example, for xs:date, the year, month and day components are required and hour, minute and second components must be absent; for xs:time the hour, minute and second components are required and year, month and day are missing; for xs:gDay, day is required and year, month, hour, minute and second are missing.

In , a new explicitTimezone facet is available with values optional, required, or prohibited to enable the timezone to be defined as mandatory or disallowed.

Values of the date/time datatypes xs:time, xs:gMonthDay, xs:gMonth, and xs:gDay, can be considered to represent a sequence of recurring time instants or time periods. An xs:time occurs every day. An xs:gMonth occurs every year. Comparison operators on these datatypes compare the starting instants of equivalent occurrences in the recurring series. These xs:dateTime values are calculated as described below.

Comparison operators on xs:date, xs:gYearMonth and xs:gYear compare their starting instants. These xs:dateTime values are calculated as described below.

The starting instant of an occurrence of a date/time value is an xs:dateTime calculated by filling in the missing components of the local value from a reference xs:dateTime. An example of a suitable reference xs:dateTime is 1972-01-01T00:00:00. Then, for example, the starting instant corresponding to the xs:date value 2009-03-12 is 2009-03-12T00:00:00; the starting instant corresponding to the xs:time value 13:30:02 is 1972-01-01T13:30:02; and the starting instant corresponding to the gMonthDay value --02-29 is 1972-02-29T00:00:00 (which explains why a leap year was chosen for the reference).

In the previous version of this specification, the reference date/time chosen was 1972-12-31T00:00:00. While this gives the same results, it produces a "starting instant" for a gMonth or gMonthDay that bears no relation to the ordinary meaning of the term, and it also required special handling of short months. The original choice was made to allow for leap seconds; but since leap seconds are not recognized in date/time arithmetic, this is not actually necessary.

If the xs:time value written as 24:00:00 is to be compared, filling in the missing components gives 1972-01-01T00:00:00, because 24:00:00 is an alternative representation of 00:00:00 (the lexical value "24:00:00" is converted to the time components { 0, 0, 0 } before the missing components are filled in). This has the consequence that when ordering xs:time values, 24:00:00 is considered to be earlier than 23:59:59. However, when ordering xs:dateTime values, a time component of 24:00:00 is considered equivalent to 00:00:00 on the following day.

Note that the reference xs:dateTime does not have a timezone. The timezone component is never filled in from the reference xs:dateTime. In some cases, if the date/time value does not have a timezone, the implicit timezone from the dynamic context is used as the timezone.

This specification uses the reference xs:dateTime 1972-01-01T00:00:00 in the description of the comparison operators. Implementations may use other reference xs:dateTime values as long as they yield the same results. The reference xs:dateTime used must meet the following constraints: when it is used to supply components into xs:gMonthDay values, the year must allow for February 29 and so must be a leap year; when it is used to supply missing components into xs:gDay values, the month must allow for 31 days. Different reference xs:dateTime values may be used for different operators.

Component extraction functions on dates and times

The date and time datatypes may be considered to be composite datatypes in that they contain distinct properties or components. The extraction functions specified below extract a single component from a date or time value. In all cases the local value (that is, the original value as written, without any timezone adjustment) is used.

A time written as 24:00:00 is treated as 00:00:00 on the following day.

Timezone adjustment functions on dates and time values

These functions adjust the timezone component of an xs:dateTime, xs:date or xs:time value. The $timezone argument to these functions is defined as an xs:dayTimeDuration but must be a valid timezone value.

Arithmetic operators on durations, dates and times

These functions support adding or subtracting a duration value to or from an xs:dateTime, an xs:date or an xs:time value. Appendix E of describes an algorithm for performing such operations.

Limits and precision

A processor that limits the number of digits in date and time datatype representations may encounter overflow and underflow conditions when it tries to execute the functions in this section. In these situations, the processor must return P0M or PT0S in case of duration underflow and 00:00:00 in case of time underflow. It must raise a dynamic error in case of overflow.

The value spaces of the two totally ordered subtypes of xs:duration described in are xs:integer months for xs:yearMonthDuration and xs:decimal seconds for xs:dayTimeDuration. If a processor limits the number of digits allowed in the representation of xs:integer and xs:decimal then overflow and underflow situations can arise when it tries to execute the functions in . In these situations the processor must return zero in case of numeric underflow and P0M or PT0S in case of duration underflow. It must raise a dynamic error in case of overflow.

Formatting dates and times

Three functions are provided to represent dates and times as a string, using the conventions of a selected calendar, language, and country. The functions are presented in their customary fashion, except for the rules and examples, which are described en bloc at and .

The date/time formatting functions

The fn:format-dateTime, fn:format-date, and fn:format-time functions format $value as a string using the picture string specified by the $picture argument, the calendar specified by the $calendar argument, the language specified by the $language argument, and the country or other place name specified by the $place argument. The result of the function is the formatted string representation of the supplied xs:dateTime, xs:date, or xs:time value.

The three functions fn:format-dateTime, fn:format-date, and fn:format-time are referred to collectively as the date formatting functions.

If $value is the empty sequence, the function returns the empty sequence.

Calling the two-argument form of each of the three functions is equivalent to calling the five-argument form with each of the last three arguments set to an empty sequence.

For details of the $language, $calendar, and $place arguments, see .

In general, the use of an invalid $picture, $language, $calendar, or $place argument results in a dynamic error . By contrast, use of an option in any of these arguments that is valid but not supported by the implementation is not an error, and in these cases the implementation is required to output the value in a fallback representation. More detailed rules are given below.

The picture string

The picture consists of a sequence of variable markers and literal substrings. A substring enclosed in square brackets is interpreted as a variable marker; substrings not enclosed in square brackets are taken as literal substrings. The literal substrings are optional and if present are rendered unchanged, including any whitespace. If an opening or closing square bracket is required within a literal substring, it must be doubled. The variable markers are replaced in the result by strings representing aspects of the date and/or time to be formatted. These are described in detail below.

A variable marker consists of a component specifier followed optionally by one or two presentation modifiers and/or optionally by a width modifier. Whitespace within a variable marker is ignored.

The variable marker may be separated into its components by applying the following rules:

The component specifier is always present and is always a single letter.

The width modifier may be recognized by the presence of a comma.

The substring between the component specifier and the comma (if present) or the end of the string (if there is no comma) contains the first and second presentation modifiers, both of which are optional. If this substring contains a single character, this is interpreted as the first presentation modifier. If it contains more than one character, the last character is examined: if it is valid as a second presentation modifier then it is treated as such, and the preceding part of the substring constitutes the first presentation modifier. Otherwise, the second presentation modifier is presumed absent and the whole substring is interpreted as the first presentation modifier.

The component specifier indicates the component of the date or time that is required, and takes the following values:

Specifier Meaning Default Presentation Modifier
Y year (absolute value) 1
M month in year 1
D day in month 1
d day in year 1
F day of week n
W week in year 1
w week in month 1
H hour in day (24 hours) 1
h hour in half-day (12 hours) 1
P am/pm marker n
m minute in hour 01
s second in minute 01
f fractional seconds 1
Z timezone 01:01
z timezone (Same as Z, but modified where appropriate to include a prefix as a time offset using GMT, for example GMT+1 or GMT-05:00. For this component there is a fixed prefix of GMT, or a localized variation thereof for the chosen language, and the remainder of the value is formatted as for specifier Z.) 01:01
C calendar: the name or abbreviation of a calendar name n
E era: the name of a baseline for the numbering of years, for example the reign of a monarch n

A dynamic error is reported if the syntax of the picture is incorrect.

A dynamic error is reported if a component specifier within the picture refers to components that are not available in the given type of $value, for example if the picture supplied to the fn:format-time refers to the year, month, or day component.

It is not an error to include a timezone component when the supplied value has no timezone. In these circumstances the timezone component will be ignored.

The first presentation modifier indicates the style in which the value of a component is to be represented. Its value may be either:

any format token permitted as a primary format token in the second argument of the fn:format-integer function, indicating that the value of the component is to be output numerically using the specified number format (for example, 1, 01, i, I, w, W, or Ww) or

the format token n, N, or Nn, indicating that the value of the component is to be output by name, in lower-case, upper-case, or title-case respectively. Components that can be output by name include (but are not limited to) months, days of the week, timezones, and eras. If the processor cannot output these components by name for the chosen calendar and language then it must use an fallback representation.

If a comma is to be used as a grouping separator within the format token, then there must be a width specifier. More specifically: if a variable marker contains one or more commas, then the last comma is treated as introducing the width modifier, and all others are treated as grouping separators. So [Y9,999,*] will output the year as 2,008.

It is not possible to use a closing square bracket as a grouping separator within the format token.

If the implementation does not support the use of the requested format token, it must use the default presentation modifier for that component.

If the first presentation modifier is present, then it may optionally be followed by a second presentation modifier as follows:

Modifier Meaning
either a or t indicates alphabetic or traditional numbering respectively, the default being . This has the same meaning as in the second argument of fn:format-integer.
either c or o indicates cardinal or ordinal numbering respectively, for example 7 or seven for a cardinal number, or 7th, seventh, or for an ordinal number. This has the same meaning as in the second argument of fn:format-integer. The actual representation of the ordinal form of a number may depend not only on the language, but also on the grammatical context (for example, in some languages it must agree in gender).

Although the formatting rules are expressed in terms of the rules for format tokens in fn:format-integer, the formats actually used may be specialized to the numbering of date components where appropriate. For example, in Italian, it is conventional to use an ordinal number (primo) for the first day of the month, and cardinal numbers (due, tre, quattro ...) for the remaining days. A processor may therefore use this convention to number days of the month, ignoring the presence or absence of the ordinal presentation modifier.

The Width Modifier

Whether or not a presentation modifier is included, a width modifier may be supplied. This indicates the number of characters to be included in the representation of the value.

The width modifier, if present, is introduced by a comma. It takes the form:

   ","  min-width ("-" max-width)?

where min-width is either an unsigned integer indicating the minimum number of characters to be output, or * indicating that there is no explicit minimum, and max-width is either an unsigned integer indicating the maximum number of characters to be output, or * indicating that there is no explicit maximum; if max-width is omitted then * is assumed.

A dynamic error () is raised if min-width is present and less than one, or if max-width is present and less than one or less than min-width.

A format token containing more than one digit, such as 001 or 9999, sets the minimum and maximum width to the number of digits appearing in the format token; if a width modifier is also present, then the width modifier takes precedence.

Formatting Integer-Valued Date/Time Components

The rules in this section apply to the majority of integer-valued components: specifically M D d F W w H h m s.

In the rules below, the term decimal digit pattern has the meaning given in .

If the first presentation modifier takes the form of a decimal digit pattern:

If there is no width modifier, then the value is formatted according to the rules of the format-integer function.

If there is a width modifier, then the first presentation modifier is adjusted as follows:

If the decimal digit pattern includes a grouping separator, the output is implementation-defined (but this is not an error).

Use of a width modifier together with grouping separators is inadvisable for this reason. It is never necessary to use a width modifier with a decimal digit pattern, since the same effect can be achieved by use of optional digit signs.

Otherwise, the number of mandatory-digit-sign characters in the presentation modifier is increased if necessary. This is done first by replacing optional-digit-signs with mandatory-digit-signs, starting from the right, and then prepending mandatory-digit-signs to the presentation modifier, until the number of mandatory-digit-signs is equal to the minimum width. Any mandatory-digit-signs that are added by this process must use the same decimal digit family as existing mandatory-digit-signs in the presentation modifier if there are any, or ASCII digits otherwise.

The maximum width, if specified, is ignored.

The output is then as defined using the format-integer function with this adjusted decimal digit pattern.

If the first presentation modifiers is one of N, n, or Nn:

Let FN be the full name of the component, that is, the form of the name that would be used in the absence of any width modifier.

If FN is shorter than the minimum width, then it is padded by appending spaces to the end of the name.

If FN is longer than the maximum width, then it is abbreviated, either by choosing a conventional abbreviation that fits within the maximum width (for example, “Wednesday” might be abbreviated to “Weds”), or by removing characters from the end of FN until it fits within the maximum width.

For other presentation modifiers:

Any adjustment of the value to fit within the requested width range is implementation-defined.

The value should not be truncated if this results in output that will not be meaningful to users (for example, there is no sensible way to truncate Roman numerals).

If shorter than the minimum width, the value should be padded to the minimum width, either by appending spaces, or in some other way appropriate to the numbering scheme.

Formatting the Year Component

The rules for the year component (Y) are the same as those in , except that the value of the year as output is the value of the year component of the supplied value modulo ten to the power N where N is determined as follows:

If the width modifier is present and defines a finite maximum width, then that maximum width.

Otherwise, if the first presentation modifier takes the form of a decimal-digit-pattern, then:

Let W be the number of optional-digit-signs and mandatory-digit-signs in that decimal-digit-pattern.

If W is 2 or more, then W.

Otherwise, N is infinity (that is, the year is output in full).

Formatting Fractional Seconds

The output for the fractional seconds component (f) is equivalent to the result of the following algorithm:

If the first presentation modifier contains no Unicode digit, then the output is implementation-defined.

Otherwise, the value of the fractional seconds is output as follows:

If there is no width modifier and the first presentation modifier comprises in its entirety a single mandatory-digit-sign (for example the default 1), then the presentation modifier is extended on the right with as many optional-digit-signs as are needed to accommodate the actual fractional seconds precision encountered in the value to be formatted.

If there is a width modifier, then the first presentation modifier is adjusted as follows:

If a minimum width is specified, and if this exceeds the number of mandatory-digit-sign characters in the first presentation modifier, then the first presentation modifier is adjusted. This is done first by replacing optional-digit-signs with mandatory-digit-signs, starting from the left, and then appending mandatory-digit-signs to the presentation modifier, until the number of mandatory-digit-signs is equal to the minimum width. Any mandatory-digit-signs that are added by this process must use the same decimal digit family as existing mandatory-digit-signs in the presentation modifier.

If a maximum width is specified, the first presentation modifier is extended on the right with as many optional-digit-signs as are needed to ensure that the number of mandatory-digit-signs and optional-digit-signs is at least equal to the maximum width.

The sequence of characters in the (adjusted) first presentation modifier is reversed (for example, 999'### becomes ###'999). If the result is not a valid decimal digit pattern, then the output is implementation-defined.

The sequence of digits in the conventional decimal representation of the fractional seconds component is reversed, with insignificant zeroes removed, and the result is treated as an integer. For example, if the seconds value is 25.8235, the reversed fractional seconds value is 5328.

The reversed fractional seconds value is formatted using the reversed decimal digit pattern according to the rules of the fn:format-integer function. Given the examples above, the result is 5'328

The resulting string is reversed. In our example, the result is 823'5.

If the result contains more digits than the number of mandatory-digit-signs and optional-digit-signs in the decimal digit pattern, then excess digits are removed from the right hand end (that is, the value is truncated towards zero rather than being rounded). Any grouping separator that immediately precedes a removed digit is also removed.

The reason for presenting the algorithm in this way is that it enables maximum reuse of the rules defined for fn:format-integer. Since the fractional seconds value is not properly an integer, the rules do not work if used directly: for example, the positions of grouping separators need to be counted from the left rather than from the right. Implementations, as always, are free to use a different algorithm that yields the same result.

A format token consisting of a single digit, such as 1, does not constrain the number of digits in the output. In the case of fractional seconds in particular, [f001] requests three decimal digits, [f01] requests two digits, but [f1] will retain all digits in the supplied date/time value (the maximum number of digits is implementation-defined). If exactly one digit is required, this can be achieved using the component specifier [f1,1-1].

Formatting timezones

Special rules apply to the formatting of timezones. When the component specifiers Z or z are used, the rules in this section override any rules given elsewhere in the case of discrepancies.

If the date/time value to be formatted does not include a timezone offset, then the timezone component specifier is generally ignored (results in no output). The exception is where military timezones are used (format ZZ) in which case the string "J" is output, indicating local time.

When the component specifier is z, the output is the same as for component specifier Z, except that it is prefixed by the characters GMT or some localized equivalent. The prefix is omitted, however, in cases where the timezone is identified by name rather than by a numeric offset from UTC.

If the first presentation modifier is numeric and comprises one or two digits with no grouping-separator (for example 1 or 01), then the timezone is formatted as a displacement from UTC in hours, preceded by a plus or minus sign: for example -5 or +03. If the actual timezone offset is not an integral number of hours, then the minutes part of the offset is appended, separated by a colon: for example +10:30 or -1:15.

If the first presentation modifier is numeric with a grouping-separator (for example 1:01 or 01.01), then the timezone offset is output in hours and minutes, separated by the grouping separator, even if the number of minutes is zero: for example +5:00 or +10.30.

If the first presentation modifier is numeric and comprises three or four digits with no grouping-separator, for example 001 or 0001, then the timezone offset is shown in hours and minutes with no separator, for example -0500 or +1030.

If the first presentation modifier is numeric, in any of the above formats, and the second presentation modifier is t, then a zero timezone offset (that is, UTC) is output as Z instead of a signed numeric value. In this presentation modifier is absent or if the timezone offset is non-zero, then the displayed timezone offset is preceded by a - sign for negative offsets or a + sign for non-negative offsets.

If the first presentation modifier is Z, then the timezone is formatted as a military timezone letter, using the convention Z = +00:00, A = +01:00, B = +02:00, ..., M = +12:00, N = -01:00, O = -02:00, ... Y = -12:00. The letter J (meaning local time) is used in the case of a value that does not specify a timezone offset. Timezone offsets that have no representation in this system (for example Indian Standard Time, +05:30) are output as if the format 01:01 had been requested.

If the first presentation modifier is N, then the timezone is output (where possible) as a timezone name, for example EST or CET. The same timezone offset has different names in different places; it is therefore recommended that this option should be used only if a country code (see ) or IANA timezone name (see ) is supplied in the $place argument. In the absence of this information, the implementation may apply a default, for example by using the timezone names that are conventional in North America. If no timezone name can be identified, the timezone offset is output using the fallback format 01:01.

The following examples illustrate options for timezone formatting.

Variable marker $place Timezone offsets (with time = 12:00:00)
    -10:00 -05:00 +00:00 +05:30 +13:00
[Z] () -10:00 -05:00 +00:00 +05:30 +13:00
[Z0] () -10 -5 +0 +5:30 +13
[Z0:00] () -10:00 -5:00 +0:00 +5:30 +13:00
[Z00:00] () -10:00 -05:00 +00:00 +05:30 +13:00
[Z0000] () -1000 -0500 +0000 +0530 +1300
[Z00:00t] () -10:00 -05:00 Z +05:30 +13:00
[z] () GMT‑10:00 GMT‑05:00 GMT+00:00 GMT+05:30 GMT+13:00
[ZZ] () W R Z +05:30 +13:00
[ZN] "us" HST EST GMT IST +13:00
[H00]:[M00] [ZN] "America/New_York" 06:00 EST 12:00 EST 07:00 EST 01:30 EST 18:00 EST

If a width specifier is present when formatting a timezone, then the representation as defined in this section is padded to the minimum width as described in , but it is never shortened.

Formatting Other Components

This section applies to the remaining components: P (am/pm marker), C (calendar), and E (era).

The output for these components is entirely implementation-defined. The default presentation modifier for these components is n, indicating that they are output as names (or conventional abbreviations), and the chosen names will in many cases depend on the chosen language: see .

The language, calendar, and place arguments

The set of languages, calendars, and places that are supported in the date formatting functions is implementation-defined. When any of these arguments is omitted or is an empty sequence, an implementation-defined default value is used.

The set of languages, calendars, and places that are supported in the date formatting functions is implementation-defined. If any of these arguments is omitted or set to an empty sequence, the default is implementation-defined.

If the fallback representation uses a different calendar from that requested, the output string must identify the calendar actually used, for example by prefixing the string with [Calendar: X] (where X is the calendar actually used), localized as appropriate to the requested language. If the fallback representation uses a different language from that requested, the output string must identify the language actually used, for example by prefixing the string with [Language: Y] (where Y is the language actually used) localized in an implementation-dependent way. If a particular component of the value cannot be output in the requested format, it should be output in the default format for that component.

The $language argument specifies the language to be used for the result string of the function. The value of the argument should be either the empty sequence or a value that would be valid for the xml:lang attribute (see ). Note that this permits the identification of sublanguages based on country codes (from ) as well as identification of dialects and of regions within a country.

If the $language argument is omitted or is set to an empty sequence, or if it is set to an invalid value or a value that the implementation does not recognize, then the processor uses the default language defined in the dynamic context.

The language is used to select the appropriate language-dependent forms of:

names (for example, of months)

numbers expressed as words or as ordinals (twenty, 20th, twentieth)

hour convention (0-23 vs 1-24, 0-11 vs 1-12)

first day of week, first week of year

Where appropriate this choice may also take into account the value of the $place argument, though this should not be used to override the language or any sublanguage that is specified as part of the language argument.

The choice of the names and abbreviations used in any given language is implementation-defined. For example, one implementation might abbreviate July as Jul while another uses Jly. In German, one implementation might represent Saturday as Samstag while another uses Sonnabend. Implementations may provide mechanisms allowing users to control such choices.

The choice of the names and abbreviations used in any given language for calendar units such as days of the week and months of the year is implementation-defined.

Where ordinal numbers are used, the selection of the correct representation of the ordinal (for example, the grammatical gender) may depend on the component being formatted and on its textual context in the picture string.

The calendar attribute specifies that the dateTime, date, or time supplied in the $value argument must be converted to a value in the specified calendar and then converted to a string using the conventions of that calendar.

The calendar value if present must be a valid EQName (dynamic error: ). If it is a lexical QName then it is expanded into an expanded QName using the statically known namespaces; if it has no prefix then it represents an expanded-QName in no namespace. If the expanded QName is in no namespace, then it must identify a calendar with a designator specified below (dynamic error: ). If the expanded QName is in a namespace then it identifies the calendar in an implementation-defined way.

If the $calendar argument is omitted or is set to an empty sequence then the default calendar defined in the dynamic context is used.

The calendars listed below were known to be in use during the last hundred years. Many other calendars have been used in the past.

This specification does not define any of these calendars, nor the way that they map to the value space of the xs:date datatype in . There may be ambiguities when dates are recorded using different calendars. For example, the start of a new day is not simultaneous in different calendars, and may also vary geographically (for example, based on the time of sunrise or sunset). Translation of dates is therefore more reliable when the time of day is also known, and when the geographic location is known. When translating dates between one calendar and another, the processor may take account of the values of the $place and/or $language arguments, with the $place argument taking precedence.

Information about some of these calendars, and algorithms for converting between them, may be found in .

Designator Calendar
AD Anno Domini (Christian Era)
AH Anno Hegirae (Islamic Era)
AME Mauludi Era (solar years since Muhammad’s birth)
AM Anno Mundi (Jewish Calendar)
AP Anno Persici
AS Aji Saka Era (Java)
BE Buddhist Era
CB Cooch Behar Era
CE Common Era
CL Chinese Lunar Era
CS Chula Sakarat Era
EE Ethiopian Era
FE Fasli Era
ISO ISO 8601 calendar
JE Japanese Calendar
KE Khalsa Era (Sikh calendar)
KY Kali Yuga
ME Malabar Era
MS Monarchic Solar Era
NS Nepal Samwat Era
OS Old Style (Julian Calendar)
RS Rattanakosin (Bangkok) Era
SE Saka Era
SH Solar Hijri (Islamic Era, used in Iran and Afghanistan)
SS Saka Samvat
TE Tripurabda Era
VE Vikrama Era
VS Vikrama Samvat Era

At least one of the above calendars must be supported. It is implementation-defined which calendars are supported.

The ISO 8601 calendar (), which is included in the above list and designated ISO, is very similar to the Gregorian calendar designated AD, but it differs in several ways. The ISO calendar is intended to ensure that date and time formats can be read easily by other software, as well as being legible for human users. The ISO calendar prescribes the use of particular numbering conventions as defined in ISO 8601, rather than allowing these to be localized on a per-language basis. In particular it provides a numeric “week date” format which identifies dates by year, week of the year, and day in the week; in the ISO calendar the days of the week are numbered from 1 (Monday) to 7 (Sunday), and week 1 in any calendar year is the week (from Monday to Sunday) that includes the first Thursday of that year. The numeric values of the components year, month, day, hour, minute, and second are the same in the ISO calendar as the values used in the lexical representation of the date and time as defined in . The era (E component) with this calendar is either a minus sign (for negative years) or a zero-length string (for positive years). For dates before 1 January, AD 1, year numbers in the ISO and AD calendars are off by one from each other: ISO year 0000 is 1 BC, -0001 is 2 BC, etc.

ISO 8601 does not define a numbering for weeks within a month. When the w component is used, the convention to be adopted is that each Monday-to-Sunday week is considered to fall within a particular month if its Thursday occurs in that month; the weeks that fall in a particular month under this definition are numbered starting from 1. Thus, for example, 29 January 2013 falls in week 5 because the Thursday of the week (31 January 2013) is the fifth Thursday in January, and 1 February 2013 is also in week 5 for the same reason.

The value space of the date and time datatypes, as defined in XML Schema, is based on absolute points in time. The lexical space of these datatypes defines a representation of these absolute points in time using the proleptic Gregorian calendar, that is, the modern Western calendar extrapolated into the past and the future; but the value space is calendar-neutral. The date formatting functions produce a representation of this absolute point in time, but denoted in a possibly different calendar. So, for example, the date whose lexical representation in XML Schema is 1502-01-11 (the day on which Pope Gregory XIII was born) might be formatted using the Old Style (Julian) calendar as 1 January 1502. This reflects the fact that there was at that time a ten-day difference between the two calendars. It would be incorrect, and would produce incorrect results, to represent this date in an element or attribute of type xs:date as 1502-01-01, even though this might reflect the way the date was recorded in contemporary documents.

When referring to years occurring in antiquity, modern historians generally use a numbering system in which there is no year zero (the year before 1 CE is thus 1 BCE). This is the convention that should be used when the requested calendar is OS (Julian) or AD (Gregorian). When the requested calendar is ISO, however, the conventions of ISO 8601 should be followed: here the year before +0001 is numbered zero. In (version 1.0), the value space for xs:date and xs:dateTime does not include a year zero: however, XSD 1.1 endorses the ISO 8601 convention. This means that the date on which Julius Caesar was assassinated has the ISO 8601 lexical representation -0043-03-13, but will be formatted as 15 March 44 BCE in the Julian calendar or 13 March 44 BCE in the Gregorian calendar (dependent on the chosen localization of the names of months and eras).

The intended use of the $place argument is to identify the place where an event represented by the dateTime, date, or time supplied in the $value argument took place or will take place. If the $place argument is omitted or is set to an empty sequence, then the default place defined in the dynamic context is used. If the value is supplied, and is not the empty sequence, then it should either be a country code or an IANA timezone name. If the value does not take this form, or if its value is not recognized by the implementation, then the default place defined in the dynamic context is used.

Country codes are defined in . Examples are "de" for Germany and "jp" for Japan. Implementations may also allow the use of codes representing subdivisions of a country from ISO 3166-2, or codes representing formerly used names of countries from ISO 3166-3

IANA timezone names are defined in the IANA timezone database . Examples are "America/New_York" and "Europe/Rome".

This argument is not intended to identify the location of the user for whom the date or time is being formatted; that should be done by means of the $language attribute. This information may be used to provide additional information when converting dates between calendars or when deciding how individual components of the date and time are to be formatted. For example, different countries using the Old Style (Julian) calendar started the new year on different days, and some countries used variants of the calendar that were out of synchronization as a result of differences in calculating leap years.

The geographical area identified by a country code is defined by the boundaries as they existed at the time of the date to be formatted, or the present-day boundaries for dates in the future.

If the $place argument is supplied in the form of an IANA timezone name that is recognized by the implementation, then the date or time being formatted is adjusted to the timezone offset applicable in that timezone. For example, if the xs:dateTime value 2010-02-15T12:00:00Z is formatted with the $place argument set to America/New_York, then the output will be as if the value 2010-02-15T07:00:00-05:00 had been supplied. This adjustment takes daylight savings time into account where possible; if the date in question falls during daylight savings time in New York, then it is adjusted to timezone offset -PT4H rather than -PT5H. Adjustment using daylight savings time is only possible where the value includes a date, and where the date is within the range covered by the timezone database.

Examples of date and time formatting

The following examples show a selection of dates and times and the way they might be formatted. These examples assume the use of the Gregorian calendar as the default calendar.

Required Output Expression
2002-12-31 format-date($d, "[Y0001]-[M01]-[D01]")
12-31-2002 format-date($d, "[M]-[D]-[Y]")
31-12-2002 format-date($d, "[D]-[M]-[Y]")
31 XII 2002 format-date($d, "[D1] [MI] [Y]")
31st December, 2002 format-date($d, "[D1o] [MNn], [Y]", "en", (), ())
31 DEC 2002 format-date($d, "[D01] [MN,*-3] [Y0001]", "en", (), ())
December 31, 2002 format-date($d, "[MNn] [D], [Y]", "en", (), ())
31 Dezember, 2002 format-date($d, "[D] [MNn], [Y]", "de", (), ())
Tisdag 31 December 2002 format-date($d, "[FNn] [D] [MNn] [Y]", "sv", (), ())
[2002-12-31] format-date($d, "[[[Y0001]-[M01]-[D01]]]")
Two Thousand and Three format-date($d, "[YWw]", "en", (), ())
einunddrei&eszet;igste Dezember format-date($d, "[Dwo] [MNn]", "de", (), ())
3:58 PM format-time($t, "[h]:[m01] [PN]", "en", (), ())
3:58:45 pm format-time($t, "[h]:[m01]:[s01] [Pn]", "en", (), ())
3:58:45 PM PDT format-time($t, "[h]:[m01]:[s01] [PN] [ZN,*-3]", "en", (), ())
3:58:45 o'clock PM PDT format-time($t, "[h]:[m01]:[s01] o'clock [PN] [ZN,*-3]", "en", (), ())
15:58 format-time($t, "[H01]:[m01]")
15:58:45.762 format-time($t, "[H01]:[m01]:[s01].[f001]")
15:58:45 GMT+02:00 format-time($t, "[H01]:[m01]:[s01] [z,6-6]", "en", (), ())
15.58 Uhr GMT+2 format-time($t, "[H01]:[m01] Uhr [z]", "de", (), ())
3.58pm on Tuesday, 31st December format-dateTime($dt, "[h].[m01][Pn] on [FNn], [D1o] [MNn]")
12/31/2002 at 15:58:45 format-dateTime($dt, "[M01]/[D01]/[Y0001] at [H01]:[m01]:[s01]")

The following examples use calendars other than the Gregorian calendar.

Description Request Result
Islamic format-date($d, "[D&#x0661;] [Mn] [Y&#x0661;]", "ar", "AH", ()) ٢٦ ﺸﻭّﺍﻝ ١٤٢٣
Jewish (with Western numbering) format-date($d, "[D] [Mn] [Y]", "he", "AM", ()) ‏26 טבת 5763
Jewish (with traditional numbering) format-date($d, "[D&#x05D0;t] [Mn] [Y&#x05D0;t]", "he", "AM", ()) כ״ו טבת תשס״ג
Julian (Old Style) format-date($d, "[D] [MNn] [Y]", "en", "OS", ()) 18 December 2002
Thai format-date($d, "[D&#x0E51;] [Mn] [Y&#x0E51;]", "th", "BE", ()) ๓๑ ธันวาคม ๒๕๔๕
Parsing dates and times

A function is provided to parse dates and times expressed using syntax that is commonly encountered in internet protocols.

Functions related to QNames Functions to create a QName

In addition to the xs:QName constructor function, QName values can be constructed by combining a namespace URI, prefix, and local name, or by resolving a lexical QName against the in-scope namespaces of an element node. This section defines functions that perform these operations. Leading and trailing whitespace, if present, is stripped from string arguments before the result is constructed.

Functions and operators on QNames

This section specifies functions and an operator on QNames as defined in .

Operators on base64Binary and hexBinary Comparisons of base64Binary and hexBinary values

The following comparison operators on xs:hexBinary and xs:base64Binary values are defined. Each returns a boolean value.

These functions can be used to compare any xs:hexBinary or xs:base64Binary value with any other xs:hexBinary or xs:base64Binary value: both types have the same value space, namely a sequence of octets which are treated as integers in the range 0 to 255.

Operators on NOTATION

This section specifies operators that take xs:NOTATION values as arguments.

Functions and operators on sequences

A sequence is an ordered collection of zero or more items. An item is a node, an atomic value, or a function, such as a map or an array. The terms sequence and item are defined formally in and .

General functions and operators on sequences

The following functions are defined on sequences. These functions work on any sequence, without performing any operations that are sensitive to the individual items in the sequence.

As in the previous section, for the illustrative examples below, assume an XQuery or transformation operating on a non-empty Purchase Order document containing a number of line-item elements. The variable $seq is bound to the sequence of line-item nodes in document order. The variables $item1, $item2, etc. are bound to separate, individual line-item nodes in the sequence.

Comparison functions

The functions in this section perform comparisons between the items in one or more sequences.

Functions that test the cardinality of sequences

The following functions test the cardinality of their sequence arguments.

The functions fn:zero-or-one, fn:one-or-more, and fn:exactly-one defined in this section, check that the cardinality of a sequence is in the expected range. They are particularly useful with regard to static typing. For example, the function call fn:remove($seq, fn:index-of($seq2, 'abc')) requires the result of the call on fn:index-of to be a singleton integer, but the static type system cannot infer this; writing the expression as fn:remove($seq, fn:exactly-one(fn:index-of($seq2, 'abc'))) will provide a suitable static type at query analysis time, and ensures that the length of the sequence is correct with a dynamic check at query execution time.

The type signatures for these functions deliberately declare the argument type as item()*, permitting a sequence of any length. A more restrictive signature would defeat the purpose of the function, which is to defer cardinality checking until query execution time.

Aggregate functions

Aggregate functions take a sequence as argument and return a single value computed from values in the sequence. Except for fn:count, the sequence must consist of values of a single type or one if its subtypes, or they must be numeric. xs:untypedAtomic values are permitted in the input sequence and handled by special conversion rules. The type of the items in the sequence must also support certain operations.

Functions on node identifiers

This section defines a number of functions used to find elements by ID or IDREF value, or to generate IDs.

Functions giving access to external information

The functions in this section provide access to resources (such as files) in the external environment.

Parsing and serializing

These functions convert between the lexical representation and XPath and XQuery data model representation of various file formats.

Functions on XML Data

These functions convert between the lexical representation of XML and the tree representation.

(The fn:serialize function also handles HTML and JSON output, but is included in this section for editorial convenience.)

Functions on HTML Data

These functions convert between the lexical representation of HTML and the tree representation.

XDM Mapping from HTML DOM Nodes

The fn:parse-html function conceptually works in two phases:

The lexical HTML (supplied as a string) is parsed into an HTML DOM as defined by the HTML5 specification: see and .

The resulting DOM is converted to an XDM tree as described in this section. This is described by defining the actions of the accessor functions defined in .

Because the and are not fixed, it is implementation-defined which versions are used.

An implementation must match the semantics of the mapping described in this section, but the specific way it achieves that is implementation-dependent.

Some possible implementation strategies are:

Parse the HTML to an HTML DOM and then convert the HTML DOM to an XDM node tree.

Parse the HTML to an HTML DOM and then implement a wrapper or facade that presents an XDM interface to the HTML DOM.

Parse the lexical HTML directly to an XDM node tree, bypassing the HTML DOM.

The defines parsing algorithms for two different formats, which it refers to as the HTML and XML serializations. The XML serialization is an XML document which typically uses the namespace http://www.w3.org/1999/xhtml and the content type application/xhtml+xml, and is popularly referred to as XHTML. The HTML parsing algorithm constructs an HTML DOM HTMLDocument document object for the HTML document. The XHTML parsing algorithm constructs an HTML DOM XMLDocument object for the HTML document, following XML parsing rules. This mapping supports both of these document types.

The specification defines HTML DOM nodes that are mapped to XDM nodes as follows:

The HTML DOM Document interface maps to .

The HTML DOM Element interface maps to .

The HTML DOM Attr interface maps to .

Any HTML DOM Attr instances in an HTML DOM HTMLDocument that represent namespace declarations will have been filtered out: see .

The HTML DOM ProcessingInstruction interface maps to .

The HTML parsing algorithm does not generate processing instruction nodes. If encountered they are parsed as comment nodes. The HTML DOM ProcessingInstruction interface is relevant only when the XHTML parsing algorithm is used.

The HTML DOM Comment interface maps to .

The HTML DOM Text interface maps to . Adjacent HTML DOM Text nodes are combined into a single .

The HTML DOM CDATASection interface is an instance of HTML DOM Text, so CDATA sections also map to .

The use of CDATA sections can result in the HTML DOM containing adjacent text nodes, which the mapping to XDM will merge into a single node.

The HTML DOM DocumentFragment interface is not supported as an XML node. There are two places in the HTML DOM where this is used:

The HTML DOM ShadowRoot interface is not present in the main HTML DOM tree. It is only accessible via JavaScript.

The template element’s content property contains the child nodes of the template element. The behaviour of this is defined by the include-template-content key in the map.

If an implementation allows these nodes to be passed in via an API or similar mechanism, their behaviour is implementation-defined.

attributes Accessor

The result of the dm:attributes($node) for an HTML DOM Node is as follows:

If the node is an instance of HTML DOM Element then the result is the value of the Element.attributes property mapped to a sequence as described below;

Otherwise, the result is an empty sequence.

An HTML DOM NamedNodeMap is mapped to a sequence as follows:

NamedNodeMap.length is the length of the sequence, where a length of 0 results in an empty sequence;

NamedNodeMap.item(n) is the nth element of the sequence.

That sequence is then filtered as follows:

If the Attr.namespaceURI property is "http://www.w3.org/2000/xmlns/", the attribute is not included in this sequence;

If the Attr.localName property is "xmlns", the attribute is not included in this sequence;

If the Attr.localName property starts with "xmlns:", the attribute is not included in this sequence;

Otherwise, the attribute is included in this sequence using the XDM mapping rules described in this section.

The HTML DOM Element.attributes property includes namespace and non-namespace attributes in the list when the HTML or XML parser is used. As such, the namespace attributes have to be filtered from the resulting XDM attribute sequence.

When the resulting document is an HTML DOM HTMLDocument, the Attr.localName and Attr.name properties of HTML DOM Attr nodes are both set to the qualified name. This includes namespace declarations which are filtered out by the logic in this section.

The Attr.localName property will be ASCII lowercase. The section 13.2.5.33, Attribute name state specifies that ASCII upper alpha characters are appended to the attribute’s name in lowercase.

base-uri Accessor

The result of the dm:base-uri($node) for an HTML DOM Node is the value of the Node.baseURI property mapped as follows:

If the value is null or an empty string, then the result is an empty sequence;

Otherwise, the string value is cast to an xs:anyURI.

children Accessor

The result of the dm:children($node) for an HTML DOM Node is as follows:

If the node is an instance of HTML DOM Document then the result is the value of the Node.childNodes property mapped to a sequence;

If the node is an instance of HTML DOM HTMLTemplateElement then the result is determined as follows:

If the include-template-content key of the parse-html-options map is false(), the result is an empty sequence;

Select the HTML DOM DocumentFragment from the HTMLTemplateElement.content property;

The HTML DOM DocumentFragment’s Node.childNodes property is mapped to a sequence;

If the node is an instance of HTML DOM Element then the result the value of the Node.childNodes property mapped to a sequence;

Otherwise, the result is an empty sequence.

An HTML DOM NodeList is mapped to a sequence as follows:

NodeList.length is the length of the sequence, where a length of 0 results in an empty sequence;

NodeList.item(n) is the nth element of the sequence.

That sequence is then filtered as follows:

If the child is an instance of HTML DOM DocumentType, that child is not included in this sequence;

A sequence of consecutive HTML DOM Text nodes is combined into a single XDM text node;

Otherwise, the HTML DOM Node nodes are mapped to XDM according to the rules in this section.

document-uri Accessor

The result of the dm:document-uri($node) for an HTML DOM Node is as follows:

If the node is an instance of HTML DOM Document then the value of the Document.documentURI property mapped as follows:

If the value is null or an empty string, then the result is an empty sequence;

Otherwise, the string value is cast to an xs:anyURI.

Otherwise, the result is an empty sequence.

is-id Accessor

The result of the dm:is-id($node) for an HTML DOM Node is as follows:

If the node is an instance of HTML DOM Attr then:

If the Attr.name property (its qualified name) is "id", then:

If the Attr.value is castable to an xs:NCName, the result is true;

Otherwise, the result is false;

Otherwise, the result is false;

Otherwise, the result is false.

In section 3.2.5, Global attributes, the id attribute is defined as being unique in the element’s tree, containing at least one character, and not having any ASCII whitespace characters. This means that an HTML id attribute may not conform to an xs:NCName.

If an HTML id is not a valid xs:NCName then that attribute is not an XML ID.

is-idrefs Accessor

The result of the dm:is-idrefs($node) for an HTML DOM Node is an empty sequence.

namespace-nodes Accessor

The result of the dm:namespace-nodes($node) for an HTML DOM Node is as follows:

If the node is an instance of HTML DOM Element then an implementation-dependent sequence of namespace nodes that is sufficient to define the namespace context of the node.

Otherwise, the result is the empty sequence.

For the XHTML parsing algorithm, this will be equivalent to constructing the namespace nodes from an XML infoset, PSVI, or similar mapping.

For the HTML parsing algorithm, the specification defines the namespace context in various places:

Section 2.1.3 XML compatibility defines the default element namespace to be http://www.w3.org/1999/xhtml.

Section 4.8.15 MathML defines rules for embedded MathML content in HTML documents. Section 13.1.2 Elements defines these elements as foreign elements, placing them in the MathML namespace (http://www.w3.org/1998/Math/MathML). The default element namespace for these elements is the MathML namespace.

Section 4.8.16 SVG defines rules for embedded SVG content in HTML documents. Section 13.1.2 Elements defines these elements as foreign elements, placing them in the SVG namespace (http://www.w3.org/2000/svg). The default element namespace for these elements is the SVG namespace.

Section 13.1.2.3 Attributes defines several namespaced attributes available on foreign elements. If any of these namespaced attributes are present, a namespace node for that namespace must be present on the element.

The supported namespace prefixes are:

xlink in the http://www.w3.org/1999/xlink namespace;

xml in the http://www.w3.org/XML/1998/namespace namespace; and

xmlns in the http://www.w3.org/2000/xmlns/ namespace.

No other namespaces are supported by the HTML parser.

Section number references to may change over time.

nilled Accessor

The result of the dm:nilled($node) for an HTML DOM Node is false().

node-kind Accessor

The result of the dm:node-kind($node) for an HTML DOM Node is as follows:

If the node is an instance of HTML DOM Document then the result is "document".

If the node is an instance of HTML DOM Element then the result is "element".

If the node is an instance of HTML DOM Attr then the result is "attribute".

If the node is an instance of HTML DOM ProcessingInstruction then the result is "processing-instruction".

If the node is an instance of HTML DOM Comment then the result is "comment".

If the node is an instance of HTML DOM Text then the result is "text".

node-name Accessor

The result of the dm:node-name($node) for an HTML DOM Node is as follows:

If the node is an instance of HTML DOM Element then the result is determined as follows:

The local name is the value of the Element.localName property. This is derived as follows:

The local name is initially set to the ASCII lowercase tag name. The section 13.2.5.8, Tag name state specifies that ASCII upper alpha characters are appended to the element’s name in lowercase.

If the local name is an SVG element name, the case-sensitive name is used. section 13.2.6.5, The rules for parsing tokens in foreign content has a table mapping the lowercase element names to their SVG names.

If the local name contains a character that is not a valid XML NameStartChar or NameChar, then an implementation-defined replacement string is used. The result must be a valid NCName.

section 13.2.9 Coercing an HTML DOM into an infoset uses a Unnnnnn escape sequence. That would map : to U00003A.

This local name escaping applies only to the HTML parsing algorithm. If the XHTML parsing algorithm is used, the localName and prefix will be correctly set for QName-based node names.

The namespace prefix is the value of the Element.prefix property, or empty if the value is null;

The namespace URI is the value of the Element.namespaceURI property, or empty if the value is null.

If the element is an HTML element, the namespace URI is "http://www.w3.org/1999/xhtml".

If the element is an SVG element, the namespace URI is "http://www.w3.org/2000/svg".

If the element is a MathML element, the namespace URI is "http://www.w3.org/1998/Math/MathML".

If the node is an instance of HTML DOM Attr then the result is determined as follows:

The attribute name is the tokenized attribute name. The section 13.2.5.33, Attribute name state specifies that ASCII upper alpha characters are appended to the attribute’s name in lowercase.

The local name is the value of the Attr.localName property. This is derived as follows:

The local name is initially set to the attribute name.

If the local name is an SVG or MathML attribute name, the case-sensitive name is used. section 13.2.6.1, Creating and inserting nodes has a table mapping the lowercase attribute names to their SVG/MathML names.

If the local name is an allowed xlink, xml, or xmlns attribute name the local name is the value of the local name column of the attribute name mapping table in section 13.2.6.1, Creating and inserting nodes.

If the local name contains a character that is not a valid XML NameStartChar or NameChar, then an implementation-defined replacement string is used. The result must be a valid NCName.

section 13.2.9 Coercing an HTML DOM into an infoset uses a Unnnnnn escape sequence. That would map : to U00003A.

This local name escaping applies only to the HTML parsing algorithm. If the XHTML parsing algorithm is used, the localName and prefix will be correctly set for QName-based node names.

The namespace prefix is the value of the Attr.prefix property, or empty if the value is null.

If the attribute name is an allowed xlink, xml, or xmlns attribute name the namespace prefix is the value of the prefix column of the attribute name mapping table in section 13.2.6.1, Creating and inserting nodes.

The namespace URI is the value of the Attr.namespaceURI property, or empty if the value is null;

If the attribute name is an allowed xlink, xml, or xmlns attribute name the namespace URI is the value of the namespace column of the attribute name mapping table in section 13.2.6.1, Creating and inserting nodes.

If the node is an instance of HTML DOM ProcessingInstruction then the result is an xs:QName constructed as follows:

The local name is the value of the ProcessingInstruction.target property;

The namespace prefix is empty;

The namespace URI is empty;

Otherwise, the result is an empty sequence.

When the resulting document is an HTML DOM HTMLDocument, the Element.localName and Element.name properties of HTML DOM Element nodes are both set to the qualified name.

When the resulting document is an HTML DOM HTMLDocument, the Attr.localName and Attr.name properties of HTML DOM Attr nodes are both set to the qualified name.

parent Accessor

The result of the dm:parent($node) for an HTML DOM Node is as follows:

Let $parent be the Node.parentNode property of the node;

If $parent is an instance of HTML DOM DocumentFragment, then for each HTML DOM HTMLTemplateElement $template in the parsed DOM tree:

Let $content be the value of the HTMLTemplateElement.content property of $template;

If $content is the same node as $parent, then the result is $template using the XDM mapping rules described in this section;

If there are no more $template nodes, then the result is an empty sequence;

If $parent is null, then the result is an empty sequence;

Otherwise, the result is $parent using the XDM mapping rules described in this section.

The current node can have a HTML DOM DocumentFragment parent node only if the include-template-content key of the html-parser-options is true().

The HTML DOM DocumentFragment’s Node.parentNode property is null, and a DocumentFragment attached to HTMLTemplateElement.content property does not have a host property connecting the fragment back to the template element.

If a future version of adds a DocumentFragment.host property that references the node’s template element, or the implementation has access to that internal property, the implementation may choose to use that instead of traversing the parsed HTML tree.

string-value Accessor

The result of the dm:string-value($node) for an HTML DOM Node is as follows:

If the node is an instance of HTML DOM Document, then use the algorithm described in ;

If the node is an instance of HTML DOM Element, then use the algorithm described in ;

If the node is an instance of HTML DOM Text, then use the algorithm described in ;

Otherwise, the result is the value of the Node.nodeValue property.

Tree string construction

The following algorithm is used to construct the concatenated string value of a node in the HTML DOM tree:

Let $text be the string value "";

For each descendant node $node in document order:

If $node is not an instance of HTML DOM Text, process the next node in document order;

Append the value of the Node.nodeValue property for $node to $text;

The result is $text.

Text node string construction

The following algorithm is used to construct the maximal sequence of adjacent character information items for text node children of an element:

Let $text be the string value "";

Append the value of the Node.nodeValue property for $node to $text;

Let $next be the value of Node.nextSibling;

Let $next is null, or not an instance of HTML DOM Text, the result is $text;

Otherwise, repeat from step 2 using $next as $node.

Adjacent text nodes in the HTML DOM are treated as a single XDM text node by only including the first text node and providing logic to ensure that the text content is merged into a single text block.

type-name Accessor

The result of the dm:type-name($node) for an HTML DOM Node is as follows:

If the node is an instance of HTML DOM Element then the result is xs:untyped.

If the node is an instance of HTML DOM Attr then the result is xs:untypedAtomic.

If the node is an instance of HTML DOM Text then the result is xs:untypedAtomic.

Otherwise, the result is an empty sequence.

typed-value Accessor

The result of the dm:typed-value($node) for an HTML DOM Node is as follows:

Let $string-value be the for the node;

If the node is an instance of HTML DOM Document then the result is $string-value as an xs:untypedAtomic;

If the node is an instance of HTML DOM Element then the result is $string-value as an xs:untypedAtomic;

If the node is an instance of HTML DOM Attr then the result is $string-value as an xs:untypedAtomic;

If the node is an instance of HTML DOM Text then the result is $string-value as an xs:untypedAtomic;

Otherwise, the result is $string-value.

unparsed-entity-public-id Accessor

The result of the dm:unparsed-entity-public-id($node) for an HTML DOM Node is an empty sequence.

unparsed-entity-system-id Accessor

The result of the dm:unparsed-entity-system-id($node) for an HTML DOM Node is an empty sequence.

HTML parser options

This section describes the record structure used to pass options to the fn:parse-html function.

The approach used to parse the HTML document into XDM nodes.

An implementation may use this to specify a specific algorithm, tool, or library that is used, such as tidy or tagsoup.

An implementation may also use this to specify a non-standard variant of HTML to support, such as word for the Microsoft Word HTML variant.

xs:string

The version of HTML to support when parsing HTML strings or sequences of octets.

Valid values an implementation must support for the html method are:

3, 3.2 for HTML 3.2 W3C Recommendation, 14 January 1997

4, 4.01 for HTML 4.01 W3C Recommendation, 24 December 1999

5.0 for HTML5 W3C Recommendation, 28 October 2014

5.1 for HTML 5.1 W3C Recommendation, 1 November 2016

5.2 for HTML 5.2 W3C Recommendation, 14 December 2017

LS for HTML Living Standard, WHATWG

5 may be equivalent to any of 5.0, 5.1, 5.2, or LS

Valid values an implementation must support for the xhtml method are:

1.0 for XHTML 1.0 W3C Recommendation, 26 January 2000

1.1 for XHTML 1.1 W3C Recommendation, 31 May 2001

Any other method and html-version combinations are implementation-defined.

(enum('LS') | xs:decimal)

The character encoding to use to decode a sequence of octets that represents an HTML document.

xs:string?

Defines how to handle elements in the HTMLTemplateElement.content property.

If this option is true(), the template element’s children are the children of the content property’s document fragment node.

If this option is false(), the template element’s children are the empty sequence.

The default behaviour is implementation-defined.

This allows an implementation to support the behaviour defined in section 4.12.3.1, Interaction of template elements with XSLT and XPath:

This option would default to true() for an XSLT processor operating on an HTML DOM constructed from an XHTML document.

This option would default to false() for an XPath processor using the section 8, XPath APIs.

xs:boolean?

Additional parser options are allowed.

Example:

An implementation may provide keys for options to the tidy HTML parser, allowing a user to configure the behaviour of that parser.

Functions on JSON Data

The functions listed in this section parse or serialize JSON data.

JSON is a popular format for exchange of structured data on the web: it is specified in . This section describes facilities allowing JSON data to be converted to and from XDM values.

This specification describes two ways of representing JSON data losslessly using XDM constructs. The first method uses XDM maps to represent JSON objects, and XDM arrays to represent JSON arrays. The second method represents all JSON constructs using XDM element and attribute nodes.

Note also that the function fn:serialize has an option to act as the inverse function to fn:parse-json.

Representing JSON using maps and arrays

This section defines a mapping from JSON data to XDM maps and arrays. Two functions are available to support this mapping: fn:parse-json and fn:serialize (with options selecting JSON as the output method). The fn:parse-json function will accept any JSON text as input, and converts it to XDM data values. The fn:serialize function (with JSON as the output method) will accept any XDM value produced using fn:parse-json and convert it back to the original JSON text (subject to insignificant variations such as reordering the properties in a JSON object).

The conversion is lossless if recommended JSON good practice is followed. Information may however be lost if (a) JSON numbers are not exactly representable as double-precision floating point, or (b) duplicate key values appear within a JSON object.

The representation of JSON data produced by the fn:parse-json function has been chosen with ease of manipulation as a design aim. For example, a simple JSON object such as { "Sun": 1, "Mon": 2, "Tue": 3, ... } produces a simple map, so if the result of parsing is held in $weekdays, the number for a given weekday can be extracted using an expression such as $weekdays?Tue. Similarly, a simple array such as [ "Sun", "Mon", "Tue", ... ] produces an array that can be addressed as, for example, $weekdays(3). A more deeply nested structure can be addressed in a similar way: for example if the JSON text is an array of person objects, each of which has a property named phones which is an array of strings containing phone numbers, then the first phone number of each person in the data can be addressed as $data?phones(1).

XML Representation of JSON

This section defines a mapping from JSON data to XML (specifically, to XDM element and attribute nodes). A function fn:json-to-xml is provided to take a JSON string as input and convert it to the XML representation, and a second function fn:xml-to-json performs the reverse operation.

The XML representation is designed to be capable of representing any valid JSON text including one that uses characters which are not valid in XML. The transformation is normally lossless: that is, distinct JSON texts convert to distinct XML representations. When converting JSON to XML, options are provided to reject unsupported characters, to replace them with a substitute character, or to leave them in backslash-escaped form.

The conversion is lossless if recommended JSON good practice is followed. Information may however be lost if (a) JSON numbers are not exactly representable as double-precision floating point, or (b) duplicate key values appear within a JSON object.

The following example demonstrates the correspondence of a JSON text and the corresponding XML representation.

A JSON Text and its XML Representation

Consider the following JSON text:

The XML representation of this text is as follows. Whitespace is included in the XML representation for purposes of illustration, but it will not necessarily be present in the output of the json-to-xml function.

Distances between several cities, in kilometers. 2014-02-04T18:50:45 true London 322 Paris 265 Amsterdam 173 Brussels 322 Paris 344 Amsterdam 358 Brussels 265 London 344 Amsterdam 431 Brussels 173 London 358 Paris 431 ]]>

An XSD 1.0 schema for the XML representation is provided in . It is not necessary to import this schema into the static context unless the stylesheet or query makes explicit reference to the components defined in the schema. If the stylesheet or query does import a schema for the namespace http://www.w3.org/2005/xpath-functions, then:

Unless the host language specifies otherwise, the processor (if it is schema-aware) must recognize an import declaration for this namespace, whether or not a schema location is supplied.

If a schema location is provided, then the schema document at that location must be equivalent to the schema document at ; the effect if it is not equivalent is

The rules governing the mapping from JSON to XML are as follows. In these rules, the phrase “an element named N” is to be interpreted as meaning “an element node whose local name is N and whose namespace URI is http://www.w3.org/2005/xpath-functions”.

The JSON value null is represented by an element named null, with empty content.

The JSON values true and false are represented by an element named boolean, with content conforming to the type xs:boolean. When the element is created by the fn:json-to-xml function, the string value of the element will be true or false. The fn:xml-to-json function also recognizes other strings that validate as xs:boolean, for example 1 and 0. Leading and trailing whitespace is accepted.

A JSON number is represented by an element named number, with content conforming to the type xs:double, with the additional restriction that the value must not be positive or negative infinity, nor NaN. The fn:json-to-xml function creates an element whose string value is lexically the same as the JSON representation of the number. The fn:xml-to-json function generates a JSON representation that is the result of casting the (typed or untyped) value of the node to xs:double and then casting the result to xs:string. Leading and trailing whitespace is accepted. Since JSON does not impose limits on the range or precision of numbers, these rules mean that conversion from JSON to XML will always succeed, and will retain full precision in the lexical representation unless the data model implementation is one that reconstructs the string value from the typed value. In the reverse direction, conversion from XML to JSON may fail if the value is infinity or NaN, or if the string value is such that casting to xs:double produces positive or negative infinity.

A JSON string is represented by an element named string, with content conforming to the type xs:string. The string element has two alternative representations: escaped form, and unescaped form.

A JSON array is represented by an element named array. The content is a sequence of child elements representing the members of the array in order, each such element being the representation of the array member obtained by applying these rules recursively.

A JSON object is represented by an element named map. The content is a sequence of child elements each of which represents one of the name/value pairs in the object. The representation of the name/value pair N:V is obtained by taking the element that represents the value V (by applying these rules recursively) and adding an attribute with name key (in no namespace), whose value is N as an instance of xs:string. The functions fn:json-to-xml and fn:xml-to-json both retain the order of entries, subject to rules about how duplicate keys are handled. The key may be represented in escaped or unescaped form.

The attribute escaped="true" may be specified on a string element to indicate that the string value contains backslash-escaped characters that are to be interpreted according to the JSON rules. The attribute escaped-key="true" may be specified on any element with a key attribute to indicate that the key contains backslash-escaped characters that are to be interpreted according to the JSON rules. Both attributes have the default value false, signifying that the relevant value is in unescaped form. In unescaped form, the backslash character has no special significance (it represents itself).

The JSON grammar for number is a subset of the lexical space of the XSD type xs:double. The mapping from JSON number values to xs:double values is defined by the XPath rules for casting from xs:string to xs:double. Note that these rules will never generate an error for out-of-range values; instead very large or very small values will be converted to +INF or -INF. Since JSON does not impose limits on the range or precision of numbers, the conversion is not guaranteed to retain full precision.

Although the order of entries in a JSON object is generally considered to have no significance, the functions json-to-xml and json-to-xml both retain order.

The XDM representation of a JSON value may either be untyped (all elements annotated as xs:untyped, attributes as xs:untypedAtomic), or it may be typed. If it is typed, then it must have the type annotations obtained by validating the untyped representation against the schema given in . If it is untyped, then it must be an XDM instance such that validation against this schema would succeed; with the proviso that all attributes other than those in no namespace or in namespace http://www.w3.org/2005/xpath-functions are ignored, including attributes such as xsi:type and xsi:nil that would normally influence the process of schema validation.

The namespace prefix associated with the namespace http://www.w3.org/2005/xpath-functions (if any) is immaterial. The effect of the fn:xml-to-json function does not depend on the choice of prefix, and the prefix (if any) generated by the fn:json-to-xml function is implementation-dependent.

Functions on CSV Data

This section describes functions that parse CSV data.

The term comma separated values or CSV refers to a wide variety of plain-text tabular data formats with fields and records separated by standard character delimiters (often, but not invariably, commas).

A CSV is a 2-dimensional tabular data structure consisting of multiple rows (also known as records). Each row contains multiple fields. Fields occupying the same position in successive rows constitute a column. Columns are identified by position and optionally by name. Column names can be assigned within a CSV using an optional header row.

CSV has developed informally for decades, and many variations are found. This specification refers to , which provides a standardized grammar. This specification extends the grammar defined in as follows:

This specification uses the term row where RFC 4180 uses record.

Row delimiters other than CRLF are recognized.

Field delimiters other than comma (",") are recognized.

Quote characters other than the double quotation mark ('"') are recognized.

Non-ASCII characters are recognized.

This specification defines a mapping from this extended grammar to constructs in the XDM model, and provides illustrative examples of how these constructs can be combined with other language features to process CSV data.

The most basic function for parsing CSV is fn:csv-to-arrays which recognizes the delimiters for rows and fields and returns a sequence of arrays each corresponding to one row. The fields within each array are represented as instances of xs:string.

The other two functions recognize column names, and make it easier to address individual fields using these names. The parse-csv function delivers this capability using XDM maps and functions, while csv-to-xml function represents the information using XDM element nodes.

CSV delimiters

The delimiters used for rows, columns, and quoting are configurable. An error is raised if the same delimiter string is used in multiple roles .

Rows in CSV files are typically delimited with CRLF (U+000D, U+000A), LF (U+000A), or CR (U+000D) line endings, although RFC 4180 specifies CRLF. By contrast, the fn:unparsed-text function normalizes these line endings to LF (U+000A). The CSV parsing functions therefore use LF by default. An option is available to normalize line endings so that CR and CRLF are converted to U+000A (except when they appear in quote fields). This option is off by default, because line ending normalization will usually have been carried out earlier: for example, the fn:unparsed-text function does it automatically.

The last row in the file may or may not be followed by a row delimiter.

Fields in CSV are frequently delimited with a comma. Other field delimiters are useful, for example when numeric data uses comma as a decimal separator. The chosen field delimiter is then often U+003B or U+0009.

The column delimiter defaults to U+002C. The value may be any single Unicode character. An error is raised if the column-delimiter option is set to a multi-character string.

Field quoting

CSVs, as specified in , require that fields be wrapped with a quote character if they contain either the row or column delimiter. For example:

"A single field, containing a comma","another field containing CRLF within it"

If a field is to contain the quote character, the character must be escaped by doubling it, as with escaping of quotes in XPath string literals (see ). An error is raised if a quote character appears within a field incorrectly escaped, for example:

incorrectly escaped " quote character

The quotes surrounding quoted fields are not included in the result. The following input string, when parsed, produces a sequence of strings, as shown below:

'"Field 1","Field 2","Field ""with quotes"" 3"' ('Field 1', 'Field 2', 'Field "with quotes" 3')

The quote character defaults to U+0022.

No space is allowed between the column delimiter and a quote. An error is raised if whitespace or other characters occur between a quote character and the nearest column delimiter.

The following example is therefore invalid and parsing it will raise an error.

'"Field 1", "Field 2", "Field 3"'
Basic parsing of CSV to arrays

The result of fn:csv-to-arrays is a sequence of rows, where each row is represented as an array of xs:string values.

The first row of the CSV is returned in the same way as all the other rows. fn:csv-to-arrays does not distinguish between a header row and data rows, and returns all of them.

A CSV with fixed-width rows

For example, given the input:

'Column 1,Column 2,Column 3 Field 1A,Field 1B,Field 1C Field 2A,Field 2B,Field 2C'

the fn:csv-to-arrays function produces

( [ "Column 1", "Column 2", "Column 3" ], [ "Field 1A", "Field 1B", "Field 1C" ], [ "Field 2A", "Field 2B", "Field 2C" ] )
A CSV with variable-width rows

It is common practice for all rows in a CSV to have the same number of columns, but this is not required.

'Column 1,Column 2,Column 3 Field 1A,Field 1B,Field 1C Field 2A,Field 2B,Field 2C,Field 2D'

produces

( [ "Column 1", "Column 2", "Column 3" ], [ "Field 1A", "Field 1B", "Field 1C" ], [ "Field 2A", "Field 2B", "Field 2C", "Field 2D" ] )

states that CSVs should contain the same number of fields in each row, so that there are a uniform number of columns. However, the reality is that CSVs can, and sometimes do, contain a variable number of fields in a row. As a result, this function does not truncate or pad the number of fields in each row for any reason. The fn:csv-to-xml and fn:parse-csv functions provide facilities to enforce uniformity and an expected number of columns.

Enhanced parsing of CSV data to maps and arrays

While fn:csv-to-arrays simply delivers the CSV content as a sequence of arrays, the fn:parse-csv function goes a step further and enables access to the data using column names. The column names may be taken either from the first row of the CSV data, or from data supplied by the caller in the options parameter.

The function returns a parsed-csv-structure-record:

The record has four parts, which are always present (though potentially empty):

The list of column names, in order.

xs:string*

A map from column names to the 1-based integer position of the column.

map(xs:string,xs:integer)?

The contents of the non-header rows in the CSV data, as a sequence of arrays of xs:string values; each array represents one row of the CSV data.

array(xs:string)*

A function providing ready access to a given field in a given row. The get function has signature:

function($row as xs:integer, $column as union(xs:string, xs:integer)) as xs:string?

The function takes two arguments: the first is an integer giving the row number (1-based), the second identifies a column either by its name or by its 1-based position.

function(xs:positiveInteger, (xs:positiveInteger | xs:string)) as xs:string?
Representing CSV data as XML

The fn:csv-to-xml function returns an XDM node tree representing the CSV data. Following is a CSV text and the XML serialization of the corresponding node tree.

Name,Date,Amount Alice,2023-07-14,1.23 Bob,2023-07-14,2.34 Name Date Amount Alice 2023-07-14 1.23 Bob 2023-07-14 2.34 ]]>

If column names were not extracted, then implementations should not include the ]]> element, and ]]> elements should not have the column attribute:

Name Date Amount Alice 2023-07-14 1.23 Bob 2023-07-14 2.34 ]]>

An XSD 1.0 schema for the XML representation is provided in .

Illustrative examples of processing CSV data

The following examples illustrate more complex applications making use of CSV parsing functions.

A variable $crlf is assumed to be in scope representing the CRLF string:

let $crlf := fn:char(0x0D)||fn:char(0x0A) Converting a CSV into an HTML-style table using fn:parse-csv

Direct conversion is a matter of iterating across the records and fields to generate <tr> and <td> elements.

Using XQuery:

{ for $column in $csv?columns?fields return { $column } } { for $row in $csv?rows return { for $field in $row?fields return { $field } } } ]]>

Using XSLT:

{ . } { . }
]]>
Converting a CSV into an HTML-style table using fn:csv-to-xml

The fn:csv-to-xml function makes these kinds of conversion-to-XML-table tasks simpler by providing a simple XML represenation of the data. Here, in XQuery:

{ for $column in $csv/csv/columns/column return { $column } } { for $row in $csv/csv/rows/row return { for $field in $row/field return { $field } } } ]]>

And in XSLT:

{ . } { . }
]]>
Functions on Invisible XML

This section describes functions that support parsing.

Invisible XML defines a BNF-like language for specifying grammars, together with a mapping from sentences in that grammar to an XML representation. By defining an Invisible XML grammar, a great variety of non-XML data formats can be manipulated as if they were XML. The function fn:invisible-xml takes a grammar as input, and returns a function which can be used for parsing data instances and converting them to XML node trees.

Context functions

The following functions are defined to obtain information from the static or dynamic context.

Higher-order functions Functions on functions

The functions included in this section operate on function items, that is, values referring to a function.

Functions that accept functions among their arguments, or that return functions in their result, are described in this specification as higher-order functions. Some host languages may exclude higher-order functions from the set of functions that they support, or may include such functions in an optional conformance feature.

Some functions such as fn:parse-json allow the option of supplying a callback function for example to define exception behavior. Where this is not essential to the use of the function, the function has not been classified as higher-order for this purpose; in applications where function items cannot be created, these particular options will not be available.

Basic higher-order functions

The following functions take function items as an argument.

With all these functions, if the caller-supplied function fails with a dynamic error, this error is propagated as an error from the higher-order function itself.

Dynamic Evaluation

The following functions allow dynamic loading and evaluation of XQuery queries, XSLT stylesheets, and XPath binary operators.

Maps

Maps were introduced as a new datatype in XDM 3.1. This section describes functions that operate on maps.

A map is an additional kind of item.

A map consists of a set of entries, also known as key-value pairs. Each entry comprises a key which is an arbitrary atomic value, and an arbitrary sequence called the associated value.

Within a map, no two entries have the same key. Two atomic values K1 and K2 are the same key for this purpose if the function call fn:atomic-equal($K1, $K2) returns true.

It is not necessary that all the keys in a map should be of the same type (for example, they can include a mixture of integers and strings).

Maps are immutable, and have no identity separate from their content. For example, the map:remove function returns a map that differs from the supplied map by the omission (typically) of one entry, but the supplied map is not changed by the operation. Two calls on map:remove with the same arguments return maps that are indistinguishable from each other; there is no way of asking whether these are “the same map”.

A map can also be viewed as a function from keys to associated values. To achieve this, a map is also a function item. The function corresponding to the map has the signature function($key as xs:anyAtomicValue) as item()*. Calling the function has the same effect as calling the get function: the expression $map($key) returns the same result as get($map, $key). For example, if $books-by-isbn is a map whose keys are ISBNs and whose assocated values are book elements, then the expression $books-by-isbn("0470192747") returns the book element with the given ISBN. The fact that a map is a function item allows it to be passed as an argument to higher-order functions that expect a function item as one of their arguments.

Composing and Decomposing Maps

It is often useful to decompose a map into a sequence of entries, or key-value pairs (in which the key is an atomic value and the value is an arbitrary sequence). Subsequently it may be necessary to reconstruct a map from these components, typically after modification.

There are two conventional ways of representing key-value pairs, each with its own advantages and disadvantages. Both approaches are supported by functions in this library. These are described below:

A singleton map is a map containing a single entry.

It is possible to decompose any map into a sequence of singleton maps, and to construct a map from a sequence of singleton maps.

For example the map { "x": 1, "y": 2 } can be decomposed to the sequence ({ "x": 1 }, { "y": 2 }).

A key-value pair map is a map containing two entries, one (with the key "key") containing the key part of a key value pair, the other (with the key "value") containing the value part of a key value pair.

For example the map { "x": 1, "y": 2 } can be decomposed as ({ "key": "x", "value": 1 }, { "key": "y", "value": 2 })

A is an instance of the type record(key as xs:anyAtomicType, value as item()*).

The following table summarizes the way in which these two representations can be used to compose and decompose maps:

Operation Singleton Maps Key-Value Pair Maps

Decompose a map

map:entries($map)

map:pairs($map)

Compose a map

map:merge($entries)

map:of-pairs($key-value-pairs)

Create a single entry

map:entry($key, $value)

map:pair($key, $value)

Extract the key part of a single entry

map:keys($entry)

$key-value-pair?key

Extract the value part of a single entry

map:values($entry)

$key-value-pair?value

Functions that Operate on Maps

The functions defined in this section use a conventional namespace prefix map, which is assumed to be bound to the namespace URI http://www.w3.org/2005/xpath-functions/map.

The function call map:get($map, $key) can be used to retrieve the value associated with a given key.

There is no operation to atomize a map or convert it to a string. The function fn:serialize can in some cases be used to produce a JSON representation of a map.

Other Operations on Maps

Because a map is a function item, functions that apply to functions also apply to maps. A map is an anonymous function, so fn:function-name returns the empty sequence; fn:function-arity always returns 1.

Maps may be compared using the fn:deep-equal function.

There is no function or operator to atomize a map or convert it to a string (other than fn:serialize, which can be used to serialize some maps as JSON texts).

Arrays

Arrays were introduced as a new datatype in XDM 3.1. This section describes functions that operate on arrays.

An array is an additional kind of item. An array of size N is a mapping from the integers (1 to N) to a set of values, called the members of the array, each of which is an arbitrary sequence. Because an array is an item, and therefore a sequence, arrays can be nested.

An array acts as a function from integer positions to associated values, so the function call $array($index) can be used to retrieve the array member at a given position. The function corresponding to the array has the signature function($index as xs:integer) as item()*. The fact that an array is a function item allows it to be passed as an argument to higher-order functions that expect a function item as one of their arguments.

Functions that Operate on Arrays

The functions defined in this section use a conventional namespace prefix array, which is assumed to be bound to the namespace URI http://www.w3.org/2005/xpath-functions/array.

As with all other values, arrays are treated as immutable. For example, the array:reverse function returns an array that differs from the supplied array in the order of its members, but the supplied array is not changed by the operation. Two calls on array:reverse with the same argument will return arrays that are indistinguishable from each other; there is no way of asking whether these are “the same array”. Like sequences, arrays have no identity.

All functionality on arrays is defined in terms of two primitives:

The function array:members decomposes an array to a sequence of value records.

The function array:of-members composes an array from a sequence of value records.

A value record here is an item that encapsulates an arbitrary value; the representation chosen for a value record is record(value as item()*), that is, a map containing a single entry whose key is the string "value" and whose value is the encapsulated sequence.

Additional Operations on Arrays

This section is non-normative.

The XPath language provides explicit syntax for certain operations on arrays. These constructs can also be specified in terms of functions primitives:

Singleton Arrays

array { $sequence } constructs an array whose members are the items in $sequence. Every member of this array will be a singleton item. This can be defined as array:join($sequence ! array { . }).

[E1, E2, E3, ..., En] constructs an array in which E1 is the first member, E2 is the second member, and so on. The result is equivalent to the expression array:join(([ E1 ], [ E2 ], ... [ En ])).

The lookup expression $array?* is equivalent to array:split($array)?*.

The lookup expression $array?$N, where $N is an integer within the bounds of the array, is equivalent to array:split($array)[$N]?*.

Similarly, applying the array as a function, $array($N), is also equivalent to array:split($array)[$N]?*

Value Maps

array { $sequence } constructs an array whose members are the items in $sequence. Every member of this array will be a singleton item. This can be defined as array:of-members($sequence ! { 'value': . }).

[E1, E2, E3, ..., En] constructs an array in which E1 is the first member, E2 is the second member, and so on. The result is equivalent to the expression array:of-members(({ 'value': E1 }, { 'value': E2 }, { 'value': E3 }, ... { 'value': En })).

The lookup expression $array?* is equivalent to array:members($array) ! ?value.

The lookup expression $array?$N, where $N is an integer within the bounds of the array, is equivalent to array:members($array)[$N]?value.

Similarly, applying the array as a function, $array($N), is also equivalent to array:members($array)[$N]?value

Constructor functions

Constructor functions are used to convert a supplied value to a given type, and the name of the function is the same as the name of the target type. This section describes constructor functions corresponding to the following types:

Simple types (atomic types, union types, and list types as defined in ), which are present in the static context either because they appear in the in-scope schema types or because they appear as named item types.

These constructor functions always take a single argument.

Record tests defined as named item types.

These take one argument for each named field of the record test. Constructor functions for record types are defined in .

Constructor functions are defined for all user-defined named simple types, and for most built-in atomic, list, and union types. The only named simple types that have no constructor function are those that have no instances other than instances of their derived types: specifically, xs:anySimpleType, xs:anyAtomicType, and xs:NOTATION.

Constructor functions for XML Schema built-in atomic types

Every built-in atomic type that is defined in , except xs:anyAtomicType and xs:NOTATION, has an associated constructor function. The type xs:untypedAtomic, defined in and the two derived types xs:yearMonthDuration and xs:dayTimeDuration defined in also have associated constructor functions. Implementations may additionally provide a constructor functions for the new datatype xs:dateTimeStamp introduced in .

A constructor function is not defined for xs:anyAtomicType as there are no atomic values with type annotation xs:anyAtomicType at runtime, although this can be a statically inferred type. A constructor function is not defined for xs:NOTATION since it is defined as an abstract type in . If the static context (See ) contains a type derived from xs:NOTATION then a constructor function is defined for it. See .

The form of the constructor function for an atomic type eg:TYPE is:

If $arg is the empty sequence, the empty sequence is returned. For example, the signature of the constructor function corresponding to the xs:unsignedInt type defined in is:

Calling the constructor function xs:unsignedInt(12) returns the xs:unsignedInt value 12. Another call of that constructor function that returns the same xs:unsignedInt value is xs:unsignedInt("12"). The same result would also be returned if the constructor function were to be called with a node that had a typed value equal to the xs:unsignedInt 12. The standard features described in would atomize the node to extract its typed value and then call the constructor with that value. If the value passed to a constructor is not in the lexical space of the datatype to be constructed, and cannot be converted to a value in the value space of the datatype under the rules in this specification, then an dynamic error is raised .

The semantics of the constructor function xs:TYPE(arg) are identical to the semantics of arg cast as xs:TYPE? . See .

If the argument to a constructor function is a literal, the result of the function may be evaluated statically; if an error is found during such evaluation, it may be reported as a static error.

Special rules apply to constructor functions for xs:QName and types derived from xs:QName and xs:NOTATION. See .

The argument is optional, and defaults to the context value (which will be atomized if necessary).

The following constructor functions for the built-in atomic types are supported:

Implementations should return negative zero for xs:float("-0.0E0"). But because does not distinguish between the values positive zero and negative zero, implementations may return positive zero in this case.

Implementations should return negative zero for xs:double("-0.0E0"). But because does not distinguish between the values positive zero and negative zero, implementations may return positive zero in this case.

See for special rules.

See for rules related to constructing values of type xs:ENTITY and types derived from it.

Available only if the implementation supports XSD 1.1.

Constructor functions for xs:QName and xs:NOTATION

Special rules apply to constructor functions for the types xs:QName and xs:NOTATION, for two reasons:

Values cannot belong directly to the type xs:NOTATION, only to its subtypes.

The lexical representation of these types uses namespace prefixes, whose meaning is context-dependent.

These constraints result in the following rules:

There is no constructor function for xs:NOTATION. Constructors are defined, however, for xs:QName, for types derived or constructed from xs:QName, and for types derived or constructed from xs:NOTATION.

When converting from an xs:string, the prefix within the lexical xs:QName supplied as the argument is resolved to a namespace URI using the statically known namespaces from the static context. If the lexical xs:QName has no prefix, the namespace URI of the resulting expanded-QName is the default namespace for elements and types, taken from the static context. Components of the static context are defined in . A dynamic error is raised if the prefix is not bound in the static context. As described in , the supplied prefix is retained as part of the expanded-QName value.

When a constructor function for a namespace-sensitive type is used as a literal function item or in a partial function application (for example, xs:QName#1 or xs:QName(?)) the namespace bindings that are relevant are those from the static context of the literal function item or partial function application. When a constructor function for a namespace-sensitive type is obtained by means of the fn:function-lookup function, the relevant namespace bindings are those from the static context of the call on fn:function-lookup.

When the supplied argument to the xs:QName constructor function is a node, the node is atomized in the usual way, and if the result is xs:untypedAtomic it is then converted as if a string had been supplied. The effect might not be what is desired. For example, given the attribute xsi:type="my:type", the expression xs:QName(@xsi:type) might fail on the grounds that the prefix my is undeclared. This is because the namespace bindings are taken from the static context (that is, from the query or stylesheet), and not from the source document containing the @xsi:type attribute. The solution to this problem is to use the function call resolve-QName(@xsi:type, .) instead.

Constructor functions for XML Schema built-in list types

Each of the three built-in list types defined in , namely xs:NMTOKENS, xs:ENTITIES, and xs:IDREFS, has an associated constructor function.

The function signatures are as follows:

The semantics are equivalent to casting to the corresponding types from xs:string.

All three of these types have the facet minLength = 1 meaning that there must always be at least one item in the list. The return type, however, allows for the fact that when the argument to the function is an empty sequence, the result is an empty sequence.

In the case of atomic types, it is possible to use an expression such as xs:date(@date-of-birth) to convert an attribute value to an instance of xs:date, knowing that this will work both in the case where the attribute is already annotated as xs:date, and also in the case where it is xs:untypedAtomic. This approach does not work with list types, because it is not permitted to use a value of type xs:NMTOKEN* as input to the constructor function xs:NMTOKENS. Instead, it is necessary to use conditional logic that performs the conversion only in the case where the input is untyped: if (@x instance of attribute(*, xs:untypedAtomic)) then xs:NMTOKENS(@x) else data(@x)

Constructor functions for XML Schema built-in union types

There is a constructor function for the union type xs:numeric defined in . The function signature is:

The semantics are determined by the rules in . These rules have the effect that:

If the argument is an instance of xs:double, xs:float, or xs:decimal, then the result is an instance of the same primitive type, with the same value;

If the argument is an instance of xs:boolean, the result is the xs:double value 0.0e0 or 1.0e0;

If the argument is an instance of xs:string or xs:untypedAtomic, then:

If the value is in the lexical space of xs:double, the result will be the corresponding xs:double value;

Otherwise, a dynamic error occurs;

The result will never be an instance of xs:float, xs:decimal, or xs:integer. This is because xs:double appears first in the list of member types of xs:numeric, and its lexical space subsumes the lexical space of the other numeric types. Thus, unlike XPath numeric literals, the result does not depend on the lexical form of the supplied value. The reason for this design choice is to retain compatibility with the function conversion rules: functions such as fn:abs and fn:round are declared to expect an instance of xs:numeric as their first or only argument, and compatibility with the function conversion rules defined in earlier versions of these specifications demands that when an untyped atomic value (or untyped node) is supplied as the argument, it is converted to an xs:double value even if its lexical form is that (say) of an integer.

In all other cases, a dynamic error occurs.

In the case of an implementation that supports XSD 1.1, there is a constructor function associated with the built-in union type xs:error.

The function signature is as follows:

The semantics are equivalent to casting to the corresponding union type (see ).

Because xs:error has no member types, and therefore has an empty value space, casting will always fail with a dynamic error except in the case where the supplied argument is an empty sequence, in which case the result is also an empty sequence.

Constructor functions for user-defined atomic and union types

For every named user-defined simple type in the static context (See ), there is a constructor function whose name is the same as the name of the typeand whose effect is to create a value of that type from the supplied argument. The rules for constructing user-defined types are defined in the same way as the rules for constructing built-in derived types defined in .

For named atomic types, the rules are the same as the rules for constructing built-in derived atomic types defined in . For a named atomic type T, the signature of the function takes the form T($value as xs:anyAtomicType? := .) as T?, and the semantics are the same as casting to derived types: see ..

For named union types, the rules follow the same principles as the rules for constructing built-in union types defined in . For a named union type U, the signature of the function takes the form U($value as xs:anyAtomicType? := .) as U?, and the semantics are the same as casting to union types: see .

For named list types, the rules follow the same principles as the rules for constructing built-in list types defined in . For a named list type L, where the item type of L is I, the signature of the function takes the form L($value as xs:string? := .) as I*, and the semantics are the same as casting to list types: see .

Constructor functions are available both for named types defined in an imported schema (that is, named simple types in the in-scope schema types), and for types defined by means of . Specifically, named enumeration types follow the same rules as schema types derived by restricting xs:string, and named local union types follow the same rules as union types defined in a schema.

Special rules apply to constructor functions for namespace-sensitive types, that is, atomic types derived from xs:QName and xs:NOTATION, list types that have a namespace-sensitive item type, and union types that have a namespace-sensitive member type. See .

Using a Constructor Function for a User-Defined Atomic Type

Consider a situation where the static context contains an atomic type called hatSize defined in a schema whose target namespace is bound to the prefix eg. In such a case the following constructor function is available to users:

The resulting function may be used in an expression such as eg:hatSize("10½").

In the case of an atomic type A, the return type of the function is A?, reflecting the fact that the result will be an empty sequence if the input is an empty sequence. For a union or list type, the return type of the function is specified only as xs:anyAtomicType*. Implementations performing static type checking will often be able to compute a more specific result type. For example, if the target type is a list type whose item type is the atomic type A, the result will always be an instance of A*; if the target type is a pure union type U then the result will always be an instance of U?. In general, however, applications needing interoperable behavior on implementations that do strict static type checking will need to use a treat as expression to assert the specific type of the result.

To construct an instance of a user-defined type that is not in a namespace, it is possible to use an EQName (for example Q{}hatsize(17)). Alternatives are to use a cast expression (17 cast as hatsize) or (if the host language allows it) to undeclare the default function namespace.

Constructor functions for named record tests

For every named item type in the static context (See ) whose expansion is a record test, there is a constructor function whose name is the same as the name of the type, and whose parameters correspond to the fields defined in the record test.

For example, if there is a named item type with the XQuery definition:

declare item type my:location as record( latitude as xs:double, longitude as xs:double )

then there will be a function definition equivalent to:

declare function my:location ( $latitude as xs:double, $longitude as xs:double ) as my:location { { 'latitude': $latitude, 'longitude': $longitude } }

Equivalently using XSLT syntax, if there is a named item type with the XSLT definition:

]]>

then there will be a function definition equivalent to:

]]>

The rules defining the relationship of the function definition to the record test are as follows:

The name of the function is the same as the name of the named item type. A static error occurs if this clashes with the name and arity of other function definitions in the static context.

For every named field in the record test, in order, there is is one parameter defined as follows:

If the name of the field is an NCName, then the name of the parameter is the name of the field.

Otherwise, the name of the parameter is argNZ, where arg is the literal string "arg", N is the ordinal position of the field, counting from 1 (one), and Z is an implementation-defined suffix, added only when needed to make the parameter name unique.

The declared type of the parameter is the same as the declared type of the field, but if the field is declared optional, then the occurrence indicator is adjusted to ? or * if needed to make the empty sequence a valid value for the parameter.

If the field is optional and if all subsequent fields are optional, then the parameter is declared as optional with a default value of () (the empty sequence). In all other cases the parameter is declared as required.

It is immaterial whether the record test is extensible; the constructor function cannot be used to create entries in the resulting map other than entries corresponding to named fields.

The return type of the constructor function is the record test (with no occurrence indicator).

The body of the function constructs a map having one entry for each mandatory field in the record test, and one entry for each optional field in the record test for which an actual value other than the empty sequence is supplied in the arguments of the function call. The key of the entry is the field name as an instance of xs:string, and the corresponding value is the value supplied in the arguments to the constructor function call, after applying the coercion rules.

Record constructor with optional fields

Consider the record test (in XQuery syntax):

declare item type p:person as record( "1st-title"? as xs:string, "2nd-title"? as xs:string, first as xs:string, middle? as xs:string, last as xs:string, suffix? as xs:string, *)

This will result in an implicit function declaration equivalent to:

declare function p:person ( $arg1 as xs:string?, $arg2 as xs:string?, $first as xs:string, $middle as xs:string?, $last as xs:string, $suffix as xs:string? := () ) as p:person { map:merge(( { "1st-title", $arg1 }[exists($arg1)], { "2nd-title", $arg2 }[exists($arg2)], { "first", $first }, { "middle", $middle }[exists($middle)], { "last", $last }, { "suffix", $suffix }[exists($suffix)] )) };
Casting

Constructor functions and cast expressions accept an expression and return a value of a given type. They both convert a source value, SV, of a source type, ST, to a target value, TV, of the given target type, TT, with identical semantics and different syntax. The name of the constructor function is the same as the name of the built-in datatype or the datatype defined in of (see ) or the user-derived datatype (see ) that is the target for the conversion, and the semantics are exactly the same as for a cast expression; for example, xs:date("2003-01-01") means exactly the same as "2003-01-01" cast as xs:date?.

The cast expression takes a type name to indicate the target type of the conversion. See . If the type name allows the empty sequence and the expression to be cast is the empty sequence, the empty sequence is returned. If the type name does not allow the empty sequence and the expression to be cast is the empty sequence, a type error is raised .

Where the argument to a cast is a literal, the result of the function may be evaluated statically; if an error is encountered during such evaluation, it may be reported as a static error.

The general rules for casting from primitive types to primitive types are defined in , and subsections describe the rules for specific target types. The general rules for casting from xs:string (and xs:untypedAtomic) follow in . Casting to non-primitive types, including atomic types derived by resctriction, union types, and list types, is described in . Casting from derived types is defined in , and .

Throughout this section (), the term primitive type means either one of the 19 primitive types defined in , or one of the types xs:untypedAtomic, xs:integer, xs:yearMonthDuration and xs:dayTimeDuration; and where the text refers to types derived from a particular primitive type T, the reference is to types for which T is the nearest ancestor-or-self primitive type in the type hierarchy.

When casting from xs:string or xs:untypedAtomic the semantics in apply, regardless of target type.

Casting from primitive types to primitive types

This section defines casting between primitive types (specifically, the 19 primitive types defined in as well as xs:untypedAtomic, xs:integer and the two derived types of xs:duration: xs:yearMonthDuration and xs:dayTimeDuration which are treated as primitive types in this section. The type conversions that are supported between primitive atomic types are indicated in the table below; casts between other (non-primitive) types are defined in terms of these primitives.

In this table, there is a row for each primitive type acting as the source of the conversion and there is a column for each primitive type acting as the target of the conversion. The intersections of rows and columns contain one of three characters: Y indicates that a conversion from values of the type to which the row applies to the type to which the column applies is supported; N indicates that there are no supported conversions from values of the type to which the row applies to the type to which the column applies; and M indicates that a conversion from values of the type to which the row applies to the type to which the column applies may succeed for some values in the value space and fail for others.

defines xs:NOTATION as an abstract type. Thus, casting to xs:NOTATION from any other type including xs:NOTATION is not permitted and raises a static error . However, casting from one subtype of xs:NOTATION to another subtype of xs:NOTATION is permitted.

Casting is not supported to or from xs:anySimpleType. Thus, there is no row or column for this type in the table below. For any node that has not been validated or has been validated as xs:anySimpleType, the typed value of the node is an atomic value of type xs:untypedAtomic. There are no atomic values with the type annotation xs:anySimpleType at runtime. Casting to xs:anySimpleType is not permitted and raises a static error: .

Similarly, casting is not supported to or from xs:anyAtomicType and will raise a static error: . There are no atomic values with the type annotation xs:anyAtomicType at runtime, although this can be a statically inferred type.

If casting is attempted from an ST to a TT for which casting is not supported, as defined in the table below, a type error is raised .

In the following table, the columns and rows are identified by short codes that identify simple types as follows:

uA = xs:untypedAtomic aURI = xs:anyURI b64 = xs:base64Binary bool = xs:boolean dat = xs:date gDay = xs:gDay dbl = xs:double dec = xs:decimal dT = xs:dateTime dTD = xs:dayTimeDuration dur = xs:duration flt = xs:float hxB = xs:hexBinary gMD = xs:gMonthDay gMon = xs:gMonth int = xs:integer NOT = xs:NOTATION QN = xs:QName str = xs:string tim = xs:time gYM = xs:gYearMonth yMD = xs:yearMonthDuration gYr = xs:gYear

In the following table, the notation S\T indicates that the source (S) of the conversion is indicated in the column below the notation and that the target (T) is indicated in the row to the right of the notation.

S\T uA str flt dbl dec int dur yMD dTD dT tim dat gYM gYr gMD gDay gMon bool b64 hxB aURI QN NOT
uA Y Y M M M M M M M M M M M M M M M M M M M M M
str Y Y M M M M M M M M M M M M M M M M M M M M M
flt Y Y Y Y M M N N N N N N N N N N N Y N N N N N
dbl Y Y Y Y M M N N N N N N N N N N N Y N N N N N
dec Y Y Y Y Y Y N N N N N N N N N N N Y N N N N N
int Y Y Y Y Y Y N N N N N N N N N N N Y N N N N N
dur Y Y N N N N Y Y Y N N N N N N N N N N N N N N
yMD Y Y N N N N Y Y Y N N N N N N N N N N N N N N
dTD Y Y N N N N Y Y Y N N N N N N N N N N N N N N
dT Y Y N N N N N N N Y Y Y Y Y Y Y Y N N N N N N
tim Y Y N N N N N N N N Y N N N N N N N N N N N N
dat Y Y N N N N N N N Y N Y Y Y Y Y Y N N N N N N
gYM Y Y N N N N N N N N N N Y N N N N N N N N N N
gYr Y Y N N N N N N N N N N N Y N N N N N N N N N
gMD Y Y N N N N N N N N N N N N Y N N N N N N N N
gDay Y Y N N N N N N N N N N N N N Y N N N N N N N
gMon Y Y N N N N N N N N N N N N N N Y N N N N N N
bool Y Y Y Y Y Y N N N N N N N N N N N Y N N N N N
b64 Y Y N N N N N N N N N N N N N N N N Y Y N N N
hxB Y Y N N N N N N N N N N N N N N N N Y Y N N N
aURI Y Y N N N N N N N N N N N N N N N N N N Y N N
QN Y Y N N N N N N N N N N N N N N N N N N N Y M
NOT Y Y N N N N N N N N N N N N N N N N N N N Y M
Casting to xs:string and xs:untypedAtomic

Casting is permitted from any primitive type to the primitive types xs:string and xs:untypedAtomic.

When a value of any simple type is cast as xs:string, the derivation of the xs:string value TV depends on the ST and on the SV, as follows.

If ST is xs:string or a type derived from xs:string, TV is SV.

If ST is xs:anyURI, the type conversion is performed without escaping any characters.

If ST is xs:QName or xs:NOTATION:

if the qualified name has a prefix, then TV is the concatenation of the prefix of SV, a single colon (:), and the local name of SV.

otherwise TV is the local-name.

If ST is a numeric type, the following rules apply:

If ST is xs:integer, TV is the canonical lexical representation of SV as defined in . There is no decimal point.

If ST is xs:decimal, then:

If SV is in the value space of xs:integer, that is, if there are no significant digits after the decimal point, then the value is converted from an xs:decimal to an xs:integer and the resulting xs:integer is converted to an xs:string using the rule above.

Otherwise, the canonical lexical representation of SV is returned, as defined in .

If ST is xs:float or xs:double, then:

TV will be an xs:string in the lexical space of xs:double or xs:float that when converted to an xs:double or xs:float under the rules of produces a value that is equal to SV, or is NaN if SV is NaN. In addition, TV must satisfy the constraints in the following sub-bullets.

If SV has an absolute value that is greater than or equal to 0.000001 (one millionth) and less than 1000000 (one million), then the value is converted to an xs:decimal and the resulting xs:decimal is converted to an xs:string according to the rules above, as though using an implementation of xs:decimal that imposes no limits on the totalDigits or fractionDigits facets.

If SV has the value positive or negative zero, TV is "0" or "-0" respectively.

If SV is positive or negative infinity, TV is the string "INF" or "-INF" respectively.

In other cases, the result consists of a mantissa, which has the lexical form of an xs:decimal, followed by the letter "E", followed by an exponent which has the lexical form of an xs:integer. Leading zeroes and "+" signs are prohibited in the exponent. For the mantissa, there must be a decimal point, and there must be exactly one digit before the decimal point, which must be non-zero. The "+" sign is prohibited. There must be at least one digit after the decimal point. Apart from this mandatory digit, trailing zero digits are prohibited.

The above rules allow more than one representation of the same value. For example, the xs:float value whose exact decimal representation is 1.26743223E15 might be represented by any of the strings "1.26743223E15", "1.26743222E15" or "1.26743224E15" (inter alia). It is implementation-dependent which of these representations is chosen.

If ST is xs:dateTime, xs:date or xs:time, TV is the local value. The components of TV are individually cast to xs:string using the functions described in and the results are concatenated together. The year component is cast to xs:string using eg:convertYearToString. The month, day, hour and minute components are cast to xs:string using eg:convertTo2CharString. The second component is cast to xs:string using eg:convertSecondsToString. The timezone component, if present, is cast to xs:string using eg:convertTZtoString.

Note that the hours component of the resulting string will never be "24". Midnight is always represented as "00:00:00".

If ST is xs:yearMonthDuration or xs:dayTimeDuration, TV is the canonical representation of SV as defined in .

If ST is xs:duration then let SYM be SV cast as xs:yearMonthDuration, and let SDT be SV cast as xs:dayTimeDuration; Now, let the next intermediate value, TYM, be SYM cast as TT , and let TDT be SDT cast as TT . If TYM is "P0M", then TV is TDT. Otherwise, TYM and TDT are merged according to the following rules:

If TDT is "PT0S", then TV is TYM.

Otherwise, TV is the concatenation of all the characters in TYM and all the characters except the first "P" and the optional negative sign in TDT.

In all other cases, TV is the canonical representation of SV. For datatypes that do not have a canonical lexical representation defined an canonical representation may be used.

To cast as xs:untypedAtomic the value is cast as xs:string, as described above, and the type annotation changed to xs:untypedAtomic.

The string representations of numeric values are backwards compatible with XPath 1.0 except for the special values positive and negative infinity, negative zero and values outside the range 1.0e-6 to 1.0e+6.

Casting to numeric types Casting to xs:float

When a value of any simple type is cast as xs:float, the xs:float TV is derived from the ST and the SV as follows:

If ST is xs:float, then TV is SV and the conversion is complete.

If ST is xs:double, then TV is obtained as follows:

if SV is the xs:double value INF, -INF, NaN, positive zero, or negative zero, then TV is the xs:float value INF, -INF, NaN, positive zero, or negative zero respectively.

otherwise, SV can be expressed in the form m × 2^e where the mantissa m and exponent e are signed xs:integers whose value range is defined in , and the following rules apply:

if m (the mantissa of SV) is outside the permitted range for the mantissa of an xs:float value (-2^24-1 to +2^24-1), then it is divided by 2^N where N is the lowest positive xs:integer that brings the result of the division within the permitted range, and the exponent e is increased by N. This is integer division (in effect, the binary value of the mantissa is truncated on the right). Let M be the mantissa and E the exponent after this adjustment.

if E exceeds 104 (the maximum exponent value in the value space of xs:float) then TV is the xs:float value INF or -INF depending on the sign of M.

if E is less than -149 (the minimum exponent value in the value space of xs:float) then TV is the xs:float value positive or negative zero depending on the sign of M

otherwise, TV is the xs:float value M × 2^E.

If ST is xs:decimal, or xs:integer, then TV is xs:float( SV cast as xs:string) and the conversion is complete.

If ST is xs:boolean, SV is converted to 1.0E0 if SV is true and to 0.0E0 if SV is false and the conversion is complete.

If ST is xs:untypedAtomic or xs:string, see .

XSD 1.1 adds the value +INF to the lexical space, as an alternative to INF. XSD 1.1 also adds negative zero to the value space.

Implementations should return negative zero for xs:float("-0.0E0"). But because does not distinguish between the values positive zero and negative zero. Implementations may return positive zero in this case.

Casting to xs:double

When a value of any simple type is cast as xs:double, the xs:double value TV is derived from the ST and the SV as follows:

If ST is xs:double, then TV is SV and the conversion is complete.

If ST is xs:float or a type derived from xs:float, then TV is obtained as follows:

if SV is the xs:float value INF, -INF, NaN, positive zero, or negative zero, then TV is the xs:double value INF, -INF, NaN, positive zero, or negative zero respectively.

otherwise, SV can be expressed in the form m × 2^e where the mantissa m and exponent e are signed xs:integer values whose value range is defined in , and TV is the xs:double value m × 2^e.

If ST is xs:decimal or xs:integer, then TV is xs:double( SV cast as xs:string) and the conversion is complete.

If ST is xs:boolean, SV is converted to 1.0E0 if SV is true and to 0.0E0 if SV is false and the conversion is complete.

If ST is xs:untypedAtomic or xs:string, see .

XSD 1.1 adds the value +INF to the lexical space, as an alternative to INF. XSD 1.1 also adds negative zero to the value space.

Implementations should return negative zero for xs:double("-0.0E0"). But because does not distinguish between the values positive zero and negative zero. Implementations may return positive zero in this case.

Casting to xs:decimal

When a value of any simple type is cast as xs:decimal, the xs:decimal value TV is derived from ST and SV as follows:

If ST is xs:decimal, xs:integer or a type derived from them, then TV is SV, converted to an xs:decimal value if need be, and the conversion is complete.

If ST is xs:float or xs:double, then TV is the xs:decimal value, within the set of xs:decimal values that the implementation is capable of representing, that is numerically closest to SV. If two values are equally close, then the one that is closest to zero is chosen. If SV is too large to be accommodated as an xs:decimal, (see for limits on numeric values) a dynamic error is raised . If SV is one of the special xs:float or xs:double values NaN, INF, or -INF, a dynamic error is raised .

If ST is xs:boolean, SV is converted to 1.0 if SV is 1 or true and to 0.0 if SV is 0 or false and the conversion is complete.

If ST is xs:untypedAtomic or xs:string, see .

Casting to xs:integer

When a value of any simple type is cast as xs:integer, the xs:integer value TV is derived from ST and SV as follows:

If ST is xs:integer, or a type derived from xs:integer, then TV is SV, converted to an xs:integer value if need be, and the conversion is complete.

If ST is xs:decimal, xs:float or xs:double, then TV is SV with the fractional part discarded and the value converted to xs:integer. Thus, casting 3.1456 returns 3 and -17.89 returns -17. Casting 3.124E1 returns 31. If SV is too large to be accommodated as an integer, (see for limits on numeric values) a dynamic error is raised . If SV is one of the special xs:float or xs:double values NaN, INF, or -INF, a dynamic error is raised .

If ST is xs:boolean, SV is converted to 1 if SV is 1 or true and to 0 if SV is 0 or false and the conversion is complete.

If ST is xs:untypedAtomic or xs:string, see .

Casting to duration types

When a value of type xs:untypedAtomic, xs:string, a type derived from xs:string, xs:yearMonthDuration or xs:dayTimeDuration is cast as xs:duration, xs:yearMonthDuration or xs:dayTimeDuration, TV is derived from ST and SV as follows:

If ST is the same as TT, then TV is SV.

If ST is xs:duration, or a type derived from xs:duration, but not xs:dayTimeDuration or a type derived from xs:dayTimeDuration, and TT is xs:yearMonthDuration, then TV is derived from SV by removing the day, hour, minute and second components from SV.

If ST is xs:duration, or a type derived from duration, but not xs:yearMonthDuration or a type derived from xs:yearMonthDuration, and TT is xs:dayTimeDuration, then TV is derived from SV by removing the year and month components from SV.

If ST is xs:yearMonthDuration or xs:dayTimeDuration, and TT is xs:duration, then TV is derived from SV as defined in .

If ST is xs:yearMonthDuration and TT is xs:dayTimeDuration, the cast is permitted and returns a xs:dayTimeDuration with value 0.0 seconds.

If ST is xs:dayTimeDuration and TT is xs:yearMonthDuration, the cast is permitted and returns a xs:yearMonthDuration with value 0 months.

If ST is xs:untypedAtomic or xs:string, see .

Note that casting from xs:duration to xs:yearMonthDuration or xs:dayTimeDuration loses information. To avoid this, users can cast the xs:duration value to both an xs:yearMonthDuration and an xs:dayTimeDuration and work with both values.

Casting to date and time types

In several situations, casting to date and time types requires the extraction of a component from SV or from the result of fn:current-dateTime and converting it to an xs:string. These conversions must follow certain rules. For example, converting an xs:integer year value requires converting to an xs:string with four or more characters, preceded by a minus sign if the value is negative.

This document defines four functions to perform these conversions. These functions are for illustrative purposes only and make no recommendations as to style or efficiency. References to these functions from the following text are not normative.

The arguments to these functions come from functions defined in this document. Thus, the functions below assume that they are correct and do no range checking on them.

= 0) then "" else "-" let $yearString := abs($year) cast as xs:string let $length := string-length($yearString) return if ($length = 1) then concat($plusMinus, "000", $yearString) else if ($length = 2) then concat($plusMinus, "00", $yearString) else if ($length = 3) then concat($plusMinus, "0", $yearString) else concat($plusMinus, $yearString) };]]> = 0) then "+" else "-" let $tzhString := eg:convertTo2CharString(abs($tzh)) let $tzmString := eg:convertTo2CharString(abs($tzm)) return concat($plusMinus, $tzhString, ":", $tzmString) };]]>

Conversion from primitive types to date and time types follows the rules below.

When a value of any primitive type is cast as xs:dateTime, the xs:dateTime value TV is derived from ST and SV as follows:

If ST is xs:dateTime, then TV is SV.

If ST is xs:date, then let SYR be eg:convertYearToString( year-from-date( SV )), let SMO be eg:convertTo2CharString( month-from-date( SV )), let SDA be eg:convertTo2CharString( day-from-date( SV )) and let STZ be eg:convertTZtoString( timezone-from-date( SV )); TV is xs:dateTime( concat( SYR , '-', SMO , '-', SDA , 'T00:00:00 ', STZ ) ).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:time, the xs:time value TV is derived from ST and SV as follows:

If ST is xs:time, then TV is SV.

If ST is xs:dateTime, then TV is xs:time( concat( eg:convertTo2CharString( hours-from-dateTime( SV )), ':', eg:convertTo2CharString( minutes-from-dateTime( SV )), ':', eg:convertSecondsToString( seconds-from-dateTime( SV )), eg:convertTZtoString( timezone-from-dateTime( SV )) )).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:date, the xs:date value TV is derived from ST and SV as follows:

If ST is xs:date, then TV is SV.

If ST is xs:dateTime, then let SYR be eg:convertYearToString( year-from-dateTime( SV )), let SMO be eg:convertTo2CharString( month-from-dateTime( SV )), let SDA be eg:convertTo2CharString( day-from-dateTime( SV )) and let STZ be eg:convertTZtoString(timezone-from-dateTime( SV )); TV is xs:date( concat( SYR , '-', SMO , '-', SDA, STZ ) ).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:gYearMonth, the xs:gYearMonth value TV is derived from ST and SV as follows:

If ST is xs:gYearMonth, then TV is SV.

If ST is xs:dateTime, then let SYR be eg:convertYearToString( year-from-dateTime( SV )), let SMO be eg:convertTo2CharString( month-from-dateTime( SV )) and let STZ be eg:convertTZtoString( timezone-from-dateTime( SV )); TV is xs:gYearMonth( concat( SYR , '-', SMO, STZ ) ).

If ST is xs:date, then let SYR be eg:convertYearToString( year-from-date( SV )), let SMO be eg:convertTo2CharString( month-from-date( SV )) and let STZ be eg:convertTZtoString( timezone-from-date( SV )); TV is xs:gYearMonth( concat( SYR , '-', SMO, STZ ) ).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:gYear, the xs:gYear value TV is derived from ST and SV as follows:

If ST is xs:gYear, then TV is SV.

If ST is xs:dateTime, let SYR be eg:convertYearToString( year-from-dateTime( SV )) and let STZ be eg:convertTZtoString( timezone-from-dateTime( SV )); TV is xs:gYear(concat( SYR, STZ )).

If ST is xs:date, let SYR be eg:convertYearToString( year-from-date( SV )); and let STZ be eg:convertTZtoString( timezone-from-date( SV )); TV is xs:gYear(concat( SYR, STZ )).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:gMonthDay, the xs:gMonthDay value TV is derived from ST and SV as follows:

If ST is xs:gMonthDay, then TV is SV.

If ST is xs:dateTime, then let SMO be eg:convertTo2CharString( month-from-dateTime( SV )), let SDA be eg:convertTo2CharString( day-from-dateTime( SV )) and let STZ be eg:convertTZtoString( timezone-from-dateTime( SV )); TV is xs:gYearMonth( concat( '--', SMO '-', SDA, STZ ) ).

If ST is xs:date, then let SMO be eg:convertTo2CharString( month-from-date( SV )), let SDA be eg:convertTo2CharString( day-from-date( SV )) and let STZ be eg:convertTZtoString( timezone-from-date( SV )); TV is xs:gYearMonth( concat( '--', SMO , '-', SDA, STZ ) ).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:gDay, the xs:gDay value TV is derived from ST and SV as follows:

If ST is xs:gDay, then TV is SV.

If ST is xs:dateTime, then let SDA be eg:convertTo2CharString( day-from-dateTime( SV )) and let STZ be eg:convertTZtoString( timezone-from-dateTime( SV )); TV is xs:gDay( concat( '---', SDA, STZ )).

If ST is xs:date, then let SDA be eg:convertTo2CharString( day-from-date( SV )) and let STZ be eg:convertTZtoString( timezone-from-date( SV )); TV is xs:gDay( concat( '---', SDA, STZ )).

If ST is xs:untypedAtomic or xs:string, see .

When a value of any primitive type is cast as xs:gMonth, the xs:gMonth value TV is derived from ST and SV as follows:

If ST is xs:gMonth, then TV is SV.

If ST is xs:dateTime, then let SMO be eg:convertTo2CharString( month-from-dateTime( SV )) and let STZ be eg:convertTZtoString( timezone-from-dateTime( SV )); TV is xs:gMonth( concat( '--' , SMO, STZ )).

If ST is xs:date, then let SMO be eg:convertTo2CharString( month-from-date( SV )) and let STZ be eg:convertTZtoString( timezone-from-date( SV )); TV is xs:gMonth( concat( '--', SMO, STZ )).

If ST is xs:untypedAtomic or xs:string, see .

Casting to xs:boolean

When a value of any primitive type is cast as xs:boolean, the xs:boolean value TV is derived from ST and SV as follows:

If ST is xs:boolean, then TV is SV.

If ST is xs:float, xs:double, xs:decimal or xs:integer and SV is 0, +0, -0, 0.0, 0.0E0 or NaN, then TV is false.

If ST is xs:float, xs:double, xs:decimal or xs:integer and SV is not one of the above values, then TV is true.

If ST is xs:untypedAtomic or xs:string, see .

Casting to xs:base64Binary and xs:hexBinary

Values of type xs:base64Binary can be cast as xs:hexBinary and vice versa, since the two types have the same value space. Casting to xs:base64Binary and xs:hexBinary is also supported from the same type and from xs:untypedAtomic, xs:string and subtypes of xs:string using semantics.

Casting to xs:anyURI

Casting to xs:anyURI is supported only from the same type, xs:untypedAtomic or xs:string.

When a value of any primitive type is cast as xs:anyURI, the xs:anyURI value TV is derived from the ST and SV as follows:

If ST is xs:untypedAtomic or xs:string see .

Casting to xs:QName and xs:NOTATION

Casting from xs:string or xs:untypedAtomic to xs:QName or xs:NOTATION is described in .

It is also possible to cast from xs:NOTATION to xs:QName, or from xs:QName to any type derived by restriction from xs:NOTATION. (Casting to xs:NOTATION itself is not allowed, because xs:NOTATION is an abstract type.) The resulting xs:QName or xs:NOTATION has the same prefix, local name, and namespace URI parts as the supplied value.

See for a discussion of how the combination of atomization and casting might not produce the desired effect.

Casting to xs:ENTITY

says that The value space of ENTITY is the set of all strings that match the NCName production ... and have been declared as an unparsed entity in a document type definition. However, and do not check that constructed values of type xs:ENTITY match declared unparsed entities. Thus, this rule is relaxed in this specification and, in casting to xs:ENTITY and types derived from it, no check is made that the values correspond to declared unparsed entities.

Casting from xs:string and xs:untypedAtomic

This section applies when the supplied value SV is an instance of xs:string or xs:untypedAtomic, including types derived from these by restriction. If the value is xs:untypedAtomic, it is treated in exactly the same way as a string containing the same sequence of characters.

The supplied string is mapped to a typed value of the target type as defined in . Whitespace normalization is applied as indicated by the whiteSpace facet for the datatype. The resulting whitespace-normalized string must be a valid lexical form for the datatype. The semantics of casting follow the rules of XML Schema validation. For example, "13" cast as xs:unsignedInt returns the xs:unsignedInt typed value 13. This could also be written xs:unsignedInt("13").

The target type can be any simple type other than an abstract type. Specifically, it can be a type whose variety is atomic, union, or list. In each case the effect of casting to the target type is the same as constructing an element with the supplied value as its content, validating the element using the target type as the governing type, and atomizing the element to obtain its typed value.

When the target type is a derived type that is restricted by a pattern facet, the lexical form is first checked against the pattern before further casting is attempted (See ). If the lexical form does not conform to the pattern, a dynamic error is raised.

For example, consider a user-defined type my:boolean which is derived by restriction from xs:boolean and specifies the pattern facet value="0|1". The expression "true" cast as my:boolean would fail with a dynamic error .

Facets other than pattern are checked after the conversion. For example if there is a user-defined datatype called my:height defined as a restriction of xs:integer with the facet <maxInclusive value="84"/>, then the expression "100" cast as my:height would fail with a dynamic error .

Casting to the types xs:NOTATION, xs:anySimpleType, or xs:anyAtomicType is not permitted because these types are abstract (they have no immediate instances).

Special rules apply when casting to namespace-sensitive types. The types xs:QName and xs:NOTATION are namespace-sensitive. Any type derived by restriction from a namespace-sensitive type is itself namespace-sensitive, as is any union type having a namespace-sensitive type among its members, and any list type having a namespace-sensitive type as its item type. For details, see .

This version of the specification allows casting between xs:QName and xs:NOTATION in either direction; this was not permitted in the previous Recommendation. This version also removes the rule that only a string literal (rather than a dynamic string) may be cast to an xs:QName

When casting to a numeric type:

If the value is too large or too small to be accurately represented by the implementation, it is handled as an overflow or underflow as defined in .

If the target type is xs:float or xs:double, the string -0 (and equivalents such as -0.0 or -000) should be converted to the value negative zero. However, if the implementation is reliant on an implementation of XML Schema 1.0 in which negative zero is not part of the value space for these types, these lexical forms may be converted to positive zero.

In casting to xs:decimal or to a type derived from xs:decimal, if the value is not too large or too small but nevertheless cannot be represented accurately with the number of decimal digits available to the implementation, the implementation may round to the nearest representable value or may raise a dynamic error . The choice of rounding algorithm and the choice between rounding and error behavior is .

When casting to xs:duration, xs:dateTime, or xs:time, if the seconds component has more fractional digits than are supported by the implementation, excess digits must be truncated. This rule ensures that components other than the seconds component are unaffected: for example xs:dateTime('2023-12-31T23:59:59.999999999') is guaranteed to deliver an xs:dateTime value whose year component is 2023 rather than 2024.

Implementations are required to support millisecond precision or greater.

In casting to xs:date, xs:dateTime, xs:gYear, or xs:gYearMonth (or types derived from these), if the value is too large or too small to be represented by the implementation, a dynamic error is raised.

In casting to a duration value, if the value is too large or too small to be represented by the implementation, a dynamic error is raised.

For xs:anyURI, the extent to which an implementation validates the lexical form of xs:anyURI is .

If the cast fails for any other reason, a dynamic error is raised.

Casting involving non-primitive types

Casting from xs:string and xs:untypedAtomic to any other type (primitive or non-primitive) has been described in . This section defines how other casts to non-primitive types operate, including casting to types derived by restriction, to union types, and to list types.

A non-primitive type here means any type that is not a primitive type according to the extended definition used in .

Casting to derived types

Casting a value to a derived type can be separated into four cases. In these rules:

The types xs:untypedAtomic, xs:integer, xs:yearMonthDuration, and xs:dayTimeDuration are treated as primitive types (alongside the 19 primitive types defined in XSD).

For any atomic type T, let P(T) denote the most specific primitive type such that itemType-subtype(T, P(T)) is true.

The rules are then:

When ST is the same type as TT: this case always succeeds, returning SV unchanged.

When itemType-subtype(ST, TT) is true: This case is described in .

When P(ST) is the same type as P(TT): This case is described in .

Otherwise (P(ST) is not the same type as P(TT)): This case is described in .

Casting from derived types to parent types

It is always possible to cast an atomic value A to a type T if the relation A instance of T is true, provided that T is not an abstract type.

For example, it is possible to cast an xs:unsignedShort to an xs:unsignedInt, to an xs:integer, to an xs:decimal, or to a union type whose member types are xs:integer and xs:double.

Since the value space of the original type is a subset of the value space of the target type, such a cast is always successful.

For the expression A instance of T to be true, T must be either an atomic type, or a union type that has no constraining facets. It cannot be a list type, nor a union type derived by restriction from another union type, nor a union type that has a list type among its member types.

The result will have the same value as the original, but will have a new type annotation:

If T is an atomic type, then the type annotation of the result is T.

If T is a union type, then the type of the result is an atomic type M such that M is one of the atomic types in the transitive membership of the union type T and A instance of M is true; if there is more than one type M that satisfies these conditions (which could happen, for example, if T is the union of two overlapping types such as xs:int and xs:positiveInteger) then the first one is used, taking the member types in the order in which they appear within the definition of the union type.

Casting within a branch of the type hierarchy

It is possible to cast an SV to a TT if the type of the SV and the TT type are both derived by restriction (directly or indirectly) from the same primitive type, provided that the supplied value conforms to the constraints implied by the facets of the target type. This includes the case where the target type is derived from the type of the supplied value, as well as the case where the type of the supplied value is derived from the target type. For example, an instance of xs:byte can be cast as xs:unsignedShort, provided the value is not negative.

If the value does not conform to the facets defined for the target type, then a dynamic error is raised . See . In the case of the pattern facet (which applies to the lexical space rather than the value space), the pattern is tested against the canonical lexical representation of the value, as defined for the source type (or the result of casting the value to an xs:string, in the case of types that have no canonical lexical representation defined for them).

Note that this will cause casts to fail if the pattern excludes the canonical lexical representation of the source type. For example, if the type my:distance is defined as a restriction of xs:decimal with a pattern that requires two digits after the decimal point, casting of an xs:integer to my:distance will always fail, because the canonical representation of an xs:integer does not conform to this pattern.

In some cases, casting from a parent type to a derived type requires special rules. See for rules regarding casting to xs:yearMonthDuration and xs:dayTimeDuration. See , below, for casting to xs:ENTITY and types derived from it.

Casting across the type hierarchy

When the ST and the TT are derived, directly or indirectly, from different primitive types, this is called casting across the type hierarchy. Casting across the type hierarchy is logically equivalent to three separate steps performed in order. Errors can occur in either of the latter two steps.

Cast the SV, up the hierarchy, to the primitive type of the source, as described in .

If SV is an instance of xs:string or xs:untypedAtomic, check its value against the pattern facet of TT, and raise a dynamic error if the check fails.

Cast the value to the primitive type of TT, as described in .

If TT is derived from xs:NOTATION, assume for the purposes of this rule that casting to xs:NOTATION succeeds.

Cast the value down to the TT, as described in

Casting to union types

If the target type of a cast expression (or a constructor function) is a type with variety union, the supplied value must be one of the following:

A value of type xs:string or xs:untypedAtomic. This case follows the general rules for casting from strings, and has already been described in .

If the union type has a pattern facet, the pattern is tested against the supplied value after whitespace normalization, using the whiteSpace normalization rules of the member datatype against which validation succeeds.

A value that is an instance of one of the atomic types in the transitive membership of the union type, and of the union type itself. This case has already been described in

This situation only applies when the value is an instance of the union type, which means it will never apply when the union is derived by facet-based restriction from another union type.

A value that is castable to one or more of the atomic types in the transitive membership of the union type (in the sense that the castable as operator returns true).

In this case the supplied value is cast to each atomic type in the transitive membership of the union type in turn (in the order in which the member types appear in the declaration) until one of these casts is successful; if none of them is successful, a dynamic error occurs . If the union type has constraining facets then the resulting value must satisfy these facets, otherwise a dynamic error occurs .

If the union type has a pattern facet, the pattern is tested against the canonical representation of the result value.

Only the atomic types in the transitive membership of the union type are considered. The union type may have list types in its transitive membership, but (unless the supplied value is of type xs:string or xs:untypedAtomic, in which case the rules in apply), any list types in the membership are effectively ignored.

If more than one of these conditions applies, then the casting is done according to the rules for the first condition that applies.

If none of these conditions applies, the cast fails with a dynamic error .

Example: consider a type U whose member types are xs:integer and xs:date.

The expression "123" cast as U returns the xs:integer value 123.

The expression current-date() cast as U returns the current date as an instance of xs:date.

The expression 23.1 cast as U returns the xs:integer value 23.

Example: consider a type V whose member types are xs:short and xs:negativeInteger.

The expression "-123" cast as V returns the xs:short value -123.

The expression "-100000" cast as V returns the xs:negativeInteger value -100000.

The expression 93.7 cast as V returns the xs:short value 93.

The expression "93.7" cast as V raises a dynamic error on the grounds that the string "93.7" is not in the lexical space of the union type.

Example: consider a type W that is derived from the above type V by restriction, with a pattern facet of -?\d\d.

The expression "12" cast as V returns the xs:short value 12.

The expression "123" cast as V raises an dynamic error on the grounds that the string "123" does not match the pattern facet.

Casting to list types

If the target type of a cast expression (or a constructor function) is a type with variety list, the supplied value must be of type xs:string or xs:untypedAtomic. The rules follow the general principle for all casts from xs:string outlined in .

If the supplied value is not of type xs:string or xs:untypedAtomic, a type error is raised .

The semantics of the operation are consistent with validation: that is, the effect of casting a string S to a list type L is the same as constructing an element or attribute node whose string value is S, validating it using L as the governing type, and atomizing the resulting node. The result will always be either failure, or a sequence of zero or more atomic values each of which is an instance of the item type of L (or if the item type of L is a union type, an instance of one of the atomic types in its transitive membership).

If the item type of the list type is namespace-sensitive, then the namespace bindings in the static context will be used to resolve any namespace prefix, in the same way as when the target type is xs:QName.

If the list type has a pattern facet, the pattern must match the supplied value after collapsing whitespace (an operation equivalent to the use of the fn:normalize-space function).

For example, the expression cast "A B C D" as xs:NMTOKENS produces a sequence of four xs:NMTOKEN values, ("A", "B", "C", "D").

For example, given a user-defined type my:coordinates defined as a list of xs:integer with the facet <xs:length value="2"/>, the expression my:coordinates("2 -1") will return a sequence of two xs:integer values (2, -1), while the expression my:coordinates("1 2 3") will result in a dynamic error because the length of the list does not conform to the length facet. The expression my:coordinates("1.0 3.0") will also fail because the strings 1.0 and 3.0 are not in the lexical space of xs:integer.

References Normative references Character Model for the World Wide Web 1.0: Fundamentals, Martin J. Dürst, François Yergeau, et. al., Editors. World Wide Web Consortium, 15 February 2015. This version is http://www.w3.org/TR/2005/REC-charmod-20050215/. The latest version is available at https://www.w3.org/TR/charmod/. HTML: Living Standard. WHATWG, 18 November 2022. DOM: Living Standard. WHATWG, 26 October 2022. The tz timezone database, available at http://www.iana.org/time-zones. It is which version of the database is used. IEEE. IEEE Standard for Floating-Point Arithmetic. IEEE. IEEE Ethernet Standard. ISO (International Organization for Standardization) Codes for the representation of names of countries and their subdivisions - Part 1: Country codes ISO 3166-1:2013. ISO (International Organization for Standardization). Representations of dates and times. Third edition, 2004-12-01. ISO 8601:2004(E). Available from: http://www.iso.org/". ISO (International Organization for Standardization). ISO/IEC 10967-1:2012, Information technology—Language Independent Arithmetic—Part 1: Integer and floating point arithmetic [Geneva]: International Organization for Standardization, 2012. Available from: http://www.iso.org/. ISO (International Organization for Standardization) Information and documentation — Codes for the representation of names of scripts ISO 15924:2004, January 2004. Unicode Consortium. Codes for the representation of names of scripts — Alphabetical list of four-letter script codes. See . Retrieved February 2013; continually updated. Legacy extended IRIs for XML resource identification. Henry S. Thomson, Richard Tobin, and Norman Walsh (eds), World Wide Web Consortium. 3 November 2008. Available at http://www.w3.org/TR/leiri/. IETF. RFC 1321: The MD5 Message-Digest Algorithm. Available at: http://www.ietf.org/rfc/rfc1321.txt. IETF. RFC 2376: XML Media Types. Available at: http://www.ietf.org/rfc/rfc2376.txt. IETF. RFC 3986: Uniform Resource Identifiers (URI): Generic Syntax. Available at: http://www.ietf.org/rfc/rfc3986.txt. IETF. RFC 3987: Internationalized Resource Identifiers (IRIs). Available at: http://www.ietf.org/rfc/rfc3987.txt. IETF. RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files. Available at: http://www.ietf.org/rfc/rfc4180.txt. IETF. RFC 6151: Updated Security Considerations for the MD5 Message-Digest and the HMAC-MD5 Algorithms Available at: http://www.ietf.org/rfc/rfc6151.txt. IETF. RFC 7159: The Javascript Object Notation (JSON) Data Interchange Format Available at: http://www.rfc-editor.org/rfc/rfc7159.txt. H. Thompson and C. Lilley. XML Media Types. IETF RFC 7303. See http://www.ietf.org/rfc/rfc7303.txt. National Institute of Standards and Technology. Secure Hash Standard (SHS). FIPS PUB 180-4. August 2015. See http://dx.doi.org/10.6028/NIST.FIPS.180-4. Unicode Standard Annex #15: Unicode Normalization Forms. Ed. Mark Davis and Ken Whistler, Unicode Consortium. The current version is 9.0.0, dated 2016-02-24. As with , the version to be used is . Available at: http://www.unicode.org/reports/tr15/. Unicode Standard Annex #29: Unicode Text Segmentation. Ed. Josh Hadley, Unicode Consortium. The current version is 15.1.0, dated 2023-08-16. As with , the version to be used is . Available at: http://www.unicode.org/reports/tr29/. The Unicode Consortium, Reading, MA, Addison-Wesley, 2016. The Unicode Standard as updated from time to time by the publication of new versions. See http://www.unicode.org/standard/versions/ for the latest version and additional information on versions of the standard and of the Unicode Character Database. The version of Unicode to be used is , but implementations are recommended to use the latest Unicode version; currently, Version 9.0.0. Unicode Technical Standard #10: Unicode Collation Algorithm. Ed. Mark Davis and Ken Whistler, Unicode Consortium. The current version is 9.0.0, dated 2016-05-18. As with , the version to be used is . Available at: . Unicode Technical Standard #35: Unicode Locale Data Markup Language. Ed Mark Davis et al, Unicode Consortium. The current version is 29, dated 2016-03-15. As with , the version to be used is . Available at: . CITATION: T.B.D. CITATION: T.B.D. CITATION: T.B.D. XML Schema Part 2: Datatypes Second Edition, Oct. 28 2004. Available at: http://www.w3.org/TR/xmlschema-2/ Invisible XML Specification, Steven Pemberton, editor. World Wide Web Consortium, 20 June 2020. This version is https://invisiblexml.org/1.0/. The latest version is available at https://invisiblexml.org/current/. Non-normative references Edward M. Reingold and Nachum Dershowitz. Calendrical Calculations Millennium edition (2nd Edition). Cambridge University Press, ISBN 0 521 77752 6. CLDR - Unicode Common Locale Data Repository. Available at: http://cldr.unicode.org. Character Model for the World Wide Web 1.0: Normalization, Last Call Working Draft. Available at: http://www.w3.org/TR/2004/WD-charmod-norm-20040225/. EXPath: Collaboratively Defining Open Standards for Portable XPath Extensions. http://expath.org/. EXQuery: Collaboratively Defining Open Standards for Portable XQuery Extensions. http://exquery.org/. EXSLT: A Community Initiative to Provide Extensions to XSLT. https://exslt.github.io. FunctX Functions. http://www.functx.com/. HTML 4.01 Recommendation, 24 December 1999. Available at: http://www.w3.org/TR/REC-html40/. ICU - International Components for Unicode. Available at http://site.icu-project.org. The Open Group Base Specifications Issue 7 (IEEE Std 1003.1-2008). Available at: http://pubs.opengroup.org/onlinepubs/9699919799/. IETF. RFC 822: Standard for the Format of ARPA Internet Text Messages. Available at: http://www.ietf.org/rfc/rfc822.txt. IETF. RFC 850: Standard for Interchange of USENET Messages. Available at: http://www.ietf.org/rfc/rfc850.txt. IETF. RFC 1036: Standard for Interchange of USENET Messages. Available at: http://www.ietf.org/rfc/rfc1036.txt. IETF. RFC 1123: Requirements for Internet Hosts -- Application and Support. Available at: http://www.ietf.org/rfc/rfc1123.txt. IETF. RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1. Available at: http://www.ietf.org/rfc/rfc2616.txt. IETF. RFC 3339: Date and Time on the Internet: Timestamps. Available at: http://www.ietf.org/rfc/rfc3339.txt. Unicode Technical Standard #18: Unicode Regular Expressions. Ed. Mark Davis and Andy Heniger, Unicode Consortium. The current version is 17, dated 2013-11-19. Available at: http://www.unicode.org/reports/tr18/. World Wide Web Consortium Working Group Note. Working With Timezones, October 13, 2005. Available at: http://www.w3.org/TR/2005/NOTE-timezone-20051013/. Error summary

The error text provided with these errors is non-normative.

Error code used by fn:error when no other error code is provided.

Raised when fn:apply is called and the arity of the supplied function is not the same as the number of members in the supplied array.

This error is raised whenever an attempt is made to divide by zero.

This error is raised whenever numeric operations result in an overflow or underflow.

This error is raised when an integer used to select a member of an array is outside the range of values for that array.

This error is raised when the $length argument to array:subarray is negative.

Raised when casting to xs:decimal if the supplied value exceeds the implementation-defined limits for the datatype.

Raised by fn:resolve-QName and fn:QName when a supplied value does not have the lexical form of a QName or URI respectively; and when casting to decimal, if the supplied value is NaN or Infinity.

Raised when casting to xs:integer if the supplied value exceeds the implementation-defined limits for the datatype.

Raised when multiplying or dividing a duration by a number, if the number supplied is NaN.

Raised when casting a string to xs:decimal if the string has more digits of precision than the implementation can represent (the implementation also has the option of rounding).

Raised by fn:codepoints-to-string if the input contains an integer that is not the codepoint of a permitted character.

Raised by any function that uses a collation if the requested collation is not recognized.

Raised by fn:normalize-unicode if the requested normalization form is not supported by the implementation.

Raised by functions such as fn:contains if the requested collation does not operate on a character-by-character basis.

Raised by fn:char if the supplied character name is not recognized, or if it represents a codepoint that is not a permitted character.

Raised when parsing CSV input if a syntax error in the input CSV is found.

Raised when parsing CSV input if the field-separator, record-separator, or quote-character option is set to an invalid value.

Raised when parsing CSV input if the same delimiter character is assigned to more than one role.

Raised by the function from the get entry of csv-columns-record, if its $key argument is an xs:string and is not one of the known column names.

Raised by fn:id, fn:idref, and fn:element-with-id if the node that identifies the tree to be searched is a node in a tree whose root is not a document node.

Raised by fn:doc, fn:collection, and fn:uri-collection to indicate that either the supplied URI cannot be dereferenced to obtain a resource, or the resource that is returned is not parseable as XML.

Raised by fn:doc, fn:collection, and fn:uri-collection to indicate that it is not possible to return a result that is guaranteed deterministic.

Raised by fn:collection and fn:uri-collection if the argument is not a valid xs:anyURI.

Raised (optionally) by fn:doc if the argument is not a valid xs:anyURI.

Raised by fn:parse-xml if the supplied string is not a well-formed and namespace-well-formed XML document; or if DTD validation is requested and the document is not valid against its DTD.

Raised when fn:serialize is called and the processor does not support serialization, in cases where the host language makes serialization an optional feature.

Raised by fn:parse-html if the supplied string is not a well-formed HTML document.

Raised by fn:parse-html if a key passed to $options, or its value, is not supported by the implementation.

This error is raised if the decimal format name supplied to fn:format-number is not a valid QName, or if the prefix in the QName is undeclared, or if there is no decimal format in the static context with a matching name.

This error is raised if a decimal format value supplied to fn:format-number is not valid for the associated property, or if the properties of the decimal format resulting from a supplied map do not have distinct values.

This error is raised if the picture string supplied to fn:format-number or fn:format-integer has invalid syntax.

Raised when casting to date/time datatypes, or performing arithmetic with date/time values, if arithmetic overflow or underflow occurs.

Raised when casting to duration datatypes, or performing arithmetic with duration values, if arithmetic overflow or underflow occurs.

Raised by adjust-date-to-timezone and related functions if the supplied timezone is invalid.

This error is raised if the picture string or calendar supplied to fn:format-date, fn:format-time, or fn:format-dateTime has invalid syntax.

This error is raised if the picture string supplied to fn:format-date selects a component that is not present in a date, or if the picture string supplied to fn:format-time selects a component that is not present in a time.

Raised by fn:hash if the effective value of the supplied algorithm is not one of the values supported by the implementation.

Raised by functions such as fn:json-doc, fn:parse-json or fn:json-to-xml if the string supplied as input does not conform to the JSON grammar (optionally with implementation-defined extensions).

Raised by functions such as map:merge, fn:json-doc, fn:parse-json or fn:json-to-xml if the input contains duplicate keys, when the chosen policy is to reject duplicates.

Raised by fn:json-to-xml if validation is requested when the processor does not support schema validation or typed nodes.

Raised by functions such as map:merge, fn:parse-json, and fn:xml-to-json if the $options map contains an invalid entry.

Raised by fn:xml-to-json if the XML input does not conform to the rules for the XML representation of JSON.

Raised by fn:xml-to-json if the XML input uses the attribute escaped="true" or escaped-key="true", and the corresponding string or key contains an invalid JSON escape sequence.

Raised by fn:resolve-QName and analogous functions if a supplied QName has a prefix that has no binding to a namespace.

Raised by fn:resolve-uri if no base URI is available for resolving a relative URI.

Raised by fn:load-xquery-module if the supplied module URI is zero-length.

Raised by fn:load-xquery-module if no module can be found with the supplied module URI.

Raised by fn:load-xquery-module if a static error (including a statically detected type error) is encountered when processing the library module.

Raised by fn:load-xquery-module if a value is supplied for the initial context item or for an external variable, and the value does not conform to the required type declared in the dynamically loaded module.

Raised by fn:load-xquery-module if no XQuery processor is available supporting the requested XQuery version (or if none is available at all).

A general-purpose error raised when casting, if a cast between two datatypes is allowed in principle, but the supplied value cannot be converted: for example when attempting to cast the string "nine" to an integer.

Raised when either argument to fn:resolve-uri is not a valid URI/IRI.

Raised by fn:zero-or-one if the supplied value contains more than one item.

Raised by fn:one-or-more if the supplied value is an empty sequence.

Raised by fn:exactly-one if the supplied value is not a singleton sequence.

Raised by functions such as fn:max, fn:min, fn:avg, fn:sum if the supplied sequence contains values inappropriate to this function.

Raised by fn:dateTime if the two arguments both have timezones and the timezones are different.

A catch-all error for fn:resolve-uri, recognizing that the implementation can choose between a variety of algorithms and that some of these may fail for a variety of reasons.

Raised when the input to fn:parse-ietf-date does not match the prescribed grammar, or when it represents an invalid date/time such as 31 February.

Raised when the radix supplied to fn:parse-integer is not in the range 2 to 36.

Raised when the digits in the string supplied to fn:parse-integer are not in the range appropriate to the chosen radix.

Raised if the option in an option map is not described in the specification, if it is not supported by the implementation and if its name is in no namespace.

Raised by regular expression functions such as fn:matches and fn:replace if the regular expression flags contain a character other than i, m, q, s, or x.

Raised by regular expression functions such as fn:matches and fn:replace if the regular expression is syntactically invalid.

For functions such as fn:replace and fn:tokenize, raises an error if the supplied regular expression is capable of matching a zero length string.

Raised by fn:replace to report errors in the replacement string.

Raised by fn:replace if both the $replacement and $action arguments are supplied.

Raised by fn:data, or by implicit atomization, if applied to a node with no typed value, the main example being an element validated against a complex type that defines it to have element-only content.

Raised by fn:data, or by implicit atomization, if the sequence to be atomized contains a function item other than an array.

Raised by fn:string, or by implicit string conversion, if the input sequence contains a function item.

Raised by fn:unparsed-text or fn:unparsed-text-lines if the $href argument contains a fragment identifier, or if it cannot be resolved to an absolute URI (for example, because the base-URI property in the static context is absent), or if it cannot be used to retrieve the string representation of a resource.

Raised by fn:unparsed-text or fn:unparsed-text-lines if the $encoding argument is not a valid encoding name, if the processor does not support the specified encoding, if the string representation of the retrieved resource contains octets that cannot be decoded into Unicode characters using the specified encoding, or if the resulting characters are not permitted characters.

Raised by fn:unparsed-text or fn:unparsed-text-lines if the $encoding argument is absent and the processor cannot infer the encoding using external information and the encoding is not UTF-8.

A dynamic error is raised if the authority component of a URI contains an open square bracket but no corresponding close square bracket.

A dynamic error is raised if no XSLT processor suitable for evaluating a call on fn:transform is available.

A dynamic error is raised if the parameters supplied to fn:transform are invalid, for example if two mutually exclusive parameters are supplied. If a suitable XSLT error code is available (for example in the case where the requested initial-template does not exist in the stylesheet), that error code should be used in preference.

A dynamic error is raised if an XSLT transformation invoked using fn:transform fails with a static or dynamic error. The XSLT error code is used if available; this error code provides a fallback when no XSLT error code is returned, for example because the processor is an XSLT 1.0 processor.

A dynamic error is raised if the fn:transform function is invoked when XSLT transformation (or a specific transformation option) has been disabled for security or other reasons.

A dynamic error is raised if the result of the fn:transform function contains characters available only in XML 1.1 and the calling processor cannot handle such characters.

Schemas

Two functions in this specification, fn:analyze-string and fn:json-to-xml, produce results in the form of an XDM node tree that must conform to a specified schema, defined in this appendix. In both cases the elements in the result are in the namespace http://www.w3.org/2005/xpath-functions, which is therefore the target namespace of the relevant schema.

A processor may have built-in knowledge of this schema, or it may read it from external files. Any attempt to supply a modified form of this schema will have unpredictable consequences. Modification here includes not only actual changes to the text of a schema document, but also actions such as using xs:redefine or xs:override, adding members to substitution groups, or defining derived types. Processors are not required to detect and reject such modifications. When validating against this schema, it is recommended that processors should ignore or reject any xsi:schemaLocation or xsi:type attributes in the instance being validated.

The schema for this namespace is organized as three schema documents. The first is a simple umbrella document that includes the other two. A copy can be found at xpath-functions.xsd:

Schema for the result of fn:analyze-string

This schema describes the output of the function fn:analyze-string.

The schema is reproduced below, and can also be found in analyze-string.xsd:

Schema for the result of fn:json-to-xml

This schema describes the output of the function fn:json-to-xml, and the input to the function fn:xml-to-json.

The schema is reproduced below, and can also be found in schema-for-json.xsd:

Schema for the result of fn:csv-to-xml

This schema describes the output of the function fn:csv-to-xml.

The schema is reproduced below, and can also be found in schema-for-csv.xsd:

Glossary Other Functions

This Appendix describes some sources of functions that fall outside the scope of the function library defined in this specification. It includes both function specifications and function implementations. Inclusion of a function in this appendix does not constitute any kind of recommendation or endorsement; neither is omission from this appendix to be construed negatively. This Appendix does not attempt to give any information about licensing arrangements for these function specifications or implementations.

XPath Functions Defined in Other W3C Recommendations

A number of W3C Recommendations make use of XPath, and in some cases such Recommmendations define additional functions to be made available when XPath is used in a specific host language.

Functions Defined in XSLT

The various versions of XSLT have all included additional functions intended to be available only when XPath is used within XSLT, and not in other host language environments. Some of these functions were originally defined in XSLT, and subsequently migrated into the core function library defined in this specification.

Generally, the reason that functions have been defined in XSLT rather than in the core library has been that they required additional static or dynamic context information.

XSLT-defined functions share the core namespace http://www.w3.org/2005/xpath-functions (but in XPath 1.0 and XSLT 1.0, no namespace was defined for these functions).

The conformance rules for XSLT 4.0 require implementations to support either XPath 3.0 or XPath 3.1. Some of the new functions in XPath 3.1, however, must be supported by all XSLT 4.0 implementations whether or not they implement other parts of XPath 3.1.

The following table lists all functions that have been defined in XSLT 1.0, 2.0, or 3.0, and summarizes their status.

Function name Availability
fn:accumulator-afterXSLT 3.0 only
fn:accumulator-beforeXSLT 3.0 only
fn:available-system-propertiesXSLT 3.0 only
fn:collation-keyCommon to XSLT 3.0 and XPath 3.1
fn:copy-ofXSLT 3.0 only
fn:currentXSLT 1.0, 2.0, and 3.0
fn:current-groupXSLT 2.0 and 3.0
fn:current-grouping-keyXSLT 2.0 and 3.0
fn:current-merge-groupXSLT 3.0 only
fn:current-merge-keyXSLT 3.0 only
fn:current-output-uriXSLT 3.0 only
fn:documentXSLT 1.0, 2.0, and 3.0
fn:element-availableXSLT 1.0, 2.0, and 3.0
fn:format-dateXSLT 2.0; migrated to XPath 3.0 and 3.1
fn:format-dateTimeXSLT 2.0; migrated to XPath 3.0 and 3.1
fn:format-numberXSLT 1.0 and 2.0; migrated to XPath 3.0 and 3.1
fn:format-timeXSLT 2.0; migrated to XPath 3.0 and 3.1
fn:function-availableXSLT 1.0, 2.0, and 3.0
fn:generate-idXSLT 1.0 and 2.0; migrated to XPath 3.0 and 3.1
fn:json-to-xmlCommon to XSLT 3.0 and XPath 3.1
fn:keyXSLT 1.0, 2.0, and 3.0
fn:regex-groupXSLT 2.0 and 3.0
fn:snapshotXSLT 3.0 only
fn:stream-availableXSLT 3.0 only
fn:system-propertyXSLT 1.0, 2.0, and 3.0
fn:type-availableXSLT 2.0 and 3.0
fn:unparsed-entity-public-idXSLT 2.0 and 3.0
fn:unparsed-entity-uriXSLT 1.0, 2.0, and 3.0
fn:unparsed-textXSLT 2.0; migrated to XPath 3.0 and 3.1
fn:xml-to-jsonCommon to XSLT 3.0 and XPath 3.1
map:containsCommon to XSLT 3.0 and XPath 3.1
map:entryCommon to XSLT 3.0 and XPath 3.1
map:findCommon to XSLT 3.0 and XPath 3.1
map:for-eachCommon to XSLT 3.0 and XPath 3.1
map:getCommon to XSLT 3.0 and XPath 3.1
map:keysCommon to XSLT 3.0 and XPath 3.1
map:mergeCommon to XSLT 3.0 and XPath 3.1
map:putCommon to XSLT 3.0 and XPath 3.1
map:removeCommon to XSLT 3.0 and XPath 3.1
map:sizeCommon to XSLT 3.0 and XPath 3.1
Functions Defined in XForms

XForms 1.1 is based on XPath 1.0. It adds the following functions to the set defined in XPath 1.0, using the same namespace:

boolean-from-string, is-card-number, avg, min, max, count-non-empty, index, power, random, compare, if, property, digest, hmac, local-date, local-dateTime, now, days-from-date, days-to-date, seconds-from-dateTime, seconds-to-dateTime, adjust-dateTime-to-timezone, seconds, months, instance, current, id, context, choose, event.

XForms 2.0 was first published as a W3C Working Draft, and subsequently as a W3C Community Group specification. These draft specifications do not include any additional functions beyond those in the core XPath specification.

Function Defined in XQuery Update 1.0

The XQuery Update 1.0 specification defines one additional function in the core namespace http://www.w3.org/2005/xpath-functions, namely fn:put. This function can be used to write a document to external storage. It is thus unusual in that it has side-effects; the XQuery Update 1.0 specification defines semantics for updating expressions including this function.

Although XQuery Update 1.0 is defined as an extension of XQuery 1.0, a number of implementors have adapted it, in a fairly intuitive way, to work with later versions of XQuery. At the time of this publication, later versions of the XQuery Update specification remain at Working Draft status.

Functions Defined by Community Groups

A number of community groups, with varying levels of formal organization, have defined specifications for additional function libraries to augment the core functions defined in this specification. Many of the resulting function specifications have implementations available for popular XPath, XQuery, and XSLT processors, though the level of support is highly variable.

The first such group was EXSLT. This activity was primarily concerned with augmenting the capability of XSLT 1.0, and many of its specifications were overtaken by core functions that became available in XPath 2.0. EXSLT defined a number of function modules covering:

Dates and Times Dynamic XPath Evaluation Common (containing most notably the widely used node-set function) Math (max, min, abs, and trigonometric functions) Random Number Generation Regular Expressions Sets (operations on sets of nodes including set intersection and difference) String Manipulation (tokenize, replace, join and split, etc.)

Specifications from the EXSLT group can be found at .

A renewed attempt to define additional function libraries using XPath 2.0 as its baseline formed under the name EXPath. Again, the specifications are in various states of maturity and stability, and implementation across popular processors is patchy. At the time of this publication the function libraries that exist in stable published form include:

Binary (functions for manipulating binary data) File Handling (reading and writing files) Geospatial (handling of geographic data) HTTP Client (sending HTTP requests) ZIP Facility (reading and creating ZIP files or similar archives)

The EXPath community has also been engaged in other related projects, such as defining packaging standards for distribution of XSLT/XQuery components, and tools for unit testing. Its specifications can be found at .

A third activity has operated under the name EXQuery, which as the name suggests has focused on extensions to XQuery. EXQuery has published a single specification, RestXQ, which is primarily a system of function annotations allowing XQuery functions to act as endpoints for RESTful services. It also includes some simple functions to assist with the creation of such services. The RestXQ specification can be found at .

The FunctX Library

Many useful functions can be written in XSLT or XQuery, and in this case the function implementations themselves can be portable across different XSLT and XQuery processors. This section describes one such library.

FunctX is an open-source library of general-purpose functions, supplied in the form of XQuery 1.0 and XSLT 2.0 implementations. It contains over a hundred functions. Typical examples of these functions are:

Test whether a string is all-whitespace Trim leading and trailing whitespace Test whether all the values in a sequence are distinct Capitalize the first character of a string Change the namespace of all elements in a tree Get the number of days in a given month Get the first or last day in a given month Get the date of the preceding or following day Ask whether an element has element-only, mixed, or simple content Find the position of a node in a sequence Count words in a string

The FunctX library can be found at .

Checklist of implementation-defined features Changes since version 3.1 New Functions

A number of new functions have been defined:

fn:all-different

fn:all-equal

fn:atomic-equal

fn:build-uri

fn:chain

fn:char

fn:characters

fn:contains-subsequence

fn:csv-to-xml

fn:csv-to-arrays

fn:decode-from-uri

fn:do-until

fn:duplicate-values

fn:ends-with-subsequence

fn:every

fn:expanded-QName

fn:foot

fn:function-annotations

fn:graphemes

fn:hash

fn:highest

fn:identity

fn:in-scope-namespaces

fn:index-where

fn:intersperse

fn:invisible-xml

fn:is-NaN

fn:items-at

fn:lowest

fn:message

fn:op

fn:parse-csv

fn:parse-html

fn:parse-integer

fn:parse-QName

fn:parse-uri

fn:partition

fn:replicate

fn:scan-left

fn:scan-right

fn:slice

fn:some

fn:sort-with

fn:stack-trace

fn:starts-with-subsequence

fn:subsequence-where

fn:transitive-closure

fn:trunk

fn:void

fn:while-do

fn:xdm-to-json

array:build

array:empty

array:foot

array:index-of

array:index-where

array:members

array:of-members

array:replace

array:slice

array:split

array:trunk

array:values

map:build

map:empty

map:entries

map:filter

map:keys-where

map:of-pairs

map:pair

map:pairs

map:replace

map:values

Changes to Existing Functions

The keywords used for parameter names have been changed. Previously these names were of no significance, but in 4.0 they can be used with keyword := value argument syntax in function calls.

The fn:deep-equal function has an options argument giving detailed control over how two values are compared.

The fn:compare function has been enhanced to accept types other than strings.

The fn:format-integer function can produce output in non-decimal radices, for example binary and hexadecimal.

The fn:json-doc function accepts additional options.

The fn:remove function allows several items to be removed from a sequence in a single call.

The fn:replace function has an additional optional argument allowing the replacement string to be computed from the matched input string.

The third argument of fn:format-number can now be supplied as an xs:QName instead of as a string that can be converted to a QName. Using a xs:QName, especially in the (rare) cases when the value is supplied dynamically, avoids the need to maintain the static namespace context at execution time. In addition an extra argument has been added to fn:format-number to allow the decimal format to be supplied explicitly.

The function fn:xml-to-json accepts an additional option: number-formatter allows the user to control the formatting of numeric values, for example by preventing the use of exponential notation for large integers.

In many functions including fn:substring, fn:subsequence, fn:unparsed-text, fn:unparsed-text-available, fn:unparsed-text-lines, array:subarray, fn:resolve-uri, fn:error, and fn:trace, arguments that can be omitted can now also be set to an empty sequence; the effect of supplying an empty sequence is equivalent to the effect of not supplying the argument.

Changes to Casts and Constructor Functions

The keyword for the argument has changed from arg to value.

The argument is now optional, and defaults to the context value (which is atomized if necessary). This change aligns constructor functions such as xs:string, xs:boolean, and xs:numeric with fn:string, fn:boolean, and fn:number.

Miscellaneous Changes

The semantics of the HTML case-insensitive collation "http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive" are now defined normatively in this specification rather than by reference to the living HTML5 specification (which has changed since 3.1); and the rules now make ordering explicit rather than leaving it implementation-defined.

An option in an option map is now rejected if it is not described in the specification, if it is not supported by the implementation and if its name is in no namespace.

Editorial Changes

These changes are not highlighted in the change-marked version of the specification.

The operator mapping table has been simplified so all the value comparison operators are now defined in terms of two functions (for each data type): op:XX-equal, and op:XX-less-than. The entries for op:XX-greater-than have therefore been removed.

The names of arguments appearing in function signatures have been changed. This is to reflect the introduction of keyword arguments in XPath 4.0; the names chosen for arguments are now more consistent across the function library.

Where appropriate, the phrase "the value of $x" has been replaced by the simpler $x. No change in meaning is intended.

For functions that take a variable number of arguments, wherever possible the specification now gives a single function signature indicating default values for arguments that may be omitted, rather than multiple signatures.

The formal specifications of array functions have been rewritten to use two new primitives: array:members which converts an array to a sequence of value records, and array:of-members which does the inverse. This has enabled many of the functions to be specified more concisely, and with less duplication between similar functions for sequences and arrays.

The appendix containing illustrative user-written functions has been dropped; many of these functions are no longer needed.

Compatibility with Previous Versions

This section summarizes the extent to which this specification is compatible with previous versions.

Version 4.0 of this function library is fully backwards compatible with version 3.1, except as noted below:

In fn:deep-equal, and in other functions such as fn:distinct-values that refer to fn:deep-equal, the rules for comparing values of different numeric types (for example, xs:double and xs:decimal) have changed. In previous versions of the specification, xs:decimal values were converted to xs:double, leading to a possible loss of precision. This could make comparisons non-transitive, leading to problems when grouping, and potentially (depending on the sort algorithm) with sorting. The problem has been fixed by requiring comparisons to be performed based on the exact mathematical value without any loss of precision.

This means, for example, that deep-equal(0.2, 0.2e0) is now false, whereas in previous versions it was true. The two values are not mathematically equal, because the exact decimal equivalent of the xs:double value written as 0.2e0 is 0.200000000000000011102230246251565404236316680908203125.

The corresponding change has not been made to the = and eq operators, because it was found to be too disruptive. For example, if the context node is the element <e price="10.0" discount="0.2"/>, there is an expectation that the expression @price - @discount = 9.8 should return true. But (assuming untyped data), the result of the subtraction is an xs:double whose precise value is 9.800000000000000710542735760100185871124267578125, so comparing the two values as decimals would return false.

In version 4.0, omitting the $value of fn:error has the same effect as setting it to an empty sequence. In 3.1, the effects could be different (the effect of omitting the argument was implementation-defined).

In version 3.1, the fn:deep-equal function did not merge adjacent text nodes after stripping comments and processing instructions, so the elements abcdef]]> and abcdef]]> were considered non-equal. In version 4.0, the text nodes are now merged prior to comparison, so these two elements compare equal.

In version 4.0, the function signature of fn:namespace-uri-for-prefix constrains the first argument to be either an xs:NCName or a zero-length string (the new coercion rules mean that any string in the form of an xs:NCName is acceptable). If a string is supplied that does not meet these requirements, a type error will be raised. In version 3.1, this was not an error: it came under the rule that when no namespace binding existed for the supplied prefix, the function would return an empty sequence.

Furthermore, because the expected type of this parameter is no longer xs:string, the special coercion rules for xs:string parameters in XPath 1.0 compatibility mode no longer apply. For example, supplying xs:duration('PT1H') as the first argument will now raise a type error, rather than looking for a namespace binding for the prefix PT1H.

Version 4.0 makes it clear that the casting of a value other than xs:string or xs:untypedAtomic to a list type (whether using a cast expression or a constructor function) is a type error . Previously this was defined as an error, but the kind of error and the error code were left unspecified. Accordingly, the function signatures of the constructor functions for built-in list types have been changed to use an argument type of xs:string?.

In version 3.1, end-of-line characters were adopted unchanged when calling fn:unparsed-text. In version 4.0, they are normalized as known from XML (see ).

The way that fn:min and fn:max compare numeric values of different types has changed. The most noticeable effect is that when these functions are applied to a sequence of xs:integer or xs:decimal values, the result is an xs:integer or xs:decimal, rather than the result of converting this to an xs:double.

The type of the third argument of fn:format-number has changed from xs:string to (xs:string | xs:QName). Because the expected type of this parameter is no longer xs:string, the special coercion rules for xs:string parameters no longer apply. For example, it is no longer possible to supply an instance of xs:anyURI or (when XPath 1.0 compatibility mode is in force) an instance of xs:boolean or xs:duration.

For compatibility issues regarding earlier versions, see the 3.1 version of this specification.