XPathIntroduction to XPath in eXist-db and <oXygen/>Getting started with XPath and eXideOpen http://newtfire.org:8338/exist/apps/eXide/index.html (or
eXide within eXist-db on your own laptop, if you’ve installed it), click
“New XQuery”, and erase all content in the editing window. You’ll type your
XPath in the editing window and run it with the “Eval” button.Learning XPath (and other languages, including XSLT, XQuery, Schematron)
means learning the …Vocabulary (e.g., the division operator in XPath is div,
not /)Syntax (e.g., in XPath conditional expressions,
the if test must be parenthesized and an else
is required: if (condition) then 1 else ())Function library (e.g., string-length() and count() are functions, but there is no
length() or len() or
size())All XPath expressions return a sequence. Sequences may
contain nodes (elements, attributes, etc.), atomic
values (strings, numbers, etc.), or both. A sequence of one item is
nonetheless a sequence, as is an empty sequence. Nested
sequences are automatically flattened.Type a number and hit Eval. This is a one-item sequence that
consists of a single atomic value. Try integers and decimal numbers.
Try wrapping the number in parentheses.Type a string (inside single or double quotes) and hit Eval.
This is a one-item sequence that consists of a single atomic value.
Try omitting the quotation marks. Try using curly quotation marks.
Try wrapping the string in parentheses.Type empty parentheses and hit Eval. This is an empty
sequence.Type multiple items of different types (numbers, strings),
separated by commas. Try wrapping them in parentheses. Try wrapping
them in multiple parentheses. Try removing the commas. This is a
multi-item sequence.Try to type a nested sequence, e.g., (1, 2, (3,
4)), and hit Eval. What result do you expect? What do you
get?Simple XPath expressionsReview: strings and numbers (atomic values) are XPath expressions "Hi, Mom!" (Strings are enclosed in single or
double quotation marks—straight, not curly)1 (Numbers are not enclosed in quotation
marks)1.0 (What should this return? lexical
space and value space)Arithmetic expressions are XPath expressions1 + 1Practice: +, -, *,
div, idiv, mod
(/ is not division)XPath library functions (with no arguments) are XPath expressionscurrent-date()current-time()current-dateTime()XPath library functions (with arguments) are XPath expressionsupper-case('dhsi') (How many arguments, and of what
type?)concat('Curly', 'Larry', 'Moe') (How many arguments, and
of what type?)count(('Curly', 'Larry', 'Moe')) (Why two sets of
parentheses? Hint: How many arguments, and of what type?)Function signature and cardinality:
count($items as item()*) as xs:integerNested XPath library functions and operations are XPath expressions. Read
them from the inside outmax((1 + 2, 10 div 5, 6 * 0.2)) (Remember those two sets
of parentheses?)translate(upper-case('Hi, Mom!'),'AEIOU','xxxxx') (How
is this different from upper-case(translate('Hi,
Mom!','AEIOU','xxxxx'))?)format-dateTime(current-dateTime(),'[h].[m01] [Pn] on [FNn],
[D1o] [MNn]')Nested functions are hard to read. Use the arrow
operator (=>) insteadupper-case('Hi, Mom!') =>
translate('AEIOU','xxxxx')current-dateTime() => format-dateTime('[h].[m01][Pn] on
[FNn], [D1o] [MNn]')Path expressions may span multiple lines (try it with the examples above),
that is, new-line and space have the same meaningXPath in <oXygen/>Launch <oXygen/> Editor, hit Ctrl+u (Windows) or Cmd+u
(MacOS), copy and paste the string
http://newtfire.org:8338/exist/apps/shakespeare/data/ham.xml,
and hit OK. (Backup copy at
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/ham.xml.)
This is a copy of Hamlet with TEI markup.Set the dropdown in the upper left to XPath 3.1. (This widget is
called the XPath Toolbar.) Enter some XPath expressions (from above, such
as 1 + 1). Limited to one line; hit Enter to run the
expression. The XPath Toolbar works only if you have an XML document open in
<oXygen/>, even if you aren’t using the document in your XPath
expression.Go to Window → Show View → XPath/XQuery Builder. Set the dropdown in the
upper left to XPath 3.1. Enter some XPath expressions. May span
multiple lines; Enter for a new line. To run, hit Ctrl+Enter
(Windows) or Cmd+Enter (MacOS), or click the red right-pointed
triangle.XPath path expressionsAn XPath path expression is a sequence of steps, each of
which proceeds from one node (called the context node) to a
sequence of zero (!) or more others. It returns the results in document
order (order of start tags). (Details at Kay 1227)Sample XPath path expression: /TEI/text/body/div: start at
the document node, then navigate to a sequence of all its
<TEI> children. For each of those, navigate to all of
their <text> children, then to their
<body> children, and then to their
<div> children.XPath steps are separated by single slashes (/).An XPath expression that begins with a slash (/) starts at
the document node; this is an absolute path. Any
other XPath expression starts at the current context; this is a
relative path.It is not an error to ask for something that doesn’t exist; it just
returns an empty sequence.With Hamlet open and selected, go to the XPath Toolbar or
XPath Builder and try the following examples. Click on some of the results
in the lower panel:/TEI/teiHeader/fileDesc/titleStmt/title (returns 1
<title> element)/TEI/text/body/div (returns 5
<div> elements)/TEI/teiHeader/fileDesc/titleStmt/info (returns no
results; this is not an error)/TEI/teiHeader/fileDesc/title Stmt/title (raises
an error; spaces are not allowed in path expressions)XPath path stepsPath steps move along axes: child::, parent::,
descendant::, ancestor::,
preceding-sibling::, following-sibling::, etc.
See:
http://dh.obdurodon.org/introduction-xpath.xhtml#xpath_axes.Axes are specified with a double colon, e.g., descendant::div
matches all <div> descendants of the current context
node. There are two common shortcut notationsThe default is the child axis, so
/TEI/teiHeader is synonymous with
/child::TEI/child::teiHeader. Use the
shorthand.// is shorthand for descendant-or-self::node()/, so /TEI//div
finds all of the <div> elements that are
descendants of the <TEI> root element, that is,
anywhere in the document. The document node has a descendant axis,
too: //div. Be careful with this one!Each path step returns a sequence of zero or more context nodes for the
next path step. Only the final path step is permitted to return something
other than a node. Why?The end of a path expression may return nodes or atomic values//body/div/count(descendant::sp) navigates from
the document node to all of the acts in the play and then returns a
count of the speeches in each actWhat’s wrong with //body/div/count(//sp)?
The leading double slash resets the current context to
the document node, and selects all <sp>
elements in the entire document, instead of just the individual
act.* matches any element/TEI/teiHeader/* matches all child elements of the
<teiHeader>.. matches the parent node of the current context node. That
is, it’s shorthand for parent::*//stage/.. matches the parent nodes of all
<stage> elementsYour turn: Find the acts (<div> children of
<body>) in Hamlet//body/divFind the stage directions (<stage>) in
Hamlet//stageFind the <stage> children of
<div> elements (but not other
<stage> elements) in Hamlet//div/stageFind the parents of the stage directions in Hamlet//stage/.. or //stage/parent::*Find the <div> parents of the stage
directions in Hamlet, but not other parents //stage/parent::divExploring document structures and data with XPathXPath functions for stringsconcat()concat('Curly','Larry','Moe')concat('Curly is #', 1)Or use the concatenation operator: 'Curly is #' ||
1What’s wrong with concat(//speaker)? The
arguments to concat() must be two or more
individual atomic (or atomizable) items, and
//speaker is a sequencestring-join()string-join(( 'Curly', 'Larry', 'Moe'), ',
')string-join(//speaker, ', ')string-join(//speaker) Why does this work when
concat(//speaker) didn’t? The first
argument to string-join() is a sequence. All
arguments to concat() must be atomic or
atomizable.string-length()string-length('Curly, Larry, and Moe')lower-case(), upper-case()lower-case('Curly, Larry, and Moe')normalize-space()normalize-space(' Curly, Larry, Moe ')substring-before(), substring-after()substring-before('Larry', 'r') What if there’s
more than one?substring-after('Larry', 'r') What if there’s more
than one?substring()substring('Curly', 1, 2) XPath starts counting
with 1 (not 0).contains() Foreshadowing: This returns a Boolean
(True or False) value. How might this be useful?contains('Ophelia', 'ph')//speaker/contains(., 'ph') (the dot refers to the
current context item)See also contains-token(), which would match "Rosencrantz" but
not "Rosencrantzenfeld".starts-with(), ends-with()starts-with('Ophelia', 'Op')XPath functions for numbers and for sequences of numbersceiling(), floor()ceiling(3.141592653)round()round(3.141592653, 4)format-integer(), format-number()format-integer(154,'w')format-integer(154,'I')format-number(1, '#.000')max(), min(),
sum(), avg()max((1, 2, 3)), etc.What happens when these are applied to strings? To a sequence
that mixes strings and numbers?Find the length in character count of each <speaker>//speaker/string-length() (why doesn’t
string-length(//speaker) work?)Find the length of the longest speaker name
max(//speaker/string-length())XPath functions for sequencesdistinct-values()distinct-values(/TEI//speaker)count()count(('Curly', 'Larry', 'Moe',
'Curly'))count(distinct-values(('Curly', 'Larry', 'Moe',
'Curly')))distinct-values(count(('Curly', 'Larry', 'Moe',
'Curly')))sort()sort(//speaker)sort(//speaker,(), function($item)
{string-length($item)})Your turn: How many <speaker> elements are there in
Hamlet?
count(//speaker)How many distinct<speaker> elements are there in
Hamlet?
count(distinct-values(//speaker))How many acts are there in Hamlet?
count(//body/div)How many scenes are there in Hamlet?
count(//div/div)What does count(//div) tell you about
Hamlet, and why is it unhelpful? It counts
<div> elements of different types
together: acts, scenes, cast list.Looking Stuff Up: XPath function signatures and cardinalityThe function signature is the number and type of arguments it
accepts or requires, and the number and type of items it returnsType error: string-length(1.2345)Cardinality error:
string-length(/TEI/speaker)Why is count(/TEI//speaker) okay, while count('Curly',
'Larry', 'Moe') is broken? The count()
function is receiving three arguments, but it is designed to receive
only one argument. To give it one argument, it needs a set of inner
parentheses: count(('Curly', 'Larry',
'Moe'))The error message is your friend. Read it.Resources and references:
https://ebeshero.github.io/UpTransformation/References.htmlXPath predicatesPredicates, in square brackets after a path step,
filter the resultsNumerical predicates//body/div[3] matches the third
<div> child of each <body>
element (same as //body/div[position() eq 3]//body/div[last()] matches the last
<div> child of each <body>
elementPredicates with node tests//stage[parent::div] is equivalent to
//div/stagePredicates with functions and operators//sp[speaker eq 'Ophelia']//sp[contains-token(speaker,
'Rosencrantz')]//lg[@type eq 'couplet']XPath and XQueryFrom XPath to XQueryWorking with sequencesThree ways to apply a function to a sequenceExplicit forfor $speaker in /TEI//speaker return
string-length($speaker)Implicit for/TEI//speaker/string-length()Simple map (!)/TEI//speaker ! string-length(.)Difference between simple map (!) and
arrow (=>) ('Curly', 'Larry', 'Moe') => count()('Curly', 'Larry', 'Moe') ! count(.)Read and evaluate XML projects with XPathLet’s open Hamlet again in <oXygen/>. Launch
<oXygen/> Editor, hit Ctrl+u (Windows) or Cmd+u (MacOS),
copy and paste the string
http://newtfire.org:8338/exist/apps/shakespeare/data/ham.xml,
and hit OK. (Backup copy at
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/ham.xml.)
This is a copy of Hamlet with TEI markup.How many speeches (<sp>) does Ophelia have?
count(//sp[speaker eq
'Ophelia'])How many speeches does Ophelia have in Act 2?
count(//body/div[2]//sp[speaker eq
'Ophelia'])What types of elements can have stage directions
(<stage>) as children? (Hint: use the name()
function.)
distinct-values(//stage/../name())How many speeches don’t contain any metrical line child elements
(<l>)? (Hint: use the not()
function.) count(//sp[not(l)])Building on your answer to the last question, who are the speakers of
those speeches?
distinct-values(//sp[not(l)]/speaker)Building on your answers to the last two questions, what kinds of elements
do they contain instead?
distinct-values(//sp[not(l)]/*/name())What is Hamlet’s first spoken line (<l>)?
(//sp[speaker eq
'Hamlet']/l)[1]What is the last stage direction in the entire document?
(//stage)[last()]How many speeches have more than 8 line children?
count(//sp[count(l) gt 8])Building on your answer to the preceding question, how many line children
does each of those speeches have? //sp[count(l) gt
8]/count(l)Building on your answers to the preceding two questions, who are the
speakers of speeches that have more than 8 line children?
distinct-values(//sp[count(l) gt
8]/speaker)How long is the longest speech?
max(//sp/string-length()) (or, better:
max(//sp/string-length(normalize-space())))Building on your answer to the last question, who is the speaker of the
longest speech? //sp[string-length() eq
max(//sp/string-length())]/speaker (or, better:
//sp[string-length(normalize-space()) eq
max(//sp/string-length(normalize-space()))]/speaker)Housekeeping: documents, collections, and namespacesOpen our web server installation of eXist-dB at
http://exist.newtfire.org/exist/apps/eXide/index.html. In
the eXide window, click on the New XQuery tab. This brings up a
window with xquery version "3.1"; at the top.Access a document with doc()doc('/db/apps/shakespeare/data/ham.xml')Access a collection of documents with collection()collection('/db/apps/shakespeare/data/')Namespace declaration declare namespace
tei="http://www.tei-c.org/ns/1.0";<stage> elements in Hamlet:
doc('/db/apps/shakespeare/data/ham.xml')//tei:stageFind all the stage directions in the entire Shakespeare
collection
collection('/db/apps/shakespeare/data/')//tei:stageThe seven types of nodesDocument (document-node())Element (element())Attribute (attribute())collection('/db/apps/shakespeare/data/')//tei:sp/@whocollection('/db/apps/shakespeare/data/')//tei:sp/@who/string()Text (text(); not a function; not to be confused with
string())doc('/db/mitford/literary/Charles1.xml')//tei:stage
(Mary Russell Mitford’s Charles the First) What does
doc('/db/mitford/literary/Charles1.xml')//tei:stage/string()
return? The string values of the stage directions, that is,
the stage directions with all markup strippedWhat does
doc('/db/mitford/literary/Charles1.xml')//tei:stage/text()
return? The text() nodes in each stage
directionRarely used: comment (comment()), processing instruction
(processing-instruction())Deprecated: namespace (namespace-uri())BreakScavenger hunt 1Work with the Digital Mitford Site Index posted in eXist at
/db/mitford/si.xml or the official version at its external
location: https://digitalmitford.org/si.xml Can you find out
the following? Look at the <div> elements in the site
index. What attribute on this element can tell you how the document
is organized? Write an XPath that isolates these attribute values.
doc('https://digitalmitford.org/si.xml')//tei:div/@type/string()Look at the element children of the <div>
elements (you can do this without knowing what all the elements
are). What do you think is the purpose of the @sortKey
attributes? What XPath expression would show you those values?
doc('https://digitalmitford.org/si.xml')//tei:div/*/@sortKey
! string()Wildcard node testingWork with the Digital Mitford Site Index posted in eXist at
/db/mitford/si.xml or the official version at its external
location: https://digitalmitford.org/si.xml Can you find out
the following? The @xml:id for the play Charles the
First in the site index is "CharlesI_MRMplay". References
to the play throughout the site index will be made with various
attributes that begin with a hashmark #, formatted like
this: "#CharlesI_MRMplay". Knowing this, can you locate
all the individual entries in any of the site index lists that
contain references of any kind to the play?
doc('https://digitalmitford.org/si.xml')//tei:div/*/*[descendant::*/@*="#CharlesI_MRMplay"]
How can you find out how many these are using a function?
doc('https://digitalmitford.org/si.xml')//tei:div/*/*[descendant::*/@*="#CharlesI_MRMplay"]
=> count()Regex in XPathcontains() vs. matches()doc('/db/mitford/literary/Charles1.xml')//tei:l[contains(.,
'murder')]doc('/db/mitford/literary/Charles1.xml')//tei:l[contains(.,
'unrighteousness')]doc('/db/mitford/literary/Charles1.xml')//tei:l[matches(.,
'[a-z]{15,}','i')]doc('/db/mitford/literary/Charles1.xml')//tei:*/text()[matches(.,
'\d{4}')]doc('/db/mitford/literary/Charles1.xml')//tei:*/text()[matches(.,
'(^|\D)\d{4}($|\D)')] Why is the number of results
smaller than for the previous expression?
xquery version "3.1";
declare namespace tei="http://www.tei-c.org/ns/1.0";
let $a := doc('/db/mitford/literary/Charles1.xml')//tei:*/text()[matches(., '(^|\D)\d{4}($|\D)')]
let $b := doc('/db/mitford/literary/Charles1.xml')//tei:*/text()[matches(., '\d{4}')]
return $b except $a (: returns items in $b that are not in $a:)translate() vs. replace()Try this expression,
doc('/db/mitford/literary/Charles1.xml')//tei:castList,
and notice the pseudo-markup in the cast list.
translate() to the rescue!
doc('/db/mitford/literary/Charles1.xml')//tei:castList//tei:roleDesc/translate(.,
'()', '')The next examples work with the @xml:id attributes
on the <l> elements. How can you get a look at
the @xml:id attributes first?
doc('/db/mitford/literary/Charles1.xml')//tei:l/@xml:id/string()Change the format of the @xml:id attributes on the
<l> elements with replace():
doc('/db/mitford/literary/Charles1.xml')//tei:l/replace(@xml:id,
'Chas(_\w+_)', 'C1$1')substring-before() and substring-after() vs.
tokenize()Return only the document location (e.g., "ded", "pro", "act")
and line number information in the @xml:id attributes:
doc('/db/mitford/literary/Charles1.xml')//tei:l/substring-after(@xml:id,
'Chas_')Working with the expression we just wrote, how would you apply
substring-before() to return only the document
location ("ded", "pro", "act"), and trim off the line number
information? Two ways: old-fashioned:
doc('/db/mitford/literary/Charles1.xml')//tei:l/substring-before(substring-after(@xml:id,
'Chas_'), '_') and more legible with simple map
operator:
doc('/db/mitford/literary/Charles1.xml')//tei:l/substring-after(@xml:id,
'Chas_') ! substring-before(.,'_') Why can’t we use
the arrow operator (=>) here?Introducing variablesGlobal variables and syntax, how to return their values in eXist-dbIn eXist-db, keep the TEI namespace declaration line, and copy the
following global variables: declare variable $Chas as document-node() :=
doc('/db/mitford/literary/Charles1.xml');declare variable $ChasPlay as element() :=
$Chas/*;Return the value of the variables one by one, by typing each of
their names on the next line: $Chas and
$ChasPlay. Notice the difference in the data type
declaration and in the results. Other common values of
as include xs:string or
xs:integer.Introducing FLWORFLWOR keywords: for, let, where,
order by, returnThe simplest FLWOR: let (or for) followed by
returnXQuery flow controlIntroducing FLWOR, continued.Retrieve a sequence of whole elements: let $places := $Chas//tei:placeNamereturn $placesHow would you return only their text contents?
$places/string(), but notice the white
space issues. Repair these with return
$places/normalize-space()Scavenger hunt 2: in XQuery this time. Work in eXist-db in the same file we started before the break and delete
only the return line. Let’s keep adding to it. Use variables
and FLWOR statements to define and retrieve the following: Define a global variable pointing to the Digital Mitford site
index document, posted in eXist at /db/mitford/si.xml
or the official version at its external
location:https://digitalmitford.org/si.xml. Hint:
Declarations need to come first. declare variable $si
as document-node() := doc('/db/mitford/si.xml'); or
declare variable $si as document-node() :=
doc('https://digitalmitford.org/si.xml'); The new
global variable must be added before the first let
statement.Write a variable (either global or in let form)
that locates all of the <place> elements in the
site index document. Use the $si variable you just
defined for the site index document in your expression.
let $siPlaces as element()+ :=
$si//tei:place or as a global variable above the
first let statement and after the variable defining
the si.xml document: declare variable $siPlaces as
element()+ := $si//tei:place;. For housekeeping purposes, rename the variable
$places (that we defined earlier to retrieve
$Chas//tei:placeName): Call it
$Chasplaces.Define a new variable to retrieve the values of the
@ref on those $ChasplacesDon’t
forget the string() to return the attribute value:
let $ChasPlaceRefs :=
$Chasplaces/@ref/string()How would you rewrite the last XPath scavenger hunt solution as
a let statement in this XQuery? (Find references to
"CharlesI_MRMplay" in the site index): let $siChasRefs
:=
$si//tei:div/*/*[descendant::*/@*="#CharlesI_MRMplay"]XPath for expressions; sequence and range variables
(<oXygen/>)In the <oXygen/> XPath Builder View, try this code: for $i in
("Curly", "Larry", "Moe") return concat($i, " is a
Stooge!")Can we write it as a simple map (with !)?
("Curly", "Larry", "Moe") ! concat(., " is a
Stooge!")Open the Digital Mitford site index URL in <oXygen/> using
https://digitalmitford.org/si.xml . Try finding out the
following in the <oXygen/> XPath Builder: Find each person we have listed as born in Scotland in the site
index. Notice that sometimes place names are stored inside the
<birth> elements. for $i in
//person[contains(birth, "Scotland")] return $i. You
should return 28 <person>
entries.Now, modify that example to return the @xml:id,
(or anything else you want to find out about the person elements):
For the @xml:id: for $i in
//person[contains(birth, "Scotland")] return
$i/@xml:id. Notice that we don’t need the
string() function after the
@xml:id in the <oXygen/> XPath builder
view because the <oXygen/> viewer exposes the attribute
values and eXide does not.FLWOR statements in XQuery: how for works: Part 1for in XQuery and iterative returns: for $i in
$YourSequenceVariable. Look up the places coded in Charles
the First for their entries in the Digital Mitford site index. Get the unique (distinct) values of @ref
attributes on placeName elements. let
$distChPRs :=
distinct-values($ChasPlaceRefs)Next, loop through each of these distinct values:
for $i in $distChPRsHow will we find the site index entry that matches up with each
member of our sequence of place references in Charles the
First? Each site index entry holds an
@xml:id, and each placeName
element has a @ref attribute whose value is
formatted with a leading # followed by the
@xml:id value.Write the variable that finds the site index entry whose
@xml:id matches the value of the range variable in
our for expression. let $siCPrs :=
$si//tei:place[@xml:id = substring-after($i,
'#')]BreakFLWOR statements in XQuery: how for works: Part 2Sort your sequence: two ways: Apply the XPath sort() function to the variable
that defines the sequence (above the for loop):
let $distChPRs := $ChasPlaceRefs =>
distinct-values() => sort()Or, within the FLWOR with the XQuery order by
statement: order by $siCPrs/@xml:id followed by nothing
(default: ascending alphabetical order), or a keyword:
ascending or descending. To order in
reverse alphabetical order by the @xml:id in the site
index entry? order by $siCPrs/@xml:id
descendingDo either of these methods really deliver alphabetical order?
In no human understanding of alphabetical order does
Zebra come before aardvark. This sorting
reflects Unicode order.Number the results with $posSet the $pos variable in the for
statement: for $i at $pos in $YourSequenceVariable, but
caution: order by happens after$pos is set. So if we want sorted, numbered output?
Use the sort() function on the sequence.
Try a return like return concat($pos, ': ',
$siCPrs/@xml:id)Add where in a FLWOR expression to filter the returns Notice the blank results: A number of entries are not yet in
the site index. We can filter by selecting only those where the
variable $siCPrs exists: where $siCPrsOr use where to return only results in the site
index whose string value contains "France": where
$siCPrs[contains(., 'France')]Which is more efficient in XQuery: a predicate or where? Text returns: combining strings into one result: concat() and
string-join()flower.jpgPutting it all together: writing FLWORs to make new filesHTML returns: how to use curly braces to layer and activate XQuery in an
HTML file. HTML table output:
https://ebeshero.github.io/UpTransformation/Chas1_FrenchPlaces.htmlXQuery to make the HTML, in the newtfire eXist-db:
/db/DHSI-Queries/Chas-PersNameGraph-SVG.xql, or on
GitHub:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/xquery/DHSI-Queries/Chas-SI-HTMLTable.xql
SVG returns: a bar graph from XQuery SVG bar graph output:
http://newtfire.org:8338/exist/rest/db/DHSI-Queries/Chas-PersNameGraph-SVG.xql
(may require permission) or
https://ebeshero.github.io/UpTransformation/Chas-PersNameGraph.svgXQuery to make the SVG, in the newtfire eXist-db:
/db/DHSI-Queries/Chas-PersNameGraph-SVG.xql, or on
GitHub:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/xquery/DHSI-Queries/Chas-PersNameGraph-SVG.xqlXPath and XSLTIntroduction to XPath in XSLTPreparation for writing XSLT in <oXygen>Settings: XSLT debugger and Saxon parserSelecting files to run and save Open <oXygen/> and open the following url:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/ozymandias.xmlOpen this starter XSLT file, too:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/xslt/xsltStarter1.xsl.
Save this file locally on your computer.XSLT overview in <oXygen/>XSLT (eXtensible Stylesheet Language Transformations) is a
programming language expressed as an XML document, where
programming instructions are represented by elements in the XSLT
namespaceXSLT is a declarative programming language. Not written to be
executed in a line-by-line order. Template elements (or template
rules) do the work, but can be written in any order.Basic structure: <xsl:stylesheet> is the root element,
with <xsl:template> children that do the
processing.Housekeeping: up to three namespacesNamespace for XSLT elements: xsl:: distinguishes the XSLT
elementsNamespace for input: if the input is in a namespace, set the
@xpath-default-namespace attribute on the
<xsl:stylesheet>. For example:
<xsl:stylesheet
xpath-default-namespace="http://www.tei-c.org/ns/1.0"> says
that input will be in the TEI namespace unless specified otherwise.Namespace for output: set the default namespace using the
@xmlns attribute. For example, <xsl:stylesheet
xpath-default-namespace="http://www.tei-c.org/ns/1.0"
xmlns="http://www.w3.org/1999/xhtml"> means that input is in
the TEI namespace and output will be HTML (that is, in the HTML
namespace)Housekeeping: <xsl:output>Configure the @method, @html-version,
@omit-xml-declaration, @include-content-type,
and @indent attributes on the <xsl:output>
element: <xsl:output method="xhtml" html-version="5"
omit-xml-declaration="no" include-content-type="no"
indent="yes"/>XSLT and templates, part 1Templates match patterns: <xsl:template match="???">:
The @match attribute is an XPath pattern that
specifies what the template processes. XPath patterns are not the same as
XPath expressions because they don’t navigate or find; they just match. For
example <xsl:template match="p"> will match and process
all <p> elements. It is a mistake to write
<xsl:template match="//p"> because
@match values don’t have to find <p>
elements; they just have to … well … match them.In each example below, look at the @match value: What should
the XPath pattern be matching in the source XML document? And how is this
XPath different from the way we write XPath expressions (which have to find,
and not just match, elements) in the XPath Toolbar? <xsl:template match="div/head>"Matches any <head> child of any
<div> at any level of the XML hierarchy.
In the XPath Toolbar, we have to start the expression with two
leading forward slashes (//div/head) to indicate we
are looking down the tree from the document
node.<xsl:template match="div[count(descendant::p) gt
1]>"Matches any <div> element that contains
more than one <p> descendants. In the XPath
toolbar, we must add // to the beginning.Inside a template rule, an <xsl:apply-templates/>
elements specifies what to process at that location.An <xsl:apply-templates/> element with no
@select attribute means process all my child nodes
here. What if you want to process only some children, or some
non-children?<xsl:apply-templates/> with a @select
attribute specifies what to process. The value of @select is an
XPath expression (not the shorter XPath pattern)
because it has to find the things to process. The path starts from the
current context, that is, from the single item you are processing at the
moment. Examples: <xsl:template
match="body/div"><xsl:apply-templates/></xsl:template><xsl:template
match="body/div"><xsl:apply-templates
select="div[1]"/></xsl:template>Complete basic Ozymandias transformation:Input:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/ozymandias.xml)Required HTML elements: <html>,
<head>, <title>,
<body>, <h1>,
<h2>, <p>,
<cite> (for publication venue),
<div> (for poem), <br/>
(NB: empty element, after all lines except the last)BreakIdentity transformation for making changes to an XML fileWhy perform an identity transformation?How to perform an identity transformation: <xsl:mode
on-no-match="shallow-copy"/>Change the structure and add line numbers to the Ozymandias XML
file Open the url of our simple identity transformation starter:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/xslt/ID-TransformSimple-Starter.xslUse attribute value templates to add numbers to
the new <code> elementsOptional activity: Combining a collection of files into a single XML file See
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/xslt/coll-IDTransform.xsl.Optional exercise: Repair our Pacific Voyage file: Open this file URL in <oXygen/>:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/xslt/ID-TransformTEI-Starter.xslDevelop this XSLT file following this exercise:
http://dh.newtfire.org/XSLTExercise1.htmlComparing XSLT and XQueryInvoking namespacesSequential processingPull vs. push processingPreparing XSLT to output HTML from TEI XMLThe output we want:
https://ebeshero.github.io/UpTransformation/dickinson16.htmlOpen this file URL in <oXygen>:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/xslt/TEI-HTML-Starter.xsl<xsl:stylesheet> and
<xsl:output>Template matching on the document node to output HTML Structure of an HTML document: <head> and
<body>XSLT ActivityTEI XML to HTML transformationOpen the url of the Emily Dickinson Fascicle 16 file:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/dickinsonColl.xml,
and study the document.Open this starter XSLT file url in <oXygen/>:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/xslt/TEI-HTML-Starter.xslPush processing: <xsl:apply-templates>Pruning the tree: when to use the @select attributeWhen to use <xsl:value-of>BreakXSLT activity: Making a linked table of contentsContinue working with the XSLT we are writing on the Emily Dickinson
file.Modal XSLT: Processing the same nodes in multipe ways The output we want:
https://ebeshero.github.io/UpTransformation/dickinson16-with-toc.htmlModal XSLT to create the table of contents:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/xslt/DickinsonModal-TransformHTML.xslHow internal links workXPath and SchematronUsing Schematron to constrain your markupSchematron overviewSchematron is constraint based; Relax NG, XML Schema, DTD are
grammar basedSample constraint-based tasks involve multiple elementsAre start pages (<start>) no larger than end
pages (<end>)?Are birth dates no later than death dates?Does a list (e.g., of students in a course) contain
duplicates?Do pointers to persons really point to persons (and not
places)?Schematron structure: <pattern> →
<rule> → <assert> or
<report>Looking at SchematronDocument analysis of our XML:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/pages.xml<start> shouldn’t be greater than
<end><issue> is optional, but we could omit it by
mistake<initial> should usually be one
letterApostrophes and quotation marks should usually be curly (“, ”,
‘, ’), not straight (', ")What Relax NG can constrain:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/pages.rnc<volume>, <issue>,
<year>, <start>, and
<end> must be positive integers<year> must be exactly four digits<issue> is optionalNo empty elementsSchematron to the rescue:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/pages.schAnatomy of a schematron ruleValidating start and end pagesValidating apostrophes and quotation marks (text, not
markup)Associating Schematron with XMLSchematron error reportingSchematron has the best error messagesEnhance Schematron reporting with <sch:value-of>:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/pages_value-of.schEnhance Schematron maintenance with <sch:let>:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/pages_variables.schGenerate warnings as well as errors with @role:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/pages_warnings.schXPath functions practice: Leipzig glossing rules, part 1Document analysis:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/leipzig.xmlTarget output:
http://htmlpreview.github.io/?https://github.com/ebeshero/UpTransformation/blob/master/data/leipzig.htmlValidation challenge: the spaces and hyphens need to be alignedBest practiceTest the XPath separately firstDevelop and test incrementallySchematron validationHousekeeping: create the Schematron skeleton in
<oXygen/>, save it, link it to XMLTwo ways of counting spaces and hyphenstranslate()string-length('one two
three') - string-length(translate('one two three', '
', ''))tokenize()count(tokenize('one two three', ' '))
or tokenize(('Curly Larry Moe') ,'\s+') =>
count()BreakXPath functions practice: Leipzig glossing rules, part 2Comparing three things Three-way test not available in XPath$a eq $b eq $c$a lt $b lt $cWhat is available Composite expression: $a eq $b and $b eq
$cCompare to average value: ($a, $b, $c) !=
avg(($a, $b, $c))Count distinct valuescount(distinct-values(($a, $b, $c))) eq
1distinct-values(($a, $b, $c)) => count()
eq 1Whitespace normalizationRequire it in the XML with Relax NG xsd:string {
pattern = "(\S+ )*\S+" }Require it in the XML with Schematron test='. eq
normalize-space(.)'Manage it with Schematron inside tier-comparison test
normalize-space(.) instead of just
.SolutionsSimple
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/leipzig-basic.schEnhanced
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/leipzig.schFinding which word has a hyphen misalignment
http://htmlpreview.github.io/?https://github.com/ebeshero/UpTransformation/blob/master/data/leipzig-enhanced.htmlThe Three Stooges go to Schematron Summer CampThe Edge Case Saloon“QA Engineer walks into a bar. Orders a beer. Orders 0 beers.
Orders 999999999 beers. Orders a lizard. Orders -1 beers. Orders a
sfdeljknesv.”More edge cases at
https://www.sempf.net/post/On-Testing1Best Stooge Ever contest results:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/stooges.xmlHands on validation tasksAll stooges must have percentages (no empty
<stooge> elements)Percentages total 100Individual votes range from 0 through 100, inclusiveThere are exactly three stooges!No duplicate stooges!Solution (no peeking!)
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/stooges.schOne more way of counting spaces and hyphensExplode the stringstring-to-codepoints(), codepoints-to-string()for $c in string-to-codepoints('one two three') return
codepoints-to-string($c)Find the index values of the spacesindex-of()index-of(('a', 'b', 'c', 'b', 'a'), 'a')Count themcount(index-of(for $c in string-to-codepoints('one two
three') return codepoints-to-string($c), ' '))Make it legiblestring-to-codepoints('one two three') !
codepoints-to-string(.) => index-of(' ') =>
count()Schematron and external filesID/IDREF validationFilesInstance:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/letter_id-idref.xmlRelax NG:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/letter_id-idref.rncTransformed:
http://htmlpreview.github.io/?https://github.com/ebeshero/UpTransformation/blob/master/data/letter_id-idref.htmlDetailsDatatypes xsd:ID, xsd:IDREF,
xsd:IDREFSValue must be unique within the documentLexical space: NCName (begin with letter or
underscore, may contain letters, digits, underscores, hyphens,
periods) (simplified)@xml:id is not of type xsd:ID unless
your schema says it isYou don’t have to call it @xml:id, but you
shouldValidates by exact string matchingLimitationsValidates only within the same file (but XInclude can
help)No subcategory support (e.g., you can’t require person IDREF to
match only person ID)Cannot require mixed content to be non-emptyDesiderataValidation against external (remote) filesSubcategory supportRequire (selected) mixed content to be non-emptyGeneral comparison and value comparisonValue comparisonOperators: eq, ne, lt,
gt, le, geCompares one thing to one thingExample: count(distinct-values(('Curly', 'Larry',
'Moe'))) eq 1General comparisonOperators: =, !=, <,
>, <=, >= (angle
brackets may have to be spelled <,
>)Compares sequences of any lengthExample:'Curly' = ('Curly', 'Larry',
'Moe')What does 'Curly' != ('Curly', 'Larry',
'Moe') return? What should we have written
instead? not('Curly' = ('Curly', 'Larry',
'Moe'))substring(@ref, 2) =
$ancillary//person/@xml:idSchematron validationInstance:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/letter_schematron.xmlRelax NG:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/letter_schematron.rncSchematron:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/letter_schematron.schExternal reference file:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/letter_schematron_ancillary.xmlExploring Digital MitfordProject site: https://digitalmitford.orgSite indexWorkshop repo on GitHub:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/si.xmlMitford project site:
https://digitalmitford.org/si.xmlOutline:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/si-outline.xmlBreakHamilton 1823-04-09 letterLetter XML:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/1823-04-09-Hamilton.xmlRead on line:
https://digitalmitford.org/getLetterText.php?uri=1823-04-09-Hamilton.xmlSchematron starter:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/mitford.schTasks Save local copy of SchematronAssociate letter with local copyTest validation of <editor>
elementAdd and test rules for other element typesWebb 1819-05-16 letterLetter XML:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/1819-05-16_MWebb.xmlRead on line:
https://digitalmitford.org/getLetterText.php?uri=1819-05-16_MWebb.xmlSchematron starter:
https://raw.githubusercontent.com/ebeshero/UpTransformation/master/data/mitford-back.schNew items for the site index are in the <back>Some @ref values in the back have also already
been added to the site index; report pointers to them as
errorsSome @ref values in the back still have to be
added to the site index; report them as infoIf an element that should have a @ref doesn’t,
report an errorTaking stockPutting it all to workHands on activity with participant data TBAWatch this space!Retrospective