{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": null, "outputs": [], "source": [ "#r \"nuget: FSharp.Data,6.4.0\"\n", "\n", "Formatter.SetPreferredMimeTypesFor(typeof\u003cobj\u003e, \"text/plain\")\n", "Formatter.Register(fun (x: obj) (writer: TextWriter) -\u003e fprintfn writer \"%120A\" x)\n", "#endif\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "[![Binder](../img/badge-binder.svg)](https://mybinder.org/v2/gh/fsprojects.github.io/FSharp.Data/main?filepath=library/XmlProvider.ipynb)\u0026emsp;\n", "[![Script](../img/badge-script.svg)](https://fsprojects.github.io/FSharp.Data//library/XmlProvider.fsx)\u0026emsp;\n", "[![Notebook](../img/badge-notebook.svg)](https://fsprojects.github.io/FSharp.Data//library/XmlProvider.ipynb)\n", "\n", "# XML Type Provider\n", "\n", "This article demonstrates how to use the XML Type Provider to access XML documents\n", "in a statically typed way. We first look at how the structure is inferred and then\n", "demonstrate the provider by parsing an RSS feed.\n", "\n", "The XML Type Provider provides statically typed access to XML documents.\n", "It takes a sample document as an input (or document containing a root XML node with\n", "multiple child nodes that are used as samples). The generated type can then be used\n", "to read files with the same structure\n", "\n", "If the loaded file does not match the structure of the sample, a runtime error may occur\n", "(but only when explicitly accessing an element incompatible with the original sample — e.g. if it is no longer present)\n", "\n", "Starting from version 3.0.0 there is also the option of using a schema (XSD) instead of\n", "relying on samples.\n", "\n", "## Introducing the provider\n", "\n", "The type provider is located in the `FSharp.Data.dll` assembly. Assuming the assembly\n", "is located in the `../../bin` directory, we can load it in F# Interactive as follows:\n", "(note we also need a reference to `System.Xml.Linq`, because the provider uses the\n", "`XDocument` type internally):\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 2, "outputs": [], "source": [ "#r \"System.Xml.Linq.dll\"\n", "\n", "open FSharp.Data\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "### Inferring type from sample\n", "\n", "The `XmlProvider\u003c...\u003e` takes one static parameter of type `string`. The parameter can\n", "be **either** a sample XML string **or** a sample file (relative to the current folder or online\n", "accessible via `http` or `https`). It is not likely that this could lead to ambiguities.\n", "\n", "The following sample generates a type that can read simple XML documents with a root node\n", "containing two attributes:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 3, "outputs": [], "source": [ "type Author = XmlProvider\u003c\"\"\"\u003cauthor name=\"Paul Feyerabend\" born=\"1924\" /\u003e\"\"\"\u003e\n", "let sample = Author.Parse(\"\"\"\u003cauthor name=\"Karl Popper\" born=\"1902\" /\u003e\"\"\")\n", "\n", "printfn \"%s (%d)\" sample.Name sample.Born\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "The type provider generates a type `Author` that has properties corresponding to the\n", "attributes of the root element of the XML document. The types of the properties are\n", "inferred based on the values in the sample document. In this case, the `Name` property\n", "has a type `string` and `Born` is `int`.\n", "\n", "XML is a quite flexible format, so we could represent the same document differently.\n", "Instead of using attributes, we could use nested nodes (`\u003cname\u003e` and `\u003cborn\u003e` nested\n", "under `\u003cauthor\u003e`) that directly contain the values:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 4, "outputs": [ { "data": { "text/plain": ["Paul Feyerabend (1924)", "", "type AuthorAlt = XmlProvider\u003c...\u003e", "", "val doc: string =", "", " \"\u003cauthor\u003e\u003cname\u003ePaul Feyerabend\u003c/name\u003e\u003cborn\u003e1924\u003c/born\u003e\u003c/author\u003e\"", "", "val sampleAlt: XmlProvider\u003c...\u003e.Author =", "", " \u003cauthor\u003e", "", " \u003cname\u003ePaul Feyerabend\u003c/name\u003e", "", " \u003cborn\u003e1924\u003c/born\u003e", "", "\u003c/author\u003e", "", "val it: unit = ()"] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" }], "source": [ "type AuthorAlt = XmlProvider\u003c\"\u003cauthor\u003e\u003cname\u003eKarl Popper\u003c/name\u003e\u003cborn\u003e1902\u003c/born\u003e\u003c/author\u003e\"\u003e\n", "let doc = \"\u003cauthor\u003e\u003cname\u003ePaul Feyerabend\u003c/name\u003e\u003cborn\u003e1924\u003c/born\u003e\u003c/author\u003e\"\n", "let sampleAlt = AuthorAlt.Parse(doc)\n", "\n", "printfn \"%s (%d)\" sampleAlt.Name sampleAlt.Born\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "The generated type provides exactly the same API for reading documents following this\n", "convention (Note that you cannot use `AuthorAlt` to parse samples that use the\n", "first style - the implementation of the types differs, they just provide the same public API.)\n", "\n", "The provider turns a node into a simply typed property only when the node contains just\n", "a primitive value and has no children or attributes.\n", "\n", "### Types for more complex structure\n", "\n", "Now let\u0027s look at a number of examples that have more interesting structure. First of\n", "all, what if a node contains some value, but also has some attributes?\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 5, "outputs": [ { "data": { "text/plain": ["Thomas Kuhn (full=false)", "", "type Detailed = XmlProvider\u003c...\u003e", "", "val info: XmlProvider\u003c...\u003e.Author =", "", " \u003cauthor\u003e", "", " \u003cname full=\"false\"\u003eThomas Kuhn\u003c/name\u003e", "", "\u003c/author\u003e", "", "val it: unit = ()"] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" }], "source": [ "type Detailed = XmlProvider\u003c\"\"\"\u003cauthor\u003e\u003cname full=\"true\"\u003eKarl Popper\u003c/name\u003e\u003c/author\u003e\"\"\"\u003e\n", "\n", "let info =\n", " Detailed.Parse(\"\"\"\u003cauthor\u003e\u003cname full=\"false\"\u003eThomas Kuhn\u003c/name\u003e\u003c/author\u003e\"\"\")\n", "\n", "printfn \"%s (full=%b)\" info.Name.Value info.Name.Full\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "If the node cannot be represented as a simple type (like `string`) then the provider\n", "builds a new type with multiple properties. Here, it generates a property `Full`\n", "(based on the name of the attribute) and infers its type to be boolean. Then it\n", "adds a property with a (special) name `Value` that returns the content of the element.\n", "\n", "### Types for multiple simple elements\n", "\n", "Another interesting case is when there are multiple nodes that contain just a\n", "primitive value. The following example shows what happens when the root node\n", "contains multiple `\u003cvalue\u003e` nodes (note that if we leave out the parameter to the\n", "`Parse` method, the same text used for the schema will be used as the runtime value).\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 6, "outputs": [], "source": [ "type Test = XmlProvider\u003c\"\u003croot\u003e\u003cvalue\u003e1\u003c/value\u003e\u003cvalue\u003e3\u003c/value\u003e\u003c/root\u003e\"\u003e\n", "\n", "for v in Test.GetSample().Values do\n", " printfn \"%d\" v\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "The type provider generates a property `Values` that returns an array with the\n", "values - as the `\u003cvalue\u003e` nodes do not contain any attributes or children, they\n", "are turned into `int` values and so the `Values` property returns just `int[]`!\n", "\n", "## Type inference hints / inline schemas\n", "\n", "Starting with version 4.2.10 of this package, it\u0027s possible to enable basic type annotations\n", "directly in the sample used by the provider, to complete or to override type inference.\n", "(Only basic types are supported. See the reference documentation of the provider for the full list)\n", "\n", "This feature is disabled by default and has to be explicitly enabled with the `InferenceMode`\n", "static parameter.\n", "\n", "Let\u0027s consider an example where this can be useful:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 7, "outputs": [ { "data": { "text/plain": ["type AmbiguousEntity = XmlProvider\u003c...\u003e", "", "val code: float = 123.0", "", "val length: decimal = 42M"] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }], "source": [ "type AmbiguousEntity =\n", " XmlProvider\u003cSample=\"\"\"\n", " \u003cEntity Code=\"000\" Length=\"0\"/\u003e\n", " \u003cEntity Code=\"123\" Length=\"42\"/\u003e\n", " \u003cEntity Code=\"4E5\" Length=\"1.83\"/\u003e\n", " \"\"\", SampleIsList=true\u003e\n", "\n", "let code = (AmbiguousEntity.GetSamples()[1]).Code\n", "let length = (AmbiguousEntity.GetSamples()[1]).Length\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "In the previous example, `Code` is inferred as a `float`,\n", "even though it looks more like it should be a `string`.\n", "(`4E5` is interpreted as an exponential float notation instead of a string)\n", "\n", "Now let\u0027s enable inline schemas:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 8, "outputs": [ { "data": { "text/plain": ["type AmbiguousEntity2 = XmlProvider\u003c...\u003e", "", "val code2: string = \"123\"", "", "val length2: float\u003cUnitSystems.SI.UnitNames.metre\u003e = 42.0"] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }], "source": [ "open FSharp.Data.Runtime.StructuralInference\n", "\n", "type AmbiguousEntity2 =\n", " XmlProvider\u003cSample=\"\"\"\n", " \u003cEntity Code=\"typeof{string}\" Length=\"typeof{float{metre}}\"/\u003e\n", " \u003cEntity Code=\"123\" Length=\"42\"/\u003e\n", " \u003cEntity Code=\"4E5\" Length=\"1.83\"/\u003e\n", " \"\"\", SampleIsList=true, InferenceMode=InferenceMode.ValuesAndInlineSchemasOverrides\u003e\n", "\n", "let code2 = (AmbiguousEntity2.GetSamples()[1]).Code\n", "let length2 = (AmbiguousEntity2.GetSamples()[1]).Length\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "With the `ValuesAndInlineSchemasOverrides` inference mode, the `typeof{string}` inline schema\n", "takes priority over the type inferred from other values.\n", "`Code` is now a `string`, as we wanted it to be!\n", "\n", "Note that an alternative to obtain the same result would have been to replace all the `Code` values\n", "in the samples with unambiguous string values. (But this can be very cumbersome, especially with big samples)\n", "\n", "If we had used the `ValuesAndInlineSchemasHints` inference mode instead, our inline schema\n", "would have had the same precedence as the types inferred from other values, and `Code`\n", "would have been inferred as a choice between either a number or a string,\n", "exactly as if we had added another sample with an unambiguous string value for `Code`.\n", "\n", "### Units of measure\n", "\n", "Inline schemas also enable support for units of measure.\n", "\n", "In the previous example, the `Length` property is now inferred as a `float`\n", "with the `metre` unit of measure (from the default SI units).\n", "\n", "Warning: units of measures are discarded when merged with types without a unit or with a different unit.\n", "As mentioned previously, with the `ValuesAndInlineSchemasHints` inference mode,\n", "inline schemas types are merged with other inferred types with the same precedence.\n", "Since values-inferred types never have units, inline-schemas-inferred types will lose their\n", "unit if the sample contains other values...\n", "\n", "## Processing philosophers\n", "\n", "In this section we look at an example that demonstrates how the type provider works\n", "on a simple document that lists authors that write about a specific topic. The\n", "sample document [`data/Writers.xml`](../data/Writers.xml) looks as follows:\n", "\n", " [lang=xml]\n", " \u003cauthors topic=\"Philosophy of Science\"\u003e\n", " \u003cauthor name=\"Paul Feyerabend\" born=\"1924\" /\u003e\n", " \u003cauthor name=\"Thomas Kuhn\" /\u003e\n", " \u003c/authors\u003e\n", "\n", "At runtime, we use the generated type provider to parse the following string\n", "(which has the same structure as the sample document with the exception that\n", "one of the `author` nodes also contains a `died` attribute):\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 9, "outputs": [], "source": [ "let authors =\n", " \"\"\"\n", " \u003cauthors topic=\"Philosophy of Mathematics\"\u003e\n", " \u003cauthor name=\"Bertrand Russell\" /\u003e\n", " \u003cauthor name=\"Ludwig Wittgenstein\" born=\"1889\" /\u003e\n", " \u003cauthor name=\"Alfred North Whitehead\" died=\"1947\" /\u003e\n", " \u003c/authors\u003e \"\"\"\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "When initializing the `XmlProvider`, we can pass it a file name or a web URL.\n", "The `Load` and `AsyncLoad` methods allows reading the data from a file or from a web resource. The\n", "`Parse` method takes the data as a string, so we can now print the information as follows:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 10, "outputs": [ { "data": { "text/plain": ["Philosophy of Mathematics", "", " - Bertrand Russell", "", " - Ludwig Wittgenstein (1889)", "", " - Alfred North Whitehead", "", "[\u003cLiteral\u003e]", "", "val ResolutionFolder: string = \"D:\\a\\FSharp.Data\\FSharp.Data\\docs\\library\"", "", "type Authors = XmlProvider\u003c...\u003e", "", "val topic: XmlProvider\u003c...\u003e.Authors =", "", " \u003cauthors topic=\"Philosophy of Mathematics\"\u003e", "", " \u003cauthor name=\"Bertrand Russell\" /\u003e", "", " \u003cauthor name=\"Ludwig Wittgenstein\" born=\"1889\" /\u003e", "", " \u003cauthor name=\"Alfred North Whitehead\" died=\"1947\" /\u003e", "", " \u003c/authors\u003e", "", "val it: unit = ()"] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }], "source": [ "[\u003cLiteral\u003e]\n", "let ResolutionFolder = __SOURCE_DIRECTORY__\n", "\n", "type Authors = XmlProvider\u003c\"../data/Writers.xml\", ResolutionFolder=ResolutionFolder\u003e\n", "let topic = Authors.Parse(authors)\n", "\n", "printfn \"%s\" topic.Topic\n", "\n", "for author in topic.Authors do\n", " printf \" - %s\" author.Name\n", " author.Born |\u003e Option.iter (printf \" (%d)\")\n", " printfn \"\"\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "The value `topic` has a property `Topic` (of type `string`) which returns the value\n", "of the attribute with the same name. It also has a property `Authors` that returns\n", "an array with all the authors. The `Born` property is missing for some authors,\n", "so it becomes `option\u003cint\u003e` and we need to print it using `Option.iter`.\n", "\n", "The `died` attribute was not present in the sample used for the inference, so we\n", "cannot obtain it in a statically typed way (although it can still be obtained\n", "dynamically using `author.XElement.Attribute(XName.Get(\"died\"))`).\n", "\n", "## Global inference mode\n", "\n", "In the examples shown earlier, an element was never (recursively) contained in an\n", "element of the same name (for example `\u003cauthor\u003e` never contained another `\u003cauthor\u003e`).\n", "However, when we work with documents such as XHTML files, this can often be the case.\n", "Consider for example, the following sample (a simplified version of\n", "[`data/HtmlBody.xml`](../data/HtmlBody.xml)):\n", "\n", " [lang=xml]\n", " \u003cdiv id=\"root\"\u003e\n", " \u003cspan\u003eMain text\u003c/span\u003e\n", " \u003cdiv id=\"first\"\u003e\n", " \u003cdiv\u003eSecond text\u003c/div\u003e\n", " \u003c/div\u003e\n", " \u003c/div\u003e\n", "\n", "Here, a `\u003cdiv\u003e` element can contain other `\u003cdiv\u003e` elements and it is quite clear that\n", "they should all have the same type - we want to be able to write a recursive function\n", "that processes `\u003cdiv\u003e` elements. To make this possible, you need to set an optional\n", "parameter `Global` to `true`:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 11, "outputs": [], "source": [ "type Html = XmlProvider\u003c\"../data/HtmlBody.xml\", Global=true, ResolutionFolder=ResolutionFolder\u003e\n", "let html = Html.GetSample()\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "When the `Global` parameter is `true`, the type provider **unifies** all elements of the\n", "same name. This means that all `\u003cdiv\u003e` elements have the same type (with a union\n", "of all attributes and all possible children nodes that appear in the sample document).\n", "\n", "The type is located under a type `Html`, so we can write a `printDiv` function\n", "that takes `Html.Div` and acts as follows:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 12, "outputs": [ { "data": { "text/plain": ["Main text", "", "First text", "", "Another text", "", "Second text", "", "val printDiv: div: XmlProvider\u003c...\u003e.Div -\u003e unit", "", "val it: unit = ()"] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }], "source": [ "/// Prints the content of a \u003cdiv\u003e element\n", "let rec printDiv (div: Html.Div) =\n", " div.Spans |\u003e Seq.iter (printfn \"%s\")\n", " div.Divs |\u003e Seq.iter printDiv\n", "\n", " if div.Spans.Length = 0 \u0026\u0026 div.Divs.Length = 0 then\n", " div.Value |\u003e Option.iter (printfn \"%s\")\n", "\n", "// Print the root \u003cdiv\u003e element with all children\n", "printDiv html\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "The function first prints all text included as `\u003cspan\u003e` (the element never has any\n", "attributes in our sample, so it is inferred as `string`), then it recursively prints\n", "the content of all `\u003cdiv\u003e` elements. If the element does not contain nested elements,\n", "then we print the `Value` (inner text).\n", "\n", "## Loading Directly from a File or URL\n", "\n", "In many cases we might want to define schema using a local sample file, but then directly\n", "load the data from disk or from a URL either synchronously (with `Load`) or asynchronously\n", "(with `AsyncLoad`).\n", "\n", "For this example I am using the US Census data set from `https://api.census.gov/data.xml`, a sample of\n", "which I have used here for `../data/Census.xml`. This sample is greatly reduced from the live data, so\n", "that it contains only the elements and attributes relevant to us:\n", "\n", " [lang=xml]\n", " \u003ccensus-api\n", " xmlns=\"http://thedataweb.rm.census.gov/api/discovery/\"\n", " xmlns:dcat=\"http://www.w3.org/ns/dcat#\"\n", " xmlns:dct=\"http://purl.org/dc/terms/\"\u003e\n", " \u003cdct:dataset\u003e\n", " \u003cdct:title\u003e2006-2010 American Community Survey 5-Year Estimates\u003c/dct:title\u003e\n", " \u003cdcat:distribution\n", " dcat:accessURL=\"https://api.census.gov/data/2010/acs5\"\u003e\n", " \u003c/dcat:distribution\u003e\n", " \u003c/dct:dataset\u003e\n", " \u003cdct:dataset\u003e\n", " \u003cdct:title\u003e2006-2010 American Community Survey 5-Year Estimates\u003c/dct:title\u003e\n", " \u003cdcat:distribution\n", " dcat:accessURL=\"https://api.census.gov/data/2010/acs5\"\u003e\n", " \u003c/dcat:distribution\u003e\n", " \u003c/dct:dataset\u003e\n", " \u003c/census-api\u003e\n", "\n", "When doing this for your scenario, be careful to ensure that enough data is given for the provider\n", "to infer the schema correctly. For example, the first level `\u003cdct:dataset\u003e` element must be included at\n", "least twice for the provider to infer the `Datasets` array rather than a single `Dataset` object.\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 13, "outputs": [ { "data": { "text/plain": ["type Census = XmlProvider\u003c...\u003e", "", "val data: XmlProvider\u003c...\u003e.CensusApi =", "", " \u003ccensus-api xmlns=\"http://thedataweb.rm.census.gov/api/discovery/\" xmlns:dcat=\"http://www.w3.org/ns/dcat#\" xmlns:dct=\"http://purl.org/dc/terms/\" xmlns:pod=\"https://project-open-data.cio.gov/v1.1/schema/\" xmlns:foaf=\"http://xmlns.com/foaf/0.1/\" xmlns:org=\"http://www.w3.org/ns/org#\" xmlns:vcard=\"http://www.w3.org/2006/vcard/ns#\"\u003e", "", " \u003cdct:dataset vintage=\"1994\" geographyLink=\"http://api.census.gov/data/1994/cps/basic/jun/geography.xml\" variablesLink=\"http://api.census.gov/data/1994/cps/basic/jun/variables...", "", "val apiLinks: (string * string) array =", "", " [|(\"Jun 1994 Current Population Survey: Basic Monthly\",", "", " \"http://api.census.gov/data/1994/cps/basic/jun\");", "", " (\"1986 County Business Patterns: Business Patterns\",", "", " \"http://api.census.gov/data/1986/cbp\");", "", " (\"1994 County Business Patterns - Zip Code Business Patterns: T\"+[17 chars],", "", " \"http://api.census.gov/data/1994/zbp\");", "", " (\"1987 County Business Patterns: Business Patterns\",", "", " \"http://api.census.gov/data/1987/cbp\");", "", " (\"1995 County Business Patterns: Business Patterns\",", "", " \"http://api.census.gov/data/1995/cbp\");", "", " (\"1988 County Business Patterns: Business Patterns\",", "", " \"http://api.census.gov/data/1988/cbp\");", "", " (\"1989 County Business Patterns: Business Patterns\",", "", " \"http://api.census.gov/data/1989/cbp\");", "", " (\"1995 County Business Patterns - Zip Code Business Patterns: T\"+[17 chars],", "", " \"http://api.census.gov/data/1995/zbp\");", "", " (\"Mar 1994 Current Population Survey: Basic Monthly\",", "", " \"http://api.census.gov/data/1994/cps/basic/mar\");", "", " (\"1990 County Business Patterns: Business Patterns\",", "", " \"http://api.census.gov/data/1990/cbp\")|]"] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }], "source": [ "type Census = XmlProvider\u003c\"../data/Census.xml\", ResolutionFolder=ResolutionFolder\u003e\n", "\n", "let data = Census.Load(\"https://api.census.gov/data.xml\")\n", "\n", "let apiLinks =\n", " data.Datasets\n", " |\u003e Array.map (fun ds -\u003e ds.Title, ds.Distribution.AccessUrl)\n", " |\u003e Array.truncate 10\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "This US Census data is an interesting dataset with this top level API returning hundreds of other\n", "datasets each with their own API. Here we use the Census data to get a list of titles and URLs for\n", "the lower level APIs.\n", "\n", "## Bringing in Some Async Action\n", "\n", "Let\u0027s go one step further and assume here a slightly contrived but certainly plausible example where\n", "we cache the Census URLs and refresh once in a while. Perhaps we want to load this in the background\n", "and then post each link over (for example) a message queue.\n", "\n", "This is where `AsyncLoad` comes into play:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 14, "outputs": [ { "data": { "text/plain": ["val enqueue: title: string * apiUrl: string -\u003e unit", "", "val cacheJanitor: unit -\u003e Async\u003cunit\u003e"] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" }], "source": [ "let enqueue (title, apiUrl) =\n", " // do the real message enqueueing here instead of\n", " printfn \"%s -\u003e %s\" title apiUrl\n", "\n", "// helper task which gets scheduled on some background thread somewhere...\n", "let cacheJanitor () =\n", " async {\n", " let! reloadData = Census.AsyncLoad(\"https://api.census.gov/data.xml\")\n", "\n", " reloadData.Datasets\n", " |\u003e Array.map (fun ds -\u003e ds.Title, ds.Distribution.AccessUrl)\n", " |\u003e Array.iter enqueue\n", " }\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading RSS feeds\n", "\n", "To conclude this introduction with a more interesting example, let\u0027s look how to parse an\n", "RSS feed. As discussed earlier, we can use relative paths or web addresses when calling\n", "the type provider:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 15, "outputs": [], "source": [ "type Rss = XmlProvider\u003c\"https://tomasp.net/rss.xml\"\u003e\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "This code builds a type `Rss` that represents RSS feeds (with the features that are used\n", "on `https://tomasp.net`). The type `Rss` provides static methods `Parse`, `Load` and `AsyncLoad`\n", "to construct it - here, we just want to reuse the same URI of the schema, so we\n", "use the `GetSample` static method:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 16, "outputs": [], "source": [ "let blog = Rss.GetSample()\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "Printing the title of the RSS feed together with a list of recent posts is now quite\n", "easy - you can simply type `blog` followed by `.` and see what the autocompletion\n", "offers. The code looks like this:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 17, "outputs": [ { "data": { "text/plain": ["Tomas Petricek - Languages and tools, open-source, philosophy of science and F# coding", "", " - What can routers at Centre Pompidou teach us about software evolution? (http://tomasp.net/blog/2023/pompidou/)", "", " - Where programs live? Vague spaces and software systems (http://tomasp.net/blog/2023/vague-spaces/)", "", " - The Timeless Way of Programming (http://tomasp.net/blog/2022/timeless-way/)", "", " - No-code, no thought? Substrates for simple programming for all (http://tomasp.net/blog/2022/no-code-substrates/)", "", " - Pop-up from Hell: On the growing opacity of web programs (http://tomasp.net/blog/2021/popup-from-hell/)", "", " - Software designers, not engineers: An interview from alternative universe (http://tomasp.net/blog/2021/software-designers/)", "", " - Is deep learning a new kind of programming? Operationalistic look at programming (http://tomasp.net/blog/2020/learning-and-programming/)", "", " - Creating interactive You Draw bar chart with Compost (http://tomasp.net/blog/2020/youdraw-compost-visualization/)", "", " - Data exploration calculus: Capturing the essence of exploratory data scripting (http://tomasp.net/blog/2020/data-exploration-calculus/)", "", " - On architecture, urban planning and software construction (http://tomasp.net/blog/2020/cities-and-programming/)", "", " - What to teach as the first programming language and why (http://tomasp.net/blog/2019/first-language/)", "", " - What should a Software Engineering course look like? (http://tomasp.net/blog/2019/software-engineering/)", "", " - Write your own Excel in 100 lines of F# (http://tomasp.net/blog/2018/write-your-own-excel/)", "", " - Programming as interaction: A new perspective for programming language research (http://tomasp.net/blog/2018/programming-interaction/)", "", " - Would aliens understand lambda calculus? (http://tomasp.net/blog/2018/alien-lambda-calculus/)", "", " - The design side of programming language design (http://tomasp.net/blog/2017/design-side-of-pl/)", "", " - Getting started with The Gamma just got easier (http://tomasp.net/blog/2017/thegamma-getting-started/)", "", " - Papers we Scrutinize: How to critically read papers (http://tomasp.net/blog/2017/papers-we-scrutinize/)", "", " - The mythology of programming language ideas (http://tomasp.net/blog/2017/programming-mythology/)", "", " - Towards open and transparent data-driven storytelling: Notes from my Alan Turing Institute talk (http://tomasp.net/blog/2017/thegamma-talk/)", "", "val it: unit = ()"] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }], "source": [ "// Title is a property returning string\n", "printfn \"%s\" blog.Channel.Title\n", "\n", "// Get all item nodes and print title with link\n", "for item in blog.Channel.Items do\n", " printfn \" - %s (%s)\" item.Title item.Link\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "## Transforming XML\n", "\n", "In this example we will now also create XML in addition to consuming it.\n", "Consider the problem of flattening a data set. Let\u0027s say you have xml data that looks like this:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 18, "outputs": [], "source": [ "[\u003cLiteral\u003e]\n", "let customersXmlSample =\n", " \"\"\"\n", " \u003cCustomers\u003e\n", " \u003cCustomer name=\"ACME\"\u003e\n", " \u003cOrder Number=\"A012345\"\u003e\n", " \u003cOrderLine Item=\"widget\" Quantity=\"1\"/\u003e\n", " \u003c/Order\u003e\n", " \u003cOrder Number=\"A012346\"\u003e\n", " \u003cOrderLine Item=\"trinket\" Quantity=\"2\"/\u003e\n", " \u003c/Order\u003e\n", " \u003c/Customer\u003e\n", " \u003cCustomer name=\"Southwind\"\u003e\n", " \u003cOrder Number=\"A012347\"\u003e\n", " \u003cOrderLine Item=\"skyhook\" Quantity=\"3\"/\u003e\n", " \u003cOrderLine Item=\"gizmo\" Quantity=\"4\"/\u003e\n", " \u003c/Order\u003e\n", " \u003c/Customer\u003e\n", " \u003c/Customers\u003e\"\"\"\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "and you want to transform it into something like this:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 19, "outputs": [], "source": [ "[\u003cLiteral\u003e]\n", "let orderLinesXmlSample =\n", " \"\"\"\n", " \u003cOrderLines\u003e\n", " \u003cOrderLine Customer=\"ACME\" Order=\"A012345\" Item=\"widget\" Quantity=\"1\"/\u003e\n", " \u003cOrderLine Customer=\"ACME\" Order=\"A012346\" Item=\"trinket\" Quantity=\"2\"/\u003e\n", " \u003cOrderLine Customer=\"Southwind\" Order=\"A012347\" Item=\"skyhook\" Quantity=\"3\"/\u003e\n", " \u003cOrderLine Customer=\"Southwind\" Order=\"A012347\" Item=\"gizmo\" Quantity=\"4\"/\u003e\n", " \u003c/OrderLines\u003e\"\"\"\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "We\u0027ll create types from both the input and output samples and use the constructors on the types generated by the XmlProvider:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 20, "outputs": [ { "data": { "text/plain": ["type InputXml = XmlProvider\u003c...\u003e", "", "type OutputXml = XmlProvider\u003c...\u003e", "", "val orderLines: XmlProvider\u003c...\u003e.OrderLines =", "", " \u003cOrderLines\u003e", "", " \u003cOrderLine Customer=\"ACME\" Order=\"A012345\" Item=\"widget\" Quantity=\"1\" /\u003e", "", " \u003cOrderLine Customer=\"ACME\" Order=\"A012346\" Item=\"trinket\" Quantity=\"2\" /\u003e", "", " \u003cOrderLine Customer=\"Southwind\" Order=\"A012347\" Item=\"skyhook\" Quantity=\"3\" /\u003e", "", " \u003cOrderLine Customer=\"Southwind\" Order=\"A012347\" Item=\"gizmo\" Quantity=\"4\" /\u003e", "", "\u003c/OrderLines\u003e"] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" }], "source": [ "type InputXml = XmlProvider\u003ccustomersXmlSample\u003e\n", "type OutputXml = XmlProvider\u003corderLinesXmlSample\u003e\n", "\n", "let orderLines =\n", " OutputXml.OrderLines\n", " [| for customer in InputXml.GetSample().Customers do\n", " for order in customer.Orders do\n", " for line in order.OrderLines do\n", " yield OutputXml.OrderLine(customer.Name, order.Number, line.Item, line.Quantity) |]\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "## Using a schema (XSD)\n", "\n", "The `Schema` parameter can be used (instead of `Sample`) to specify an XML schema.\n", "The value of the parameter can be either the name of a schema file or plain text\n", "like in the following example:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 21, "outputs": [], "source": [ "type Person =\n", " XmlProvider\u003cSchema=\"\"\"\n", " \u003cxs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\"\n", " elementFormDefault=\"qualified\" attributeFormDefault=\"unqualified\"\u003e\n", " \u003cxs:element name=\"person\"\u003e\n", " \u003cxs:complexType\u003e\n", " \u003cxs:sequence\u003e\n", " \u003cxs:element name=\"surname\" type=\"xs:string\"/\u003e\n", " \u003cxs:element name=\"birthDate\" type=\"xs:date\"/\u003e\n", " \u003c/xs:sequence\u003e\n", " \u003c/xs:complexType\u003e\n", " \u003c/xs:element\u003e\n", " \u003c/xs:schema\u003e\"\"\"\u003e\n", "\n", "let turing =\n", " Person.Parse\n", " \"\"\"\n", " \u003cperson\u003e\n", " \u003csurname\u003eTuring\u003c/surname\u003e\n", " \u003cbirthDate\u003e1912-06-23\u003c/birthDate\u003e\n", " \u003c/person\u003e\n", " \"\"\"\n", "\n", "printfn \"%s was born in %d\" turing.Surname turing.BirthDate.Year\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "The properties of the provided type are derived from the schema instead of being inferred from samples.\n", "\n", "Usually a schema is not specified as plain text but stored in a file like\n", "[`data/po.xsd`](../data/po.xsd) and the uri is set in the `Schema` parameter:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 22, "outputs": [], "source": [ "type PurchaseOrder = XmlProvider\u003cSchema=\"../data/po.xsd\"\u003e\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "When the file includes other schema files, the `ResolutionFolder` parameter can help locating them.\n", "The uri may also refer to online resources:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 23, "outputs": [], "source": [ "type RssXsd = XmlProvider\u003cSchema=\"https://www.w3schools.com/xml/note.xsd\"\u003e\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "The schema is expected to define a root element (a global element with complex type).\n", "In case of multiple root elements:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 24, "outputs": [], "source": [ "type TwoRoots =\n", " XmlProvider\u003cSchema=\"\"\"\n", " \u003cxs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\"\n", " elementFormDefault=\"qualified\" attributeFormDefault=\"unqualified\"\u003e\n", " \u003cxs:element name=\"root1\"\u003e\n", " \u003cxs:complexType\u003e\n", " \u003cxs:attribute name=\"foo\" type=\"xs:string\" use=\"required\" /\u003e\n", " \u003cxs:attribute name=\"fow\" type=\"xs:int\" /\u003e\n", " \u003c/xs:complexType\u003e\n", " \u003c/xs:element\u003e\n", " \u003cxs:element name=\"root2\"\u003e\n", " \u003cxs:complexType\u003e\n", " \u003cxs:attribute name=\"bar\" type=\"xs:string\" use=\"required\" /\u003e\n", " \u003cxs:attribute name=\"baz\" type=\"xs:date\" use=\"required\" /\u003e\n", " \u003c/xs:complexType\u003e\n", " \u003c/xs:element\u003e\n", " \u003c/xs:schema\u003e\n", "\"\"\"\u003e\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "the provided type has an optional property for each alternative:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 25, "outputs": [ { "data": { "text/plain": ["Foo = aa and Fow = Some 2", "", "Bar = aa and Baz = 12/22/2017 12:00:00 AM", "", "val e1: XmlProvider\u003c...\u003e.Choice = \u003croot1 foo=\"aa\" fow=\"2\" /\u003e", "", "val e2: XmlProvider\u003c...\u003e.Choice = \u003croot2 bar=\"aa\" baz=\"2017-12-22\" /\u003e", "", "val it: unit = ()"] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" }], "source": [ "let e1 = TwoRoots.Parse \"\u003croot1 foo=\u0027aa\u0027 fow=\u00272\u0027 /\u003e\"\n", "\n", "match e1.Root1, e1.Root2 with\n", "| Some x, None -\u003e printfn \"Foo = %s and Fow = %A\" x.Foo x.Fow\n", "| _ -\u003e failwith \"Unexpected\"\n", "\n", "let e2 = TwoRoots.Parse \"\u003croot2 bar=\u0027aa\u0027 baz=\u00272017-12-22\u0027 /\u003e\"\n", "\n", "match e2.Root1, e2.Root2 with\n", "| None, Some x -\u003e printfn \"Bar = %s and Baz = %O\" x.Bar x.Baz\n", "| _ -\u003e failwith \"Unexpected\"\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "### Common XSD constructs: sequence and choice\n", "\n", "A `sequence` is the most common way of structuring elements in a schema.\n", "The following xsd defines `foo` as a sequence made of an arbitrary number\n", "of `bar` elements followed by a single `baz` element.\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 26, "outputs": [], "source": [ "type FooSequence =\n", " XmlProvider\u003cSchema=\"\"\"\n", " \u003cxs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\"\n", " elementFormDefault=\"qualified\" attributeFormDefault=\"unqualified\"\u003e\n", " \u003cxs:element name=\"foo\"\u003e\n", " \u003cxs:complexType\u003e\n", " \u003cxs:sequence\u003e\n", " \u003cxs:element name=\"bar\" type=\"xs:int\" maxOccurs=\"unbounded\" /\u003e\n", " \u003cxs:element name=\"baz\" type=\"xs:date\" minOccurs=\"1\" /\u003e\n", " \u003c/xs:sequence\u003e\n", " \u003c/xs:complexType\u003e\n", " \u003c/xs:element\u003e\n", " \u003c/xs:schema\u003e\"\"\"\u003e\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "here a valid xml element is parsed as an instance of the provided type, with two properties corresponding to `bar`and `baz` elements, where the former is an array in order to hold multiple elements:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 27, "outputs": [], "source": [ "let fooSequence =\n", " FooSequence.Parse\n", " \"\"\"\n", "\u003cfoo\u003e\n", " \u003cbar\u003e42\u003c/bar\u003e\n", " \u003cbar\u003e43\u003c/bar\u003e\n", " \u003cbaz\u003e1957-08-13\u003c/baz\u003e\n", "\u003c/foo\u003e\"\"\"\n", "\n", "printfn \"%d\" fooSequence.Bars.[0] // 42\n", "printfn \"%d\" fooSequence.Bars.[1] // 43\n", "printfn \"%d\" fooSequence.Baz.Year // 1957\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "Instead of a sequence we may have a `choice`:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 28, "outputs": [], "source": [ "type FooChoice =\n", " XmlProvider\u003cSchema=\"\"\"\n", " \u003cxs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\"\n", " elementFormDefault=\"qualified\" attributeFormDefault=\"unqualified\"\u003e\n", " \u003cxs:element name=\"foo\"\u003e\n", " \u003cxs:complexType\u003e\n", " \u003cxs:choice\u003e\n", " \u003cxs:element name=\"bar\" type=\"xs:int\" maxOccurs=\"unbounded\" /\u003e\n", " \u003cxs:element name=\"baz\" type=\"xs:date\" minOccurs=\"1\" /\u003e\n", " \u003c/xs:choice\u003e\n", " \u003c/xs:complexType\u003e\n", " \u003c/xs:element\u003e\n", " \u003c/xs:schema\u003e\"\"\"\u003e\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "although a choice is akin to a union type in F#, the provided type still has\n", "properties for `bar` and `baz` directly available on the `foo` object; in fact\n", "the properties representing alternatives in a choice are simply made optional\n", "(notice that for arrays this is not even necessary because an array can be empty).\n", "This decision is due to technical limitations (discriminated unions are not supported\n", "in type providers) but also preferred because it improves discoverability:\n", "intellisense can show both alternatives. There is a lack of precision but this is not the main goal.\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 29, "outputs": [ { "data": { "text/plain": ["0 items", "", "1957", "", "val fooChoice: XmlProvider\u003c...\u003e.Foo = \u003cfoo\u003e", "", " \u003cbaz\u003e1957-08-13\u003c/baz\u003e", "", "\u003c/foo\u003e", "", "val it: unit = ()"] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" }], "source": [ "let fooChoice =\n", " FooChoice.Parse\n", " \"\"\"\n", "\u003cfoo\u003e\n", " \u003cbaz\u003e1957-08-13\u003c/baz\u003e\n", "\u003c/foo\u003e\"\"\"\n", "\n", "printfn \"%d items\" fooChoice.Bars.Length // 0 items\n", "\n", "match fooChoice.Baz with\n", "| Some date -\u003e printfn \"%d\" date.Year // 1957\n", "| None -\u003e ()\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "Another xsd construct to model the content of an element is `all`, which is used less often and\n", "it\u0027s like a sequence where the order of elements does not matter. The corresponding provided type\n", "in fact is essentially the same as for a sequence.\n", "\n", "### Advanced schema constructs\n", "\n", "XML Schema provides various extensibility mechanisms. The following example\n", "is a terse summary mixing substitution groups with abstract recursive definitions.\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 30, "outputs": [ { "data": { "text/plain": ["p1", "", "p2", "", "p3", "", "type Prop = XmlProvider\u003c...\u003e", "", "val formula: XmlProvider\u003c...\u003e.And =", "", " \u003cAnd\u003e", "", " \u003cProp\u003ep1\u003c/Prop\u003e", "", " \u003cAnd\u003e", "", " \u003cProp\u003ep2\u003c/Prop\u003e", "", " \u003cProp\u003ep3\u003c/Prop\u003e", "", " \u003c/And\u003e", "", " \u003c/And\u003e", "", "val it: unit = ()"] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" }], "source": [ "type Prop =\n", " XmlProvider\u003cSchema=\"\"\"\n", " \u003cxs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\"\n", " elementFormDefault=\"qualified\" attributeFormDefault=\"unqualified\"\u003e\n", " \u003cxs:element name=\"Formula\" abstract=\"true\"/\u003e\n", " \u003cxs:element name=\"Prop\" type=\"xs:string\" substitutionGroup=\"Formula\"/\u003e\n", " \u003cxs:element name=\"And\" substitutionGroup=\"Formula\"\u003e\n", " \u003cxs:complexType\u003e\n", " \u003cxs:sequence\u003e\n", " \u003cxs:element ref=\"Formula\" minOccurs=\"2\" maxOccurs=\"2\"/\u003e\n", " \u003c/xs:sequence\u003e\n", " \u003c/xs:complexType\u003e\n", " \u003c/xs:element\u003e\n", " \u003c/xs:schema\u003e\"\"\"\u003e\n", "\n", "let formula =\n", " Prop.Parse\n", " \"\"\"\n", " \u003cAnd\u003e\n", " \u003cProp\u003ep1\u003c/Prop\u003e\n", " \u003cAnd\u003e\n", " \u003cProp\u003ep2\u003c/Prop\u003e\n", " \u003cProp\u003ep3\u003c/Prop\u003e\n", " \u003c/And\u003e\n", " \u003c/And\u003e\n", " \"\"\"\n", "\n", "printfn \"%s\" formula.Props.[0] // p1\n", "printfn \"%s\" formula.Ands.[0].Props.[0] // p2\n", "printfn \"%s\" formula.Ands.[0].Props.[1] // p3\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "Substitution groups are like choices, and the type provider produces an optional\n", "property for each alternative.\n", "\n", "### Validation\n", "\n", "The `GetSchema` method on the generated type returns an instance\n", "of `System.Xml.Schema.XmlSchemaSet` that can be used to validate documents:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 31, "outputs": [], "source": [ "open System.Xml.Schema\n", "let schema = Person.GetSchema()\n", "turing.XElement.Document.Validate(schema, validationEventHandler = null)\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "The `Validate` method accepts a callback to handle validation issues;\n", "passing `null` will turn validation errors into exceptions.\n", "There are overloads to allow other effects (for example setting default values\n", "by enabling the population of the XML tree with the post-schema-validation infoset;\n", "for details see the [documentation](https://docs.microsoft.com/en-us/dotnet/api/system.xml.schema.extensions.validate?view=netframework-4.7.2)).\n", "\n", "### Remarks on using a schema\n", "\n", "The XML Type Provider supports most XSD features.\n", "Anyway the [XML Schema](https://www.w3.org/XML/Schema) specification is rich and complex and also provides a\n", "fair degree of [openness](http://docstore.mik.ua/orelly/xml/schema/ch13_02.htm)\n", "which may be [difficult to handle](https://link.springer.com/chapter/10.1007/978-3-540-76786-2_6) in\n", "data binding tools; but in FSharp.Data, when providing typed views on elements becomes too challenging\n", "(take for example [wildcards](https://www.w3.org/TR/xmlschema11-1/#Wildcards)) the underlying `XElement`\n", "is still available.\n", "\n", "An important design decision is to focus on elements and not on complex types; while the latter\n", "may be valuable in schema design, our goal is simply to obtain an easy and safe way to access xml data.\n", "In other words the provided types are not intended for domain modeling (it\u0027s one of the very few cases\n", "where optional properties are preferred to sum types).\n", "Hence, we do not provide types corresponding to complex types in a schema but only corresponding\n", "to elements (of course the underlying complex types still affect the shape of the provided types\n", "but this happens only implicitly).\n", "Focusing on element shapes let us generate a type that should be essentially the same as one\n", "inferred from a significant set of valid samples. This allows a smooth transition (replacing `Sample` with `Schema`)\n", "when a schema becomes available.\n", "\n", "Note that inline schemas (values of the form `typeof{...}`) are not supported inside XSD documents.\n", "\n", "## Related articles\n", "\n", "* [Using JSON provider in a library](JsonProvider.html#jsonlib) also applies to XML type provider\n", "\n", "* API Reference: [XmlProvider](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-xmlprovider.html) type provider\n", "\n", "* API Reference: [XElementExtensions](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-xelementextensions.html)\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": ".NET (F#)", "language": "F#", "name": ".net-fsharp" }, "language_info": { "file_extension": ".fs", "mimetype": "text/x-fsharp", "name": "polyglot-notebook", "pygments_lexer": "fsharp" }, "polyglot_notebook": { "kernelInfo": { "defaultKernelName": "fsharp", "items": [ { "aliases": [], "languageName": "fsharp", "name": "fsharp" } ] } } }, "nbformat": 4, "nbformat_minor": 2 }