{ "metadata": { "name": "", "signature": "sha256:a277b6c11610b889ab4f776e2aab64455ca4b6742d077510ddcfba117d8ee5df" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Treepace - Tree Pattern Replace" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Welcome to Treepace tutorial. First, we import the library:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from treepace import *" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Data structures" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Nodes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The basic unit of all trees is a node." ] }, { "cell_type": "code", "collapsed": false, "input": [ "Node(\"label\")" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ "" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Treepace, any object (not only a string) can become a label of the node." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from glob import glob\n", "from IPython.display import display\n", "with open(glob('*.ipynb')[0], 'rb') as file_handle:\n", " display(Node(file_handle))" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "' @3227db0>" ] } ], "prompt_number": 3 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Trees" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A node has children, which can have other children..." ] }, { "cell_type": "code", "collapsed": false, "input": [ "root = Node('root',\n", " [Node('c1'), Node('c2',\n", " [Node('subchild')])])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "A tree is defined by the reference to the root node." ] }, { "cell_type": "code", "collapsed": false, "input": [ "Tree(root)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is possible to load and save a tree to various formats like tab-indented / parenthesized text or XML." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print(Tree.load('root (element1 (sub-element) element2)').save(IndentedText))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "root\n", " element1\n", " sub-element\n", " element2\n", "\n" ] } ], "prompt_number": 6 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Subtrees" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A subtree is a connected part of the tree consisting of the selected nodes of the main tree (highlighted with blue)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "Subtree([root, root.children[1]])" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we will see later, searching methods return `Match` objects. Each match consists of groups (subtrees), where the group 0 represents the whole match \u2013 just like in a regex. In this tutorial, it will be highlighted with green color." ] }, { "cell_type": "code", "collapsed": false, "input": [ "c2 = root.children[1]\n", "Match([Subtree([c2, c2.parent]),\n", " Subtree([c2])\n", " ])" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "" ] } ], "prompt_number": 8 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Searching" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To search for a pattern anywhere in the tree, use the `search()` method. The result is a list of matches." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "One node patterns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most basic pattern is a dot which matches one arbitrary node." ] }, { "cell_type": "code", "collapsed": false, "input": [ "tree = Tree.load('a (b c)')\n", "tree.search('.')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "[, , ]" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "A text literal matches the nodes whose string representation is equal to the given literal." ] }, { "cell_type": "code", "collapsed": false, "input": [ "tree.search('a')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "[]" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "A pattern can contain arbitrary Python code, enclosed in square brackets. The expression is evaluated for each relevant node (accessible in the expression via the variable `node`) and matches if its result equals `True`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "tree.search('[node.value != \"c\"]')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "[, ]" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "An underscore is a shortcut for `node.value`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "tree.search('[_.upper() == \"C\"]')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "[]" ] } ], "prompt_number": 12 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Relations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Multiple node patterns can be connected using relations. In the following example, we search for a node 'a' which has a child 'b'. The whole subtree is returned \u2013 not only the final component." ] }, { "cell_type": "code", "collapsed": false, "input": [ "tree.search('a < b')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "[]" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other availabe relations are: immediately following sibling (`,`), any sibling (`&`) and parent (`>`)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "tree.search('a < b, c')[0]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "tree.search('a < c & b')[0]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The 'parent' relationship is implicitly followed by a 'match any node' pattern. This is useful to form queries like this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "Tree.load('a (b (c) d (e))').search('a < b , d')[0]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "" ] } ], "prompt_number": 16 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Groups" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To mark a part of the match as a group, use brackets. The groups are numbered from 1 and can be nested." ] }, { "cell_type": "code", "collapsed": false, "input": [ "tree.search('{a < {b}, {c}}')[0]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is possible to back-reference saved groups by `$n`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "Tree.load('m (n (o) m (n))').search('{m < n}, $1')[0]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ "" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "More complicated relationship between the nodes in a match can be expressed using back-references in a predicate." ] }, { "cell_type": "code", "collapsed": false, "input": [ "nums = Tree(Node(1, [Node(-1), Node(0.5)]))\n", "match = nums.search('{[_ != 2]} < [abs(_) == $1]')\n", "match[0].group(0)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ "" ] } ], "prompt_number": 19 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Other searching methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To assert that the match must begin exactly at the root node, use the `match()` method." ] }, { "cell_type": "code", "collapsed": false, "input": [ "Tree.load('node (node (node))').match('node < node')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 20, "text": [ "[]" ] } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the match must cover all nodes of the tree, the `fullmatch()` method can be called. This is useful for validation." ] }, { "cell_type": "code", "collapsed": false, "input": [ "fruits = Tree.load('fruits (apple pear apple)')\n", "display(fruits)\n", "if fruits.fullmatch('fruits < apple & pear'):\n", " print('The stock contains at least one apple and pear, but no other fruit.')\n", "else:\n", " print('The condition is not met.')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "The stock contains at least one apple and pear, but no other fruit.\n" ] } ], "prompt_number": 21 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Replacing" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Basic replacing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `replace()` method substitutes all matches of the pattern with the given replacement. Although it is not necessary, we will first search for the pattern (for illustration):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "shop = Tree.load('shop (item (bread) item (water) item (roll) item (water))')\n", "pattern = '{item} < water'\n", "display(shop.search(pattern))" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "[,\n", " ]" ] } ], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The actual replacement is simple:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "shop.replace(pattern, '$1 < juice')\n", "display(shop)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 23 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Transformation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The transformation consists of one or more rules in the form: `pattern -> replacement`. Each rule is repeated until a match is found. In addition, the whole list of rules is repeatead while at least one rule finds a match. To illustrate this behavior, the following transformation is performed:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "subject = Tree.load('a (b)')\n", "print('Original:')\n", "display(subject)\n", "\n", "subject.transform('''x -> y\n", " a -> x''')\n", "print('Transformed:')\n", "display(subject)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Original:\n" ] }, { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Transformed:\n" ] }, { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 24 }, { "cell_type": "markdown", "metadata": {}, "source": [ "A more useful transformation follows. Here is a sample XML document:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "text = '''
\n", " An example\n", " \n", " \n", " \n", " 3\n", " 4\n", " \n", " \n", " \n", "
'''\n", "doc = Tree.load(text, XmlText)\n", "doc" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 25, "text": [ "" ] } ], "prompt_number": 25 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will replace a semantic document representation with its visual HTML form and solve a mathematical expression." ] }, { "cell_type": "code", "collapsed": false, "input": [ "doc.transform('''\n", "article -> html < body\n", "heading -> h1\n", "content -> p\n", "calc < plus < elem<{.}>, elem<{.}> -> [text(num($1) + num($2))]\n", "''')\n", "display(doc)\n", "print(doc.save(XmlText))" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", " \n", "

An example

\n", "

7

\n", " \n", "\n", "\n" ] } ], "prompt_number": 26 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This concludes the tutorial. You can install the library by running\n", "\n", " py -m pip install treepace\n", "\n", "on Windows or\n", "\n", " pip install treepace\n", "\n", "on Linux." ] } ], "metadata": {} } ] }