{
"metadata": {
"name": "",
"signature": "sha256:a277b6c11610b889ab4f776e2aab64455ca4b6742d077510ddcfba117d8ee5df"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Treepace - Tree Pattern Replace"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Welcome to Treepace tutorial. First, we import the library:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from treepace import *"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Data structures"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Nodes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The basic unit of all trees is a node."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Node(\"label\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
""
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In Treepace, any object (not only a string) can become a label of the node."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from glob import glob\n",
"from IPython.display import display\n",
"with open(glob('*.ipynb')[0], 'rb') as file_handle:\n",
" display(Node(file_handle))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "display_data",
"text": [
"' @3227db0>"
]
}
],
"prompt_number": 3
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Trees"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A node has children, which can have other children..."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"root = Node('root',\n",
" [Node('c1'), Node('c2',\n",
" [Node('subchild')])])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A tree is defined by the reference to the root node."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Tree(root)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
""
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is possible to load and save a tree to various formats like tab-indented / parenthesized text or XML."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print(Tree.load('root (element1 (sub-element) element2)').save(IndentedText))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"root\n",
" element1\n",
" sub-element\n",
" element2\n",
"\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Subtrees"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A subtree is a connected part of the tree consisting of the selected nodes of the main tree (highlighted with blue)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Subtree([root, root.children[1]])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
""
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we will see later, searching methods return `Match` objects. Each match consists of groups (subtrees), where the group 0 represents the whole match \u2013 just like in a regex. In this tutorial, it will be highlighted with green color."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"c2 = root.children[1]\n",
"Match([Subtree([c2, c2.parent]),\n",
" Subtree([c2])\n",
" ])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
""
]
}
],
"prompt_number": 8
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Searching"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To search for a pattern anywhere in the tree, use the `search()` method. The result is a list of matches."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"One node patterns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The most basic pattern is a dot which matches one arbitrary node."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"tree = Tree.load('a (b c)')\n",
"tree.search('.')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"[, , ]"
]
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A text literal matches the nodes whose string representation is equal to the given literal."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"tree.search('a')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
" | | |
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"[]"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A pattern can contain arbitrary Python code, enclosed in square brackets. The expression is evaluated for each relevant node (accessible in the expression via the variable `node`) and matches if its result equals `True`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"tree.search('[node.value != \"c\"]')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
" | | , | | |
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"[, ]"
]
}
],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An underscore is a shortcut for `node.value`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"tree.search('[_.upper() == \"C\"]')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
" | | |
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"[]"
]
}
],
"prompt_number": 12
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Relations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Multiple node patterns can be connected using relations. In the following example, we search for a node 'a' which has a child 'b'. The whole subtree is returned \u2013 not only the final component."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"tree.search('a < b')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
" | | |
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
"[]"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Other availabe relations are: immediately following sibling (`,`), any sibling (`&`) and parent (`>`)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"tree.search('a < b, c')[0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
""
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"tree.search('a < c & b')[0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
""
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The 'parent' relationship is implicitly followed by a 'match any node' pattern. This is useful to form queries like this:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Tree.load('a (b (c) d (e))').search('a < b , d')[0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 16,
"text": [
""
]
}
],
"prompt_number": 16
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Groups"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To mark a part of the match as a group, use brackets. The groups are numbered from 1 and can be nested."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"tree.search('{a < {b}, {c}}')[0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 17,
"text": [
""
]
}
],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is possible to back-reference saved groups by `$n`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Tree.load('m (n (o) m (n))').search('{m < n}, $1')[0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 18,
"text": [
""
]
}
],
"prompt_number": 18
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"More complicated relationship between the nodes in a match can be expressed using back-references in a predicate."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nums = Tree(Node(1, [Node(-1), Node(0.5)]))\n",
"match = nums.search('{[_ != 2]} < [abs(_) == $1]')\n",
"match[0].group(0)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 19,
"text": [
""
]
}
],
"prompt_number": 19
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Other searching methods"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To assert that the match must begin exactly at the root node, use the `match()` method."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Tree.load('node (node (node))').match('node < node')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
" | | |
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
"[]"
]
}
],
"prompt_number": 20
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the match must cover all nodes of the tree, the `fullmatch()` method can be called. This is useful for validation."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"fruits = Tree.load('fruits (apple pear apple)')\n",
"display(fruits)\n",
"if fruits.fullmatch('fruits < apple & pear'):\n",
" print('The stock contains at least one apple and pear, but no other fruit.')\n",
"else:\n",
" print('The condition is not met.')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "display_data",
"text": [
""
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"The stock contains at least one apple and pear, but no other fruit.\n"
]
}
],
"prompt_number": 21
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Replacing"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Basic replacing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `replace()` method substitutes all matches of the pattern with the given replacement. Although it is not necessary, we will first search for the pattern (for illustration):"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"shop = Tree.load('shop (item (bread) item (water) item (roll) item (water))')\n",
"pattern = '{item} < water'\n",
"display(shop.search(pattern))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
" | | , | | |
"
],
"metadata": {},
"output_type": "display_data",
"text": [
"[,\n",
" ]"
]
}
],
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The actual replacement is simple:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"shop.replace(pattern, '$1 < juice')\n",
"display(shop)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "display_data",
"text": [
""
]
}
],
"prompt_number": 23
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Transformation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The transformation consists of one or more rules in the form: `pattern -> replacement`. Each rule is repeated until a match is found. In addition, the whole list of rules is repeatead while at least one rule finds a match. To illustrate this behavior, the following transformation is performed:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subject = Tree.load('a (b)')\n",
"print('Original:')\n",
"display(subject)\n",
"\n",
"subject.transform('''x -> y\n",
" a -> x''')\n",
"print('Transformed:')\n",
"display(subject)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Original:\n"
]
},
{
"html": [
""
],
"metadata": {},
"output_type": "display_data",
"text": [
""
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Transformed:\n"
]
},
{
"html": [
""
],
"metadata": {},
"output_type": "display_data",
"text": [
""
]
}
],
"prompt_number": 24
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A more useful transformation follows. Here is a sample XML document:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"text = '''\n",
" An example\n",
" \n",
" \n",
" \n",
" 3\n",
" 4\n",
" \n",
" \n",
" \n",
" '''\n",
"doc = Tree.load(text, XmlText)\n",
"doc"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 25,
"text": [
""
]
}
],
"prompt_number": 25
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will replace a semantic document representation with its visual HTML form and solve a mathematical expression."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"doc.transform('''\n",
"article -> html < body\n",
"heading -> h1\n",
"content -> p\n",
"calc < plus < elem<{.}>, elem<{.}> -> [text(num($1) + num($2))]\n",
"''')\n",
"display(doc)\n",
"print(doc.save(XmlText))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "display_data",
"text": [
""
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
" \n",
" An example
\n",
" 7
\n",
" \n",
"\n",
"\n"
]
}
],
"prompt_number": 26
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This concludes the tutorial. You can install the library by running\n",
"\n",
" py -m pip install treepace\n",
"\n",
"on Windows or\n",
"\n",
" pip install treepace\n",
"\n",
"on Linux."
]
}
],
"metadata": {}
}
]
}