{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Trees\n",
"http://openbookproject.net/thinkcs/python/english3e/trees.html\n",
"- like linked lists, trees are made up of nodes\n",
"\n",
"## Binary Tree\n",
" - is a commonly used tree in which each node contains a reference to atmost two other nodes (possibly None)\n",
"- these references are referred to as the left and right subtrees\n",
"- like the node of linked list, each node also contains data/cargo\n",
"- like linked lists, trees are recursive data structures that are defined recursively:\n",
" 1. the empty tree, represented by None, or\n",
" 2. a node that contains a data and two tree references (left and right subtree)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building trees\n",
"- similar to building linked-list"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"class Tree:\n",
" def __init__(self, data, left=None, right=None):\n",
" self.cargo = data\n",
" self.left = left\n",
" self.right = right\n",
" \n",
" def __str__(self):\n",
" return \"{}\".format(self.cargo)\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### bottom-up way to build-trees\n",
"- first create children and link them to the parent"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"left = Tree(2)\n",
"right = Tree(3)\n",
"tree = Tree(1, left, right)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"tree1 = Tree(10, Tree(20), Tree(30))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n"
]
}
],
"source": [
"print(tree)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10\n"
]
}
],
"source": [
"print(tree1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## traversing tree\n",
"- natural way to traverse a tree is recursively!"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"def findSum(tree):\n",
" if not tree:\n",
" return 0\n",
" return tree.cargo + findSum(tree.left) + findSum(tree.right)\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"findSum(tree)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"60"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"findSum(tree1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Expression trees\n",
"- trees are natural way to represent the structure of an expression unambigigously.\n",
"- infix expression 1 + 2 * 3 is ambigious unless we know the order of operation that * happens before +\n",
"- we can use tree to represent the same expression\n",
" - operands are leaf nodes\n",
" - operator nodes contain references to their operands; operators are binary (two operands)\n",
" \n",
"\n",
"\n",
"- applications:\n",
" - translate expressions to postfix, prefix, and infix\n",
" - compilers use expression trees to parse, optimize, and translate programs\n",
" \n",
"- three ways to traverse trees: pre-order, in-order and post-order"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"expression = Tree('+', Tree(1), Tree('*', Tree(2), Tree(3)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### pre-order tree traversal\n",
"- contents of the root appear before the contents of the children\n",
"- recursive algorithm:\n",
" - visit the node\n",
" - visit left subtree\n",
" - visit right subtree"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"def preorder(tree):\n",
" if not tree:\n",
" return\n",
" print(tree.cargo, end=' ')\n",
" preorder(tree.left)\n",
" preorder(tree.right)\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"+ 1 * 2 3 "
]
}
],
"source": [
"preorder(expression)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### in-order tree traversal\n",
"- contents of the tree appear in order\n",
"- recursive algorithm:\n",
" - visit left subtree\n",
" - visit node\n",
" - visit right subtree"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"def inorder(tree):\n",
" if not tree:\n",
" return\n",
" inorder(tree.left)\n",
" print(tree.cargo, end=' ')\n",
" inorder(tree.right)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 + 2 * 3 "
]
}
],
"source": [
"inorder(expression)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"def inorderIndented(tree, level=0):\n",
" if not tree:\n",
" return\n",
" inorderIndented(tree.right, level+1)\n",
" print(' '*level + str(tree.cargo))\n",
" inorderIndented(tree.left, level+1)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 3\n",
" *\n",
" 2\n",
"+\n",
" 1\n"
]
}
],
"source": [
"inorderIndented(expression)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### post-order traversal\n",
"- recursive algorithm:\n",
" 1. visit left subtree\n",
" 2. visit right subtree\n",
" 3. visit node"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"def postorder(tree):\n",
" if not tree:\n",
" return\n",
" postorder(tree.left)\n",
" postorder(tree.right)\n",
" print(tree.cargo, end=' ')"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 2 3 * + "
]
}
],
"source": [
"postorder(expression)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## building an expression tree\n",
"- parse an infix expression and build the corresponding expression tree\n",
"- e.g., (3 + 7) * 9 yields the following tree:\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. tokenize expression into python list? How (left as an exercise)\n",
" - (3 + 7) * 9 = [\"(\", 3, \"+\", 7, \")\", \"*\", 9, \"end\"]\n",
"2. \"end\" token is useful for preventing the parser from reading pas the end of the list"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"def get_token(token_list, expected):\n",
" if token_list[0] == expected:\n",
" del token_list[0]\n",
" return True\n",
" return False"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"# handles operands\n",
"def get_number(token_list):\n",
" x = token_list[0]\n",
" if not isinstance(x, int):\n",
" return None\n",
" del token_list[0]\n",
" return Tree(x, None, None) # leaf node"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"9 "
]
}
],
"source": [
"token_list = [9, 11, 'end']\n",
"x = get_number(token_list)\n",
"postorder(x)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[11, 'end']\n"
]
}
],
"source": [
"print(token_list)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"def get_product(token_list):\n",
" a = get_number(token_list)\n",
" if get_token(token_list, '*'):\n",
" b = get_number(token_list)\n",
" return Tree('*', a, b)\n",
" return a"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"token_list = [9, '*', 11, 'end']\n",
"tree = get_product(token_list)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"9 11 * "
]
}
],
"source": [
"postorder(tree)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"9 "
]
}
],
"source": [
"token_list = [9, '+', 11, 'end']\n",
"tree = get_product(token_list)\n",
"postorder(tree)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"# adapt the function for compound product such as 3 * (5 * (7 * 9))\n",
"def get_product(token_list):\n",
" a = get_number(token_list)\n",
" if get_token(token_list, '*'):\n",
" b = get_product(token_list)\n",
" return Tree('*', a, b)\n",
" return a"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2 3 5 7 * * * "
]
}
],
"source": [
"token_list = [2, \"*\", 3, \"*\", 5 , \"*\", 7, \"end\"]\n",
"tree = get_product(token_list)\n",
"postorder(tree)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"# a sum can be a tree with + at the root, a product on the left, and a sum on the right. \n",
"# Or, a sum can be just a product.\n",
"def get_sum(token_list):\n",
" a = get_product(token_list)\n",
" if get_token(token_list, \"+\"):\n",
" b = get_sum(token_list)\n",
" return Tree(\"+\", a, b)\n",
" return a"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"token_list = [9, \"*\", 11, \"+\", 5, \"*\", 7, \"end\"]\n",
"tree = get_sum(token_list)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"9 11 * 5 7 * + "
]
}
],
"source": [
"postorder(tree)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"# handle parenthesis\n",
"def get_number(token_list):\n",
" if get_token(token_list, \"(\"):\n",
" x = get_sum(token_list) # Get the subexpression\n",
" get_token(token_list, \")\") # Remove the closing parenthesis\n",
" return x\n",
" else:\n",
" x = token_list[0]\n",
" if not isinstance(x, int):\n",
" return None\n",
" del token_list[0]\n",
" return Tree(x, None, None)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"9 11 5 + 7 * * "
]
}
],
"source": [
"# 9 * (11 + 5) * 7\n",
"token_list = [9, \"*\", \"(\", 11, \"+\", 5, \")\", \"*\", 7, \"end\"]\n",
"tree = get_sum(token_list)\n",
"postorder(tree)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## handling errors on malformed expressions"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [],
"source": [
"# handle parenthesis\n",
"def get_number(token_list):\n",
" if get_token(token_list, \"(\"):\n",
" x = get_sum(token_list) # Get the subexpression\n",
" if not get_token(token_list, \")\"): # Remove the closing parenthesis\n",
" raise ValueError('Missing close parenthesis!')\n",
" return x\n",
" else:\n",
" x = token_list[0]\n",
" if not isinstance(x, int):\n",
" return None\n",
" del token_list[0]\n",
" return Tree(x, None, None)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"ename": "ValueError",
"evalue": "Missing close parenthesis!",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mtoken_list\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m9\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"*\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"(\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m11\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"+\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"*\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m7\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"end\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mtree\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_sum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mpostorder\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtree\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m\u001b[0m in \u001b[0;36mget_sum\u001b[0;34m(token_list)\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;31m# Or, a sum can be just a product.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mget_sum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0ma\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_product\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mget_token\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"+\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_sum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m\u001b[0m in \u001b[0;36mget_product\u001b[0;34m(token_list)\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0ma\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_number\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mget_token\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'*'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_product\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mTree\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'*'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m\u001b[0m in \u001b[0;36mget_product\u001b[0;34m(token_list)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mget_product\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0ma\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_number\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mget_token\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'*'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_product\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mTree\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'*'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m\u001b[0m in \u001b[0;36mget_number\u001b[0;34m(token_list)\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_sum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# Get the subexpression\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mget_token\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\")\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;31m# Remove the closing parenthesis\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Missing close parenthesis!'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 7\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mValueError\u001b[0m: Missing close parenthesis!"
]
}
],
"source": [
"token_list = [9, \"*\", \"(\", 11, \"+\", 5, \"*\", 7, \"end\"]\n",
"tree = get_sum(token_list)\n",
"postorder(tree)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## exercises:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Modify inorder function so that it puts parentheses around every operator and pair of operands. Is the output correct and unambiguous? Are the parentheses always necessary?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. Write a function that takes an expression string and returns a token list."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. Find other places in the expression tree functions where errors can occur and add appropriate raise statements. Test your code with improperly formed expressions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}