{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Trees\n", "http://openbookproject.net/thinkcs/python/english3e/trees.html\n", "- like linked lists, trees are made up of nodes\n", "\n", "## Binary Tree\n", " - is a commonly used tree in which each node contains a reference to atmost two other nodes (possibly None)\n", "- these references are referred to as the left and right subtrees\n", "- like the node of linked list, each node also contains data/cargo\n", "- like linked lists, trees are recursive data structures that are defined recursively:\n", " 1. the empty tree, represented by None, or\n", " 2. a node that contains a data and two tree references (left and right subtree)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Building trees\n", "- similar to building linked-list" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "class Tree:\n", " def __init__(self, data, left=None, right=None):\n", " self.cargo = data\n", " self.left = left\n", " self.right = right\n", " \n", " def __str__(self):\n", " return \"{}\".format(self.cargo)\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### bottom-up way to build-trees\n", "- first create children and link them to the parent" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "left = Tree(2)\n", "right = Tree(3)\n", "tree = Tree(1, left, right)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "tree1 = Tree(10, Tree(20), Tree(30))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "print(tree)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n" ] } ], "source": [ "print(tree1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## traversing tree\n", "- natural way to traverse a tree is recursively!" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def findSum(tree):\n", " if not tree:\n", " return 0\n", " return tree.cargo + findSum(tree.left) + findSum(tree.right)\n", " " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "findSum(tree)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "60" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "findSum(tree1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Expression trees\n", "- trees are natural way to represent the structure of an expression unambigigously.\n", "- infix expression 1 + 2 * 3 is ambigious unless we know the order of operation that * happens before +\n", "- we can use tree to represent the same expression\n", " - operands are leaf nodes\n", " - operator nodes contain references to their operands; operators are binary (two operands)\n", " \n", "\n", "\n", "- applications:\n", " - translate expressions to postfix, prefix, and infix\n", " - compilers use expression trees to parse, optimize, and translate programs\n", " \n", "- three ways to traverse trees: pre-order, in-order and post-order" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "expression = Tree('+', Tree(1), Tree('*', Tree(2), Tree(3)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### pre-order tree traversal\n", "- contents of the root appear before the contents of the children\n", "- recursive algorithm:\n", " - visit the node\n", " - visit left subtree\n", " - visit right subtree" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "def preorder(tree):\n", " if not tree:\n", " return\n", " print(tree.cargo, end=' ')\n", " preorder(tree.left)\n", " preorder(tree.right)\n", " " ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+ 1 * 2 3 " ] } ], "source": [ "preorder(expression)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### in-order tree traversal\n", "- contents of the tree appear in order\n", "- recursive algorithm:\n", " - visit left subtree\n", " - visit node\n", " - visit right subtree" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "def inorder(tree):\n", " if not tree:\n", " return\n", " inorder(tree.left)\n", " print(tree.cargo, end=' ')\n", " inorder(tree.right)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 + 2 * 3 " ] } ], "source": [ "inorder(expression)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "def inorderIndented(tree, level=0):\n", " if not tree:\n", " return\n", " inorderIndented(tree.right, level+1)\n", " print(' '*level + str(tree.cargo))\n", " inorderIndented(tree.left, level+1)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 3\n", " *\n", " 2\n", "+\n", " 1\n" ] } ], "source": [ "inorderIndented(expression)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### post-order traversal\n", "- recursive algorithm:\n", " 1. visit left subtree\n", " 2. visit right subtree\n", " 3. visit node" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "def postorder(tree):\n", " if not tree:\n", " return\n", " postorder(tree.left)\n", " postorder(tree.right)\n", " print(tree.cargo, end=' ')" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 2 3 * + " ] } ], "source": [ "postorder(expression)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## building an expression tree\n", "- parse an infix expression and build the corresponding expression tree\n", "- e.g., (3 + 7) * 9 yields the following tree:\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. tokenize expression into python list? How (left as an exercise)\n", " - (3 + 7) * 9 = [\"(\", 3, \"+\", 7, \")\", \"*\", 9, \"end\"]\n", "2. \"end\" token is useful for preventing the parser from reading pas the end of the list" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "def get_token(token_list, expected):\n", " if token_list[0] == expected:\n", " del token_list[0]\n", " return True\n", " return False" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "# handles operands\n", "def get_number(token_list):\n", " x = token_list[0]\n", " if not isinstance(x, int):\n", " return None\n", " del token_list[0]\n", " return Tree(x, None, None) # leaf node" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9 " ] } ], "source": [ "token_list = [9, 11, 'end']\n", "x = get_number(token_list)\n", "postorder(x)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[11, 'end']\n" ] } ], "source": [ "print(token_list)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "def get_product(token_list):\n", " a = get_number(token_list)\n", " if get_token(token_list, '*'):\n", " b = get_number(token_list)\n", " return Tree('*', a, b)\n", " return a" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "token_list = [9, '*', 11, 'end']\n", "tree = get_product(token_list)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9 11 * " ] } ], "source": [ "postorder(tree)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9 " ] } ], "source": [ "token_list = [9, '+', 11, 'end']\n", "tree = get_product(token_list)\n", "postorder(tree)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "# adapt the function for compound product such as 3 * (5 * (7 * 9))\n", "def get_product(token_list):\n", " a = get_number(token_list)\n", " if get_token(token_list, '*'):\n", " b = get_product(token_list)\n", " return Tree('*', a, b)\n", " return a" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2 3 5 7 * * * " ] } ], "source": [ "token_list = [2, \"*\", 3, \"*\", 5 , \"*\", 7, \"end\"]\n", "tree = get_product(token_list)\n", "postorder(tree)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "# a sum can be a tree with + at the root, a product on the left, and a sum on the right. \n", "# Or, a sum can be just a product.\n", "def get_sum(token_list):\n", " a = get_product(token_list)\n", " if get_token(token_list, \"+\"):\n", " b = get_sum(token_list)\n", " return Tree(\"+\", a, b)\n", " return a" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "token_list = [9, \"*\", 11, \"+\", 5, \"*\", 7, \"end\"]\n", "tree = get_sum(token_list)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9 11 * 5 7 * + " ] } ], "source": [ "postorder(tree)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "# handle parenthesis\n", "def get_number(token_list):\n", " if get_token(token_list, \"(\"):\n", " x = get_sum(token_list) # Get the subexpression\n", " get_token(token_list, \")\") # Remove the closing parenthesis\n", " return x\n", " else:\n", " x = token_list[0]\n", " if not isinstance(x, int):\n", " return None\n", " del token_list[0]\n", " return Tree(x, None, None)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9 11 5 + 7 * * " ] } ], "source": [ "# 9 * (11 + 5) * 7\n", "token_list = [9, \"*\", \"(\", 11, \"+\", 5, \")\", \"*\", 7, \"end\"]\n", "tree = get_sum(token_list)\n", "postorder(tree)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## handling errors on malformed expressions" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "# handle parenthesis\n", "def get_number(token_list):\n", " if get_token(token_list, \"(\"):\n", " x = get_sum(token_list) # Get the subexpression\n", " if not get_token(token_list, \")\"): # Remove the closing parenthesis\n", " raise ValueError('Missing close parenthesis!')\n", " return x\n", " else:\n", " x = token_list[0]\n", " if not isinstance(x, int):\n", " return None\n", " del token_list[0]\n", " return Tree(x, None, None)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "Missing close parenthesis!", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mtoken_list\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m9\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"*\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"(\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m11\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"+\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"*\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m7\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"end\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mtree\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_sum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mpostorder\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtree\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m\u001b[0m in \u001b[0;36mget_sum\u001b[0;34m(token_list)\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;31m# Or, a sum can be just a product.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mget_sum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0ma\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_product\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mget_token\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"+\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_sum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m\u001b[0m in \u001b[0;36mget_product\u001b[0;34m(token_list)\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0ma\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_number\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mget_token\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'*'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_product\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mTree\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'*'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m\u001b[0m in \u001b[0;36mget_product\u001b[0;34m(token_list)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mget_product\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0ma\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_number\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mget_token\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'*'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_product\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mTree\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'*'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m\u001b[0m in \u001b[0;36mget_number\u001b[0;34m(token_list)\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_sum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# Get the subexpression\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mget_token\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtoken_list\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\")\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;31m# Remove the closing parenthesis\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Missing close parenthesis!'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 7\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: Missing close parenthesis!" ] } ], "source": [ "token_list = [9, \"*\", \"(\", 11, \"+\", 5, \"*\", 7, \"end\"]\n", "tree = get_sum(token_list)\n", "postorder(tree)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## exercises:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Modify inorder function so that it puts parentheses around every operator and pair of operands. Is the output correct and unambiguous? Are the parentheses always necessary?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Write a function that takes an expression string and returns a token list." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Find other places in the expression tree functions where errors can occur and add appropriate raise statements. Test your code with improperly formed expressions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }