{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Visualizing Nonbinary Trees: Classification of Chemical Isomers" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This example lets you visualize the hierarchy of nonbinary trees. An example is the classification of chemical isomers, which are compounds that have the same chemical formula, but different arrangements of atoms in space." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "*[Download this notebook from GitHub by right-clicking and choosing Save Link As...](https://raw.githubusercontent.com/bertiewooster/bertiewooster.github.io/main/_notebooks/2022-12-18-Visualizing-Nonbinary-Trees-Classification-of-Chemical-Isomers.ipynb)*" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Using this code will produce an ASCII art tree, where\n", "- each parent item is on the level above its children\n", "- a vertical line connects a parent to each of its children\n", "- a horizontal line spans the width from the first to the last child for each parent\n", "\n", "All you have to provide is the hierarchy, that is the name of each item and which item is its parent.\n", "\n", "Here's the tree for classification of chemical isomers:\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\"isomerism" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Here's the [isomerism diagram from Wikipedia](https://commons.wikimedia.org/wiki/File:Isomerism.svg) that it reproduces:" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ " \"isomerism" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "*Attribution: Vladsinger, CC BY-SA 3.0 , via Wikimedia Commons*" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This is an extension of a previous blog post I wrote about [retrosynthetic decomposition in RDKit]({% post_url 2022-10-09-RDKit-find-and-highlight-the-maximum-common-substructure-between-molecules %}). This code extends the `NonBinTree` class with the method to allow " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Code to generate trees" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Utility functions used by NonBinTree class\n", "\n", "def n_by_1_empty_list(n: int) -> list[list]:\n", " \"\"\"\n", " Create an n-by-1 empty list\n", "\n", " :param n: The number of empty (sub-)lists to include in the parent list\n", " :returns: The n-by-1 empty list, for example n=3 produces [[], [], []]\n", " \"\"\"\n", " n_by_1_list = []\n", " for i in range(n):\n", " n_by_1_list += [[]]\n", "\n", " return n_by_1_list\n", "\n", "def r_by_c_empty_string_list(r, c) -> list[list[str]]:\n", " \"\"\"\n", " Create an r-by-c matrix (nested list where the length of each top-level list is the same) of empty strings\n", "\n", " :param r: The number of rows (top-level list)\n", " :param c: The number of columns (nested list)\n", " :returns: The r-by-c rectangular list of empty strings, for example r=2, c=3 produces [['', '', ''], ['', '', '']]\n", " \"\"\"\n", " matrix = []\n", " for row_index in range(r):\n", " one_row = [\"\"] * c\n", " matrix += [one_row]\n", " \n", " return matrix\n", "\n", "def interleave(a: list[list], b: list[list]) -> list[list]:\n", " \"\"\"\n", " Interleave (alternate the rows of) two matrices \n", " (nested lists where the length of each top-level list is the same), for example\n", " a[0]\n", " b[0]\n", " a[1]\n", " b[1]\n", " a[2]\n", " where all the rows of a are shown, and the rows of b until either a or b runs out of rows.\n", " A row is a top-level list element; a column is the collection of the nth items in each row.\n", "\n", " :param a: The controlling nested list\n", " :param b: The following nested list\n", " :returns: The interleaved nested list\n", " \"\"\"\n", " len_b = len(b)\n", " interleaved = []\n", " for row_index, row in enumerate(a):\n", " interleaved += [row]\n", " # If b hasn't run out of rows, insert corresponding row of b\n", " if row_index < len_b:\n", " interleaved += [b[row_index]]\n", " return interleaved\n", "\n", "def longest_in_col(matrix: list[list]) -> list[int]:\n", " \"\"\"\n", " Determine the length of the longest (for example, greatest number of characters)\n", " entry in each column of a matrix (nested list where the length of each top-level list is the same).\n", " A row is a top-level list element; a column is the collection of the nth items in each row.\n", "\n", " :param matrix: The matrix of items\n", " :returns: A list where each entry is the length of the longest item in that column.\n", " \"\"\"\n", " ncols = len(matrix[0])\n", " longest = [0] * ncols\n", " \n", " for col_index in range(ncols):\n", " max = 0\n", " for row in matrix:\n", " len_item = len(row[col_index])\n", " if len_item > max:\n", " max = len_item\n", " longest[col_index] = max\n", " return longest\n", "\n", "def concat_grids_horizontally(grid1:list[list[str]], grid2:list[list[str]]) -> list[list[str]]:\n", " \"\"\"Concatenate two nested lists horizontally, for example\n", " inputs [['a'],['b'],['c']] and [['d'], ['e'], ['f']] \n", " produce [['a', 'd'], ['b', 'e'], ['c', 'f']]\n", "\n", " :param grid1: The first grid\n", " :param grid2: The second grid\n", " :returns: The combined grid\n", " \"\"\"\n", " if grid1 == [[]]:\n", " combined = grid2\n", " elif grid2 == [[]]:\n", " combined = grid1\n", " else:\n", " combined = []\n", " for row_counter in range(len(grid1)):\n", " combined += [grid1[row_counter] + grid2[row_counter]]\n", " return combined\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "class NonBinTree:\n", " \"\"\"\n", " Nonbinary tree class\n", " Note that this class is not designed to sort nodes as they are added to the tree;\n", " the assumption is that they should be ordered in the order added\n", " Adapted from https://stackoverflow.com/questions/60579330/non-binary-tree-data-structure-in-python#60579464\n", " \"\"\"\n", "\n", " def __init__(self, val:str):\n", " \"\"\"Create a NonBinTree instance\"\"\"\n", " self.val = val\n", " self.nodes = []\n", "\n", " def add_node(self, val:str):\n", " \"\"\"Add a node to the tree and return the new node\"\"\"\n", " self.nodes.append(NonBinTree(val))\n", " return self.nodes[-1]\n", "\n", " def __repr__(self) -> str:\n", " \"\"\"Print out the tree as a nested list\"\"\"\n", " return f\"NonBinTree({self.val}): {self.nodes}\"\n", "\n", " def get_ncols(self) -> int:\n", " \"\"\"Get the number of columns in the tree\"\"\"\n", " self.ncols = 0\n", " if len(self.nodes) > 0:\n", " # If there are nodes under this one, call get_ncols on them recursively\n", " for node in self.nodes:\n", " self.ncols += node.get_ncols()\n", " else:\n", " # If there are no nodes under this one, add 1 for this node\n", " self.ncols += 1\n", " return self.ncols\n", "\n", " def get_max_depth(self) -> int:\n", " \"\"\"Get the maximum depth of the tree\"\"\"\n", " max_depth = 0\n", " if len(self.nodes) > 0:\n", " for node in self.nodes:\n", " this_depth = node.get_max_depth()\n", " max_depth = max(this_depth + 1, max_depth)\n", " else:\n", " max_depth = max(1, max_depth)\n", " self.max_depth = max_depth\n", " return self.max_depth\n", "\n", " def get_grid(self) -> list[list[str]]:\n", " \"\"\"\n", " Get a two-dimensional grid where\n", " each row is a level in the fragment hierarchy, and\n", " the columns serve to arrange the fragments horizontally\n", " \"\"\"\n", " # Call methods to calculate self.ncols and self.max_depth\n", " self.get_ncols()\n", " self.get_max_depth()\n", "\n", " # Create top row: Node value, then the rest of columns are blank (empty strings)\n", " self.grid = [[self.val] + [\"\"] * (self.ncols - 1)]\n", "\n", " n_nodes = len(self.nodes)\n", "\n", " if n_nodes > 0:\n", " nodes_grid = [[]]\n", "\n", " # Iterate through the child nodes\n", " for node_counter, node in enumerate(self.nodes):\n", "\n", " # Recursively call this function to get the grid for children\n", " node_grid = node.get_grid()\n", "\n", " # Add spacer rows if needed\n", " node_grid_rows = len(node_grid)\n", " rows_padding = self.max_depth - node_grid_rows - 1\n", " for padding in range(rows_padding):\n", " node_grid += [[\"\"] * len(node_grid[0])]\n", "\n", " nodes_grid = concat_grids_horizontally(nodes_grid, node_grid)\n", "\n", " self.grid += nodes_grid\n", "\n", " return self.grid\n", "\n", " def get_tree(self) -> list[str]:\n", " \"\"\"\n", " Get a visual hierarchy tree where\n", " each odd-numbered row (counting with one-based indexing) shows items, and\n", " each even-numbered row shows the relationships between parent and child\n", " \"\"\"\n", " # Call method to create the grid\n", " self.get_grid()\n", "\n", " # Gather properties of the grid (object hierarchy tree diagram)\n", " n_rows = len(self.grid)\n", " n_cols = len(self.grid[0])\n", " max_col_index = n_cols - 1\n", " longest_for_col = longest_in_col(self.grid)\n", "\n", " # Aesthetic parameter: Gutter width between columns, so columns don't visually run together\n", " gutter = 3\n", "\n", " # Initialize edges, the lines connecting \n", " markers = r_by_c_empty_string_list(n_rows - 1, n_cols)\n", " \n", " # Iterate from rows to from #2 (index = 1) to last, \n", " # because there is nothing above the top row\n", " for row_index, row in enumerate(self.grid):\n", " if row_index > 0:\n", " for col_index, col in enumerate(row):\n", " if len(col) > 0:\n", " markers[row_index - 1][col_index] = \"|\"\n", " # For all but rightmost column, add connecting horizontal line to the right if needed\n", " if col_index < max_col_index:\n", " # Set up for while loop to determine which (if any) is the next item to the right\n", " right = \"\"\n", " col_to_check = col_index\n", "\n", " # Iterate to the right until find an item, or run out of columns\n", " while (len(right) == 0) and (col_to_check < max_col_index):\n", " right = self.grid[row_index][col_to_check + 1]\n", " upper_right = self.grid[row_index - 1][col_to_check + 1]\n", " \n", " # Increment col_to_check in preparation for next iteration of while loop\n", " col_to_check += 1\n", " # If there is an item to the right, and none above that, draw a horizontal line using hyphens\n", " if (len(right) > 0) and (len(upper_right) == 0):\n", " n_hyphens = longest_for_col[col_index] + gutter - len(markers[row_index - 1][col_index])\n", " markers[row_index - 1][col_index] += \"-\" * n_hyphens\n", "\n", " # Combine the grid (objects) and markers (horizontal and vertical lines) \n", " # by interleaving them (alternating one row of each)\n", " interleaved = interleave(self.grid, markers)\n", "\n", " # Format the tree for printing by setting the column widths\n", " f_string = ''\n", " for col_index, col in enumerate(interleaved[0]):\n", " f_string += \"{row[\" + str(col_index) + \"]:<\" + str(longest_for_col[col_index] + gutter) + \"}\"\n", "\n", " # Create the return value as a list of formatted strings, one formatted string per row\n", " tree_plot_list = []\n", " for row in interleaved:\n", " this_row = f_string.format(row=row)\n", " tree_plot_list += [this_row]\n", "\n", " return tree_plot_list\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Set up the hierarchy and generate the tree diagram" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Chemical isomer classification example" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Here's how you set up the hierarchy. It's just one line of code for each item, specifying its name and parent:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Create chemical isomers nonbinary tree hierarchy\n", "isomers = NonBinTree(\"isomers\")\n", "constitutional = isomers.add_node(\"constitutional\")\n", "stereoisomers = isomers.add_node(\"stereoisomers\")\n", "diastereomers = stereoisomers.add_node(\"diastereomers\")\n", "cis_trans = diastereomers.add_node(\"cis/trans\")\n", "conformers = diastereomers.add_node(\"conformers\")\n", "rotamers = conformers.add_node(\"rotamers\")\n", "enantiomers = stereoisomers.add_node(\"enantiomers\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Here's how you generate and display the formatted tree diagram:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "isomers \n", "|----------------| \n", "constitutional stereoisomers \n", " |----------------------------| \n", " diastereomers enantiomers \n", " |---------------| \n", " cis/trans conformers \n", " | \n", " rotamers \n" ] } ], "source": [ "# Generate and display the formatted tree diagram\n", "tree = isomers.get_tree()\n", "for row in tree:\n", " print(row)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "You can also output the unformatted hierarchy grid, which you could use to manipulate the tree in other ways:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['isomers', '', '', ''],\n", " ['constitutional', 'stereoisomers', '', ''],\n", " ['', 'diastereomers', '', 'enantiomers'],\n", " ['', 'cis/trans', 'conformers', ''],\n", " ['', '', 'rotamers', '']]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Generate unformatted hierarchy grid\n", "isomers.get_grid()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Geometry example: Quadrilateral family tree" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Here's another example, from the field of geometry. The following diagram reproduces the [quadrilateral family tree](https://www.slideserve.com/margaux/quadrilateral) from this link." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "quadrilateral \n", "|-------------------------------------------------| \n", "trapezoid kite \n", "|---------------------| \n", "isosceles trapezoid parallelogram \n", " |---------------| \n", " rhombus rectangle \n", " | | \n", " square square \n" ] } ], "source": [ "# Create geometry quadrilateral example nonbinary tree\n", "quadrilateral = NonBinTree(\"quadrilateral\")\n", "trapezoid = quadrilateral.add_node(\"trapezoid\")\n", "isosceles_trapezoid = trapezoid.add_node(\"isosceles trapezoid\")\n", "parallelogram = trapezoid.add_node(\"parallelogram\")\n", "rhombus = parallelogram.add_node(\"rhombus\")\n", "rhombus.add_node(\"square\")\n", "rectangle = parallelogram.add_node(\"rectangle\")\n", "rectangle.add_node(\"square\")\n", "kite = quadrilateral.add_node(\"kite\")\n", "\n", "# Generate and display formatted tree diagram\n", "tree_quadrilateral = quadrilateral.get_tree()\n", "for row in tree_quadrilateral:\n", " print(row)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.10.4", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:45:10) [Clang 12.0.1 ]" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "f8b9e48fa26c0cee807577a4309d1f208b4f08c0094fff1e9a87a3043a77ce60" } } }, "nbformat": 4, "nbformat_minor": 2 }