{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# From last time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Picking up from part 1, we have the following two examples:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "A = 'ABCBDAB'\n", "B = 'BDCABA'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two longest common subsequences:\n", "\n", "1. B, C, B, A\n", "\n", "2. B, D, A, B" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "occidental = \"occidental\"\n", "superdelicately = \"superdelicately\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One longest common subsequence (LCS) is: \"deal\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some code from part 1:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from functools import lru_cache" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def pretty_print(arr):\n", " '''A hacky function that makes every string in an array equal length, then prints\n", " it row by row.'''\n", " longest_length = len(arr[:-1][:-1])\n", " nicer = []\n", " for i, row in enumerate(arr):\n", " new_row = []\n", " for j, cell in enumerate(row):\n", " cell_length = len(cell)\n", " padding_needed = longest_length - cell_length\n", " new_cell = cell + (' ' * padding_needed)\n", " new_row.append(new_cell)\n", " nicer.append(new_row)\n", " \n", " [print(r) for r in nicer]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def lcs4(A, B, print_table=False):\n", " # Create a lookup table. This is a 2D array with the letters of A as columns\n", " # and the letters of B as rows. It includes a buffer row and column filled with empty strings.\n", " columns = len(A) + 1\n", " rows = len(B) + 1\n", " lookup = [['' for n in range(columns)] for n in range(rows)]\n", " \n", " # Since the lookup table is initialized with '' \n", " # the base case (len(A) == 0 or len(B) == 0) is \"baked-in\".\n", " \n", " # What's left is to go row by row, column by column and \n", " # apply the iterative version of the recursive cases in lcs().\n", " for row_id in range(1, rows):\n", " for column_id in range(1, columns):\n", " letter_in_A = A[column_id-1]\n", " letter_in_B = B[row_id-1]\n", " if letter_in_A == letter_in_B:\n", " # \"Recursive case 1\"\n", " lookup[row_id][column_id] = lookup[row_id - 1][column_id - 1] + letter_in_A\n", " else:\n", " # \"Recursive case 2\"\n", " option1 = lookup[row_id][column_id-1]\n", " option2 = lookup[row_id-1][column_id]\n", " if len(option1) > len(option2):\n", " lookup[row_id][column_id] = option1\n", " else:\n", " lookup[row_id][column_id] = option2\n", " \n", " \n", " if print_table:\n", " pretty_print(lookup)\n", " \n", " return lookup[rows-1][columns-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A connection between LCS and LIS?\n", "\n", "There is an interesting connection between longest common subsequence (LCS) and longest increasing subsequence (LIS). \n", "\n", ">The longest increasing subsequence of a list of numbers is equal to the longest common subsequence of that list of numbers and its sorted counterpart. \n", "\n", "Source: \"Course15.pdf\" by [Jean-Lou De Carufel](http://cglab.ca/~jdecaruf/CSI3105.html) (probably taken from Dasgupta et. al.)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[3, 2, 1, 4]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nums = [3, 2, 1, 4]\n", "nums" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 4]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Copy and sort nums\n", "sorted_nums = nums[:]\n", "sorted_nums.sort()\n", "sorted_nums" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Convert both to strings\n", "nums_str = [str(n) for n in nums]\n", "sorted_nums_str = [str(n) for n in sorted_nums]" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'14'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Find the longest common subsequence of S and S_sorted.\n", "lcs4(nums_str, sorted_nums_str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[1, 4] is a longest **increasing** subsequence of `nums`.\n", "\n", "That said, this doesn't seem to work for lists with duplicate elements." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "dupe_nums = [2, 2]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'22'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dupe_nums_str = ''.join([str(n) for n in dupe_nums])\n", "dupe_nums_str" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'22'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lcs4(dupe_nums_str, dupe_nums_str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whether $2 \\rightarrow 2$ is \"increasing\" depends on how you define it. It is not strictly increasing and will fail to pass certain tests in [this leetcode question](https://leetcode.com/problems/longest-increasing-subsequence/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Extending LCS to numbers\n", "\n", "Thinking about it, my current implementation of longest common subsequence only works with lists of numbers where no number is greater than 9 (only single digits). With stringified number lists, there is no way to tell \"10\" from \"1\", \"0\".\n", "\n", "The solution is simple: use lists instead of strings." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "def lcs5(A, B):\n", " # Create a lookup table. This is a 2D array with the letters of A as columns\n", " # and the letters of B as rows. It includes a buffer row and column filled with empty strings.\n", " columns = len(A) + 1\n", " rows = len(B) + 1\n", " lookup = [[[] for n in range(columns)] for n in range(rows)]\n", " \n", " # Since the lookup table is initialized with '' \n", " # the base case (len(A) == 0 or len(B) == 0) is \"baked-in\".\n", " \n", " # What's left is to go row by row, column by column and \n", " # apply the iterative version of the recursive cases in lcs().\n", " for row_id in range(1, rows):\n", " for column_id in range(1, columns):\n", " letter_in_A = A[column_id-1]\n", " letter_in_B = B[row_id-1]\n", " if letter_in_A == letter_in_B:\n", " # \"Recursive case 1\"\n", " lookup[row_id][column_id] = lookup[row_id - 1][column_id - 1] + [letter_in_A]\n", " else:\n", " # \"Recursive case 2\"\n", " option1 = lookup[row_id][column_id-1]\n", " option2 = lookup[row_id-1][column_id]\n", " if len(option1) > len(option2):\n", " lookup[row_id][column_id] = option1\n", " else:\n", " lookup[row_id][column_id] = option2\n", " \n", " \n", " return lookup[rows-1][columns-1]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "numbers_a = ['10', '22', '5', '11', '10']\n", "numbers_b = ['100', '22', '3', '11', '14']" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['22', '11']" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lcs5(numbers_a, numbers_b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course, letters still work:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['o', 'c', 'c', 'i', 'd', 'e', 'n', 't', 'a', 'l']" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "occidental_letters = list(occidental)\n", "occidental_letters" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['s', 'u', 'p', 'e', 'r', 'd', 'e', 'l', 'i', 'c', 'a', 't', 'e', 'l', 'y']" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "superdelicate_letters = list(superdelicately)\n", "superdelicate_letters" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['d', 'e', 'a', 'l']" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lcs5(occidental_letters, superdelicate_letters)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This combined with some input processing was enough to solve [this HackerRank question](https://www.hackerrank.com/challenges/dynamic-programming-classics-the-longest-common-subsequence/problem)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }