{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Examples of Radix sort\n", "\n", "Both least-significant-digit and most-significant-digit versions. Examples use DNA alphabet." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Least significant digit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start with a function that conducts a single pass." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from collections import defaultdict\n", "\n", "def radix_pass(strs, ordr, depth):\n", " \"\"\" Given a collection of same-length strings and depth, return a\n", " permutation that stably sorts the strings according to character\n", " at that depth \"\"\"\n", " buckets = defaultdict(list)\n", " for i in ordr:\n", " buckets[strs[i][depth]].append(i)\n", " return [x for sublist in [buckets[c] for c in '$ACGTn'] for x in sublist]" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1, 2, 3, 4, 5, 6]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "strs = ['A', 'A', 'C', 'G', 'G', 'G', 'n']\n", "radix_pass(strs, range(len(strs)), 0)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 2, 4, 1, 3, 5, 6]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "strs = ['A', 'G', 'A', 'G', 'C', 'G', 'n']\n", "radix_pass(strs, range(len(strs)), 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Chain two `radix_pass` calls together to get overall sorted order." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2, 3, 4, 8, 0, 1, 5, 6, 7]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# First call\n", "strs = ['AG', 'CG', 'AA', 'GA', 'TC', 'GT', 'Tn', 'nn', 'nC']\n", "lsd1 = radix_pass(strs, range(len(strs)), 1)\n", "lsd1" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2, 0, 1, 3, 5, 4, 6, 8, 7]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Second call, using result from first\n", "radix_pass(strs, lsd1, 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To completely sort the strings, `radix_lsd` does all the passes." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def radix_lsd(strs):\n", " \"\"\" Least-significant-digit (LSD) radix sort on collection of\n", " same-length strings. Returns a permutation that puts the list\n", " in stable-sorted order. \"\"\"\n", " assert len(strs) > 0\n", " ordr = range(len(strs))\n", " for i in range(len(strs[0])-1, -1, -1):\n", " ordr = radix_pass(strs, ordr, i)\n", " return ordr" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2, 0, 1, 3, 5, 4, 6, 8, 7]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "strs = ['AG', 'CG', 'AA', 'GA', 'TC', 'GT', 'Tn', 'nn', 'nC']\n", "radix_lsd(strs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Most significant digit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "MSD radix sort with a single recursive function." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def radix_msd_helper(strs, ordr, depth):\n", " \"\"\" Most-significant-digit (MSD) radix sort on collection of\n", " same-length strings. Returns a permutation that puts the list\n", " in stable-sorted order. \"\"\"\n", " if len(ordr) <= 1 or depth >= len(strs[0]):\n", " return ordr # bases cases: 1 elt in list, or we've exhausted characters\n", " buckets = defaultdict(list)\n", " for i in ordr:\n", " buckets[strs[i][depth]].append(i)\n", " return [x for sublist in [radix_msd_helper(strs, buckets[c], depth+1) for c in '$ACGTn'] for x in sublist]\n", "\n", "def radix_msd(strs):\n", " return radix_msd_helper(strs, range(len(strs)), 0)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2, 0, 1, 3, 5, 4, 6, 8, 7]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "strs = ['AG', 'CG', 'AA', 'GA', 'TC', 'GT', 'Tn', 'nn', 'nC']\n", "radix_msd(strs)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[4, 2, 3, 1, 0]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "strs = ['GATTACA', 'GATTAAA', 'GAATACA', 'GATAACA', 'AATTAAA']\n", "assert radix_msd(strs) == radix_lsd(strs)\n", "radix_msd(strs)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 1 }