{
 "metadata": {
  "name": "",
  "signature": "sha256:013d27f914b48119267bd7bc6674346e01673f59bbf4884e6ee057f2d75ed8b3"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#Yes, it's really named after Monty Python"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython import display"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#load lessons learned from a life wasted"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "display.YouTubeVideo('csyL9EC0S0c?t=24m47s')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%load python_tour.md"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#Turtle Graphics"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "I think substack's turtle graphics was Javascript.\n",
      "\n",
      "Mine was Python, something about the elegance appeals to the mathematician inside me. It's much less about engineering\n",
      "\n",
      "Like Marijuana laws the dutch are leading the way in readable programming.\n",
      "\n",
      "\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#Philosophy"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import this"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "So you might want to use python as if you're trying to do something that is just about munging text or most things that aren't about making event driven websites"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#So let's talk about whitespace.\n",
      "The __off side rule__.\n",
      "\n",
      "Who can explain what that is in football?\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def is_even(a):\n",
      "    if a % 2 == 0:\n",
      "        print('Even!')\n",
      "        return True\n",
      "    print('Odd!')\n",
      "    return False"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#what python is good at?\n",
      "\n",
      "+ multiple assignment\n",
      "+ list comprehension\n",
      "+ iterating over the dictionary\n",
      "+ dictionary comprehension\n",
      "\n",
      "\n",
      "##one way to do it!\n",
      "\n",
      "\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#multiple assignment\n",
      "\n",
      "a, b, c = 'spam', 'eggs', 'parrot'\n",
      "print(a, b, c)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#Iteration Idiom"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for (i=0; i < mylist_length; i++) {\n",
      "   do_something(mylist[i]);\n",
      "}\n",
      "\n",
      "#The direct equivalent in Python would be this:\n",
      "\n",
      "i = 0\n",
      "while i < mylist_length:\n",
      "    do_something(mylist[i])\n",
      "    i += 1\n",
      "\n",
      "#That, however, while it works, is not considered Pythonic. It's not an idiom the Python language encourages. We could improve it. A typical idiom in Python to generate all numbers in a list would be to use something like the built-in range() function:\n",
      "\n",
      "for i in range(mylist_length):\n",
      "    do_something(mylist[i])\n",
      "\n",
      "#This is however not Pythonic either. Here is the Pythonic way, encouraged by the language itself:\n",
      "\n",
      "for element in mylist:\n",
      "    do_something(element)\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "numbers = [1,2,3,4]\n",
      "for number in numbers:\n",
      "    print(number)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#Hacker School"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "rice_crispies.items()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "rice_crispies = {3:'Crackle', 5:'Pop'}\n",
      "for i in range(101):\n",
      "    print(i)\n",
      "    for flake in rice_crispies.keys():\n",
      "        if i % flake == 0:\n",
      "            print(rice_crispies[flake])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ahundred = range(101)\n",
      "new_list = [i*2 for i in ahundred if i % 2 == 0]\n",
      "print(new_list)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#slicing"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "new_list[0:10]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      " There's an old programming proverb which goes something like this:\n",
      "\n",
      "    Show me you algorithm,\n",
      "    and I will remain puzzled,\n",
      "    but show me your data structure,\n",
      "    and I will be enlightened. \n",
      "\n",
      "This is a statement about software and coding, but first and foremost it is about human cognition. The way my brain works is to first visualize the data and then imagine what the algorithm does to it. \n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "tel = {'jack': 4098, 'sape': 4139}\n",
      "tel['guido']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "i think dictionaries are handled particularly nicely in \n",
      "https://docs.python.org/3/tutorial/datastructures.html#dictionaries"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "words = ['spam', 'spam', 'eggs', 'spam']\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from collections import defaultdict\n",
      "word_counter = defaultdict(int)\n",
      "\n",
      "words = ['spam', 'spam', 'eggs', 'spam', 'parrot']\n",
      "\n",
      "for word in words:\n",
      "    word_counter[word] += 1\n",
      "\n",
      "print(word_counter)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now I'll show you why to write your alrogithms in python, or at least why Norvig of google does\n",
      "#Spell checker"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!wget  http://norvig.com/spell-correct.html"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#90% of the google spelling corrector in 21 lines of Python"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!head -n20 big.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import re, collections\n",
      "\n",
      "def words(text): \n",
      "    return re.findall('[a-z]+', text.lower()) \n",
      "\n",
      "def train(features):\n",
      "    model = collections.defaultdict(int)\n",
      "    for f in features:\n",
      "        model[f] += 1\n",
      "    return model\n",
      "\n",
      "NWORDS = train(words(file('big.txt').read()))\n",
      "\n",
      "alphabet = 'abcdefghijklmnopqrstuvwxyz'\n",
      "\n",
      "def edits(word):\n",
      "    splits   = [(word[:i], word[i:]) for i in range(len(word) + 1)]\n",
      "    deletes    = [a + b[1:] for a, b in splits if b]\n",
      "    transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]\n",
      "    replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]\n",
      "    inserts    = [a + c + b     for a, b in splits for c in alphabet]\n",
      "    return set(deletes + transposes + replaces + inserts)\n",
      "\n",
      "def known_edits(word):\n",
      "    edits_of_edits = set()\n",
      "    for e1 in edits(word):\n",
      "        for e2 in edits(e1):\n",
      "            if e2 in NWORDS:\n",
      "                edits_of_edits.add(e2)\n",
      "    return edits_of_edits\n",
      "   \n",
      "    #norvigs way\n",
      "    #return set(e2 for e1 in edits(word) for e2 in edits(e1) if e2 in NWORDS)\n",
      "\n",
      "def known(words): \n",
      "    return set(w for w in words if w in NWORDS)\n",
      "\n",
      "def correct(word):\n",
      "    candidates = known([word]) or known(edits(word)) or known_edits(word) or [word]\n",
      "    return max(candidates, key=NWORDS.get)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "correct('spam')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "'spasm'"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "word = 'spam'\n",
      "splits   = [(word[:i], word[i:]) for i in range(len(word) + 1)]\n",
      "deletes    = [a + b[1:] for a, b in splits if b]\n",
      "transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]\n",
      "replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]\n",
      "inserts    = [a + c + b     for a, b in splits for c in alphabet]\n",
      "print(splits)\n",
      "print(deletes)\n",
      "print(transposes)\n",
      "print(replaces)\n",
      "print(inserts)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#Homework\n",
      "\n",
      "+ Download [this book](https://www.gutenberg.org/ebooks/468.txt.utf-8) file and run the spell checker the `correct` function on every word on that book, and save it back.\n",
      "\n",
      "follow the following steps"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!wget https://www.gutenberg.org/cache/epub/468/pg468.txt manon.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "--2015-01-14 16:29:58--  https://www.gutenberg.org/cache/epub/468/pg468.txt\r\n",
        "Resolving www.gutenberg.org (www.gutenberg.org)... "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "152.19.134.47\r\n",
        "Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "connected.\r\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "HTTP request sent, awaiting response... "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "200 OK\r\n",
        "Length: 370164 (361K) [text/plain]\r\n",
        "Saving to: 'pg468.txt\u2019\r\n",
        "\r\n",
        "\r",
        " 0% [                                       ] 0           --.-K/s              "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r",
        " 2% [                                       ] 8,192       32.9KB/s             "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r",
        " 6% [=>                                     ] 24,576      52.7KB/s             "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r",
        "11% [===>                                   ] 40,960      57.9KB/s             "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r",
        "19% [======>                                ] 73,728      80.0KB/s             "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r",
        "28% [==========>                            ] 106,496     60.0KB/s             "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r",
        "42% [===============>                       ] 155,648     77.0KB/s             "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r",
        "50% [==================>                    ] 188,416     83.2KB/s             "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r",
        "59% [======================>                ] 221,184     88.1KB/s             "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r",
        "64% [========================>              ] 237,568     86.4KB/s             "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r",
        "73% [===========================>           ] 270,336     87.8KB/s  eta 1s     "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r",
        "81% [==============================>        ] 303,104     88.4KB/s  eta 1s     "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\r",
        "90% [==================================>    ] 335,872     89.9KB/s  eta 1s     \r",
        "100%[======================================>] 370,164     98.6KB/s   in 3.7s   \r\n",
        "\r\n",
        "2015-01-14 16:30:03 (98.6 KB/s) - 'pg468.txt\u2019 saved [370164/370164]\r\n",
        "\r\n",
        "--2015-01-14 16:30:03--  http://manon.txt/\r\n",
        "Resolving manon.txt (manon.txt)... "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "failed: Name or service not known.\r\n",
        "wget: unable to resolve host address 'manon.txt\u2019\r\n",
        "FINISHED --2015-01-14 16:30:03--\r\n",
        "Total wall clock time: 5.4s\r\n",
        "Downloaded: 1 files, 361K in 3.7s (98.6 KB/s)\r\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "manon_string = open('pg468.txt','r').read()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "manon_words = manon_string.split()[1000:2000]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 20
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "transform_words #do list comprehension here to to put `correct` on all the words in the list manon_words"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 24
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "outfile = open('transformed_manon.txt','w')\n",
      "transform_string = ' '.join(transform_words)\n",
      "outfile.write(transform_string)\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 25
    }
   ],
   "metadata": {}
  }
 ]
}