{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "from pycryptics.grammar.clue_parse import generate_clues\n", "from pycryptics.solve_clue import Constraints\n", "from pycryptics.grammar.clue_tree import ClueUnsolvableError" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Loading synonyms from file...\n", "...done." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Loading ngrams from file...\n", "...done." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Loading indicators from file..." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "...done.\n" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solving a single clue\n", "Let's demonstrate what we can do with a simple cryptic clue. Here's our clue, broken up into a list of words:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "phrases = 'spin broken shingle'.split()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also have constraints on the final answer, namely that it will have 7 letters and the answer won't be any of the words which directly appear in the clue. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "con = Constraints(phrases=phrases,\n", " lengths=(7,),\n", " pattern='',\n", " known_answer=None)\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can generate all possible clue interpretations of those phrases:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "clues = generate_clues(con)\n", "clues.sort() # sort the clues list so that we can reliably find \n", "# specific clues for this demo. You won't need to do this." ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, here's one interpretation:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[0]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(top (d spin) (clue_arg (first broken)) (clue_arg (first shingle)))\n" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And another:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[10]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(top (d spin) (clue_arg (ana (ana_arg (lit broken)) (ana_ shingle))))\n" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are all the possible ways 'spin broken shingle' could be interpreted:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "for c in clues:\n", " print c" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(top (d spin) (clue_arg (first broken)) (clue_arg (first shingle)))\n", "(top (d spin) (clue_arg (first broken)) (clue_arg (lit shingle)))\n", "(top (d spin) (clue_arg (first broken)) (clue_arg (syn shingle)))\n", "(top (d spin) (clue_arg (lit broken)) (clue_arg (first shingle)))\n", "(top (d spin) (clue_arg (lit broken)) (clue_arg (lit shingle)))\n", "(top (d spin) (clue_arg (lit broken)) (clue_arg (syn shingle)))\n", "(top (d spin) (clue_arg (syn broken)) (clue_arg (first shingle)))\n", "(top (d spin) (clue_arg (syn broken)) (clue_arg (lit shingle)))\n", "(top (d spin) (clue_arg (syn broken)) (clue_arg (syn shingle)))\n", "(top (d spin) (clue_arg (ana (ana_ broken) (ana_arg (lit shingle)))))\n", "(top (d spin) (clue_arg (ana (ana_arg (lit broken)) (ana_ shingle))))\n", "(top (clue_arg (sub (sub_ spin) (sub_arg (lit broken)))) (d shingle))\n", "(top (clue_arg (sub (sub_ spin) (sub_arg (syn broken)))) (d shingle))\n", "(top (clue_arg (first spin)) (clue_arg (first broken)) (d shingle))\n", "(top (clue_arg (first spin)) (clue_arg (lit broken)) (d shingle))\n", "(top (clue_arg (first spin)) (clue_arg (syn broken)) (d shingle))\n", "(top (clue_arg (ana (ana_ spin) (ana_arg (lit broken)))) (d shingle))\n", "(top (clue_arg (ana (ana_arg (lit spin)) (ana_ broken))) (d shingle))\n", "(top (clue_arg (lit spin)) (clue_arg (first broken)) (d shingle))\n", "(top (clue_arg (lit spin)) (clue_arg (lit broken)) (d shingle))\n", "(top (clue_arg (lit spin)) (clue_arg (syn broken)) (d shingle))\n", "(top (clue_arg (syn spin)) (clue_arg (first broken)) (d shingle))\n", "(top (clue_arg (syn spin)) (clue_arg (lit broken)) (d shingle))\n", "(top (clue_arg (syn spin)) (clue_arg (syn broken)) (d shingle))\n", "(top (clue_arg (sub_init (sub_init_ spin) (sub_arg (lit broken)))) (d shingle))\n", "(top (clue_arg (sub_init (sub_init_ spin) (sub_arg (syn broken)))) (d shingle))\n", "(top (clue_arg (sub_final (sub_final_ spin) (sub_arg (lit broken)))) (d shingle))\n", "(top (clue_arg (sub_final (sub_final_ spin) (sub_arg (syn broken)))) (d shingle))\n", "(top (clue_arg (rev (rev_ spin) (rev_arg (lit broken)))) (d shingle))\n", "(top (clue_arg (rev (rev_ spin) (rev_arg (syn broken)))) (d shingle))\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try solving one of these clues:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "try:\n", " print clues[0].answers\n", "except ClueUnsolvableError as e:\n", " print e" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "This clue has no solutions under the given constraints: Constraints(phrases=['spin', 'broken', 'shingle'], lengths=(7,), pattern='', known_answer=None)\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "That clue had no answers which matched our constraints. This raises an exception so that when we're solving a clue, any of its subparts being unsolvable will trigger the exception and let us skip out of solving the other subparts.\n", "\n", "Let's look at the subparts of a clue. Each clue is a tree, and we can access the children of any node in the tree using the [] notation." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[-1]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(top (clue_arg (rev (rev_ spin) (rev_arg (syn broken)))) (d shingle))\n" ] } ], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[-1][0]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(clue_arg (rev (rev_ spin) (rev_arg (syn broken))))\n" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[-1][0][0]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(rev (rev_ spin) (rev_arg (syn broken)))\n" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[-1][0][0].answers" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "{'emat': ['', 'tame'], 'detsub': ['', 'busted'], 'deyortsed': ['', 'destroyed'], 'tespu': ['', 'upset'], 'degluvid': ['', 'divulged'], 'tpeknu': ['', 'unkept'], 'detcarfni': ['', 'infracted'], 'luftif': ['', 'fitful'], 'depoleved': ['', 'developed'], 'detomed': ['', 'demoted'], 'desuap': ['', 'paused'], 'degamad': ['', 'damaged'], 'nevig': ['', 'given'], 'dettod': ['', 'dotted'], 'deliaf': ['', 'failed'], 'wol': ['', 'low'], 'dehguor': ['', 'roughed'], 'deid': ['', 'died'], 'depmub': ['', 'bumped'], 'demat': ['', 'tamed'], 'detaloiv': ['', 'violated'], 'delaever': ['', 'revealed'], 'dehsad': ['', 'dashed'], 'depparwnu': ['', 'unwrapped'], 'dedneffo': ['', 'offended'], 'detageler': ['', 'relegated'], 'depmad': ['', 'damped'], 'tilps': ['', 'split'], 'deggur': ['', 'rugged'], 'desufnoc': ['', 'confused'], 'dehcaerb': ['', 'breached'], 'tcefrepmi': ['', 'imperfect'], 'dehsurc': ['', 'crushed'], 'denepmad': ['', 'dampened'], 'detarapes': ['', 'separated'], 'detrap': ['', 'parted'], 'dehsams': ['', 'smashed'], 'desolcsid': ['', 'disclosed'], 'despalloc': ['', 'collapsed'], 'enog': ['', 'gone'], 'detlah': ['', 'halted'], 'htoomsnu': ['', 'unsmooth'], 'derutcarf': ['', 'fractured'], 'deniur': ['', 'ruined'], 'denekaew': ['', 'weakened'], 'derednuof': ['', 'foundered'], 'deppots': ['', 'stopped'], 'denetfos': ['', 'softened'], 'deriapmi': ['', 'impaired'], 'delbmuh': ['', 'humbled'], 'nrow': ['', 'worn'], 'detpure': ['', 'erupted'], 'tsrub': ['', 'burst']}\n" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `.answers` property returns the clue's possible answers or raises a `ClueUnsolvableException` if there are no answers. The first time a clue's `.answers` is requested, it automatically calls ClueTree.solve() and caches that answer so that subsequent requests for `.answers` will be instananeous.\n", "\n", "The format of a clue's answers is a dictionary in which each key is a possible answer and the corresponding value is the answers to the children of that clue which yielded the given answer. For example, the first answer given above is 'emat', with corresponding child answers of '' and 'tame'. The ordering of these sub-answers corresponds to the ordering in the `(rev (rev_ spin) (rev_arg (syn broken)))`, so the `rev_` reversal marker produced an empty string ('') and the `syn` synonym operator acting on 'broken' produced 'tame'. Reversing 'tame' gave the answer 'emit'. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also ask any clue how it got a particular answer:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[-1][0][0].derivation('emat')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(rev (rev_ \"spin\") (syn \"broken\" -> TAME) -> EMAT)\n" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If it's not clear what this means, we can also ask for a human-readible derivation:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[-1][0][0].long_derivation('emat')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Take a synonym of 'broken' to get TAME.\n", "'spin' means to reverse 'tame' to get EMAT.\n" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can ask for the derivation or long derivation of any of this sub-clue's answers:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[-1][0][0].long_derivation('tsrub')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Take a synonym of 'broken' to get BURST.\n", "'spin' means to reverse 'burst' to get TSRUB.\n" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's find a clue that actually produces a final answer. How about this one:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[9]\n", "print \"\\nAnswers:\", clues[9].answers\n", "print clues[9].long_derivation('english')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(top (d spin) (clue_arg (ana (ana_ broken) (ana_arg (lit shingle)))))\n", "\n", "Answers: {'english': ['', 'english']}\n", "\n", "'spin' is the definition.\n", "'broken' means to anagram 'shingle' to get ENGLISH.\n" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In general, it would seem like generating all of these different clue interpretations and then solving them separately would be terribly inefficient. However, the way our parser is constructed ensures that identical sub-clues actually refer to the same python object, so solving one sub-clue solves all of its instances contained in other clues. We can check this easily:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[0]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(top (d spin) (clue_arg (first broken)) (clue_arg (first shingle)))\n" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[1]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(top (d spin) (clue_arg (first broken)) (clue_arg (lit shingle)))\n" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[0][1]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(clue_arg (first broken))\n" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "print clues[1][1]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(clue_arg (first broken))\n" ] } ], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "clues[0][1] is clues[1][1] # 'is' tests whether two variables refer to the same python object" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 21, "text": [ "True" ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scoring Answers\n", "Now let's try actually searching for the *best* answer. For this, we'll need the full `CrypticClueSolver`" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from pycryptics.solve_clue import CrypticClueSolver" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "solver = CrypticClueSolver()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 23 }, { "cell_type": "code", "collapsed": false, "input": [ "annotated_answers = solver.solve_constraints(con)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": [ "for a in annotated_answers:\n", " print a" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['english', 1, '(top (d \"spin\") (ana (ana_ \"broken\") (lit \"shingle\") -> ENGLISH))']\n", "['violate', 0, '(top (sub_init (sub_init_ \"spin\") (syn \"broken\" -> VIOLATED) -> VIOLATE) (d \"shingle\"))']\n", "['reached', 0, '(top (sub_final (sub_final_ \"spin\") (syn \"broken\" -> BREACHED) -> REACHED) (d \"shingle\"))']\n", "['divulge', 0, '(top (sub_init (sub_init_ \"spin\") (syn \"broken\" -> DIVULGED) -> DIVULGE) (d \"shingle\"))']\n", "['confuse', 0, '(top (sub_init (sub_init_ \"spin\") (syn \"broken\" -> CONFUSED) -> CONFUSE) (d \"shingle\"))']\n" ] } ], "prompt_number": 25 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `annotated_answers` contains a list of possible solutions, including the actual answer, its confidence score (between 0 and 1), and its short derivation, and it's automatically sorted to put the best answers first. We can also check the long derivation of any answer:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print annotated_answers[0].long_derivation()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "'spin' is the definition.\n", "'broken' means to anagram 'shingle' to get ENGLISH.\n", "ENGLISH matches 'spin' with confidence score 100%.\n" ] } ], "prompt_number": 26 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Combining Phrases\n", "Now let's try a slightly harder clue:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "phrases = 'couch unfinished until now'.split()\n", "con = Constraints(phrases=phrases,\n", " lengths=(4,),\n", " pattern='',\n", " known_answer=None)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's just try solving this the same way as before:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ann = solver.solve_constraints(con)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 28 }, { "cell_type": "code", "collapsed": false, "input": [ "print ann[0].long_derivation()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "'couch' is the definition.\n", "Take a synonym of 'until' to get BEFORE.\n", "'unfinished' means to take an initial substring of 'before' to get BE.\n", "Take a synonym of 'now' to get AD.\n", "Combine 'BE' and 'AD' to get BEAD.\n", "BEAD matches 'couch' with confidence score 56%.\n" ] } ], "prompt_number": 29 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is not a very good answer, since it only matches with 56% confidence. In fact, the correct answer is SOFA, which we get by taking a synonym of \"until now\" to get \"SO FAR\" and then removing the last letter to get SOFA. So what went wrong? Well, the solver treated \"until\" and \"now\" as separate phrases, so it looked for synonyms of both of them, but not for the phrase \"until now\". To make sure we get the right combinations of phrases, the solver can actually search over all possible combinations of phrases to find the best answer:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ann = solver.solve_all_phrasings(con)\n", "print ann[0].long_derivation()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['couch', 'unfinished', 'until', 'now']\n", "['couch', 'unfinished', 'until_now']\n", "['couch', 'unfinished_until', 'now']" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "['couch', 'unfinished_until_now']\n", "['couch_unfinished', 'until', 'now']" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "['couch_unfinished', 'until_now']\n", "['couch_unfinished_until', 'now']\n", "\n", "'couch' is the definition.\n", "Take a synonym of 'until_now' to get SO_FAR.\n", "'unfinished' means to take an initial substring of 'so_far' to get SOFA.\n", "SOFA matches 'couch' with confidence score 100%.\n" ] } ], "prompt_number": 30 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The lists which are printed above show all the combinations of phrases that the solver tried. We can see that it got the correct answer by combining \"until now\" into a single phrase, which we've shown by joining those words with an underscore." ] } ], "metadata": {} } ] }