{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Translating English Sentences into Propositional Logic Statements\n",
    "\n",
    "In a Logic course, one exercise is to turn an English sentence like this:\n",
    "\n",
    "> *Sieglinde will survive, and either her son will gain the Ring and Wotan’s plan will be fulfilled or else Valhalla will be destroyed.*\n",
    "\n",
    "Into a formal Propositional Logic statement: \n",
    "\n",
    "    P ⋀ ((Q ⋀ R) ∨ S)\n",
    "    \n",
    "along with definitions of the propositions:\n",
    "\n",
    "    P: Sieglinde will survive\n",
    "    Q: Sieglinde’s son will gain the Ring\n",
    "    R: Wotan’s plan will be fulfilled\n",
    "    S: Valhalla will be destroyed\n",
    "\n",
    "For some sentences, it takes detailed knowledge to get a good translation. The following two sentences are ambiguous, with different preferred interpretations, and translating them correctly requires knowledge of eating habits:\n",
    "\n",
    "    I will eat salad or I will eat bread and I will eat butter.     P ∨ (Q ⋀ R)\n",
    "    I will eat salad or I will eat soup  and I will eat ice cream. (P ∨ Q) ⋀ R\n",
    "\n",
    "But for many sentences, the translation process is automatic, with no special knowledge required.  I will develop a program to handle these easy sentences. The program is based on the idea of a series of translation rules of the form:\n",
    "\n",
    "    Rule('{P} ⇒ {Q}', 'if {P} then {Q}', 'if {P}, {Q}')\n",
    "    \n",
    "which means that the logic translation will have the form `'P ⇒ Q'`, whenever the English sentence has either the form `'if P then Q'` or  `'if P, Q'`, where `P` and `Q` can match any non-empty subsequence of characters.  Whatever matches `P` and `Q` will be recursively processed by the rules. The rules are in order&mdash;top to bottom, left to right, and the first rule that matches in that order will be accepted, no matter what, so be sure you order your rules carefully. One guideline I have adhered to is to put all the rules that start with a keyword (like `'if'` or `'neither'`) before the rules that start with a variable (like `'{P}'`); that way you avoid accidentally having a keyword swallowed up inside a `'{P}'`.\n",
    "\n",
    "Consider the example sentence `\"If loving you is wrong, I don't want to be right.\"` This should match the pattern \n",
    "`'if {P}, {Q}'` with the variable `P` equal to `\"loving you is wrong\"`. But I don't want the variable `Q` to be \n",
    "`\"I don't want to be right\"`, rather, I want to have `～Q` equal to `\"I do want to be right\"`. So in addition to having a set of `Rule`s to handle the `'if {P}, {Q}'` patterns, I will also have a list of `negations` to handle `\"don't\"` and the like.\n",
    "\n",
    "Here is the code to process `Rule` definitions (using [regular expressions](https://docs.python.org/3.5/library/re.html), which can sometimes be confusing.)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "\n",
    "def Rule(output, *patterns):\n",
    "    \"A rule that produces `output` if the entire input matches any one of the `patterns`.\" \n",
    "    return (output, [name_group(pat) + '$' for pat in patterns])\n",
    "\n",
    "def name_group(pat):\n",
    "    \"Replace '{Q}' with '(?P<Q>.+?)', which means 'match 1 or more characters, and call it Q'\"\n",
    "    return re.sub('{(.)}', r'(?P<\\1>.+?)', pat)\n",
    "            \n",
    "def word(w):\n",
    "    \"Return a regex that matches w as a complete word (not letters inside a word).\"\n",
    "    return r'\\b' + w + r'\\b' # '\\b' matches at word boundary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see what a rule looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(('{P} ⇒ {Q}',\n",
       "  ['if (?P<P>.+?) then (?P<Q>.+?)$', 'if (?P<P>.+?), (?P<Q>.+?)$']),)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Rule('{P} ⇒ {Q}', 'if {P} then {Q}', 'if {P}, {Q}'),"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And now the actual rules. If your sentence is not translated correctly, you can attempt to augment these rules to handle your sentence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "rules = [\n",
    "    Rule('{P} ⇒ {Q}',         'if {P} then {Q}', 'if {P}, {Q}'),\n",
    "    Rule('{P} ⋁ {Q}',          'either {P} or else {Q}', 'either {P} or {Q}'),\n",
    "    Rule('{P} ⋀ {Q}',          'both {P} and {Q}'),\n",
    "    Rule('～{P} ⋀ ～{Q}',       'neither {P} nor {Q}'),\n",
    "    Rule('～{A}{P} ⋀ ～{A}{Q}', '{A} neither {P} nor {Q}'), # The Kaiser neither ...\n",
    "    Rule('～{Q} ⇒ {P}',        '{P} unless {Q}'),\n",
    "    Rule('{P} ⇒ {Q}',          '{Q} provided that {P}', '{Q} whenever {P}', \n",
    "                               '{P} implies {Q}', '{P} therefore {Q}', \n",
    "                               '{Q}, if {P}', '{Q} if {P}', '{P} only if {Q}'),\n",
    "    Rule('{P} ⋀ {Q}',          '{P} and {Q}', '{P} but {Q}'),\n",
    "    Rule('{P} ⋁ {Q}',          '{P} or else {Q}', '{P} or {Q}'),\n",
    "    ]\n",
    "\n",
    "negations = [\n",
    "    (word(\"not\"), \"\"),\n",
    "    (word(\"cannot\"), \"can\"),\n",
    "    (word(\"can't\"), \"can\"),\n",
    "    (word(\"won't\"), \"will\"),\n",
    "    (word(\"ain't\"), \"is\"),\n",
    "    (\"n't\", \"\"), # matches as part of a word: didn't, couldn't, etc.\n",
    "    ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now the mechanism to process these rules. The key function is `match_rule`, which matches an English sentence against a rule. The function returns two values, a string representing the translation of the English sentence into logic, and `defs`, a dictionary of `{Variable: \"value\"}` pairs. If `match_rule` finds that the rule matches, it recursively calls `match_rules` to match each of the subgroups of the regular expression (the `P` and `Q` in `if {P}, then {Q}`).\n",
    "The function `match_literal` handles negations, and is where the `defs` dictionary actually gets updated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def match_rules(sentence, rules, defs):\n",
    "    \"\"\"Match sentence against all the rules, accepting the first match; or else make it an atom.\n",
    "    Return two values: the Logic translation and a dict of {P: 'english'} definitions.\"\"\"\n",
    "    sentence = clean(sentence)\n",
    "    for rule in rules:\n",
    "        result = match_rule(sentence, rule, defs)\n",
    "        if result: \n",
    "            return result\n",
    "    return match_literal(sentence, negations, defs)\n",
    "        \n",
    "def match_rule(sentence, rule, defs):\n",
    "    \"Match rule, returning the logic translation and the dict of definitions if the match succeeds.\"\n",
    "    output, patterns = rule\n",
    "    for pat in patterns:\n",
    "        match = re.match(pat, sentence, flags=re.I)\n",
    "        if match:\n",
    "            groups = match.groupdict()\n",
    "            for P in sorted(groups): # Recursively apply rules to each of the matching groups\n",
    "                groups[P] = match_rules(groups[P], rules, defs)[0]\n",
    "            return '(' + output.format(**groups) + ')', defs\n",
    "        \n",
    "def match_literal(sentence, negations, defs):\n",
    "    \"No rule matched; sentence is an atom. Add new proposition to defs. Handle negation.\"\n",
    "    polarity = ''\n",
    "    for (neg, pos) in negations:\n",
    "        (sentence, n) = re.subn(neg, pos, sentence, flags=re.I)\n",
    "        polarity += n * '～'\n",
    "    sentence = clean(sentence)\n",
    "    P = proposition_name(sentence, defs)\n",
    "    defs[P] = sentence\n",
    "    return polarity + P, defs\n",
    "    \n",
    "def proposition_name(sentence, defs, names='PQRSTUVWXYZBCDEFGHJKLMN'):\n",
    "    \"Return the old name for this sentence, if used before, or a new, unused name.\"\n",
    "    inverted = {defs[P]: P for P in defs}\n",
    "    if sentence in inverted:\n",
    "        return inverted[sentence]                      # Find previously-used name\n",
    "    else:\n",
    "        return next(P for P in names if P not in defs) # Use a new unused name\n",
    "    \n",
    "def clean(text): \n",
    "    \"Remove redundant whitespace; handle curly apostrophe and trailing comma/period.\"\n",
    "    return ' '.join(text.split()).replace(\"’\", \"'\").rstrip('.').rstrip(',')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('(P ⇒ ～Q)', {'P': 'loving you is wrong', 'Q': 'I do want to be right'})"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "match_rule(\"If loving you is wrong, I don't want to be right\",\n",
    "           Rule('{P} ⇒ {Q}', 'if {P}, {Q}'),\n",
    "           {})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are some more test sentences and a top-level function to handle them:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "English: Polkadots and Moonbeams. \n",
      "\n",
      "Logic: (P ⋀ Q)\n",
      "P: Polkadots\n",
      "Q: Moonbeams\n",
      "\n",
      "English: If you liked it then you shoulda put a ring on it. \n",
      "\n",
      "Logic: (P ⇒ Q)\n",
      "P: you liked it\n",
      "Q: you shoulda put a ring on it\n",
      "\n",
      "English: If you build it, he will come. \n",
      "\n",
      "Logic: (P ⇒ Q)\n",
      "P: you build it\n",
      "Q: he will come\n",
      "\n",
      "English: It don't mean a thing, if it ain't got that swing. \n",
      "\n",
      "Logic: (～P ⇒ ～Q)\n",
      "P: it is got that swing\n",
      "Q: It do mean a thing\n",
      "\n",
      "English: If loving you is wrong, I don't want to be right. \n",
      "\n",
      "Logic: (P ⇒ ～Q)\n",
      "P: loving you is wrong\n",
      "Q: I do want to be right\n",
      "\n",
      "English: Should I stay or should I go. \n",
      "\n",
      "Logic: (P ⋁ Q)\n",
      "P: Should I stay\n",
      "Q: should I go\n",
      "\n",
      "English: I shouldn't go and I shouldn't not go. \n",
      "\n",
      "Logic: (～P ⋀ ～～P)\n",
      "P: I should go\n",
      "\n",
      "English: If I fell in love with you, would you promise to be true and help me\n",
      "understand. \n",
      "\n",
      "Logic: (P ⇒ (Q ⋀ R))\n",
      "P: I fell in love with you\n",
      "Q: would you promise to be true\n",
      "R: help me understand\n",
      "\n",
      "English: I could while away the hours conferrin' with the flowers, consulting\n",
      "with the rain and my head I'd be a scratchin' while my thoughts are busy\n",
      "hatchin' if I only had a brain. \n",
      "\n",
      "Logic: (P ⇒ (Q ⋀ R))\n",
      "P: I only had a brain\n",
      "Q: I could while away the hours conferrin' with the flowers, consulting with the rain\n",
      "R: my head I'd be a scratchin' while my thoughts are busy hatchin'\n",
      "\n",
      "English: There's a federal tax, and a state tax, and a city tax, and a street\n",
      "tax, and a sewer tax. \n",
      "\n",
      "Logic: (P ⋀ (Q ⋀ (R ⋀ (S ⋀ T))))\n",
      "P: There's a federal tax\n",
      "Q: a state tax\n",
      "R: a city tax\n",
      "S: a street tax\n",
      "T: a sewer tax\n",
      "\n",
      "English: A ham sandwich is better than nothing and nothing is better than\n",
      "eternal happiness therefore a ham sandwich is better than eternal happiness. \n",
      "\n",
      "Logic: ((P ⋀ Q) ⇒ R)\n",
      "P: A ham sandwich is better than nothing\n",
      "Q: nothing is better than eternal happiness\n",
      "R: a ham sandwich is better than eternal happiness\n",
      "\n",
      "English: If I were a carpenter and you were a lady, would you marry me anyway?\n",
      "and would you have my baby. \n",
      "\n",
      "Logic: ((P ⋀ Q) ⇒ (R ⋀ S))\n",
      "P: I were a carpenter\n",
      "Q: you were a lady\n",
      "R: would you marry me anyway?\n",
      "S: would you have my baby\n",
      "\n",
      "English: Either Danny didn't come to the party or Virgil didn't come to the\n",
      "party. \n",
      "\n",
      "Logic: (～P ⋁ ～Q)\n",
      "P: Danny did come to the party\n",
      "Q: Virgil did come to the party\n",
      "\n",
      "English: Either Wotan will triumph and Valhalla will be saved or else he won't\n",
      "and Alberic will have the final word. \n",
      "\n",
      "Logic: ((P ⋀ Q) ⋁ (～R ⋀ S))\n",
      "P: Wotan will triumph\n",
      "Q: Valhalla will be saved\n",
      "R: he will\n",
      "S: Alberic will have the final word\n",
      "\n",
      "English: Sieglinde will survive, and either her son will gain the Ring and\n",
      "Wotan's plan will be fulfilled or else Valhalla will be destroyed. \n",
      "\n",
      "Logic: (P ⋀ ((Q ⋀ R) ⋁ S))\n",
      "P: Sieglinde will survive\n",
      "Q: her son will gain the Ring\n",
      "R: Wotan's plan will be fulfilled\n",
      "S: Valhalla will be destroyed\n",
      "\n",
      "English: Wotan will intervene and cause Siegmund's death unless either Fricka\n",
      "relents or Brunnhilde has her way. \n",
      "\n",
      "Logic: (～(R ⋁ S) ⇒ (P ⋀ Q))\n",
      "P: Wotan will intervene\n",
      "Q: cause Siegmund's death\n",
      "R: Fricka relents\n",
      "S: Brunnhilde has her way\n",
      "\n",
      "English: Figaro and Susanna will wed provided that either Antonio or Figaro pays\n",
      "and Bartolo is satisfied or else Marcellina's contract is voided and the\n",
      "Countess does not act rashly. \n",
      "\n",
      "Logic: ((((P ⋁ Q) ⋀ R) ⋁ (S ⋀ ～T)) ⇒ (U ⋀ V))\n",
      "P: Antonio\n",
      "Q: Figaro pays\n",
      "R: Bartolo is satisfied\n",
      "S: Marcellina's contract is voided\n",
      "T: the Countess does act rashly\n",
      "U: Figaro\n",
      "V: Susanna will wed\n",
      "\n",
      "English: If the Kaiser neither prevents Bismarck from resigning nor supports the\n",
      "Liberals, then the military will be in control and either Moltke's plan will be\n",
      "executed or else the people will revolt and the Reich will not survive. \n",
      "\n",
      "Logic: ((～PQ ⋀ ～PR) ⇒ (S ⋀ (T ⋁ (U ⋀ ～V))))\n",
      "P: the Kaiser\n",
      "Q: prevents Bismarck from resigning\n",
      "R: supports the Liberals\n",
      "S: the military will be in control\n",
      "T: Moltke's plan will be executed\n",
      "U: the people will revolt\n",
      "V: the Reich will survive\n"
     ]
    }
   ],
   "source": [
    "sentences = '''\n",
    "Polkadots and Moonbeams.\n",
    "If you liked it then you shoulda put a ring on it.\n",
    "If you build it, he will come.\n",
    "It don't mean a thing, if it ain't got that swing.\n",
    "If loving you is wrong, I don't want to be right.\n",
    "Should I stay or should I go.\n",
    "I shouldn't go and I shouldn't not go.\n",
    "If I fell in love with you,\n",
    "  would you promise to be true\n",
    "  and help me understand.\n",
    "I could while away the hours\n",
    "  conferrin' with the flowers,\n",
    "  consulting with the rain\n",
    "  and my head I'd be a scratchin'\n",
    "  while my thoughts are busy hatchin'\n",
    "  if I only had a brain.\n",
    "There's a federal tax, and a state tax, and a city tax, and a street tax, and a sewer tax.\n",
    "A ham sandwich is better than nothing \n",
    "  and nothing is better than eternal happiness\n",
    "  therefore a ham sandwich is better than eternal happiness.\n",
    "If I were a carpenter\n",
    "  and you were a lady,\n",
    "  would you marry me anyway?\n",
    "  and would you have my baby.\n",
    "Either Danny didn't come to the party or Virgil didn't come to the party.\n",
    "Either Wotan will triumph and Valhalla will be saved or else he won't and Alberic will have \n",
    "  the final word.\n",
    "Sieglinde will survive, and either her son will gain the Ring and Wotan’s plan \n",
    "  will be fulfilled or else Valhalla will be destroyed.\n",
    "Wotan will intervene and cause Siegmund's death unless either Fricka relents \n",
    "  or Brunnhilde has her way.\n",
    "Figaro and Susanna will wed provided that either Antonio or Figaro pays and Bartolo is satisfied \n",
    "  or else Marcellina’s contract is voided and the Countess does not act rashly.\n",
    "If the Kaiser neither prevents Bismarck from resigning nor supports the Liberals, \n",
    "  then the military will be in control and either Moltke's plan will be executed \n",
    "  or else the people will revolt and the Reich will not survive'''.split('.')\n",
    "\n",
    "import textwrap\n",
    "\n",
    "def logic(sentences, width=80): \n",
    "    \"Match the rules against each sentence in text, and print each result.\"\n",
    "    for s in map(clean, sentences):\n",
    "        logic, defs = match_rules(s, rules, {})\n",
    "        print('\\n' + textwrap.fill('English: ' + s +'.', width), '\\n\\nLogic:', logic)\n",
    "        for P in sorted(defs):\n",
    "            print('{}: {}'.format(P, defs[P]))\n",
    "\n",
    "logic(sentences)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That looks pretty good! But far from perfect.  Here are some errors:\n",
    "\n",
    "* `Should I stay` *etc.*:<br>questions are not propositional statements.\n",
    "\n",
    "* `If I were a carpenter`:<br>doesn't handle modal logic.\n",
    "\n",
    "* `nothing is better`:<br>doesn't handle quantifiers.\n",
    "\n",
    "* `Either Wotan will triumph and Valhalla will be saved or else he won't`:<br>gets `'he will'` as one of the propositions, but better would be if that referred back to `'Wotan will triumph'`.\n",
    "\n",
    "* `Wotan will intervene and cause Siegmund's death`:<br>gets `\"cause Siegmund's death\"` as a proposition, but better would be `\"Wotan will cause Siegmund's death\"`.\n",
    "\n",
    "* `Figaro and Susanna will wed`:<br>gets `\"Figaro\"` and `\"Susanna will wed\"` as two separate propositions; this should really be one proposition. \n",
    "\n",
    "* `\"either Antonio or Figaro pays\"`:<br>gets `\"Antonio\"` as a proposition, but it should be `\"Antonio pays\"`.\n",
    "\n",
    "* `If the Kaiser neither prevents`:<br>uses the somewhat bogus propositions `PQ` and `PR`. This should be done in a cleaner way. The problem is the same as the previous problem with Antonio: I don't have a good way to attach the subject of a verb phrase to the multiple parts of the verb/object, when there are multiple parts.\n",
    "\n",
    "\n",
    "\n",
    "I'm sure more test sentences would reveal many more types of errors.\n",
    "\n",
    "There's also [a version](proplogic.py) of this program that is in Python 2 and uses only ASCII characters; if you have a Mac or Linux system you can download this as [`proplogic.py`](proplogic.py) and run it with the command `python proplogic.py`. Or you can run it [online](https://www.pythonanywhere.com/user/pnorvig/files/home/pnorvig/proplogic.py?edit)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}