{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"text-align: right\" align=\"right\"><i>Peter Norvig, 3 Jan 2020</i></div>\n",
    "\n",
    "# Spelling Bee Puzzle\n",
    "\n",
    "The [3 Jan. 2020 edition of the **538 Riddler**](https://fivethirtyeight.com/features/can-you-solve-the-vexing-vexillology/) concerns the popular NYTimes  [**Spelling Bee**](https://www.nytimes.com/puzzles/spelling-bee) puzzle:\n",
    "\n",
    "> In this game, seven letters are arranged in a **honeycomb** lattice, with one letter in the center. Here’s the lattice from Dec. 24, 2019:\n",
    "> \n",
    "> <img src=\"https://fivethirtyeight.com/wp-content/uploads/2020/01/Screen-Shot-2019-12-24-at-5.46.55-PM.png\" width=\"150\">\n",
    "> \n",
    "> The goal is to identify as many words as possible that meet the following criteria:\n",
    "> 1. The word must be at least four letters long.\n",
    "> 2. The word must include the central letter.\n",
    "> 3. The word cannot include any letter beyond the seven given letters.\n",
    ">\n",
    ">Note that letters can be repeated. For example,  GAME and AMALGAM are both acceptable words. Four-letter words are worth 1 point each, while five-letter words are worth 5 points, six-letter words are worth 6 points, etc. Words that use all seven letters in the honeycomb are known as **pangrams** and earn 7 bonus points (in addition to the points for the length of the word). So in the above example, MEGAPLEX is worth 8 + 7 = 15 points.\n",
    ">\n",
    "> ***Which seven-letter honeycomb results in the highest possible score?*** To be a valid choice of seven letters, no letter can be repeated, it must not contain the letter S (that would be too easy) and there must be at least one pangram.\n",
    ">\n",
    "> For consistency, please use [this word list](https://norvig.com/ngrams/enable1.txt) to check your score.\n",
    "\n",
    "\n",
    "Since the referenced word list came from [***my*** web site](https://norvig.com/ngrams), I felt  compelled to solve this puzzle.  (Note the word list is a standard public domain Scrabble® dictionary that I happen to host a copy of; I didn't curate it, Mendel Cooper and Alan Beale did.) \n",
    "\n",
    "I'll show you how I address the problem. First some imports:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import defaultdict\n",
    "from dataclasses import dataclass\n",
    "from itertools   import combinations\n",
    "from typing      import List, Set, Dict, Tuple, Iterable "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Letters, Words, Lettersets, and Pangrams\n",
    "\n",
    "Let's start by defining the most basic terms:\n",
    "\n",
    "- **Valid letter**: the valid letters are uppercase 'A' to 'Z', but not 'S'.\n",
    "- **Word**: A string of letters.\n",
    "- **Word list**: a list of valid words.\n",
    "- **Valid word**: a word of at least 4 valid letters and not more than 7 distinct letters.\n",
    "- **Letterset**: the set of distinct letters in a word; e.g. letterset('BOOBOO') = 'BO'.\n",
    "- **Pangram**: a valid word with exactly 7 distinct letters.\n",
    "- **Pangram letterset**: the letterset for a pangram."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "valid_letters = set('ABCDEFGHIJKLMNOPQR' + 'TUVWXYZ')\n",
    "Letter    = str # A string of one letter\n",
    "Word      = str # A string of multiple letters\n",
    "Letterset = str # A sorted string of distinct letters\n",
    "\n",
    "def word_list(text: str) -> List[Word]: \n",
    "    \"\"\"All the valid words in a text.\"\"\"\n",
    "    return [w for w in text.upper().split() if is_valid(w)]\n",
    "\n",
    "def is_valid(word) -> bool: \n",
    "    return len(word) >= 4 and valid_letters.issuperset(word) and len(set(word)) <= 7\n",
    "\n",
    "def letterset(word) -> Letterset:\n",
    "    \"\"\"The set of distinct letters in a word, represented as a sorted str.\"\"\"\n",
    "    return ''.join(sorted(set(word)))\n",
    "\n",
    "def is_pangram(word) -> bool: return len(set(word)) == 7\n",
    "\n",
    "def pangram_lettersets(wordlist) -> Set[Letterset]:\n",
    "    \"\"\"All lettersets from the pangram words in wordlist.\"\"\"\n",
    "    return {letterset(w) for w in wordlist if is_pangram(w)}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Why did I  represent a `Letterset` as a sorted string of distinct letters, and not a `set`? Because:\n",
    "- A `set` can't be the key of a dict.\n",
    "- A `frozenset` could be a key, and would be a reasonable choice for `Letterset`, but it:\n",
    "  - Takes up more memory than a `str`.\n",
    "  - Is harder to read when debugging: `frozenset({'A', 'G', 'L', 'M'})` versus `'AGLM'`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's a mini word list to experiment with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['AMALGAM', 'CACCIATORE', 'EROTICA', 'GAME', 'GLAM', 'MEGAPLEX']"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mini = word_list('amalgam amalgamation cacciatore erotica em game gem gems glam megaplex')\n",
    "mini"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that `em` and `gem` are too short, `gems` has an `s`, and `amalgamation` has 8 distinct letters. We're left with six valid words out of the ten candidate words. The pangrams and their lettersets are as follows; there are three pangrams but only two pangram lettersets because CACCIATORE and EROTICA have the same letterset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'CACCIATORE': 'ACEIORT', 'EROTICA': 'ACEIORT', 'MEGAPLEX': 'AEGLMPX'}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "assert len(pangram_lettersets(mini)) == 2\n",
    "\n",
    "{w: letterset(w) for w in mini if is_pangram(w)}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Honeycombs and Scoring\n",
    "\n",
    "- A **honeycomb** lattice consists of two attributes:\n",
    " - A letterset of seven distinct letters\n",
    " - A single distinguished center letter\n",
    "- The **word score** is 1 point for a 4-letter word, or the word length for longer words, plus 7 bonus points for a pangram.\n",
    "- The **total score** for a honeycomb is the sum of the word scores for the words that the honeycomb **can make**. \n",
    "- A honeycomb **can make** a word if the word contains the honeycomb's center, and every letter in the word is in the honeycomb. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "@dataclass(frozen=True, order=True)\n",
    "class Honeycomb:\n",
    "    \"\"\"A Honeycomb lattice, with 7 letters, 1 of which is the center.\"\"\"\n",
    "    letters: Letterset \n",
    "    center:  Letter\n",
    "        \n",
    "def word_score(word) -> int: \n",
    "    \"\"\"The points for this word, including bonus for pangram.\"\"\"\n",
    "    return 1 if len(word) == 4 else (len(word) + 7 * is_pangram(word))\n",
    "\n",
    "def total_score(honeycomb, wordlist) -> int:\n",
    "    \"\"\"The total score for this honeycomb.\"\"\"\n",
    "    return sum(word_score(w) for w in wordlist if can_make(honeycomb, w))\n",
    "\n",
    "def can_make(honeycomb, word) -> bool:\n",
    "    \"\"\"Can the honeycomb make this word?\"\"\"\n",
    "    return honeycomb.center in word and all(L in honeycomb.letters for L in word)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is the honeycomb from the diagram at the top of the notebook:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Honeycomb(letters='AEGLMPX', center='G')"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hc = Honeycomb(letterset('LAPGEMX'), 'G')\n",
    "hc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The word scores, makeable words, and total score  for this honeycomb on the `mini` word list are as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'AMALGAM': 7,\n",
       " 'CACCIATORE': 17,\n",
       " 'EROTICA': 14,\n",
       " 'GAME': 1,\n",
       " 'GLAM': 1,\n",
       " 'MEGAPLEX': 15}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{w: word_score(w) for w in mini}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'AMALGAM', 'GAME', 'GLAM', 'MEGAPLEX'}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{w for w in mini if can_make(hc, w)}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "24"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "total_score(hc, mini) # 7 + 1 + 1 + 15"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Finding the Top-Scoring Honeycomb\n",
    "\n",
    "A simple strategy for finding the top (highest total score) honeycomb is:\n",
    " - Compile a list of all valid candidate honeycombs.\n",
    " - For each honeycomb, compute the total score.\n",
    " - Return a (score, honeycomb) tuple for a honeycomb with the maximum score."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "def top_honeycomb(wordlist) -> Tuple[int, Honeycomb]: \n",
    "    \"\"\"Find a (score, honeycomb) tuple with a highest-scoring honeycomb.\"\"\"\n",
    "    return max((total_score(h, wordlist), h) \n",
    "               for h in candidate_honeycombs(wordlist))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What are the possible candidate honeycombs? We could try all letters in all slots, but that's a **lot** of honeycombs:\n",
    "- The center can be any valid letter (25 choices, because 'S' is not allowed).\n",
    "- The outside can be any six of the remaining 24 letters.\n",
    "- All together, that's 25 × (24 choose 6) = 3,364,900 candidate honeycombs.\n",
    "\n",
    "Fortunately, we can use the constraint that  **there must be at least one pangram** in a valid honeycomb.  So the letters of any valid honeycomb must ***be*** the letterset of some pangram (and the center can be any one of the seven letters):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "def candidate_honeycombs(wordlist) -> List[Honeycomb]:\n",
    "    \"\"\"Valid honeycombs have pangram letters, with any center.\"\"\"\n",
    "    return [Honeycomb(letters, center) \n",
    "            for letters in pangram_lettersets(wordlist)\n",
    "            for center in letters]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Honeycomb(letters='AEGLMPX', center='A'),\n",
       " Honeycomb(letters='AEGLMPX', center='E'),\n",
       " Honeycomb(letters='AEGLMPX', center='G'),\n",
       " Honeycomb(letters='AEGLMPX', center='L'),\n",
       " Honeycomb(letters='AEGLMPX', center='M'),\n",
       " Honeycomb(letters='AEGLMPX', center='P'),\n",
       " Honeycomb(letters='AEGLMPX', center='X'),\n",
       " Honeycomb(letters='ACEIORT', center='A'),\n",
       " Honeycomb(letters='ACEIORT', center='C'),\n",
       " Honeycomb(letters='ACEIORT', center='E'),\n",
       " Honeycomb(letters='ACEIORT', center='I'),\n",
       " Honeycomb(letters='ACEIORT', center='O'),\n",
       " Honeycomb(letters='ACEIORT', center='R'),\n",
       " Honeycomb(letters='ACEIORT', center='T')]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "candidate_honeycombs(mini) # 7 candidates for each of the 2 pangram lettersets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we're ready to find the highest-scoring honeycomb with respect to the `mini` word list:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(31, Honeycomb(letters='ACEIORT', center='T'))"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top_honeycomb(mini)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The program appears to work. But that's just the mini word list. \n",
    "\n",
    "# Big Word List\n",
    "\n",
    "Here's the big word list:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "aa\n",
      "aah\n",
      "aahed\n",
      "aahing\n",
      "aahs\n",
      "aal\n",
      "aalii\n",
      "aaliis\n",
      "aals\n",
      "aardvark\n",
      "  172820 enable1.txt\n"
     ]
    }
   ],
   "source": [
    "! [ -e  enable1.txt ] || curl -O http://norvig.com/ngrams/enable1.txt\n",
    "! head  enable1.txt\n",
    "! wc -w enable1.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "file = 'enable1.txt'\n",
    "big  = word_list(open(file).read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are some statistics:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "172,820 total words\n",
      " 44,585 valid Spelling Bee words\n",
      " 14,741 pangram words\n",
      "  7,986 distinct pangram lettersets\n",
      " 55,902 candidate honeycombs\n"
     ]
    }
   ],
   "source": [
    "print(f\"\"\"172,820 total words\n",
    "{len(big):7,d} valid Spelling Bee words\n",
    "{sum(map(is_pangram, big)):7,d} pangram words\n",
    "{len(pangram_lettersets(big)):7,d} distinct pangram lettersets\n",
    "{len(candidate_honeycombs(big)):7,d} candidate honeycombs\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How long will it take to run `top_honeycomb(big)`? Most of the computation time is in `total_score`, which is called once for each of the 55,902 candidate honeycombs, so let's estimate the total time by first checking how long it takes to compute the total score of a single honeycomb:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "8.58 ms ± 28.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit total_score(hc, big)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Roughly 9 milliseconds for one honeycomb. For all 55,902 valid honeycombs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "503.11799999999994"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    ".009 * 55902"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "About 500 seconds, which is under 9 minutes. I could run `top_honeycomb(big)`, get a coffee, come back, and declare victory. \n",
    "\n",
    "But I think that a puzzle like this deserves a more elegant solution.  \n",
    "\n",
    "# Faster Scoring: Points Table\n",
    "\n",
    "Here's an idea to make `total_score` faster by doing some precomputation:\n",
    "\n",
    "1. Do the following computation only once:\n",
    "  - Compute the `letterset` and `word_score` for each word in the word list. \n",
    "  - Make a table of `{letterset: sum_of_word_scores}` giving the total score for each letterset. \n",
    "  - I call this a **points table**.\n",
    "2. For each honeycomb, do the following:\n",
    "  - Consider every **letter subset** of the honeycomb's 7 letters that includes the center letter.\n",
    "  - Sum the points table entries for each of these letter subsets.\n",
    "\n",
    "The resulting algorithm, `fast_total_score`, iterates over just 2<sup>6</sup> – 1 = 63 letter subsets; much fewer than 44,585 valid words. The function `top_honeycomb2` creates the points table and calls `fast_total_score`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "def top_honeycomb2(wordlist) -> Tuple[int, Honeycomb]: \n",
    "    \"\"\"Find a (score, honeycomb) tuple with a highest-scoring honeycomb.\"\"\"\n",
    "    table = PointsTable(wordlist)\n",
    "    return max((fast_total_score(h, table), h) \n",
    "               for h in candidate_honeycombs(wordlist))\n",
    "\n",
    "class PointsTable(dict):\n",
    "    \"\"\"A table of {letterset: points} from words.\"\"\"\n",
    "    def __init__(self, wordlist):\n",
    "        for w in wordlist:\n",
    "            self[letterset(w)] += word_score(w)\n",
    "    def __missing__(self, key): return 0\n",
    "\n",
    "def letter_subsets(honeycomb) -> List[Letterset]:\n",
    "    \"\"\"The 63 subsets of the letters in the honeycomb, each including the center letter.\"\"\"\n",
    "    return [letters \n",
    "            for n in range(2, 8) \n",
    "            for letters in map(''.join, combinations(honeycomb.letters, n))\n",
    "            if honeycomb.center in letters]\n",
    "\n",
    "def fast_total_score(honeycomb, points_table) -> int:\n",
    "    \"\"\"The total score for this honeycomb, using a points table.\"\"\"\n",
    "    return sum(points_table[s] for s in letter_subsets(honeycomb))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is the points table for the mini word list:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'AGLM': 8, 'ACEIORT': 31, 'AEGM': 1, 'AEGLMPX': 15}"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table = PointsTable(mini)\n",
    "table"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The letterset `'AGLM'` gets 8 points (7 for AMALGAM and 1 for GLAM).  `'ACEIORT'` gets 31 points (17 for CACCIATORE and 14 for EROTICA). `'AEGM'` gets 1 for GAME and `'AEGLMPX'` gets 15 for MEGAPLEX. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is the honeycomb `hc` again, and its 63 letter subsets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Honeycomb(letters='AEGLMPX', center='G')"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['AG', 'EG', 'GL', 'GM', 'GP', 'GX', 'AEG', 'AGL', 'AGM', 'AGP', 'AGX', 'EGL', 'EGM', 'EGP', 'EGX', 'GLM', 'GLP', 'GLX', 'GMP', 'GMX', 'GPX', 'AEGL', 'AEGM', 'AEGP', 'AEGX', 'AGLM', 'AGLP', 'AGLX', 'AGMP', 'AGMX', 'AGPX', 'EGLM', 'EGLP', 'EGLX', 'EGMP', 'EGMX', 'EGPX', 'GLMP', 'GLMX', 'GLPX', 'GMPX', 'AEGLM', 'AEGLP', 'AEGLX', 'AEGMP', 'AEGMX', 'AEGPX', 'AGLMP', 'AGLMX', 'AGLPX', 'AGMPX', 'EGLMP', 'EGLMX', 'EGLPX', 'EGMPX', 'GLMPX', 'AEGLMP', 'AEGLMX', 'AEGLPX', 'AEGMPX', 'AGLMPX', 'EGLMPX', 'AEGLMPX']\n"
     ]
    }
   ],
   "source": [
    "print(letter_subsets(hc))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The total from `fast_total_score` is the sum of its letter subsets (only 3 of which are in `PointsTable(mini)`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert fast_total_score(hc, table) == 24 == table['AGLM'] + table['AEGM'] + table['AEGLMPX']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Finding the Top-Scoring Honeycomb"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now solve the puzzle on the big word list:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 1.59 s, sys: 3.69 ms, total: 1.59 s\n",
      "Wall time: 1.59 s\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "(3898, Honeycomb(letters='AEGINRT', center='R'))"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%time top_honeycomb2(big)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Wow! 3898 is a high score!** And the whole computation took **less than 2 seconds**!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Scoring Fewer Honeycombs: Branch and Bound\n",
    "\n",
    "A run time of less than 2 seconds to find the top honeycomb is pretty good! Can we do even better?\n",
    "\n",
    "The program would run faster if we scored fewer honeycombs. But if we want to be guaranteed of finding the top-scoring honeycomb, how can we skip any? Consider the pangram **JUKEBOX**. With the unusual letters  **J**, **K**,  and **X**, it scores poorly, regardless of the choice of center:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{Honeycomb(letters='BEJKOUX', center='J'): 26,\n",
       " Honeycomb(letters='BEJKOUX', center='U'): 32,\n",
       " Honeycomb(letters='BEJKOUX', center='K'): 26,\n",
       " Honeycomb(letters='BEJKOUX', center='E'): 37,\n",
       " Honeycomb(letters='BEJKOUX', center='B'): 49,\n",
       " Honeycomb(letters='BEJKOUX', center='O'): 39,\n",
       " Honeycomb(letters='BEJKOUX', center='X'): 15}"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "jk = [Honeycomb(letterset('JUKEBOX'), C) for C in 'JUKEBOX']\n",
    "\n",
    "{h: total_score(h, big) for h in jk}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We might be able to dismiss **JUKEBOX** in one call to `fast_total_score`, rather than seven, with this approach:\n",
    "- Keep track of the top score found so far.\n",
    "- For each pangram letterset, ask \"if we weren't required to use the center letter, what would this letterset score?\"\n",
    "- Check if that score (which is an **upper bound** of the score using any one center letter) is higher than the top score so far.\n",
    "  - If yes, then try the pangram letterset with all seven centers; \n",
    "  - If not then dismiss it without trying any centers.\n",
    "- This is called a [**branch and bound**](https://en.wikipedia.org/wiki/Branch_and_bound) algorithm: prune a  **branch** of 7 honeycombs if an upper **bound** can't beat the top score.\n",
    "\n",
    "*Note*: To represent a honeycomb with no center, I can just use `Honeycomb(p, '')`. This works because of a quirk of Python:  `letter_subsets` checks if `honeycomb.center in letters`; normally in Python the expression `e in s` means \"*is* `e` *an element of the collection* `s`\", but when `s` is a string it means \"*is* `e` *a substring of* `s`\", and the empty string is a substring of every string. \n",
    "\n",
    "I can rewrite `top_honeycomb` as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "def top_honeycomb3(words) -> Tuple[int, Honeycomb]: \n",
    "    \"\"\"Find a (score, honeycomb) tuple with a highest-scoring honeycomb.\"\"\"\n",
    "    table = PointsTable(words)\n",
    "    top_score, top_honeycomb = -1, None\n",
    "    pangrams = [s for s in table if len(s) == 7]\n",
    "    for p in pangrams:\n",
    "        if fast_total_score(Honeycomb(p, ''), table) > top_score:\n",
    "            for center in p:\n",
    "                honeycomb = Honeycomb(p, center)\n",
    "                score = fast_total_score(honeycomb, table)\n",
    "                if score > top_score:\n",
    "                    top_score, top_honeycomb = score, honeycomb\n",
    "    return top_score, top_honeycomb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 350 ms, sys: 1.78 ms, total: 352 ms\n",
      "Wall time: 351 ms\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "(3898, Honeycomb(letters='AEGINRT', center='R'))"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%time top_honeycomb3(big)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Awesome! We get the correct answer, and it runs four times faster than `top_honeycomb2`.\n",
    "\n",
    "# How many honeycombs does `top_honeycomb3` examine? \n",
    "\n",
    "We can use `functools.lru_cache` to make `Honeycomb` keep track:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CacheInfo(hits=0, misses=8084, maxsize=None, currsize=8084)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import functools\n",
    "Honeycomb = functools.lru_cache(None)(Honeycomb)\n",
    "top_honeycomb3(big)\n",
    "Honeycomb.cache_info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`top_honeycomb3`  examined 8,084 honeycombs; a 6.9× reduction from the 55,902 examined by `top_honeycomb2`. Since there are 7,986 pangram lettersets, that means we had to look at all 7 centers for only (8084-7986)/7 = 14 of them.\n",
    "\n",
    "How much faster is `fast_total_score` than `total_score` (which takes about 9 milliseconds per honeycomb)?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "26.4 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
     ]
    }
   ],
   "source": [
    "table = PointsTable(big)\n",
    "\n",
    "%timeit fast_total_score(hc, table)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that `fast_total_score` is about 300 times faster."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fancy Report\n",
    "\n",
    "I'd like to see the actual words that each honeycomb can make, and I'm curious about how the words are divided up by letterset. Here's a function to provide such a report. This report turned out to be more complicated than I anticipated. I guess it is difficult to create a practical extraction and reporting tool. I feel you, [Larry Wall](http://www.wall.org/~larry/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "from textwrap import fill\n",
    "\n",
    "def report(honeycomb=None, words=big):\n",
    "    \"\"\"Print stats, words, and word scores for the given honeycomb (or the top\n",
    "    honeycomb if no honeycomb is given) over the given word list.\"\"\"\n",
    "    bins    = group_by(words, key=letterset)\n",
    "    if honeycomb is None:\n",
    "        adj = \"Top \"\n",
    "        score, honeycomb = top_honeycomb3(words)\n",
    "    else:\n",
    "        adj = \"\"\n",
    "        score = total_score(honeycomb, words)\n",
    "    subsets = letter_subsets(honeycomb)\n",
    "    nwords  = sum(len(bins[s]) for s in subsets)\n",
    "    print(f'{adj}{honeycomb}: {Ns(nwords, \"word\")} {Ns(score, \"point\")}',\n",
    "          f'from a {len(words):,d} word list:\\n')\n",
    "    for s in sorted(subsets, key=lambda s: (-len(s), s)):\n",
    "        if bins[s]:\n",
    "            pts = sum(word_score(w) for w in bins[s])\n",
    "            wcount = Ns(len(bins[s]), \"pangram\" if is_pangram(s) else \"word\")\n",
    "            intro = f'{wcount:>8} {Ns(pts, \"point\"):>9} {{{s}}}: '\n",
    "            groups= group_by(bins[s], key=word_score)\n",
    "            strs  = [' '.join(w for w in sorted(groups[s])) + f' ({s})'\n",
    "                     for s in reversed(sorted(groups))]\n",
    "            print(fill(intro + ', '.join(strs), width=114, subsequent_indent=' '*4))\n",
    "            \n",
    "def Ns(n: int, noun: str) -> str:\n",
    "    \"\"\"A string with the int `n` followed by the plural or singular of noun:\n",
    "    Ns(3, 'bear') => '3 bears'; Ns(1, 'world') => '1 world'\"\"\"  \n",
    "    return f\"{n:,d} {noun}{' ' if n == 1 else 's'}\"\n",
    "\n",
    "def group_by(items, key) -> dict:\n",
    "    \"Group items into bins of a dict by key: {key(item): [item, ...], ...}.\"\n",
    "    bins = defaultdict(list)\n",
    "    for item in items:\n",
    "        bins[key(item)].append(item)\n",
    "    return bins"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are reports for the mini and full word lists:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Honeycomb(letters='AEGLMPX', center='G'): 4 words 24 points from a 6 word list:\n",
      "\n",
      "1 pangram  15 points {AEGLMPX}: MEGAPLEX (15)\n",
      " 1 word   1 point  {AEGM}: GAME (1)\n",
      " 2 words  8 points {AGLM}: AMALGAM (7), GLAM (1)\n"
     ]
    }
   ],
   "source": [
    "report(hc, mini)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Top Honeycomb(letters='ACEIORT', center='A'): 2 words 31 points from a 6 word list:\n",
      "\n",
      "2 pangrams 31 points {ACEIORT}: CACCIATORE (17), EROTICA (14)\n"
     ]
    }
   ],
   "source": [
    "report(words=mini)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Top Honeycomb(letters='AEGINRT', center='R'): 537 words 3,898 points from a 44,585 word list:\n",
      "\n",
      "50 pangrams 832 points {AEGINRT}: REAGGREGATING REINTEGRATING (20), ENTERTAINING INTENERATING REGENERATING\n",
      "    REINITIATING (19), AGGREGATING GRATINEEING INTEGRATING ITINERATING REATTAINING REINTEGRATE REITERATING\n",
      "    RETARGETING (18), ENTRAINING ENTREATING GARNIERITE GENERATING GREATENING INGRATIATE INTERREGNA INTREATING\n",
      "    REGRANTING RETRAINING RETREATING (17), ARGENTINE ARGENTITE GARTERING INTEGRATE INTERGANG ITERATING NATTERING\n",
      "    RATTENING REGRATING RETAGGING RETAINING RETEARING TANGERINE TARGETING TATTERING (16), AERATING GNATTIER\n",
      "    GRATINEE INTERAGE TREATING (15), GRANITE GRATINE INGRATE TANGIER TEARING (14)\n",
      "35 words 270 points {AEGINR}: REARRANGING (11), ENGRAINING GANGRENING REENGAGING (10), GARNERING GREGARINE\n",
      "    REEARNING REGAINING REGEARING (9), AGREEING ANEARING ANGERING ARGININE ENRAGING GRAINIER REGAINER (8), AGINNER\n",
      "    ANERGIA ANGRIER EARNING EARRING ENGRAIN GEARING GRAINER GRANNIE NAGGIER NEARING RANGIER REARING REGINAE (7),\n",
      "    EARING GAINER REAGIN REGAIN REGINA (6)\n",
      " 5 words 34 points {AEGIRT}: AIGRETTE IRRIGATE (8), AIGRET GAITER TRIAGE (6)\n",
      "13 words 94 points {AEGNRT}: REGENERATE (10), GENERATE TEENAGER (8), GRANTEE GRANTER GREATEN NEGATER REAGENT\n",
      "    REGNANT REGRANT TANAGER (7), ARGENT GARNET (6)\n",
      "30 words 232 points {AEINRT}: ENTERTAINER (11), INTENERATE REINITIATE (10), ENTERTAIN ENTRAINER ITINERANT\n",
      "    ITINERATE (9), ATTAINER INERRANT INERTIAE REATTAIN RETAINER RETIRANT TRIENNIA (8), ARENITE ENTRAIN INERTIA\n",
      "    INTREAT ITERANT NATTIER NITRATE RETINAE RETRAIN TERRAIN TERTIAN TRAINEE TRAINER (7), RATINE RETAIN RETINA (6)\n",
      "21 words 167 points {AGINRT}: INGRATIATING (12), IRRIGATING IRRITATING (10), INTRIGANT NARRATING NITRATING\n",
      "    TITRATING (9), ATTIRING GRANTING TRAINING TRIAGING (8), AIRTING GRANITA GRATING RANTING RATTING TARRING\n",
      "    TARTING (7), GRATIN RATING TARING (6)\n",
      "26 words 218 points {EGINRT}: REINTERRING (11), REENTERING REGREETING REGRETTING REIGNITING TRIGGERING (10),\n",
      "    GETTERING INTERNING INTERRING RETINTING TEETERING TENTERING TITTERING (9), ENTERING GREETING REIGNITE RETIRING\n",
      "    (8), GITTERN IGNITER INTEGER RENTING RETTING RINGENT TIERING TREEING (7), ENGIRT (6)\n",
      "18 words 120 points {AEGNR}: GREENGAGE REARRANGE (9), ARRANGER GANGRENE REENGAGE (8), ARRANGE ENGAGER GRANGER (7),\n",
      "    ENRAGE GANGER GARNER GENERA GRANGE NAGGER RANGER (6), ANGER RANGE REGNA (5)\n",
      "19 words 123 points {AEGRT}: REAGGREGATE (11), AGGREGATE (9), RETARGET (8), ETAGERE GREATER REGATTA REGRATE (7),\n",
      "    ERGATE GARGET GARRET GARTER GRATER TAGGER TARGET (6), GRATE GREAT RETAG TARGE TERGA (5)\n",
      " 3 words 19 points {AEINR}: RAINIER (7), INANER NARINE (6)\n",
      "20 words 135 points {AEIRT}: REITERATE (9), IRRITATE RETIARII TERRARIA (8), ARIETTA ARIETTE ATTRITE ITERATE\n",
      "    RATTIER TARRIER TATTIER TEARIER TITRATE (7), ARTIER ATTIRE IRATER RATITE (6), IRATE RETIA TERAI (5)\n",
      "19 words 132 points {AENRT}: RETREATANT (10), REENTRANT (9), ANTEATER NARRATER RATTENER (8), ENTRANT ENTREAT\n",
      "    NARRATE RATTEEN TERNATE TERRANE (7), ENTERA ERRANT NATTER NEATER RANTER RATTEN TANNER (6), ANTRE (5)\n",
      "19 words 138 points {AGINR}: ARRAIGNING INGRAINING (10), ARRANGING (9), AGRARIAN GARAGING GNARRING GRAINING (8),\n",
      "    ANGARIA ARRAIGN GARRING INGRAIN RAGGING RAINING RANGING (7), AIRING RAGING RARING (6), GARNI GRAIN (5)\n",
      " 1 word   5 points {AGIRT}: TRAGI (5)\n",
      " 1 word   5 points {AGNRT}: GRANT (5)\n",
      " 9 words 64 points {AINRT}: TRINITARIAN (11), ANTIARIN IRRITANT (8), ANTIAIR INTRANT TITRANT (7), ANTIAR (6),\n",
      "    RIANT TRAIN (5)\n",
      "24 words 186 points {EGINR}: REENGINEERING (13), ENGINEERING (11), REENGINEER REGREENING (10), GINGERING RENIGGING\n",
      "    RERIGGING (9), ENGINEER GREENIER GREENING REIGNING RENEGING (8), GINNIER GREEING GREENIE GRINNER REINING (7),\n",
      "    ERRING GINGER GINNER NIGGER RINGER (6), REIGN RENIG (5)\n",
      " 4 words 27 points {EGIRT}: GRITTIER (8), TERGITE TRIGGER (7), TIGER (5)\n",
      " 2 words 12 points {EGNRT}: GERENT REGENT (6)\n",
      "29 words 190 points {EINRT}: INTERNEE INTERTIE RENITENT RETINENE RETINITE (8), INTERNE NETTIER NITERIE NITRITE\n",
      "    NITTIER REINTER RENTIER TEENIER TENTIER TERRINE TINNIER (7), ENTIRE INTERN RETINE RETINT TINIER TINNER TINTER\n",
      "    TRIENE (6), INERT INTER NITER NITRE TRINE (5)\n",
      " 6 words 43 points {GINRT}: GRITTING TRIGGING (8), GIRTING RINGGIT TRINING (7), TIRING (6)\n",
      "17 words 84 points {AEGR}: ARREARAGE (9), EAGERER (7), GAGGER GARAGE RAGGEE REGEAR REGGAE (6), AGGER AGREE EAGER\n",
      "    EAGRE EGGAR GAGER RAGEE (5), AGER GEAR RAGE (1)\n",
      " 4 words 22 points {AEIR}: AERIER AIRIER (6), AERIE AIRER (5)\n",
      " 9 words 40 points {AENR}: EARNER NEARER REEARN (6), ANEAR ARENA RANEE RERAN (5), EARN NEAR (1)\n",
      "24 words 127 points {AERT}: RETREATER (9), TARTRATE (8), RETREAT TREATER (7), AERATE ERRATA RATTER RETEAR TARTER\n",
      "    TATTER TEARER TERRAE (6), ARETE EATER RATER REATA TARRE TATER TERRA TETRA TREAT (5), RATE TARE TEAR (1)\n",
      " 2 words  6 points {AGIR}: AGRIA (5), RAGI (1)\n",
      " 5 words 13 points {AGNR}: GNARR GRANA (5), GNAR GRAN RANG (1)\n",
      " 3 words 13 points {AGRT}: RAGTAG TAGRAG (6), GRAT (1)\n",
      " 4 words  8 points {AINR}: NAIRA (5), AIRN RAIN RANI (1)\n",
      " 5 words 21 points {AIRT}: ATRIA RIATA TIARA TRAIT (5), AIRT (1)\n",
      "10 words 50 points {ANRT}: TANTARA TARTANA (7), ARRANT RATTAN TANTRA TARTAN (6), ANTRA RATAN (5), RANT TARN (1)\n",
      " 3 words 17 points {EGIR}: GREIGE RIGGER (6), RERIG (5)\n",
      " 6 words 37 points {EGNR}: GREENER REGREEN RENEGER (7), RENEGE (6), GENRE GREEN (5)\n",
      " 7 words 45 points {EGRT}: REGRETTER (9), GREETER REGREET (7), GETTER REGRET (6), EGRET GREET (5)\n",
      " 4 words 17 points {EINR}: RENNIN (6), INNER RENIN (5), REIN (1)\n",
      "17 words 87 points {EIRT}: TITTERER (8), RETIREE RETIRER TERRIER (7), RETIRE RITTER TERRIT TITTER TRITER (6),\n",
      "    RETIE TITER TITRE TRIER TRITE (5), RITE TIER TIRE (1)\n",
      "19 words 104 points {ENRT}: ENTERER REENTER TERREEN TERRENE (7), ENTREE ETERNE NETTER RENNET RENTER RETENE TEENER\n",
      "    TENNER TENTER (6), ENTER RENTE TERNE TREEN (5), RENT TERN (1)\n",
      " 9 words 44 points {GINR}: GRINNING (8), GIRNING RIGGING RINGING RINNING (7), IRING (5), GIRN GRIN RING (1)\n",
      " 3 words  3 points {GIRT}: GIRT GRIT TRIG (1)\n",
      " 7 words 25 points {AER}: ARREAR REARER (6), AREAE RARER (5), AREA RARE REAR (1)\n",
      " 2 words  2 points {AGR}: AGAR RAGA (1)\n",
      " 2 words  2 points {AIR}: ARIA RAIA (1)\n",
      " 5 words 24 points {ART}: RATATAT (7), TARTAR (6), ATTAR TATAR (5), TART (1)\n",
      " 4 words 15 points {EGR}: GREEGREE (8), EGGER (5), EGER GREE (1)\n",
      " 2 words 11 points {EIR}: EERIER (6), EERIE (5)\n",
      " 1 word   1 point  {ENR}: ERNE (1)\n",
      " 7 words 27 points {ERT}: TEETER TERETE TERRET TETTER (6), RETE TREE TRET (1)\n",
      " 2 words  7 points {GIR}: GRIGRI (6), GRIG (1)\n"
     ]
    }
   ],
   "source": [
    "report()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 'S' Words\n",
    "\n",
    "What if we allowed honeycombs to have an 'S' in them? I'll make a new word list that doesn't exclude the 'S'-words, and report on it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Top Honeycomb(letters='AEINRST', center='E'): 1,179 words 8,681 points from a 98,141 word list:\n",
      "\n",
      "86 pangrams 1,381 points {AEINRST}: ENTERTAINERS INTERSTRAINS STRAITNESSES (19), INTENERATES INTERSTATES\n",
      "    INTERSTRAIN IRATENESSES ITINERARIES REINITIATES RESTRAINERS TRISTEARINS (18), ANTISTRESS ARTINESSES ENTERTAINS\n",
      "    ENTRAINERS ENTREATIES ERRANTRIES INTERSTATE INTRASTATE ITINERANTS ITINERATES REINSTATES RESISTANTS RESTRAINER\n",
      "    RESTRAINTS SANITARIES STANNARIES STRAITNESS TANISTRIES TEARSTAINS TENANTRIES TRANSIENTS TRISTEARIN (17),\n",
      "    ARSENITES ATTAINERS INSTANTER IRATENESS REATTAINS REINSTATE RESINATES RESISTANT RESTRAINS RESTRAINT RETAINERS\n",
      "    RETIRANTS SEATRAINS STEARINES STRAINERS STRAITENS TANNERIES TEARSTAIN TERNARIES TRANSIENT (16), ANTISERA\n",
      "    ARENITES ARSENITE ARTINESS ENTRAINS INERTIAS INTREATS NITRATES RAINIEST RATANIES RESINATE RESTRAIN RETRAINS\n",
      "    RETSINAS SEATRAIN STAINERS STEARINE STEARINS STRAINER STRAITEN TERRAINS TERTIANS TRAINEES TRAINERS (15),\n",
      "    ANESTRI ANTSIER NASTIER RATINES RETAINS RETINAS RETSINA STAINER STEARIN (14)\n",
      "16 words 124 points {AEINRS}: AIRINESSES (10), ANSERINES INSNARERS SIRENIANS (9), AIRINESS ANSERINE INSNARER\n",
      "    INSNARES SIRENIAN (8), ARSINES INSANER INSNARE SENARII SIERRAN (7), ARISEN ARSINE (6)\n",
      "30 words 232 points {AEINRT}: ENTERTAINER (11), INTENERATE REINITIATE (10), ENTERTAIN ENTRAINER ITINERANT\n",
      "    ITINERATE (9), ATTAINER INERRANT INERTIAE REATTAIN RETAINER RETIRANT TRIENNIA (8), ARENITE ENTRAIN INERTIA\n",
      "    INTREAT ITERANT NATTIER NITRATE RETINAE RETRAIN TERRAIN TERTIAN TRAINEE TRAINER (7), RATINE RETAIN RETINA (6)\n",
      "80 words 713 points {AEINST}: INSATIATENESSES INSTANTANEITIES (15), INSATIATENESS INSTANTNESSES (13), ASSASSINATES\n",
      "    INNATENESSES INSTANTIATES (12), ASININITIES ASSASSINATE INSTANTIATE INSTANTNESS NASTINESSES NATTINESSES\n",
      "    TASTINESSES TATTINESSES (11), INNATENESS INSANITIES INTESTATES SENTENTIAE TITANESSES (10), ANISETTES ANTISENSE\n",
      "    ANTISTATE ASTATINES INANITIES INITIATES INSATIATE INSENSATE INTESTATE NASTINESS NATTINESS SANITATES SANITISES\n",
      "    SENTENTIA STANNITES TAENIASES TAENIASIS TASTINESS TATTINESS TETANISES TITANATES TITANITES (9), ANISETTE\n",
      "    ANTSIEST ASTATINE ENTASIAS ETESIANS INSANEST INSTATES ISATINES NASTIEST NATTIEST SANITATE SANITIES SANITISE\n",
      "    SATINETS SESTINAS STANINES STANNITE TENIASES TENIASIS TETANIES TETANISE TITANESS (8), ENTASIA ENTASIS ETESIAN\n",
      "    INANEST INSTATE ISATINE NASTIES SATINET SESTINA STANINE TAENIAS TANSIES TISANES (7), TENIAS TINEAS TISANE (6)\n",
      "60 words 473 points {AEIRST}: TRAITRESSES (11), ARTISTRIES REITERATES TERTIARIES (10), ARTERITIS ASSISTERS\n",
      "    IRRITATES SATIRISES SESTERTIA STARRIEST STRAITEST TRAITRESS TREATISES (9), ARIETTAS ARIETTES ARISTATE ARTERIES\n",
      "    ARTISTES ARTSIEST ASSISTER ASTERIAS ATRESIAS EATERIES ITERATES RARITIES RATTIEST SATIRISE SERIATES STARRIER\n",
      "    STRAITER STRIATES TARRIERS TARRIEST TARSIERS TEARIEST TITRATES TREATIES TREATISE TRISTATE (8), AERIEST AIRIEST\n",
      "    ARISTAE ARTIEST ARTISTE ARTSIER ASTERIA ATRESIA ATTIRES IRATEST RATITES SATIRES SERIATE STRIATE TARRIES\n",
      "    TARSIER TASTIER (7), AIREST SATIRE STRIAE TERAIS (6)\n",
      "40 words 336 points {AENRST}: EARNESTNESSES (13), EARNESTNESS RETREATANTS (11), ARRESTANTS EASTERNERS REENTRANTS\n",
      "    TARANTASES TARTNESSES (10), ANTEATERS ARRESTANT ARSENATES ASSENTERS EASTERNER NARRATERS RATTENERS SARSENETS\n",
      "    SERENATAS (9), ARSENATE ASSENTER EARNESTS ENTRANTS ENTREATS NARRATES RATTEENS SARSENET SERENATA SERENATE\n",
      "    TARTNESS TERRANES (8), EARNEST EASTERN ERRANTS NATTERS NEAREST RANTERS RATTENS TANNERS (7), ANTRES ASTERN\n",
      "    STERNA (6)\n",
      "70 words 582 points {EINRST}: SINISTERNESSES (14), ENTIRENESSES SINISTERNESS (12), ENTERITISES INERTNESSES\n",
      "    TRITENESSES (11), ENTIRENESS ENTIRETIES ETERNITIES INTERNISTS SERENITIES (10), ENTERITIS ETERNISES INERTNESS\n",
      "    INSERTERS INSETTERS INSISTERS INTERESTS INTERNEES INTERNIST INTERTIES REENTRIES REINSERTS RETINENES RETINITES\n",
      "    RETINITIS STERNITES TEENTSIER TRINITIES TRITENESS (9), ETERNISE INSERTER INSETTER INSISTER INTENSER INTEREST\n",
      "    INTERNES NITERIES NITRITES REINSERT REINTERS RENTIERS SENTRIES SINISTER STERNITE STINTERS TEENSIER TERRINES\n",
      "    TRIENTES (8), ENTIRES ENTRIES ESTRINS INSERTS INTERNS RETINES RETINTS SINTERS STINTER TINNERS TINTERS TRIENES\n",
      "    (7), ESTRIN INERTS INSERT INTERS NITERS NITRES SINTER TRIENS TRINES (6)\n",
      " 3 words  19 points {AEINR}: RAINIER (7), INANER NARINE (6)\n",
      "17 words 129 points {AEINS}: INSANENESSES (12), INANENESSES (11), EASINESSES INSANENESS (10), INANENESS (9),\n",
      "    EASINESS (8), ASININE NANNIES SANSEIS SIENNAS (7), ANISES INANES INSANE SANIES SANSEI SIENNA (6), ANISE (5)\n",
      "10 words  64 points {AEINT}: INITIATE TITANATE TITANITE (8), TAENIAE (7), INNATE TAENIA TENIAE (6), ENTIA TENIA\n",
      "    TINEA (5)\n",
      "17 words 106 points {AEIRS}: RERAISES (8), ARRISES RAISERS RERAISE SASSIER SIERRAS (7), AERIES AIRERS ARISES\n",
      "    EASIER RAISER RAISES SERAIS SIERRA (6), ARISE RAISE SERAI (5)\n",
      "20 words 135 points {AEIRT}: REITERATE (9), IRRITATE RETIARII TERRARIA (8), ARIETTA ARIETTE ATTRITE ITERATE\n",
      "    RATTIER TARRIER TATTIER TEARIER TITRATE (7), ARTIER ATTIRE IRATER RATITE (6), IRATE RETIA TERAI (5)\n",
      "15 words 112 points {AEIST}: SATIETIES STEATITES (9), SASSIEST SATIATES STEATITE TASTIEST TATTIEST (8), EASIEST\n",
      "    ETATIST SATIATE SIESTAS TASSIES TATTIES (7), SIESTA TASSIE (6)\n",
      "25 words 172 points {AENRS}: NEARNESSES RARENESSES (10), ENSNARERS (9), ENSNARER ENSNARES NEARNESS RARENESS\n",
      "    RENNASES (8), EARNERS ENSNARE REEARNS RENNASE SARSENS SNARERS (7), ANEARS ARENAS RANEES SARSEN SNARER SNARES\n",
      "    (6), EARNS NARES NEARS SANER SNARE (5)\n",
      "19 words 132 points {AENRT}: RETREATANT (10), REENTRANT (9), ANTEATER NARRATER RATTENER (8), ENTRANT ENTREAT\n",
      "    NARRATE RATTEEN TERNATE TERRANE (7), ENTERA ERRANT NATTER NEATER RANTER RATTEN TANNER (6), ANTRE (5)\n",
      "32 words 217 points {AENST}: NEATNESSES (10), SETENANTS (9), ANATASES ANTENNAS NEATNESS SENSATES SETENANT TANNATES\n",
      "    (8), ANATASE ANNATES ASSENTS ENTASES NEATENS NEATEST SATEENS SENATES SENSATE TANNEST TENANTS (7), ANENST\n",
      "    ANSATE ASSENT ENATES SANEST SATEEN SENATE STANES (6), ANTES ETNAS NATES NEATS STANE (5)\n",
      "85 words 604 points {AERST}: RETREATERS (10), ARRESTEES ARRESTERS ASSERTERS ATTESTERS ESTERASES REARRESTS\n",
      "    REASSERTS STEARATES TARTRATES (9), ARRESTEE ARRESTER ASSERTER ATTESTER ESTERASE ESTREATS REARREST REASSERT\n",
      "    RESTARTS RESTATES RETASTES RETREATS SERRATES STARTERS STEARATE STRASSES STRETTAS TERRASES TESSERAE TREATERS\n",
      "    (8), AERATES ARRESTS ASSERTS EASTERS ERRATAS ESTREAT RASTERS RATTERS RESEATS RESTART RESTATE RETASTE RETEARS\n",
      "    SEAREST SEATERS SERRATE STARERS STARETS STARTER STATERS STRETTA TARTEST TASTERS TATTERS TEARERS TEASERS\n",
      "    TESSERA TRASSES (7), ARETES ARREST ASSERT ASTERS EASTER EATERS RAREST RASTER RATERS REATAS RESEAT SEATER\n",
      "    STARER STARES STATER TARRES TASTER TATERS TEASER TERRAS TETRAS TREATS (6), ASTER RATES STARE TARES TEARS (5)\n",
      "29 words 184 points {EINRS}: EERINESSES (10), EERINESS ESERINES (8), ESERINE RENNINS RERISEN RINSERS SEINERS\n",
      "    SEREINS SERINES SINNERS (7), INNERS NEREIS RENINS RESINS RINSER RINSES SEINER SEREIN SERINE SERINS SINNER\n",
      "    SIRENS (6), REINS RESIN RINSE RISEN SERIN SIREN (5)\n",
      "29 words 190 points {EINRT}: INTERNEE INTERTIE RENITENT RETINENE RETINITE (8), INTERNE NETTIER NITERIE NITRITE\n",
      "    NITTIER REINTER RENTIER TEENIER TENTIER TERRINE TINNIER (7), ENTIRE INTERN RETINE RETINT TINIER TINNER TINTER\n",
      "    TRIENE (6), INERT INTER NITER NITRE TRINE (5)\n",
      "58 words 469 points {EINST}: INTENSENESSES (13), INTENTNESSES (12), INTENSENESS INTENSITIES TESTINESSES\n",
      "    TINNINESSES (11), INSENTIENT INTENTNESS INTESTINES SENSITISES TEENTSIEST TININESSES (10), EINSTEINS INSISTENT\n",
      "    INTENSEST INTESTINE NINETEENS SENSITISE SENTIENTS TEENSIEST TENSITIES TESTINESS TINNINESS (9), EINSTEIN\n",
      "    ENTITIES NETTIEST NINETIES NITTIEST SENTIENT SESTINES SIENITES TEENIEST TENNISES TENNISTS TENTIEST TININESS\n",
      "    TINNIEST (8), INTENSE INTENTS INTINES SENNITS SESTINE SIENITE TENNIES TENNIST TINIEST (7), INSETS SENITI\n",
      "    SENNIT SITTEN STEINS TENNIS (6), INSET NEIST NITES SENTI STEIN TINES (5)\n",
      "38 words 262 points {EIRST}: RESISTERS TITTERERS TRESSIEST (9), IRITISES RESISTER RETIREES RETIRERS STIRRERS\n",
      "    TERRIERS TRESSIER (8), EERIEST RESISTS RESITES RETIRES RETRIES RITTERS SISTERS SITTERS STIRRER STRETTI TERRIES\n",
      "    TERRITS TESTIER TITTERS TRITEST (7), RESIST RESITE RETIES SISTER SITTER TITERS TITRES TRIERS TRISTE (6), RITES\n",
      "    TIERS TIRES TRIES (5)\n",
      "35 words 246 points {ENRST}: STERNNESSES TERSENESSES (11), STERNNESS TERSENESS (9), ENTERERS REENTERS SERENEST\n",
      "    STERNEST TERREENS TERRENES (8), ENTREES NESTERS NETTERS RENESTS RENNETS RENTERS RESENTS RETENES STERNER\n",
      "    TEENERS TENNERS TENTERS (7), ENTERS NESTER RENEST RENTES RESENT STERNS TENSER TERNES TREENS (6), NERTS RENTS\n",
      "    STERN TERNS (5)\n",
      " 2 words  11 points {AEIN}: NANNIE (6), INANE (5)\n",
      " 4 words  22 points {AEIR}: AERIER AIRIER (6), AERIE AIRER (5)\n",
      " 2 words  13 points {AEIS}: SASSIES (7), EASIES (6)\n",
      " 1 word    6 points {AEIT}: TATTIE (6)\n",
      " 9 words  40 points {AENR}: EARNER NEARER REEARN (6), ANEAR ARENA RANEE RERAN (5), EARN NEAR (1)\n",
      " 9 words  46 points {AENS}: SANENESSES (10), SANENESS (8), SENNAS (6), ANSAE SANES SENNA SENSA (5), ANES SANE (1)\n",
      "13 words  63 points {AENT}: ANTENNAE (8), ANTENNA TANNATE (7), ATTENT NEATEN TENANT (6), ANENT ANTAE EATEN ENATE\n",
      "    (5), ANTE ETNA NEAT (1)\n",
      "26 words 121 points {AERS}: REASSESSES (10), REASSESS (8), ARREARS ERASERS REARERS (7), ERASER ERASES RASERS\n",
      "    SAREES SEARER (6), AREAS ARSES ERASE RARES RASER RASES REARS SAREE SEARS (5), ARES ARSE EARS ERAS RASE SEAR\n",
      "    SERA (1)\n",
      "24 words 127 points {AERT}: RETREATER (9), TARTRATE (8), RETREAT TREATER (7), AERATE ERRATA RATTER RETEAR TARTER\n",
      "    TATTER TEARER TERRAE (6), ARETE EATER RATER REATA TARRE TATER TERRA TETRA TREAT (5), RATE TARE TEAR (1)\n",
      "35 words 164 points {AEST}: TESTATES (8), ATTESTS ESTATES TASSETS TESTATE (7), ASSETS ATTEST ESTATE STASES STATES\n",
      "    TASSES TASSET TASTES TEASES TESTAE (6), ASSET EASTS SATES SEATS SETAE STATE TASSE TASTE TATES TEASE TEATS\n",
      "    TESTA (5), ATES EAST EATS ETAS SATE SEAT SETA TEAS (1)\n",
      " 4 words  17 points {EINR}: RENNIN (6), INNER RENIN (5), REIN (1)\n",
      "10 words  53 points {EINS}: NINNIES SEISINS (7), NISEIS SEINES SEISIN (6), NINES NISEI SEINE SINES (5), SINE (1)\n",
      " 6 words  28 points {EINT}: NINETEEN (8), INTENT INTINE TENTIE (6), NITE TINE (1)\n",
      "20 words 101 points {EIRS}: RERISES SEISERS SERRIES SIRREES SISSIER (7), IRISES RERISE RISERS SEISER SERIES SIREES\n",
      "    SIRREE (6), RISER RISES SIREE SIRES (5), IRES REIS RISE SIRE (1)\n",
      "17 words  87 points {EIRT}: TITTERER (8), RETIREE RETIRER TERRIER (7), RETIRE RITTER TERRIT TITTER TRITER (6),\n",
      "    RETIE TITER TITRE TRIER TRITE (5), RITE TIER TIRE (1)\n",
      " 8 words  41 points {EIST}: SISSIEST TESTIEST (8), TITTIES (7), TESTIS (6), SITES STIES (5), SITE TIES (1)\n",
      "12 words  80 points {ENRS}: SERENENESSES (12), SERENENESS (10), SNEERERS (8), SERENER SERENES SNEERER (7), RESEEN\n",
      "    SERENE SNEERS (6), ERNES SNEER (5), ERNS (1)\n",
      "19 words 104 points {ENRT}: ENTERER REENTER TERREEN TERRENE (7), ENTREE ETERNE NETTER RENNET RENTER RETENE TEENER\n",
      "    TENNER TENTER (6), ENTER RENTE TERNE TREEN (5), RENT TERN (1)\n",
      "18 words  94 points {ENST}: TENSENESSES (11), TENSENESS (9), ENTENTES (8), SENNETS TENSEST (7), SENNET TENETS\n",
      "    TENSES (6), NESTS NETTS SENTE TEENS TENSE TENTS (5), NEST NETS SENT TENS (1)\n",
      "44 words 266 points {ERST}: RESTRESSES (10), RESETTERS (9), RESETTER RESTRESS STEERERS STRESSES (8), RESTERS\n",
      "    RETESTS SETTERS STEERER STREETS STRETTE TEETERS TERRETS TERSEST TESTERS TETTERS TRESSES (7), ESTERS REESTS\n",
      "    RESETS RESTER RETEST SEREST SETTER STEERS STERES STREET STRESS TERSER TESTER (6), ESTER REEST RESET RESTS\n",
      "    STEER STERE TERSE TREES TRESS TRETS (5), ERST REST RETS (1)\n",
      " 7 words  25 points {AER}: ARREAR REARER (6), AREAE RARER (5), AREA RARE REAR (1)\n",
      " 8 words  33 points {AES}: ASSESSES (8), ASSESS SASSES (6), ASSES EASES (5), ASEA EASE SEAS (1)\n",
      " 2 words   2 points {AET}: TATE TEAT (1)\n",
      " 1 word    1 point  {EIN}: NINE (1)\n",
      " 2 words  11 points {EIR}: EERIER (6), EERIE (5)\n",
      " 7 words  35 points {EIS}: SISSIES (7), ISSEIS SEISES (6), ISSEI SEISE SISES (5), SEIS (1)\n",
      " 1 word    6 points {EIT}: TITTIE (6)\n",
      " 1 word    1 point  {ENR}: ERNE (1)\n",
      " 6 words  20 points {ENS}: NESSES SENSES (6), SENSE (5), NESS SEEN SENE (1)\n",
      " 5 words  15 points {ENT}: ENTENTE (7), TENET (5), NETT TEEN TENT (1)\n",
      "13 words  52 points {ERS}: SEERESSES (9), SEERESS (7), RESEES (6), ERSES RESEE SEERS SERER SERES (5), ERRS REES\n",
      "    SEER SERE SERS (1)\n",
      " 7 words  27 points {ERT}: TEETER TERETE TERRET TETTER (6), RETE TREE TRET (1)\n",
      "18 words  79 points {EST}: SESTETS SETTEES TESTEES TSETSES (7), SESTET SETTEE TESTEE TESTES TSETSE (6), SETTS\n",
      "    STETS TESTS (5), SETS SETT STET TEES TEST TETS (1)\n",
      " 1 word    1 point  {EN}: NENE (1)\n",
      " 3 words   7 points {ES}: ESSES (5), ESES SEES (1)\n"
     ]
    }
   ],
   "source": [
    "valid_letters.add('S') # Make 'S' a legal letter\n",
    "\n",
    "big_s = word_list(open(file).read())\n",
    "\n",
    "report(words=big_s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Allowing 'S' more than doubles the length of the word list and the score of the top honeycomb!\n",
    "\n",
    "Here are the highest-scoring honeycombs (with and without an S) with their stats and a pangram to remember them by:\n",
    "\n",
    "<img src=\"http://norvig.com/honeycombs.png\" width=\"350\">\n",
    "<center>\n",
    "   537 words &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1,179 words \n",
    "    <br>50 pangrams  &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 86 pangrams\n",
    "    <br>3,898 points &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 8,681 points\n",
    "    <br> &nbsp;  &nbsp; RETAINING &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ENTERTAINERS\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Summary\n",
    "\n",
    "This notebook showed how to find the highest-scoring honeycomb, starting from a brute-force baseline approach (which we didn't actually run) and modifying the approach with three key improvements:\n",
    "\n",
    "1. **Brute Force Enumeration**: Compute total score for every honeycomb; return the highest-scoring.\n",
    "2. **Pangram Lettersets**: Compute total score for  honeycombs that are pangram lettersets (with all 7 centers).\n",
    "3. **Points Table**: Precompute score for each letterset; for each candidate honeycomb, sum  63 letter subset scores.\n",
    "4. **Branch and Bound**: Try all 7 centers only for lettersets that score better than the top score so far.\n",
    "\n",
    "The key ideas paid off in efficiency improvements:\n",
    "\n",
    "\n",
    "|Approach|Honeycombs|Reduction|`total_score` Time|Speedup|Overall  Time|Overall Speedup|\n",
    "|--------|----------|--------|----|---|---|---|\n",
    "|1. **Brute Force Enumeration**|3,364,900|——|9000 microseconds|——|8.5 hours (est.)|——|\n",
    "|2. **Pangram Lettersets**|55,902|60×|9000 microseconds|——|500 sec (est.)|60×|\n",
    "|3. **Points Table**|55,902|——|26 microseconds|300×|1.6 seconds|20,000×|\n",
    "|4. **Branch and Bound**|8,084 |7×|26 microseconds|——|0.35 seconds|90,000×|\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}