{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
Peter Norvig
January 2020
\n", "\n", "# Boggle\n", "\n", "![Boggle Board](https://www.puzzle-words.com/art/og/puzzle-words.png)\n", "\n", "The word game **Boggle** ([rules here](https://howdoyouplayit.com/boggle-rules-play-boggle/)) is a fun game to play with friends, but not so fun to play against a computer. It is all too simple for a computer to find *all* the valid words in a Boggle board (such as the one shown above) in less than a second. We'll see how to do that as a straightforward programming exercise. Then we'll consider a more challenging problem:\n", "\n", "# Inverse Boggle\n", "\n", "The goal of Inverse Boggle is to find an arrangement of letters on a board that scores the most points. It is called Inverse Boggle because, instead of \"give me a board and I'll find the words\" it is \"give me the word list and I'll find a board with lots of words.\" \n", "\n", "It is not feasible to try all possible boards ($26^{5 \\times 5} \\approx 10^{35}$ possible $5 \\times 5$ boards), so we will use **hill-climbing**, a heuristic search technique that does not guarantee finding an optimal solution.\n", "\n", "# The Boggle Board\n", "\n", "I'll represent a board as a dict where the keys are `(x, y)` coordinates of squares, and the values are individual letters (except that `QU` occupies a single square). A board will also hold:\n", "- `board.squares`: a list of squares in row-major order.\n", "- `board.neighbors`: a dict of `{square: neighbors}`. \n", "- `board.format`: a string format method to print the board nicely using `__repr__`.\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import random\n", "import itertools\n", "\n", "class Board(dict):\n", " \"\"\"A board is a dict of {(x, y): L}.\"\"\"\n", " def __init__(self, letters=None):\n", " letters = [('QU' if L == 'Q' else L) for L in letters if 'A' <= L <= 'Z']\n", " n = int(len(letters) ** 0.5)\n", " self.squares = [(x, y) for y in range(n) for x in range(n)]\n", " self.neighbors = {s: neighbors(s, n) for s in self.squares}\n", " self.format = ('\\n'.join(['{:2s}' * n] * n)).format\n", " self.update(zip(self.squares, letters))\n", " \n", " def __repr__(self): \n", " return self.format(*(self[s].capitalize() for s in self.squares))\n", " \n", "def neighbors(square, n) -> list:\n", " \"\"\"All the squares that neighbor a square.\"\"\"\n", " (x, y) = square\n", " return [(x+dx, y+dy) for dx in (-1, 0, 1) for dy in (-1, 0, 1)\n", " if 0 <= x+dx < n and 0 <= y+dy < n and not (dx == dy == 0)]" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "P U Z Z L \n", "W O R D E \n", "B O G G L \n", "S E A R C \n", "F I N D H " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "board = Board('PUZZL WORDE BOGGL SEARC FINDH')\n", "board" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The word list and scoring\n", "\n", "We'll read in a word list, convert to uppercase, and keep all the words that are three letters or more. The variable `WORDS` holds the set of valid words. The function `total_score` computes the total score of a set of words. It does not take a board as an argument, and so it does not check if the words can actually be made on the board. It does check that the words are in the `WORDS` list of valid words." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 172820 enable1.txt\r\n" ] } ], "source": [ "! [ -e enable1.txt ] || curl -O http://norvig.com/ngrams/enable1.txt\n", "! wc -w enable1.txt" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "172724" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def total_score(found, scores=[0, 0, 0, 1, 1, 2, 3, 5] + [11] * 99) -> int:\n", " \"\"\"The total score for the words found, according to the rules.\"\"\"\n", " return sum([scores[len(w)] for w in found & WORDS])\n", "\n", "WORDS = {w for w in open('enable1.txt').read().upper().split() if len(w) >= 3}\n", "len(WORDS)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Finding all the words in a board\n", "\n", "The strategy for finding words:\n", "- A word could start at any of the squares on the board, so consider each one.\n", "- At each square, form a **path** consisting of just that square (for example, the top left square forms the path `[(0, 0)]`) and a **prefix string** consisting of the letters visited in the path; for this one-square path on `board` (the one with `PUZZLE` in it), the prefix string would be `'P'`. On a 5 by 5 board we would have 25 initial paths of length one.\n", "- Now continue each path:\n", " - If the prefix string is a word, record it: add it to the set `found`.\n", " - If the prefix string is a prefix of some word in the dictionary, continue the path by visiting each neighboring square that has not already been visited in the path. For example, in `board`, the path `[(0, 0)]` with prefix `'P'` would be continued to three neighbors:\n", " - `[(0, 0), (1, 0)]`, `'PU'`: here `'PU'` is a prefix of a word so continue this path.\n", " - `[(0, 0), (0, 1)]`, `'PW'`: here `'PW'` is not a prefix of any word so do **not** continue this path.\n", " - `[(0, 0), (1, 1)]`, `'PO'`: here `'PO'` is a prefix of a word so continue this path.\n", " - One continuation of `'PO'` is as follows:\n", " - `[(0, 0), (1, 1), (0, 1)]`, `'POW'`: here `'POW'` is both a word and a prefix, so record it and continue.\n", "- We can precompute the set of all `PREFIXES` of all words in the word list." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def word_prefixes(words) -> set:\n", " \"The set of all non-empty, non-full-word prefixes of each word in a word list.\"\n", " return {word[:i] for word in words for i in range(1, len(word))}\n", "\n", "PREFIXES = word_prefixes(WORDS) # Precompute this once.\n", "\n", "def find_words(board, words=WORDS, prefixes=PREFIXES):\n", " \"\"\"Find all words in a Boggle board by recursively continuing paths that are prefixes of words.\"\"\"\n", " found = set() # Accumulate the words found in the set `found`\n", " def continue_path(path, prefix):\n", " if prefix in words:\n", " found.add(prefix)\n", " if prefix in prefixes:\n", " for s in board.neighbors[path[-1]]:\n", " if s not in path:\n", " continue_path(path + [s], prefix + board[s])\n", " # Body of find_words: Start paths from each of the squares on the board\n", " for s in board.squares:\n", " continue_path([s], board[s])\n", " return found" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many words can we find? And how long does it take?" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 4.39 ms, sys: 43 µs, total: 4.43 ms\n", "Wall time: 4.5 ms\n" ] }, { "data": { "text/plain": [ "252" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time len(find_words(board))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can we see the words and summarize them? " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "252 words found:\n", "AGE AGED AGES AGGRO AGGROS AGO AIN AIS AND ANE ANES ANI ANIS ANISE ARC ARCH ARGLE ARGLED\n", "BEAD BEAGLE BEAN BEAR BEARD BEG BEGAN BEGGAR BEGGED BEGROAN BEN BEND BOA BOAR BOARD BOG\n", "BOGAN BOGGED BOGGLE BOGGLED BOO BOOR BOOS BOP BORDEL BOS BOURG BOW CRAG CRAGGED CRANE\n", "CRANES DAG DAGGLE DAGGLED DAGO DAGOES DAGOS DAIS DARN DEGAGE DEL DRAG DRAGGED DRAGGLE\n", "DRAGGLED DRAIN DROOP DROP EAGLE EAR EARL EARN EDGE EDGES EFS EGAD EGG EGGAR EGGED EGO EGOS\n", "ELD END ENDARCH ENRAGE ENRAGED EOSIN FEAR FEN FENAGLE FENAGLED FEND FIAR FIE FIEND FIN\n", "FINAGLE FINAGLED FIND FINE FINES GAD GAE GAEN GAES GAG GAGE GAGED GAGES GAIN GAN GANE\n", "GANEF GANEFS GAR GARGLE GARGLED GARNI GEAR GED GEL GELD GEN GLED GOA GOAD GOB GOBO GOBOES\n", "GOBOS GOBS GOES GOO GOOP GOOS GOOSE GOR GORGE GORGED GOURD GOURDE GRAD GRAIN GRAN GRAND\n", "GROAN GROG GROUP GROW IFS INARCH LED LEDGE LEDGES LEG LEZ NAE NAG NAGGED NAIF NAIFS NAOS\n", "NARC NARD NEAR NEB NEBS NEIF NEIFS OAR OBE OBES OBOE OBOES OES ORGAN ORGANISE ORZO OSE OUR\n", "POOR POROSE POUR POW PUR PURGE PURGED PURGES PUZZLE PUZZLED RAD RAG RAGE RAGED RAGES\n", "RAGGED RAGGLE RAIN RAISE RAN RAND RANI RANIS ROAD ROAN ROAR ROB ROBE ROBES ROBS ROE ROES\n", "ROOSE ROSE ROSIN ROUP ROW SEA SEAR SEARCH SEG SEGGAR SEGO SEI SEIF SEN SEND SIN SINE SOAR\n", "SOB SOGGED SORD SORGO SOW UPO URD URGE URGED URGES WOAD WOE WOES WOG WOO WOOS WOP WORD WOS\n", "WURZEL ZED ZOO ZOOS\n", "Score of 449 for 252 words on board:\n", "P U Z Z L \n", "W O R D E \n", "B O G G L \n", "S E A R C \n", "F I N D H \n" ] } ], "source": [ "import textwrap\n", "\n", "def report(board, verbose=True):\n", " found = find_words(board, WORDS, PREFIXES)\n", " if verbose:\n", " print(f'{len(found)} words found:') \n", " print(textwrap.fill(' '.join(sorted(found)), width=90))\n", " print(f'Score of {total_score(found)} for {len(found)} words on board:\\n{board}')\n", " \n", "report(board)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's test that the `Q` works, and that a 4x4 board works:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "125 words found:\n", "ANE ANES ANEW ANSWER ANSWERS AWE AWES AWN AWNS AWRY ENOSIS ENS EON EONS ERS ESES ESS ESSES\n", "ION IONS ITS NAE NAW NEIST NENE NEON NEONS NESS NESSES NEST NESTS NEW NEWNESS NEWS NOES\n", "NOESIS NOISE NOISES NOS NOSE NOSES NOSIER OES ONE ONENESS ONERY ONES ONS OSE OSES OSIER\n", "OSIERS QUA QUEAN QUEANS QUEST QUESTION QUESTIONER QUESTIONERS QUESTIONS QUESTS REI REIS\n", "RENEST RENESTS RES RESIST REST RESTS REWAN REWAX SEA SEI SEIS SEISE SEISES SEN SENE SENSE\n", "SENSES SER SERS SESSION SEW SEWAN SEWANS SEWN SEWS SIS SISES SIT SITS SNAW SNAWS SON SONE\n", "SONES SONS SOS STIES SWAN SWANS TIE TIER TIERS TIES TIS WAE WAENESS WAES WAN WANE WANES\n", "WANS WAX WAXY WEN WENS WEST WESTS WREN WRENS WREST WRESTS WRY\n", "Score of 245 for 125 words on board:\n", "QuE S T \n", "A N S I \n", "X W E O \n", "Y R S N \n" ] } ], "source": [ "report(Board('QEST ANSI XWEO YRSN'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Inverse Boggle\n", "\n", "I'll tackle the Inverse Boggle problem with a **hill-climbing** approach:\n", "- Start with some board.\n", "- Make a random change to some letter on the board.\n", "- Evaluate the score of the new board; if it is the best score so far, keep it.\n", "- If not, revert the change.\n", "- Repeat for a set number of iterations" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def boggle_hill_climbing(board, words=WORDS, prefixes=PREFIXES, repeat=500):\n", " \"\"\"Solve Inverse Boggle by hill-climbing: find a high-scoring board by\n", " starting with one and changing it.\"\"\"\n", " board = Board(board[s] for s in board.squares) # Copy board, so we don't mutate original\n", " best_score = total_score(find_words(board, words, prefixes))\n", " for _ in range(repeat):\n", " s, old_letter = mutate_boggle(board)\n", " new_score = total_score(find_words(board, words, prefixes))\n", " if new_score >= best_score:\n", " best_score = new_score\n", " else:\n", " board[s] = old_letter # Change back\n", " return board\n", "\n", "# The distribution of letters in the 16-cube version of the game\n", "LETTERS = 3 * 'AABCDEEEGHIILMNOOPRSTUY' + 'AADEFFIJKKLLNNQRSSTTUVVWWXZ'\n", "\n", "def mutate_boggle(board, letters=LETTERS):\n", " \"\"\"Make a random change to one letter in the board\"\"\"\n", " s = random.choice(board.squares)\n", " old_letter = board[s]\n", " board[s] = random.choice(letters)\n", " return s, old_letter\n", "\n", "def random_board(n=5, letters=LETTERS) -> Board:\n", " \"\"\"Return a random Boggle board.\"\"\"\n", " return Board(random.sample(letters, n * n))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's generate a random board and see if we can improve it:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "177 words found:\n", "ABIOTIC ABO ALE ALOOF ALT ALTO ATT ATTIC AUK BABE BABOO BABOOL BABU BAKE BAL BALE BALK BAT\n", "BATT BAUBEE BEE BEEP BIO BIOTIC BIPOD BOA BOAT BOD BOLA BOLE BOLT BOO BOOT BOP BOT BOTA\n", "BOTT BOTTLE BOY BUB BUBAL BUBALE BUBO BUOY CITOLA CITOLE DIOL DIT DITA DITTO DOPY DOT\n", "DOTTLE EAT EAU ELK FIB FOB FOOL FOOT FOOTBOY FOOTLE FOP HEP HIP HOB HOBO HOE HOOD HOOF\n", "HOOP HOOT HOP HOPE HOY HYP HYPO IODIC IOTA KAB KAE KALE KAT KEA KELOID KLOOF KUE LAB LAKE\n", "LAT LATI LEA LEAK LEK LEKU LOB LOBO LOO LOOF LOOP LOOPY LOOT LOT LOTA LOTI LOTIC LOTTO\n", "LOUP LOUPE OAK OAT OBI OBOE OBOL OBOLE ODIC OFT OLE OLEA OOH OOT OOTID OPE OTIC OTTO OUPH\n", "OUPHE PEE PEH PHI PIBAL POD POH POI POOD POOF POOH POOL POOP POT POTBOY POTTLE POTTO PUB\n", "PUKE TAB TABOO TABU TAE TAEL TAKE TALE TALK TAO TAU TAUPE TIC TIT TITLE TOD TOIT TOLA TOLE\n", "TOO TOOL TOOT TOOTLE TOP TOPI TOT TOTAL TOUPEE UKE UPO YIP YOB YOU\n", "Score of 234 for 177 words on board:\n", "C I T L E \n", "D T O A K \n", "F O B U B \n", "P I O P E \n", "I Y H E U \n" ] } ], "source": [ "board2 = random_board(5)\n", "report(board2)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Score of 5782 for 1577 words on board:\n", "V I S E N \n", "D E T T S \n", "R A L A R \n", "S I M E C \n", "E D N S H \n" ] } ], "source": [ "report(boggle_hill_climbing(board2), False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Impressive! We got roughly a ten-fold improvement in score after 500 repetitions.\n", "We can do the same with our original `board`:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "252 words found:\n", "AGE AGED AGES AGGRO AGGROS AGO AIN AIS AND ANE ANES ANI ANIS ANISE ARC ARCH ARGLE ARGLED\n", "BEAD BEAGLE BEAN BEAR BEARD BEG BEGAN BEGGAR BEGGED BEGROAN BEN BEND BOA BOAR BOARD BOG\n", "BOGAN BOGGED BOGGLE BOGGLED BOO BOOR BOOS BOP BORDEL BOS BOURG BOW CRAG CRAGGED CRANE\n", "CRANES DAG DAGGLE DAGGLED DAGO DAGOES DAGOS DAIS DARN DEGAGE DEL DRAG DRAGGED DRAGGLE\n", "DRAGGLED DRAIN DROOP DROP EAGLE EAR EARL EARN EDGE EDGES EFS EGAD EGG EGGAR EGGED EGO EGOS\n", "ELD END ENDARCH ENRAGE ENRAGED EOSIN FEAR FEN FENAGLE FENAGLED FEND FIAR FIE FIEND FIN\n", "FINAGLE FINAGLED FIND FINE FINES GAD GAE GAEN GAES GAG GAGE GAGED GAGES GAIN GAN GANE\n", "GANEF GANEFS GAR GARGLE GARGLED GARNI GEAR GED GEL GELD GEN GLED GOA GOAD GOB GOBO GOBOES\n", "GOBOS GOBS GOES GOO GOOP GOOS GOOSE GOR GORGE GORGED GOURD GOURDE GRAD GRAIN GRAN GRAND\n", "GROAN GROG GROUP GROW IFS INARCH LED LEDGE LEDGES LEG LEZ NAE NAG NAGGED NAIF NAIFS NAOS\n", "NARC NARD NEAR NEB NEBS NEIF NEIFS OAR OBE OBES OBOE OBOES OES ORGAN ORGANISE ORZO OSE OUR\n", "POOR POROSE POUR POW PUR PURGE PURGED PURGES PUZZLE PUZZLED RAD RAG RAGE RAGED RAGES\n", "RAGGED RAGGLE RAIN RAISE RAN RAND RANI RANIS ROAD ROAN ROAR ROB ROBE ROBES ROBS ROE ROES\n", "ROOSE ROSE ROSIN ROUP ROW SEA SEAR SEARCH SEG SEGGAR SEGO SEI SEIF SEN SEND SIN SINE SOAR\n", "SOB SOGGED SORD SORGO SOW UPO URD URGE URGED URGES WOAD WOE WOES WOG WOO WOOS WOP WORD WOS\n", "WURZEL ZED ZOO ZOOS\n", "Score of 449 for 252 words on board:\n", "P U Z Z L \n", "W O R D E \n", "B O G G L \n", "S E A R C \n", "F I N D H \n" ] } ], "source": [ "report(board)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Score of 5690 for 1511 words on board:\n", "L P C A P \n", "N A R O M \n", "D E T I L \n", "S E S E N \n", "R T N R G \n" ] } ], "source": [ "report(boggle_hill_climbing(board), False) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, roughly a ten-fold improvement. \n", "\n", "Now, let's start from a very high-scoring board, identified by Justin Boyan in [his Ph.D. thesis](https://www.ri.cmu.edu/publications/learning-evaluation-functions-for-global-optimization/), and see if we can improve it:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Score of 10112 for 2290 words on board:\n", "R S T C S \n", "D E I A E \n", "G N L R P \n", "E A T E S \n", "M S S I D \n" ] } ], "source": [ "boyan = Board('RSTCS DEIAE GNLRP EATES MSSID')\n", "report(boyan, False)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Score of 10112 for 2290 words on board:\n", "R S T C S \n", "D E I A E \n", "G N L R P \n", "E A T E S \n", "M S S I D \n" ] } ], "source": [ "report(boggle_hill_climbing(boyan), False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sadly, we were not able to make an improvement in 500 repetitions. But that certainly is no guarantee that `boyan` is the best possible board. Here are some things to try to find a better board; maybe you can implement some of them, or try some ideas of your own:\n", "\n", "- **Genetic algorithms**: We used **mutation** of a single board, but we could also consider **crossover** where we keep a pool of boards and take the first half of one board and combine it with the second half of another.\n", "- **Swaps**: We changed one letter at a time. But maybe there is no change of one letter that will improve a board, but there is a change involving two squares, either swapping them or mutating both of them.\n", "- **Incremental score calculation**: We modified just one square and then tried to find all the words from scratch. Would it be faster to keep track of which squares contributed to which words, and only re-do the calculations for the one changed square? This would probably involve infixes of words rather than prefixes. Perhaps by keeping track of what each square contributes, we can make a better choice of which square to mutate.\n", "- **Random restarts**: When is it best to continue searching from the current board, versus starting over from a new board?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }