{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
Peter Norvig
Feb 2022
Update Dec 2022
\n", "\n", "# Winning Wordle\n", "\n", "Years ago when I did a [notebook to solve **Jotto**](Jotto.ipynb), I never expected that a similar word game, [**Wordle**](https://www.nytimes.com/games/wordle/index.html), would become so popular. Congratulations to [Josh Wardle](https://en.wikipedia.org/wiki/Josh_Wardle) for making this happen. I added Wordle to my old [Jotto notebook](Jotto.ipynb#Wordle), and in this notebook, I answer three additional questions about Wordle. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# (1) If I win in two guesses, am I good or lucky?\n", "\n", "I see people brag that they won Wordle in two guesses. Does this really attest to their acumen? Or is it analagous to [Little Jack Horner](https://en.wikipedia.org/wiki/Little_Jack_Horner), who pulled out a plum and said *What a good boy am I!*, oblivious to the fact that his prosperity had more to do with the plethora of plums than with his prowess. \n", "\n", "What is your chance of winning on your second guess? It depends on your first guess, the reply you get from the first guess, and on how many other possible words have the same reply. For example, if your guess is `HELLO`, and the reply is `.GGGG` (a miss followed by 4 green squares), then the only possible target word is `CELLO` (note that `JELLO`™ is a proper noun, and thus is not in the word list). Less obviously, if the reply to `HELLO` is `.YGY.` (a miss, a yellow E, a green L, another yellow L, and another miss), then the only possible target word is `ALLEY`. So in either case, you are guaranteed to win on your second guess, because there is only one possible word remaining (assuming you can recognize the sole possible word). On the other hand, if the reply is `.....` (all misses), then there are 406 possible target words, and the chance of guessing right on the second guess is very low. Below we work out all the details and find that:\n", "\n", "**Answer: mostly lucky.** The first guess `BRUTE` has the most guaranteed two-guess wins: 40 out of the 2,309 words in the [word list](wordle-small.txt) (about 2%). The first guess `FILET` gives the most expected wins, 56.7, assuming you have average luck in guessing with the second guess.\n", "\n", "# (2) What is a guaranteed-winning strategy that I can easily memorize?\n", "\n", "[Christos Papadimitriou](https://www.engineering.columbia.edu/faculty/christos-papadimitriou) had the idea of using a radically simple strategy that takes very little thought or memorization: always choose the same first 4 guesses (regardless of the replies), and with the last two guesses, guess *any* word that is consistent with the replies so far. Christos came up with 4 words that win 99% of the time. I was able to refine that and find 4 guesses that *always* lead to a win (assuming you can recognize the consistent guesses).\n", "\n", "**Answer: Guess the four word set `{HANDY SWIFT GLOVE CRUMP}` first; then guess *any* word consistent with the replies.**\n", "\n", "The four-preset-words strategy described above is guaranteed to win in 6 or fewer guesses, but it averages about 5.\n", "\n", "# (3) What is a strategy that minimizes the number of guesses?\n", "\n", "If I'm not satisfied to win in 5 or 6 guesses, what strategies can I use (especially simple ones)?\n", "\n", "**Answers**:\n", "- Following the [optimal game tree](https://www.poirrier.ca/notes/wordle-optimal/) wins **100%** of the time with an average of **3.4** guesses; but that's not simple (see [visualizations](https://laurentlessard.com/solving-wordle/)).\n", "- As shown in my [other notebook](Jotto.ipynb), building a game tree greedily rather than optimally takes only seconds rather than hours of compute time, and wins **100%** with an average of **3.4** guesses. But it is still not a simple strategy.\n", "- Guessing the three-word set `{BLIND SHAME CRYPT}` first wins **99.8%** of the time with **4.2** average guesses.\n", "- Guessing the two-word set `{RETCH SNAIL}` first wins **99.4%** with **3.8** average guesses.\n", "- Guessing the one-word set `{RAISE}` first wins **98%** with **3.9** average guesses.\n", "- Guessing any consistent word at any time wins **98%** with **4.0** average guesses.\n", "\n", "## Implementation Details: Imports and Words\n", "\n", "I'll start with some basics: imports, and reading in the word list, `words`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from typing import List, Tuple, Dict, Counter, Iterable\n", "from collections import defaultdict\n", "from pathlib import Path\n", "from functools import lru_cache\n", "import pandas as pd\n", "import random \n", "\n", "Word = str # A type: a word is a string of five letters\n", "\n", "! [ -e wordle-small.txt ] || curl -O https://norvig.com/ngrams/wordle-small.txt\n", "\n", "words = Path('wordle-small.txt').read_text().split() # 2,309 target words" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Implementation Details: Replies\n", "\n", "Below are functions to compute the `reply_for` a single guess and the `replies_for` a sequence of guesses:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "Reply = str # A reply is five characters, e.g. '.Y..G'\n", "Green, Yellow, Miss = 'G', 'Y', '.' # Components of a Reply\n", "\n", "@lru_cache(None)\n", "def reply_for(guess: Word, target: Word) -> Reply: \n", " \"The five-character reply for this guess on this target in Wordle.\"\n", " # (1) Start by having each reply be either Green or Miss ...\n", " reply = [(Green if guess[i] == target[i] else Miss) for i in range(5)]\n", " # (2) Then change the replies that should be yellow\n", " counts = Counter(target[i] for i in range(5) if guess[i] != target[i])\n", " for i in range(5):\n", " if reply[i] == Miss and counts[guess[i]] > 0:\n", " counts[guess[i]] -= 1\n", " reply[i] = Yellow\n", " return ''.join(reply)\n", "\n", "def replies_for(guesses, target) -> Tuple[Reply]: \n", " \"\"\"A tuple of replies for a sequence of guesses.\"\"\"\n", " return tuple(reply_for(guess, target) for guess in guesses)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, if the target word is `'LILAC'`, here is the reply for the guess `'HELLO'`: " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'..GY.'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reply_for('HELLO', 'LILAC')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the replies for the two-guess sequence `['HELLO', 'DOLLY']`:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('..GY.', '..GY.')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "replies_for(['HELLO', 'DOLLY'], 'LILAC')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Implementation Details: Partitions\n", "\n", "We say that a sequence of guesses **partitions** a word list into **bins**:\n", "- `partitions(guesses, targets)` returns a dict where each key is a `replies_for(guesses, t)` for some target word `t`, and the corresponding value is the list of words that give the same replies for those guesses.\n", "- `bins(guesses, targets)` just gives the bins, without the replies." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "Bin = List[Word] # Type for a Bin of words\n", "Wordset = Tuple[Word, ...] # Type for a tuple of guess words\n", "Partition = Dict[Wordset, Bin] # Type for a Partition\n", "\n", "def partition(guesses, targets) -> Partition:\n", " \"\"\"Partition `targets` by replies to `guesses`: {(reply, ...): [word, ...]}\"\"\"\n", " dic = defaultdict(list)\n", " for target in targets:\n", " if target not in guesses:\n", " replies = replies_for(guesses, target)\n", " dic[replies].append(target)\n", " return dic\n", "\n", "def bins(guesses, targets) -> Iterable[Bin]:\n", " \"\"\"Partition `targets`, and return the bins without the replies.\"\"\"\n", " return partition(guesses, targets).values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see this in action, I'll define `few` as a list of a few words (9 to be exact):" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "few = 'HELLO WORLD DOLLY CELLO ALLEY HEAVY HEART ALLAY LILAC'.split()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is `['HELLO', 'DOLLY']` partitioning the few words:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defaultdict(list,\n", " {('...GY', 'YG.G.'): ['WORLD'],\n", " ('.GGGG', '.YGG.'): ['CELLO'],\n", " ('.YGY.', '..GYG'): ['ALLEY'],\n", " ('GG...', '....G'): ['HEAVY'],\n", " ('GG...', '.....'): ['HEART'],\n", " ('..GY.', '..GYG'): ['ALLAY'],\n", " ('..GY.', '..GY.'): ['LILAC']})" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "partition(['HELLO', 'DOLLY'], few)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All the bins have only one word in them, which means that after guessing `'HELLO'` and `'DOLLY'` we could always win the game in one more guess. On the other hand, the two guess sequence `['HELLO', 'WORLD']` leaves one bin with size 2, and thus we could *not* always win on the third guess; we'd have to be lucky in choosing between `'ALLAY'` or `'LILAC'` for our third guess." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defaultdict(list,\n", " {('..GGY', '.G.GY'): ['DOLLY'],\n", " ('.GGGG', '.Y.G.'): ['CELLO'],\n", " ('.YGY.', '...Y.'): ['ALLEY'],\n", " ('GG...', '.....'): ['HEAVY'],\n", " ('GG...', '..Y..'): ['HEART'],\n", " ('..GY.', '...Y.'): ['ALLAY', 'LILAC']})" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "partition(['HELLO', 'WORLD'], few)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# (1) If I win in two guesses, am I good or lucky?\n", "\n", "I will make a table of all possible first-guess words, where each row of the table contains:\n", "- The guess word.\n", "- The number of *guaranteed* second-guess wins (i.e., the number of partition bins of length 1).\n", "- The number of *expected* second-guess wins (i.e. the sum of the reciprocals of the partition bin sizes: if the first guess results in a bin with *n* words, there is a 1/*n* chance of guessing right on the second guess).\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def guess_row(guess) -> Tuple[Word, int, float, int]:\n", " \"\"\"A tuple of a (guess word, nuber of guaranteed wins, expected wins, maximum bin size).\"\"\"\n", " B = bins([guess], words)\n", " return (guess, sum(len(bin) == 1 for bin in B), sum(1 / len(bin) for bin in B))\n", "\n", "df = pd.DataFrame(map(guess_row, words), columns=['Guess', 'Wins', 'E(Wins)'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I'll sort the table by most guaranteed wins, and see that `'BRUTE'` devlivers the most guaranteed wins, 40:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
GuessWinsE(Wins)
291BRUTE4055.018860
354CHANT3952.746261
1223METRO3854.233278
1865SPILT3850.831156
988HORDE3750.137364
............
2299WRYLY614.640533
1513QUEUE59.939117
1509QUEER411.173628
1272MUMMY49.767194
1508QUEEN411.218357
\n", "

2309 rows × 3 columns

\n", "
" ], "text/plain": [ " Guess Wins E(Wins)\n", "291 BRUTE 40 55.018860\n", "354 CHANT 39 52.746261\n", "1223 METRO 38 54.233278\n", "1865 SPILT 38 50.831156\n", "988 HORDE 37 50.137364\n", "... ... ... ...\n", "2299 WRYLY 6 14.640533\n", "1513 QUEUE 5 9.939117\n", "1509 QUEER 4 11.173628\n", "1272 MUMMY 4 9.767194\n", "1508 QUEEN 4 11.218357\n", "\n", "[2309 rows x 3 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values('Wins', ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next I'll sort by expected wins, and see that `'FILET'` is best on this metric:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
GuessWinsE(Wins)
733FILET3656.659162
1372PARSE3556.173751
553DINER3755.975379
291BRUTE4055.018860
1223METRO3854.233278
............
1048JAZZY812.385749
1508QUEEN411.218357
1509QUEER411.173628
1513QUEUE59.939117
1272MUMMY49.767194
\n", "

2309 rows × 3 columns

\n", "
" ], "text/plain": [ " Guess Wins E(Wins)\n", "733 FILET 36 56.659162\n", "1372 PARSE 35 56.173751\n", "553 DINER 37 55.975379\n", "291 BRUTE 40 55.018860\n", "1223 METRO 38 54.233278\n", "... ... ... ...\n", "1048 JAZZY 8 12.385749\n", "1508 QUEEN 4 11.218357\n", "1509 QUEER 4 11.173628\n", "1513 QUEUE 5 9.939117\n", "1272 MUMMY 4 9.767194\n", "\n", "[2309 rows x 3 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values('E(Wins)', ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Still, 56.6 wins out of 2,309 targets is less than 2.5%, so if you win on the second guess, you're **very lucky**. \n", "\n", "(Note that `'MUMMY'` is the worst first guess on both metrics.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# (2) What is a guaranteed-winning strategy that I can easily memorize?\n", "\n", "[Christos Papadimitriou](https://www.engineering.columbia.edu/faculty/christos-papadimitriou) came up with a fixed set of 4 words, `{ARISE CLOMP THUNK BAWDY}`, that allow you to win almost all of the time if you use them as your first 4 guesses, and then guess any consistent word on your 5th and 6th guesses. The strategy would be much harder if we had to rack our brains to think of *all* the possible consistent words on the fifth and sixth guesses; it is critical to the simplicity of the strategy that as soon as you think of one consistent word you can guess it and always be guaranteed to win. The function `always_wins` verifies this property:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def always_wins(guesses, more=2, words=words) -> bool:\n", " \"\"\"After the sequence of guesses, are we guaranteed to always win in `more` consistent guesses?\n", " We are if every bin created by `guesses` has `more` words or less, or if guessing any word in \n", " the bin leads to an `always_win` with one fewer guess.\"\"\"\n", " return all(len(bin) <= more or \n", " more >= 1 and all(always_wins([guess], more - 1, bin) for guess in bin)\n", " for bin in bins(guesses, words))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "always_wins(('ARISE', 'CLOMP', 'THUNK', 'BAWDY')) # Christos's 4 words" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sadly, Christos's guess set does not always win. At the top of the notebook, I showed a guess set that does always win:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "always_wins(('HANDY', 'SWIFT', 'GLOVE', 'CRUMP'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is my approach for finding winning guess sets:\n", "1. The function `disjoint_guess_set` returns a collection of guess words with distinct letters (by depth-first exhaustive search).\n", "2. The function `random_disjoint_guess_sets` returns a list of `N` disjoint guess_sets. \n", "3. Only consider \"good words\": words with no repeated letters, and none of the rarest letters, `JQXZ`.\n", "4. `random_disjoint_guess_sets` shuffles the good words after each call to `disjoint_guess_set` to yield `N` different guess sets.\n", "5. Once we have a list" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "letters = ''.join(a for a, _ in Counter(''.join(words)).most_common()) # letters ordered by frequency\n", "letters22 = set(letters[:22]) # 22 most frequent letters, omitting `JQXZ`.\n", "\n", "def disjoint_guess_set(W, letters:set, good_words) -> Wordset:\n", " \"\"\"Tuple of `W` words made of `letters`, with no repeated letters.\"\"\"\n", " if W == 0:\n", " return ()\n", " for word in good_words:\n", " if letters.issuperset(word):\n", " others = disjoint_guess_set(W - 1, letters - set(word), good_words)\n", " if others is not None:\n", " return (word, *others)\n", " return None\n", "\n", "def random_disjoint_guess_sets(N, W=4, letters=letters22, words=words) -> Iterable[Wordset]:\n", " \"\"\"`N` random disjoint `W`-word guess sets made out of distinct `letters`.\"\"\"\n", " good_words = [w for w in words if len(set(w)) == 5 and letters.issuperset(w)]\n", " for _ in range(N):\n", " yield disjoint_guess_set(W, letters, good_words)\n", " random.shuffle(good_words)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('ABHOR', 'CLEFT', 'DUMPY', 'SWING'),\n", " ('STONE', 'PUDGY', 'WHARF', 'CLIMB'),\n", " ('HAREM', 'BLOWN', 'PUDGY', 'STICK'),\n", " ('OVARY', 'THUMB', 'FLECK', 'SWING'),\n", " ('GODLY', 'CRUMB', 'SHIFT', 'KNAVE')]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(random_disjoint_guess_sets(5, 4, letters22)) # 5 different 4-word guess sets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And finally, we can find the winners within the guess_sets:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1min 59s, sys: 1.87 s, total: 2min 1s\n", "Wall time: 2min 14s\n" ] }, { "data": { "text/plain": [ "[('CRUST', 'VEGAN', 'HOWDY', 'BLIMP'),\n", " ('WELSH', 'COMFY', 'GRUNT', 'VAPID'),\n", " ('SLURP', 'MIGHT', 'COVEN', 'BAWDY'),\n", " ('STAID', 'CRUMB', 'WOVEN', 'GLYPH'),\n", " ('DUTCH', 'BALMY', 'WOVEN', 'SPRIG'),\n", " ('CLANG', 'WISPY', 'THUMB', 'DROVE')]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time winners = list(filter(always_wins, random_disjoint_guess_sets(1000, 4, letters22)))\n", "\n", "winners" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "**It works!** There are lots of guess sets that win every time. (It looks like about 1% of the random disjoint guess sets always win.)\n", "\n", "There are some important **caveats**; the strategy only works under the following assumptions:\n", "- You always will be familiar with the word that is the solution (you won't complain that [BLOKE](https://nypost.com/2022/02/24/americans-are-outraged-over-too-british-wordle-answer/) is too British or [HOMER](https://www.dailyrecord.co.uk/lifestyle/dictionary-word-year-homer-wordle-28509250) is too American).\n", "- You can always come up with a guess word that is consistent with the replies so far.\n", "- You won't guess one of the uncommon words that are legal guesses in Wordle, but are not possible answers.\n", "- You are satisfied with winning in 5 or 6 guesses. If you aspire to win in 2, 3, or 4 guesses, this strategy is not for you.\n", "\n", "## (2b) How many guesses will it take?\n", "\n", "It is great that a simple strategy is guaranteed to win, but it will probably take 5 or 6 guesses. I'd like to quantify exactly how many guesses, on average. To do that, I'll start by defining the following:\n", "- `Frequency` is an alias for `Counter`, but the values might be fractions, not integers. For example, in the case where I need one guess 1/4 of the time and 2 guesses 3/4 of that time, I can represent that as `Frequency({1: 0.25, 2: 0.75})`.\n", "- `scores(guesses)` gives a frequency counter of `{score: number_of_times_we_get_that_score}`, summed over all possible target words, and averaged over all possible guess words in a bin. Sometimes the number of times will be a fraction, because we are averaging over guesses within a bin.\n", "- `average([freq, freq, ...])` gives the average of the frequency counters. " ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "Frequency = Counter # Type to hold {item: frequency} mapping; frequency need not be an integer\n", "\n", "def scores(guesses, so_far=0, targets=words) -> Frequency:\n", " \"\"\"A frequency counter of all possible scores from playing these guesses first,\n", " then playing any consistent guess (and averaging over the possible consistent guesses).\"\"\"\n", " result = Frequency(range(so_far + 1, so_far + 1 + len(guesses))) # Initial guesses might be right\n", " so_far += len(guesses)\n", " for bin in bins(guesses, targets):\n", " result += (Frequency([so_far + 1]) if len(bin) == 1 else \n", " average(scores([guess], so_far, bin) for guess in bin))\n", " return result\n", " \n", "def average(frequencies) -> Frequency:\n", " \"\"\"The mean of k Frequency counters.\"\"\"\n", " frequencies = list(frequencies)\n", " total = sum(frequencies, start=Frequency())\n", " k = len(frequencies)\n", " return {i: total[i] / k for i in total}\n", "\n", "assert average([Frequency({1: 0.25, 2: 0.75}), Frequency({1: 0.75, 2: 0.25, 3: 1})]) == {1: 0.5, 2: 0.5, 3: 0.5}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, `report` will print a report on how well an initial guess set scores:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "def report(guesses, show_bins=3):\n", " \"\"\"Print a report on these guesses: do they win or not, and why?\"\"\"\n", " def fmt(words): return \"{\" + \" \".join(words) + \"}\" # sequence -> str\n", " freq = scores(guesses)\n", " freq2 = {k: round(v) if v > 1 else round(v, 3) for k, v in freq.items()} \n", " p = sum(freq[s] for s in range(1, 7)) / len(words)\n", " print(f'\\n{fmt(guesses)} wins {p:.2%} of the time')\n", " print(f'mean score: {mean_score(freq):.2f}; max score: {max(freq)}')\n", " print(f'score frequencies: {freq2}')\n", " print(f'bin sizes: {dict(Counter(sorted(map(len, bins(guesses, words)))))}')\n", " for bin in sorted(bins(guesses, words), key=len, reverse=True):\n", " if len(bin) >= show_bins:\n", " bad_words = [w for w in bin if not always_wins([w], 6 - len(guesses) - 1, bin)]\n", " if bad_words:\n", " print(f'bin {fmt(bin)} can lose with: {fmt(bad_words)}')\n", " \n", "def mean_score(freq: Frequency) -> float:\n", " \"\"\"Given a frequency counter, compute the mean of the keys weighted by the values.\"\"\"\n", " return sum(s * freq[s] for s in freq) / sum(freq.values())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the `report` on Christos's guess set, and on my winners:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "{ARISE CLOMP THUNK BAWDY} wins 99.76% of the time\n", "mean score: 5.06; max score: 7\n", "score frequencies: {1: 1, 2: 1, 3: 1, 4: 1, 5: 2159, 6: 140, 7: 8}\n", "bin sizes: {1: 2032, 2: 107, 3: 19, 4: 1}\n", "bin {OTTER OVERT RETRO VOTER} can lose with: {RETRO}\n", "bin {EAGER GAZER RARER} can lose with: {RARER}\n", "bin {ESTER RESET STEER} can lose with: {RESET}\n", "bin {FIXER GIVER RIVER} can lose with: {FIXER}\n", "bin {FOCAL LOCAL VOCAL} can lose with: {FOCAL LOCAL VOCAL}\n", "bin {FOLLY GOLLY JOLLY} can lose with: {FOLLY GOLLY JOLLY}\n", "bin {GAUNT JAUNT VAUNT} can lose with: {GAUNT JAUNT VAUNT}\n", "bin {OFFER ROGER ROVER} can lose with: {OFFER}\n", "bin {PIPER RIPER VIPER} can lose with: {PIPER RIPER VIPER}\n", "bin {STAGE STATE STAVE} can lose with: {STAGE STATE STAVE}\n", "bin {WAFER WAGER WAVER} can lose with: {WAFER WAGER WAVER}\n" ] } ], "source": [ "report(('ARISE', 'CLOMP', 'THUNK', 'BAWDY')) # Christos's 4 words" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "{HANDY SWIFT GLOVE CRUMP} wins 100.00% of the time\n", "mean score: 5.03; max score: 6\n", "score frequencies: {1: 1, 2: 1, 3: 1, 4: 1, 5: 2222, 6: 83}\n", "bin sizes: {1: 2141, 2: 79, 3: 2}\n", "\n", "{CRUST VEGAN HOWDY BLIMP} wins 100.00% of the time\n", "mean score: 5.04; max score: 6\n", "score frequencies: {1: 1, 2: 1, 3: 1, 4: 1, 5: 2209, 6: 96}\n", "bin sizes: {1: 2124, 2: 77, 3: 5, 4: 3}\n", "\n", "{WELSH COMFY GRUNT VAPID} wins 100.00% of the time\n", "mean score: 5.04; max score: 6\n", "score frequencies: {1: 1, 2: 1, 3: 1, 4: 1, 5: 2200, 6: 105}\n", "bin sizes: {1: 2102, 2: 92, 3: 5, 4: 1}\n", "\n", "{SLURP MIGHT COVEN BAWDY} wins 100.00% of the time\n", "mean score: 5.03; max score: 6\n", "score frequencies: {1: 1, 2: 1, 3: 1, 4: 1, 5: 2222, 6: 83}\n", "bin sizes: {1: 2147, 2: 69, 3: 4, 4: 2}\n", "\n", "{STAID CRUMB WOVEN GLYPH} wins 100.00% of the time\n", "mean score: 5.04; max score: 6\n", "score frequencies: {1: 1, 2: 1, 3: 1, 4: 1, 5: 2210, 6: 95}\n", "bin sizes: {1: 2122, 2: 82, 3: 5, 4: 1}\n", "\n", "{DUTCH BALMY WOVEN SPRIG} wins 100.00% of the time\n", "mean score: 5.04; max score: 6\n", "score frequencies: {1: 1, 2: 1, 3: 1, 4: 1, 5: 2202, 6: 103}\n", "bin sizes: {1: 2107, 2: 88, 3: 6, 4: 1}\n", "\n", "{CLANG WISPY THUMB DROVE} wins 100.00% of the time\n", "mean score: 5.04; max score: 6\n", "score frequencies: {1: 1, 2: 1, 3: 1, 4: 1, 5: 2205, 6: 100}\n", "bin sizes: {1: 2108, 2: 94, 3: 3}\n" ] } ], "source": [ "for winner in [('HANDY', 'SWIFT', 'GLOVE', 'CRUMP')] + winners:\n", " report(winner)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# (3) What is a strategy that minimizes the number of guesses?\n", "\n", "With 4 preset guesses, we're destined to win in 5 guesses most of the time, or sometimes 6. What if we aspire to win in 4 guesses? Or 3? We could try a smaller preset guess set, and find one with a low mean score. This function will help:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def good_guess_set(N, W=4, letters=letters22) -> Wordset:\n", " \"\"\"Generate `N` `W`-word guess sets, and see which one has the lowest mean score.\"\"\" \n", " return min(random_disjoint_guess_sets(N, W, letters), \n", " key=lambda guesses: mean_score(scores(guesses)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I'll search for a good guess set with 3 words, and then with 2 words:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "{BLIND SHAME CRYPT} wins 99.83% of the time\n", "mean score: 4.19; max score: 9\n", "score frequencies: {1: 1, 2: 1, 3: 1, 4: 1892, 5: 380, 6: 30, 7: 4, 8: 0.086, 9: 0.004}\n", "bin sizes: {1: 1607, 2: 213, 3: 40, 4: 20, 5: 5, 6: 4, 7: 2, 10: 1}\n", "bin {FEVER FEWER JOKER OFFER QUEER REFER ROGER ROVER ROWER WOOER} can lose with: {FEVER FEWER JOKER OFFER QUEER REFER ROGER WOOER}\n", "bin {FOLLY FULLY GOLLY GULLY JOLLY LOWLY WOOLY} can lose with: {LOWLY WOOLY}\n", "bin {AFTER EATER EXTRA TAKER TERRA WATER} can lose with: {TERRA}\n", "bin {EAGER GAZER RARER WAFER WAGER WAVER} can lose with: {EAGER GAZER RARER WAFER WAVER}\n", "bin {OTTER OUTER RETRO TOWER UTTER VOTER} can lose with: {RETRO}\n", "bin {AWOKE GAFFE GAUGE GAUZE VAGUE} can lose with: {AWOKE}\n", "bin {SKATE STAGE STAKE STATE STAVE} can lose with: {STAGE STAKE STATE STAVE}\n", "bin {DODGE FUDGE JUDGE WEDGE} can lose with: {DODGE WEDGE}\n", "bin {GAUNT JAUNT TAUNT VAUNT} can lose with: {GAUNT JAUNT TAUNT VAUNT}\n" ] } ], "source": [ "report(good_guess_set(200, 3, set(letters[:18])))" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "{RETCH SNAIL} wins 99.38% of the time\n", "mean score: 3.79; max score: 10\n", "score frequencies: {1: 1, 2: 1, 3: 953, 4: 966, 5: 310, 6: 64, 7: 12, 8: 2, 9: 0.089, 10: 0.002}\n", "bin sizes: {1: 535, 2: 170, 3: 92, 4: 44, 5: 34, 6: 22, 7: 11, 8: 10, 9: 5, 10: 6, 11: 3, 12: 3, 13: 3, 14: 2, 15: 3, 16: 1, 17: 1, 18: 1, 19: 1, 21: 1, 23: 1, 25: 1, 28: 1, 29: 1, 39: 1}\n", "bin {BOXER BREED BROKE BUYER DROVE DRYER EMBER ERODE ERROR EVERY FORGE FOYER FREED FREER FROZE GORGE GREED GROPE GROVE JOKER MOVER MOWER ODDER OFFER OMBRE ORDER POKER POWER PROBE PROVE PRUDE PUREE PURER PURGE QUEER QUERY UDDER UPPER WOOER} can lose with: {BOXER BREED BROKE BUYER DROVE DRYER EMBER ERODE ERROR EVERY FORGE FOYER FREED FREER FROZE GORGE GREED GROPE GROVE JOKER MOVER MOWER ODDER OFFER OMBRE ORDER POKER POWER PROBE PROVE PRUDE PUREE PURER PURGE QUEER QUERY UDDER UPPER WOOER}\n", "bin {BOBBY BOOBY BOOZY BUDDY BUGGY BUXOM DODGY DOWDY DUMMY DUMPY FOGGY FUZZY GOODY GOOFY GUMBO GUMMY GUPPY JUMBO JUMPY MOODY MUDDY MUMMY POPPY PUDGY PUFFY PUPPY PYGMY WOODY WOOZY} can lose with: {BOBBY BOOBY BOOZY BUDDY BUGGY BUXOM DOWDY DUMMY FOGGY FUZZY GUMMY JUMBO JUMPY MUDDY MUMMY POPPY PUFFY PUPPY PYGMY WOODY WOOZY}\n", "bin {BLOOD BLOOM BLUFF BULKY BULLY DOLLY DULLY FLOOD FLUFF FOLLY FULLY GLOOM GODLY GOLLY GULLY JOLLY LOBBY LOOPY LOWLY LUMPY MOLDY ODDLY PLUMB PLUMP POLYP PULPY WOOLY WOULD} can lose with: {BLOOD BLOOM BLUFF BULKY BULLY FLUFF GLOOM JOLLY LOBBY LOOPY LOWLY LUMPY MOLDY ODDLY PLUMB PLUMP POLYP PULPY WOOLY}\n", "bin {ADORE AGREE AMBER ARGUE AZURE BAKER BARGE BREAD BREAK DREAD DREAM EAGER FREAK GAMER GAYER GAZER MAKER OPERA PAPER PARER PAYER WAFER WAGER WAVER WREAK} can lose with: {ADORE AGREE AMBER ARGUE AZURE BAKER BARGE BREAD BREAK DREAD DREAM EAGER FREAK GAMER GAYER GAZER MAKER OPERA PAPER PARER PAYER WAFER WAGER WAVER WREAK}\n", "bin {BRIBE BRIDE BRIEF DIRGE DIVER DRIED DRIER DRIVE FIBER FIERY FIXER FRIED GIVER GRIEF GRIME GRIPE PIPER PRIDE PRIED PRIME PRIZE VIPER WIDER} can lose with: {BRIBE BRIEF DIVER DRIED DRIER DRIVE FIBER FIERY FIXER FRIED GIVER GRIEF GRIME PIPER PRIME PRIZE VIPER WIDER}\n", "bin {BROOD BROOK BROOM DOWRY DROOP FJORD FORGO FORUM FUROR FURRY GOURD GROOM GROUP GRUFF JUROR MURKY PROOF PROUD PROXY WORDY WORRY} can lose with: {DOWRY GRUFF JUROR WORRY}\n", "bin {ARBOR ARDOR ARMOR AROMA ARRAY ARROW AUGUR BORAX BROAD FAVOR FORAY KARMA MAJOR MARRY MAYOR PARKA PARRY UMBRA VAPOR} can lose with: {BORAX FORAY PARKA}\n", "bin {BAGGY BAWDY BAYOU DADDY DOGMA GAMMA GAUDY GAWKY JAZZY KAPPA KAYAK MADAM MAGMA MAMBO MAMMA MAMMY PADDY VODKA} can lose with: {BAGGY BAYOU DADDY GAMMA GAWKY JAZZY KAPPA KAYAK MAGMA MAMBO MAMMA MAMMY PADDY}\n", "bin {AGLOW ALBUM ALLAY ALLOW ALLOY ALOOF ALOUD AMPLY APPLY BADLY BALMY BYLAW DALLY GAYLY MADLY POLKA} can lose with: {ALOOF APPLY}\n", "bin {ABLED ALGAE ALLEY AMBLE AMPLE APPLE BLEAK EAGLE FABLE GLEAM LADLE MAPLE PLEAD VALUE VALVE} can lose with: {ALGAE BLEAK VALUE VALVE}\n", "bin {AGONY AMONG BANJO DANDY FANNY FAUNA GONAD MANGA MANGO MANGY NANNY NOMAD PAGAN WAGON WOMAN} can lose with: {BANJO MANGA PAGAN}\n", "bin {BOOZE BUDGE DODGE DOPEY EMBED EPOXY EVOKE FUDGE FUGUE GOOEY GOUGE JUDGE MODEM QUEUE VOGUE} can lose with: {EPOXY}\n", "bin {ABBEY ABODE ABOVE ADOBE AWOKE BADGE GAFFE GAUGE GAUZE MAUVE MAYBE OMEGA PAYEE VAGUE} can lose with: {ABODE ABOVE ADOBE AWOKE}\n", "bin {BERRY DEFER DEMUR DERBY FEMUR FERRY FEVER FEWER JERKY MERGE MERRY PERKY VERGE VERVE} can lose with: {DEMUR FEMUR FEVER FEWER MERGE VERGE VERVE}\n", "bin {BILLY BLIMP BUILD DILLY DIMLY FILLY FILMY GUILD IGLOO IMPLY LIMBO MILKY WILLY} can lose with: {BLIMP GUILD IGLOO IMPLY MILKY}\n", "bin {BINGO DINGO DINGY DOING DYING FUNGI GOING KINKY NINNY OWING PINKY VYING WINDY} can lose with: {BINGO FUNGI}\n", "bin {BROWN DONOR DROWN DRUNK FROND FROWN GROWN MORON MOURN PRONG WRONG WRUNG} can lose with: {MORON MOURN}\n", "bin {CORER COVER COWER CREDO CREED CREEK CREEP CREME CREPE CRUDE CURVE CYBER} can lose with: {CREED CREEK CREEP CREME CREPE CRUDE CURVE}\n", "bin {AUNTY DAUNT GAUNT JAUNT JUNTA TANGO TANGY TAUNT TAWNY TONGA VAUNT} can lose with: {AUNTY DAUNT GAUNT JAUNT JUNTA TAUNT TAWNY VAUNT}\n", "bin {BONGO BOUND BUNNY DOWNY FOUND FUNKY FUNNY MOUND POUND WOUND YOUNG} can lose with: {BONGO BOUND BUNNY DOWNY FOUND FUNKY FUNNY MOUND POUND WOUND YOUNG}\n", "bin {ADULT ALLOT ALOFT BLOAT FAULT FLOAT GLOAT TALLY VAULT WALTZ} can lose with: {VAULT WALTZ}\n", "bin {AWARE BRAKE BRAVE DRAKE DRAPE FRAME GRADE GRAPE GRAVE GRAZE} can lose with: {AWARE BRAKE BRAVE DRAKE DRAPE FRAME GRAZE}\n", "bin {BIDDY DIZZY FIZZY GIDDY IDIOM JIFFY OPIUM PIGGY WIDOW WIMPY} can lose with: {JIFFY OPIUM WIMPY}\n", "bin {BONEY DOZEN EBONY MONEY NUDGE OZONE QUEEN WOKEN WOMEN WOVEN} can lose with: {BONEY EBONY NUDGE OZONE QUEEN}\n", "bin {RODEO ROGER ROGUE ROUGE ROVER ROWER RUDER RUPEE} can lose with: {RUPEE}\n", "bin {SKUNK SOUND SPOON SPUNK SUNNY SWOON SWUNG SYNOD} can lose with: {SUNNY}\n", "bin {BOWEL DOWEL DWELL EXPEL MODEL QUELL VOWEL} can lose with: {QUELL}\n", "bin {BATCH CATCH HATCH MATCH PATCH WATCH} can lose with: {BATCH CATCH HATCH MATCH PATCH WATCH}\n", "bin {ASSET BASTE PASTE TASTE WASTE} can lose with: {ASSET}\n", "bin {BATTY DATUM FATTY PATTY TATTY} can lose with: {DATUM}\n", "bin {BUNCH CONCH HUNCH MUNCH PUNCH} can lose with: {CONCH}\n", "bin {SAPPY SASSY SAVOY SAVVY SQUAD} can lose with: {SQUAD}\n", "bin {SAUTE STEAD STEAK STEAM SWEAT} can lose with: {SAUTE}\n", "bin {SHADE SHAKE SHAME SHAPE SHAVE} can lose with: {SHADE SHAKE SHAME SHAPE SHAVE}\n" ] } ], "source": [ "report(good_guess_set(20, 2, set(letters[:14])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I hope this gives you some ideas for Wordle strategies you can use, and/or new computational ideas to explore." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 4 }