{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Day 2 - counting letters\n",
    "\n",
    "- [Day 2](https://adventofcode.com/2018/day/2)\n",
    "\n",
    "This is trivially solved by counting letters in a given id using [`collections.Counter()`](https://docs.python.org/3/library/collections.html#collections.Counter), then noting if there are any 2s or 3s in the counts.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import aocd\n",
    "\n",
    "data = aocd.get_data(day=2, year=2018)\n",
    "boxids = [id_.strip() for id_ in data.splitlines()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import Counter\n",
    "\n",
    "\n",
    "def checksum(boxids):\n",
    "    twos, threes = 0, 0\n",
    "    for id_ in boxids:\n",
    "        counts = set(Counter(id_).values())\n",
    "        twos += int(2 in counts)\n",
    "        threes += int(3 in counts)\n",
    "    return twos * threes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Part 1: 6225\n"
     ]
    }
   ],
   "source": [
    "print(\"Part 1:\", checksum(boxids))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 2 - Efficiency through tries\n",
    "\n",
    "The most efficient path to a solution is to use a [trie](https://en.wikipedia.org/wiki/Trie), a tree structure where nodes are letters of a word. This is often used to do efficient prefix testing (does a given prefix exist in a set of words?), but here you can use it to quickly prune the number of possible ids to search given a prefix.\n",
    "\n",
    "- The simplest trie in Python is just a set of nested dictionaries; letter -> dict\n",
    "- Each level of the trie tells you what other letters have been used so far\n",
    "- As you test each letter of a given id, you navigate deeper into the trie and have\n",
    "  the matching level. If the current letter is not in the trie, already, there can't\n",
    "  be any matching ids with the same prefix, so you can stop searching further. Just\n",
    "  insert the remainder of the id into the trie and continue with the next.\n",
    "\n",
    "For the given example, the full trie would be:\n",
    "\n",
    "```\n",
    "├── a\n",
    "│   ├── b\n",
    "│   │   └── c\n",
    "│   │       └── d\n",
    "│   │           └── e\n",
    "│   └── x\n",
    "│       └── c\n",
    "│           └── y\n",
    "│               └── e\n",
    "├── f\n",
    "│   └── g\n",
    "│       ├── h\n",
    "│       │   └── i\n",
    "│       │       └── j\n",
    "│       └── u\n",
    "│           └── i\n",
    "│               └── j\n",
    "├── k\n",
    "│   └── l\n",
    "│       └── m\n",
    "│           └── n\n",
    "│               └── o\n",
    "├── p\n",
    "│   └── q\n",
    "│       └── r\n",
    "│           └── s\n",
    "│               └── t\n",
    "└── w\n",
    "    └── v\n",
    "        └── x\n",
    "            └── y\n",
    "                └── z\n",
    "```\n",
    "\n",
    "but you never have to go that far. You start with `abcde` and the trie is quickly updated to\n",
    "\n",
    "```\n",
    "└── a\n",
    "    └── b\n",
    "        └── c\n",
    "            └── d\n",
    "                └── e\n",
    "```\n",
    "\n",
    "after finding that there are no letters in the top level of the trie and `a` not being present in the trie (so you insert the whole id, continue).\n",
    "\n",
    "You then process `fghij`; you find the `a` at the top level, but there is no `aghij` in the trie (traverse to `a`, the map has no `g`, end of test), so you insert `fghij` into the trie:\n",
    "\n",
    "```\n",
    "├── a\n",
    "│   └── b\n",
    "│       └── c\n",
    "│           └── d\n",
    "│               └── e\n",
    "└── f\n",
    "    └── g\n",
    "        └── h\n",
    "            └── i\n",
    "                └── j\n",
    "```\n",
    "\n",
    "`klmno` is treated the same, there is no `almno` and no `flmno` in the trie (the two letters at the top of the trie that could replace `k`, neither of which have an `l` entry in their subtree), so you insert the word at the top. `pqrst` is treated the same way; 3 tests for `aq`, `fq` and `kq` all fail. You now have:\n",
    "\n",
    "```\n",
    "├── a\n",
    "│   └── b\n",
    "│       └── c\n",
    "│           └── d\n",
    "│               └── e\n",
    "├── f\n",
    "│   └── g\n",
    "│       └── h\n",
    "│           └── i\n",
    "│               └── j\n",
    "├── k\n",
    "│   └── l\n",
    "│       └── m\n",
    "│           └── n\n",
    "│               └── o\n",
    "└── p\n",
    "    └── q\n",
    "        └── r\n",
    "            └── s\n",
    "                └── t\n",
    "```\n",
    "\n",
    "Now comes `fguij`. Testing with `f` in that id, you find are no `ag`, `kg`, and `pg` prefixes, but `f` does exist, so it is worth progressing to the `guij` substring of the id and the `f` subtree (`g` -> `h` -> `i` -> `j`) of the trie only. There are no alternative letters, but `g` does exist in the subtree, so we continue on, with the subtree (`h` -> `i` -> `j`), and substring `uij`.\n",
    "\n",
    "We test the only key in the trie here, `h`, and find that `hij` exists in this subtree. We have a match! `fg` is the prefix so far, `ij` the postfix we tested, so the answer is `fgij`.\n",
    "\n",
    "So for each id we test, we only need to check a very small subset of letters seen soo far.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def intrie(s, trie):\n",
    "    for c in s:\n",
    "        trie = trie.get(c)\n",
    "        if trie is None:\n",
    "            return False\n",
    "    return True\n",
    "\n",
    "\n",
    "def matching_ids(boxids):\n",
    "    trie = {}\n",
    "    for id_ in boxids:\n",
    "        current = trie\n",
    "        for i, char in enumerate(id_):\n",
    "            for tchar in current:\n",
    "                if tchar != char:\n",
    "                    # check if there is a full postfix match\n",
    "                    if intrie(id_[i + 1 :], current[tchar]):\n",
    "                        return f\"{id_[:i]}{id_[i + 1 :]}\"\n",
    "            if char not in current:\n",
    "                # insert remainder as a new trie entry\n",
    "                for c in id_[i:]:\n",
    "                    current = current.setdefault(c, {})\n",
    "                break\n",
    "            else:\n",
    "                current = current[char]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "testids = \"\"\"\\\n",
    "abcde\n",
    "fghij\n",
    "klmno\n",
    "pqrst\n",
    "fguij\n",
    "axcye\n",
    "wvxyz\"\"\".splitlines()\n",
    "assert matching_ids(testids) == \"fgij\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Part 2: revtaubfniyhsgxdoajwkqilp\n"
     ]
    }
   ],
   "source": [
    "print(\"Part 2:\", matching_ids(boxids))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}