{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d8b8d45e-273c-4f6e-ad5f-961420eaf484",
   "metadata": {},
   "source": [
    "*   [Overview](overview.ipynb)\n",
    "*   [Depth First and Breath First Search](puzzle.ipynb)\n",
    "*   [SAT Solvers](einstein.ipynb)\n",
    "*   [Pattern Recognition and Reinforcement Learning](#) <--- (You Are Here)\n",
    "    *   [Overview](#overview)\n",
    "    *   [Rock Paper Scissors](#rock-paper-scissors)\n",
    "    *   [Our Approach](#our-approach)\n",
    "    *   [Helpful Python Skills](#helpful-python-skills)\n",
    "    *   [The Code](#the-code)\n",
    "    *   [Critiques and Improvements](#critiques-and-improvements)\n",
    "    *   [Where you can go from here](#where-you-can-go-from-here)\n",
    "*   [SMT and Model Checkers](dining.ipynb)\n",
    "*   [AlphaZero](alpha_zero.ipynb)\n",
    "*   [The Future](philosophy.ipynb)\n",
    "\n",
    "* * *"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3545b7e9-1446-45a1-b9ab-b2d64cb60856",
   "metadata": {},
   "source": [
    "Pattern Recognition and Reinforcement Learning[¶](#pattern-recognition-and-reinforcement-learning \"Permalink to this headline\")\n",
    "==============================================================================================================================="
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2da99b7-c4dc-4eab-96f9-41d1d3008c5c",
   "metadata": {},
   "source": [
    "Overview[¶](#overview \"Permalink to this headline\")\n",
    "---------------------------------------------------\n",
    "\n",
    "**Problem:** Getting son to eat broccoli\n",
    "\n",
    "**Strategies**:\n",
    "\n",
    "*   Acclimatization: Serve once a week until he gets used to it\n",
    "*   Better than alternative: It’s better than spinach\n",
    "*   Enticements: We’ll watch your favorite movie after\n",
    "*   Trickery: It’s candy\n",
    "*   Testimonial: Your best friend likes it\n",
    "*   Wait it out: We’re not leaving this table until …\n",
    "*   Piecemeal: Just one little bite\n",
    "*   Appeal to Reason: Great source of vitamins K and C\n",
    "\n",
    "**Reinforcement Learning:**\n",
    "\n",
    "Randomly try strategies. If they work, choose them more often."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9c1cdec-b634-4eef-b561-eb9a0a8519dd",
   "metadata": {},
   "source": [
    "Rock Paper Scissors[¶](#rock-paper-scissors \"Permalink to this headline\")\n",
    "-------------------------------------------------------------------------\n",
    "\n",
    "This is an iterated adversarial zero-sum two-person game of perfect information.\n",
    "\n",
    "![_images/300px-Rock-paper-scissors.svg.png](_images/300px-Rock-paper-scissors.svg.png)\n",
    "\n",
    "The scoring logic is easily encoded in a Python dictionary:\n",
    "\n",
    "```python\n",
    "# scorer['RS'] -> 1 meaning Rock cuts Scissors is +1 for the first player.\n",
    "scorer = dict(SP=1, PR=1, RS=1, PS=-1, RP=-1, SR=-1, SS=0, PP=0, RR=0)\n",
    "```\n",
    "\n",
    "Now, it’s time to thinking about winning (not losing):\n",
    "\n",
    "*   Playing randomly assures that we lose no more than one-third of the time, regardless of our opponent’s strategy.\n",
    "*   If our opponent is completely predictable, we can always win.\n",
    "*   In the real world, humans have a hard time simulating perfect randomness or who “have a plan” to beat us:\n",
    "    *   A mild preference for _paper_\n",
    "    *   Propensity to select _rock_ after they’ve played _scissors_\n",
    "    *   If we play _paper_, they often select _scissors_ on the next round\n",
    "    *   They tend to copy our last play\n",
    "*   In other words, there may be patterns that we can detect and exploit to win more than a third of the time."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c89222c-2476-4202-b94e-de1375517735",
   "metadata": {},
   "source": [
    "Our Approach[¶](#our-approach \"Permalink to this headline\")\n",
    "-----------------------------------------------------------\n",
    "\n",
    "*   Generate multiple competing pattern recognition strategies\n",
    "    \n",
    "*   Choose between the strategies [multi-arm bandit](https://en.wikipedia.org/wiki/Multi-armed_bandit) approach to [reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning).\n",
    "    \n",
    "*   Core logic:\n",
    "\n",
    "```python\n",
    "for trial in range(trials):\n",
    "  # choose our move\n",
    "  # get opponent's move\n",
    "  # determine the winner\n",
    "  # update move history and strategy weights\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a64882a-7f8d-4bfb-90da-0be23ca169fa",
   "metadata": {},
   "source": [
    "Helpful Python Skills[¶](#helpful-python-skills \"Permalink to this headline\")\n",
    "-----------------------------------------------------------------------------"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac1f0614-2dc7-4155-936b-99e7f2e83876",
   "metadata": {},
   "source": [
    "`itertools.chain()` links to multiple iterables into one:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "78d3a006-4ec9-4d30-8c3d-eb9c58b96ed8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from itertools import chain\n",
    "\n",
    "list(chain('RPRPS', 'RPS'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb323727-6d45-4c24-a7c8-66e173cdba8a",
   "metadata": {},
   "source": [
    "`itertools.cycle()` repeats a sequence over and over again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96156e76-d3c7-49d9-be97-bbfd6b33fff9",
   "metadata": {},
   "outputs": [],
   "source": [
    "from itertools import cycle, islice\n",
    "\n",
    "list(islice(cycle('RPS'), 9))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38acac01-d6a4-41aa-8d68-b60a5d61aa92",
   "metadata": {},
   "source": [
    "`collections.Counter()` tallies frequencies of the inputs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1db629c0-cdaf-4be8-8de8-e0e9526414ff",
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import Counter\n",
    "\n",
    "Counter('RPRPSRPSR')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71351aae-9c90-4610-8f02-734cfd44a41c",
   "metadata": {},
   "source": [
    "`Counter.most_common(n)` picks the _n_ most common counts:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33a07db9-e622-45ca-abd4-62af0d681d22",
   "metadata": {},
   "outputs": [],
   "source": [
    "Counter('RPRPSRPSR').most_common(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bdbc483-091f-4f0d-8912-5f2ceac11e0e",
   "metadata": {},
   "source": [
    "`zip(*somedict.items())` transposes items into a separate keys and values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f190b669-9f7b-4cd8-9953-1e5ff1609c1e",
   "metadata": {},
   "outputs": [],
   "source": [
    "d = dict(R=4, P=3, S=2)\n",
    "keys, values = zip(*d.items())\n",
    "keys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a276a88-9c8b-410f-ae90-42fbe0edbf0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "values"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c75a3fb8-a644-4523-ab6b-2078068d14f5",
   "metadata": {},
   "source": [
    "`zip(a, a[1:])` groups input into overlapping digraphs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d40c378-6a33-47bf-b797-dfae69794a88",
   "metadata": {},
   "outputs": [],
   "source": [
    "a = 'abcde'\n",
    "list(zip(a, a[1:]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f141ede-2137-46d4-bd68-d6e7eec1c5c4",
   "metadata": {},
   "source": [
    "`random.choices(population, weights)[0]` makes a weighted choice from a population:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a7a6d59c-04e7-4097-84cd-834fed513296",
   "metadata": {},
   "outputs": [],
   "source": [
    "from random import choices\n",
    "choices(['R', 'P', 'S'], [6, 3, 1], k=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb48545e-febd-498f-a108-66ec66a1da54",
   "metadata": {},
   "outputs": [],
   "source": [
    "Counter(_)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c5c7bc0-4f94-4a1d-a806-0e095c37fdd1",
   "metadata": {},
   "source": [
    "The Code[¶](#the-code \"Permalink to this headline\")\n",
    "---------------------------------------------------"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4cbb8189-75b6-4320-b60b-0c0793c3e077",
   "metadata": {},
   "outputs": [],
   "source": [
    "''' Generic learning algorithm for adversarial two-person\n",
    "    zero-sum games of perfect information.\n",
    "\n",
    "    General approach:  Make a list of strategies based\n",
    "    on pattern recognition. Use the multi-arm bandit\n",
    "    approach to learning which strategies win the most.\n",
    "'''\n",
    "\n",
    "from collections import Counter\n",
    "from random import choices, choice\n",
    "from itertools import chain, cycle\n",
    "from pprint import pprint\n",
    "\n",
    "# Game Definition ################################\n",
    "\n",
    "# https://en.wikipedia.org/wiki/Rock%E2%80%93paper%E2%80%93scissors\n",
    "# scorer['RS'] -> 1 meaning Rock cuts Scissors is +1 for the first player.\n",
    "scorer = dict(SP=1, PR=1, RS=1, PS=-1, RP=-1, SR=-1, SS=0, PP=0, RR=0)\n",
    "\n",
    "# Scissors cuts Paper; Paper covers Rock; Rock crushes Scissors\n",
    "ideal_response = {'P': 'S', 'R': 'P', 'S': 'R'}\n",
    "options = ['R', 'P', 'S']\n",
    "\n",
    "# Strategy Utilities #############################\n",
    "\n",
    "def select_proportional(events, baseline=()):\n",
    "    rel_freq = Counter(chain(baseline, events))\n",
    "    population, weights = zip(*rel_freq.items())\n",
    "    return choices(population, weights)[0]\n",
    "\n",
    "def select_maximum(events, baseline=()):\n",
    "    rel_freq = Counter(chain(baseline, events))\n",
    "    return rel_freq.most_common(1)[0][0]\n",
    "\n",
    "# Strategies #####################################\n",
    "\n",
    "def random_reply(p1hist, p2hist):\n",
    "    return choice(options)\n",
    "\n",
    "def single_event_proportional(p1hist, p2hist):\n",
    "    \"\"\" When opponent plays R two-thirds of the time,\n",
    "        respond with P two-thirds of the time.'\n",
    "    \"\"\"\n",
    "    prediction = select_proportional(p2hist, options)\n",
    "    return ideal_response[prediction]\n",
    "\n",
    "def single_event_greedy(p1hist, p2hist):\n",
    "    \"\"\" When opponent plays R more than P or S,\n",
    "        always respond with P.'\n",
    "    \"\"\"\n",
    "    prediction = select_maximum(p2hist, options)\n",
    "    return ideal_response[prediction]\n",
    "\n",
    "def digraph_event_proportional(p1hist, p2hist):\n",
    "    \"\"\" When opponent's most recent play is S\n",
    "        and they usually play R two-thirds of the time\n",
    "        after an S, respond with P two-thirds of the time.\n",
    "    \"\"\"\n",
    "    recent_play = p2hist[-1:]\n",
    "    digraphs = zip(p2hist, p2hist[1:])\n",
    "    followers = [b for a, b in digraphs if a == recent_play]\n",
    "    prediction = select_proportional(followers, options)\n",
    "    return ideal_response[prediction]\n",
    "\n",
    "def digraph_event_greedy(p1hist, p2hist):\n",
    "    \"\"\" When opponent's most recent play is S\n",
    "        and they usually play R two-thirds of the time\n",
    "        after an S, respond with P all of the time.\n",
    "    \"\"\"\n",
    "    recent_play = p2hist[-1:]\n",
    "    digraphs = zip(p2hist, p2hist[1:])\n",
    "    followers = [b for a, b in digraphs if a == recent_play]\n",
    "    prediction = select_maximum(followers, options)\n",
    "    return ideal_response[prediction]\n",
    "\n",
    "strategies = [random_reply, single_event_proportional, single_event_greedy,\n",
    "              digraph_event_proportional, digraph_event_greedy]\n",
    "\n",
    "def play_and_learn(opposition, strategies=strategies,\n",
    "                   trials=1000, verbose=False):\n",
    "    strategy_range = range(len(strategies))\n",
    "    weights = [1] * len(strategies)\n",
    "    p1hist = []\n",
    "    p2hist = []\n",
    "    cum_score = 0\n",
    "    for trial in range(trials):\n",
    "        # choose our move\n",
    "        our_moves = [strategy(p1hist, p2hist) for strategy in strategies]\n",
    "        i = choices(strategy_range, weights)[0]\n",
    "        our_move = our_moves[i]\n",
    "\n",
    "        # get opponent's move\n",
    "        opponent_move = opposition(p2hist, p1hist)\n",
    "\n",
    "        # determine the winner\n",
    "        score = scorer[our_move + opponent_move]\n",
    "        if verbose:\n",
    "            print(f'{our_move} ~ {opponent_move} = {score:+d}'\n",
    "                  f'\\t\\t{strategies[i].__name__}')\n",
    "        cum_score += score\n",
    "\n",
    "        # update move history and strategy weights\n",
    "        p1hist.append(our_move)\n",
    "        p2hist.append(opponent_move)\n",
    "        for i, our_move in enumerate(our_moves):\n",
    "            if scorer[our_move + opponent_move] == 1:\n",
    "                weights[i] += 1\n",
    "\n",
    "    print(f'---- vs. {opposition.__name__} ----')            \n",
    "    print('Total score:', cum_score)\n",
    "    pprint(sorted([(weight, strategy.__name__) for weight, strategy in zip(weights, strategies)]))\n",
    "\n",
    "if __name__ == '__main__':\n",
    "\n",
    "    def human(p1hist, p2hist):\n",
    "        return input(f'Choose one of {options!r}: ')\n",
    "\n",
    "    def fixed_ratio(p1hist, p2hist):\n",
    "        return choices(options, (1, 2, 3))[0]\n",
    "\n",
    "    def cycling(series):\n",
    "        iterator = cycle(series)\n",
    "        def cycle_opponent(p1hist, p2hist):\n",
    "            return next(iterator)\n",
    "        return cycle_opponent\n",
    "\n",
    "    play_and_learn(opposition=fixed_ratio)\n",
    "    play_and_learn(opposition=random_reply)\n",
    "    play_and_learn(opposition=cycling('RPRSS'))\n",
    "    play_and_learn(opposition=human, trials=10, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fff8156e-8903-4576-af0f-78b65f212efe",
   "metadata": {},
   "source": [
    "Critiques and Improvements[¶](#critiques-and-improvements \"Permalink to this headline\")\n",
    "---------------------------------------------------------------------------------------\n",
    "\n",
    "*   When humans are losing, they generally change they strategy, so we should “age” our weights to adapt to the change.\n",
    "*   The _random\\_reply_ strategy wins one-third of the time so it keeps getting selected when other strategies are doing better.\n",
    "*   We need to keep _random\\_reply_ in the mix in case our strategies get figured-out and gamed. Randomness is your best defense against a smarter adversary.\n",
    "*   The multi-arm bandit approach (choosing the strategy proportionally to wins and losses) results in very slow learning.\n",
    "*   Humans get bored with this game quickly, so it is hard get people to test it."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed7cb3de-4c5b-4f62-a99e-fdeb81c0a51b",
   "metadata": {},
   "source": [
    "Where you can go from here[¶](#where-you-can-go-from-here \"Permalink to this headline\")\n",
    "---------------------------------------------------------------------------------------\n",
    "\n",
    "*   None of the code logic is hard-wired to the basic game.\n",
    "*   It is easy to add new strategies.\n",
    "*   Or to evaluate various published [how-to-win strategies](https://www.telegraph.co.uk/men/thinking-man/11051704/How-to-always-win-at-rock-paper-scissors.html).\n",
    "*   Changing the game definition logic allows for more complex games such as rock-paper-scissors-lizard-spock:\n",
    "\n",
    "![_images/1200px-Rock_Paper_Scissors_Lizard_Spock_en.svg.png](_images/1200px-Rock_Paper_Scissors_Lizard_Spock_en.svg.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec309a01-4af8-46d1-a08b-4d5966ca140f",
   "metadata": {},
   "source": [
    "| [<<< Previous](einstein.ipynb \"SAT Solvers\") | [Next >>>](dining.ipynb \"SMT and Model Checkers\") |\n",
    "|                    :-:                       |                        :-:                        |\n",
    "\n",
    "* * *\n",
    "\n",
    "© Copyright 2019, Raymond Hettinger"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}