{ "cells": [ { "cell_type": "markdown", "id": "d8b8d45e-273c-4f6e-ad5f-961420eaf484", "metadata": {}, "source": [ "* [Overview](overview.ipynb)\n", "* [Depth First and Breath First Search](puzzle.ipynb)\n", "* [SAT Solvers](einstein.ipynb)\n", "* [Pattern Recognition and Reinforcement Learning](#) <--- (You Are Here)\n", " * [Overview](#overview)\n", " * [Rock Paper Scissors](#rock-paper-scissors)\n", " * [Our Approach](#our-approach)\n", " * [Helpful Python Skills](#helpful-python-skills)\n", " * [The Code](#the-code)\n", " * [Critiques and Improvements](#critiques-and-improvements)\n", " * [Where you can go from here](#where-you-can-go-from-here)\n", "* [SMT and Model Checkers](dining.ipynb)\n", "* [AlphaZero](alpha_zero.ipynb)\n", "* [The Future](philosophy.ipynb)\n", "\n", "* * *" ] }, { "cell_type": "markdown", "id": "3545b7e9-1446-45a1-b9ab-b2d64cb60856", "metadata": {}, "source": [ "Pattern Recognition and Reinforcement Learning[¶](#pattern-recognition-and-reinforcement-learning \"Permalink to this headline\")\n", "===============================================================================================================================" ] }, { "cell_type": "markdown", "id": "b2da99b7-c4dc-4eab-96f9-41d1d3008c5c", "metadata": {}, "source": [ "Overview[¶](#overview \"Permalink to this headline\")\n", "---------------------------------------------------\n", "\n", "**Problem:** Getting son to eat broccoli\n", "\n", "**Strategies**:\n", "\n", "* Acclimatization: Serve once a week until he gets used to it\n", "* Better than alternative: It’s better than spinach\n", "* Enticements: We’ll watch your favorite movie after\n", "* Trickery: It’s candy\n", "* Testimonial: Your best friend likes it\n", "* Wait it out: We’re not leaving this table until …\n", "* Piecemeal: Just one little bite\n", "* Appeal to Reason: Great source of vitamins K and C\n", "\n", "**Reinforcement Learning:**\n", "\n", "Randomly try strategies. If they work, choose them more often." ] }, { "cell_type": "markdown", "id": "b9c1cdec-b634-4eef-b561-eb9a0a8519dd", "metadata": {}, "source": [ "Rock Paper Scissors[¶](#rock-paper-scissors \"Permalink to this headline\")\n", "-------------------------------------------------------------------------\n", "\n", "This is an iterated adversarial zero-sum two-person game of perfect information.\n", "\n", "![_images/300px-Rock-paper-scissors.svg.png](_images/300px-Rock-paper-scissors.svg.png)\n", "\n", "The scoring logic is easily encoded in a Python dictionary:\n", "\n", "```python\n", "# scorer['RS'] -> 1 meaning Rock cuts Scissors is +1 for the first player.\n", "scorer = dict(SP=1, PR=1, RS=1, PS=-1, RP=-1, SR=-1, SS=0, PP=0, RR=0)\n", "```\n", "\n", "Now, it’s time to thinking about winning (not losing):\n", "\n", "* Playing randomly assures that we lose no more than one-third of the time, regardless of our opponent’s strategy.\n", "* If our opponent is completely predictable, we can always win.\n", "* In the real world, humans have a hard time simulating perfect randomness or who “have a plan” to beat us:\n", " * A mild preference for _paper_\n", " * Propensity to select _rock_ after they’ve played _scissors_\n", " * If we play _paper_, they often select _scissors_ on the next round\n", " * They tend to copy our last play\n", "* In other words, there may be patterns that we can detect and exploit to win more than a third of the time." ] }, { "cell_type": "markdown", "id": "3c89222c-2476-4202-b94e-de1375517735", "metadata": {}, "source": [ "Our Approach[¶](#our-approach \"Permalink to this headline\")\n", "-----------------------------------------------------------\n", "\n", "* Generate multiple competing pattern recognition strategies\n", " \n", "* Choose between the strategies [multi-arm bandit](https://en.wikipedia.org/wiki/Multi-armed_bandit) approach to [reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning).\n", " \n", "* Core logic:\n", "\n", "```python\n", "for trial in range(trials):\n", " # choose our move\n", " # get opponent's move\n", " # determine the winner\n", " # update move history and strategy weights\n", "```" ] }, { "cell_type": "markdown", "id": "0a64882a-7f8d-4bfb-90da-0be23ca169fa", "metadata": {}, "source": [ "Helpful Python Skills[¶](#helpful-python-skills \"Permalink to this headline\")\n", "-----------------------------------------------------------------------------" ] }, { "cell_type": "markdown", "id": "ac1f0614-2dc7-4155-936b-99e7f2e83876", "metadata": {}, "source": [ "`itertools.chain()` links to multiple iterables into one:" ] }, { "cell_type": "code", "execution_count": null, "id": "78d3a006-4ec9-4d30-8c3d-eb9c58b96ed8", "metadata": {}, "outputs": [], "source": [ "from itertools import chain\n", "\n", "list(chain('RPRPS', 'RPS'))" ] }, { "cell_type": "markdown", "id": "cb323727-6d45-4c24-a7c8-66e173cdba8a", "metadata": {}, "source": [ "`itertools.cycle()` repeats a sequence over and over again:" ] }, { "cell_type": "code", "execution_count": null, "id": "96156e76-d3c7-49d9-be97-bbfd6b33fff9", "metadata": {}, "outputs": [], "source": [ "from itertools import cycle, islice\n", "\n", "list(islice(cycle('RPS'), 9))" ] }, { "cell_type": "markdown", "id": "38acac01-d6a4-41aa-8d68-b60a5d61aa92", "metadata": {}, "source": [ "`collections.Counter()` tallies frequencies of the inputs:" ] }, { "cell_type": "code", "execution_count": null, "id": "1db629c0-cdaf-4be8-8de8-e0e9526414ff", "metadata": {}, "outputs": [], "source": [ "from collections import Counter\n", "\n", "Counter('RPRPSRPSR')" ] }, { "cell_type": "markdown", "id": "71351aae-9c90-4610-8f02-734cfd44a41c", "metadata": {}, "source": [ "`Counter.most_common(n)` picks the _n_ most common counts:" ] }, { "cell_type": "code", "execution_count": null, "id": "33a07db9-e622-45ca-abd4-62af0d681d22", "metadata": {}, "outputs": [], "source": [ "Counter('RPRPSRPSR').most_common(2)" ] }, { "cell_type": "markdown", "id": "3bdbc483-091f-4f0d-8912-5f2ceac11e0e", "metadata": {}, "source": [ "`zip(*somedict.items())` transposes items into a separate keys and values:" ] }, { "cell_type": "code", "execution_count": null, "id": "f190b669-9f7b-4cd8-9953-1e5ff1609c1e", "metadata": {}, "outputs": [], "source": [ "d = dict(R=4, P=3, S=2)\n", "keys, values = zip(*d.items())\n", "keys" ] }, { "cell_type": "code", "execution_count": null, "id": "8a276a88-9c8b-410f-ae90-42fbe0edbf0c", "metadata": {}, "outputs": [], "source": [ "values" ] }, { "cell_type": "markdown", "id": "c75a3fb8-a644-4523-ab6b-2078068d14f5", "metadata": {}, "source": [ "`zip(a, a[1:])` groups input into overlapping digraphs:" ] }, { "cell_type": "code", "execution_count": null, "id": "8d40c378-6a33-47bf-b797-dfae69794a88", "metadata": {}, "outputs": [], "source": [ "a = 'abcde'\n", "list(zip(a, a[1:]))" ] }, { "cell_type": "markdown", "id": "9f141ede-2137-46d4-bd68-d6e7eec1c5c4", "metadata": {}, "source": [ "`random.choices(population, weights)[0]` makes a weighted choice from a population:" ] }, { "cell_type": "code", "execution_count": null, "id": "a7a6d59c-04e7-4097-84cd-834fed513296", "metadata": {}, "outputs": [], "source": [ "from random import choices\n", "choices(['R', 'P', 'S'], [6, 3, 1], k=10)" ] }, { "cell_type": "code", "execution_count": null, "id": "eb48545e-febd-498f-a108-66ec66a1da54", "metadata": {}, "outputs": [], "source": [ "Counter(_)" ] }, { "cell_type": "markdown", "id": "8c5c7bc0-4f94-4a1d-a806-0e095c37fdd1", "metadata": {}, "source": [ "The Code[¶](#the-code \"Permalink to this headline\")\n", "---------------------------------------------------" ] }, { "cell_type": "code", "execution_count": null, "id": "4cbb8189-75b6-4320-b60b-0c0793c3e077", "metadata": {}, "outputs": [], "source": [ "''' Generic learning algorithm for adversarial two-person\n", " zero-sum games of perfect information.\n", "\n", " General approach: Make a list of strategies based\n", " on pattern recognition. Use the multi-arm bandit\n", " approach to learning which strategies win the most.\n", "'''\n", "\n", "from collections import Counter\n", "from random import choices, choice\n", "from itertools import chain, cycle\n", "from pprint import pprint\n", "\n", "# Game Definition ################################\n", "\n", "# https://en.wikipedia.org/wiki/Rock%E2%80%93paper%E2%80%93scissors\n", "# scorer['RS'] -> 1 meaning Rock cuts Scissors is +1 for the first player.\n", "scorer = dict(SP=1, PR=1, RS=1, PS=-1, RP=-1, SR=-1, SS=0, PP=0, RR=0)\n", "\n", "# Scissors cuts Paper; Paper covers Rock; Rock crushes Scissors\n", "ideal_response = {'P': 'S', 'R': 'P', 'S': 'R'}\n", "options = ['R', 'P', 'S']\n", "\n", "# Strategy Utilities #############################\n", "\n", "def select_proportional(events, baseline=()):\n", " rel_freq = Counter(chain(baseline, events))\n", " population, weights = zip(*rel_freq.items())\n", " return choices(population, weights)[0]\n", "\n", "def select_maximum(events, baseline=()):\n", " rel_freq = Counter(chain(baseline, events))\n", " return rel_freq.most_common(1)[0][0]\n", "\n", "# Strategies #####################################\n", "\n", "def random_reply(p1hist, p2hist):\n", " return choice(options)\n", "\n", "def single_event_proportional(p1hist, p2hist):\n", " \"\"\" When opponent plays R two-thirds of the time,\n", " respond with P two-thirds of the time.'\n", " \"\"\"\n", " prediction = select_proportional(p2hist, options)\n", " return ideal_response[prediction]\n", "\n", "def single_event_greedy(p1hist, p2hist):\n", " \"\"\" When opponent plays R more than P or S,\n", " always respond with P.'\n", " \"\"\"\n", " prediction = select_maximum(p2hist, options)\n", " return ideal_response[prediction]\n", "\n", "def digraph_event_proportional(p1hist, p2hist):\n", " \"\"\" When opponent's most recent play is S\n", " and they usually play R two-thirds of the time\n", " after an S, respond with P two-thirds of the time.\n", " \"\"\"\n", " recent_play = p2hist[-1:]\n", " digraphs = zip(p2hist, p2hist[1:])\n", " followers = [b for a, b in digraphs if a == recent_play]\n", " prediction = select_proportional(followers, options)\n", " return ideal_response[prediction]\n", "\n", "def digraph_event_greedy(p1hist, p2hist):\n", " \"\"\" When opponent's most recent play is S\n", " and they usually play R two-thirds of the time\n", " after an S, respond with P all of the time.\n", " \"\"\"\n", " recent_play = p2hist[-1:]\n", " digraphs = zip(p2hist, p2hist[1:])\n", " followers = [b for a, b in digraphs if a == recent_play]\n", " prediction = select_maximum(followers, options)\n", " return ideal_response[prediction]\n", "\n", "strategies = [random_reply, single_event_proportional, single_event_greedy,\n", " digraph_event_proportional, digraph_event_greedy]\n", "\n", "def play_and_learn(opposition, strategies=strategies,\n", " trials=1000, verbose=False):\n", " strategy_range = range(len(strategies))\n", " weights = [1] * len(strategies)\n", " p1hist = []\n", " p2hist = []\n", " cum_score = 0\n", " for trial in range(trials):\n", " # choose our move\n", " our_moves = [strategy(p1hist, p2hist) for strategy in strategies]\n", " i = choices(strategy_range, weights)[0]\n", " our_move = our_moves[i]\n", "\n", " # get opponent's move\n", " opponent_move = opposition(p2hist, p1hist)\n", "\n", " # determine the winner\n", " score = scorer[our_move + opponent_move]\n", " if verbose:\n", " print(f'{our_move} ~ {opponent_move} = {score:+d}'\n", " f'\\t\\t{strategies[i].__name__}')\n", " cum_score += score\n", "\n", " # update move history and strategy weights\n", " p1hist.append(our_move)\n", " p2hist.append(opponent_move)\n", " for i, our_move in enumerate(our_moves):\n", " if scorer[our_move + opponent_move] == 1:\n", " weights[i] += 1\n", "\n", " print(f'---- vs. {opposition.__name__} ----') \n", " print('Total score:', cum_score)\n", " pprint(sorted([(weight, strategy.__name__) for weight, strategy in zip(weights, strategies)]))\n", "\n", "if __name__ == '__main__':\n", "\n", " def human(p1hist, p2hist):\n", " return input(f'Choose one of {options!r}: ')\n", "\n", " def fixed_ratio(p1hist, p2hist):\n", " return choices(options, (1, 2, 3))[0]\n", "\n", " def cycling(series):\n", " iterator = cycle(series)\n", " def cycle_opponent(p1hist, p2hist):\n", " return next(iterator)\n", " return cycle_opponent\n", "\n", " play_and_learn(opposition=fixed_ratio)\n", " play_and_learn(opposition=random_reply)\n", " play_and_learn(opposition=cycling('RPRSS'))\n", " play_and_learn(opposition=human, trials=10, verbose=True)" ] }, { "cell_type": "markdown", "id": "fff8156e-8903-4576-af0f-78b65f212efe", "metadata": {}, "source": [ "Critiques and Improvements[¶](#critiques-and-improvements \"Permalink to this headline\")\n", "---------------------------------------------------------------------------------------\n", "\n", "* When humans are losing, they generally change they strategy, so we should “age” our weights to adapt to the change.\n", "* The _random\\_reply_ strategy wins one-third of the time so it keeps getting selected when other strategies are doing better.\n", "* We need to keep _random\\_reply_ in the mix in case our strategies get figured-out and gamed. Randomness is your best defense against a smarter adversary.\n", "* The multi-arm bandit approach (choosing the strategy proportionally to wins and losses) results in very slow learning.\n", "* Humans get bored with this game quickly, so it is hard get people to test it." ] }, { "cell_type": "markdown", "id": "ed7cb3de-4c5b-4f62-a99e-fdeb81c0a51b", "metadata": {}, "source": [ "Where you can go from here[¶](#where-you-can-go-from-here \"Permalink to this headline\")\n", "---------------------------------------------------------------------------------------\n", "\n", "* None of the code logic is hard-wired to the basic game.\n", "* It is easy to add new strategies.\n", "* Or to evaluate various published [how-to-win strategies](https://www.telegraph.co.uk/men/thinking-man/11051704/How-to-always-win-at-rock-paper-scissors.html).\n", "* Changing the game definition logic allows for more complex games such as rock-paper-scissors-lizard-spock:\n", "\n", "![_images/1200px-Rock_Paper_Scissors_Lizard_Spock_en.svg.png](_images/1200px-Rock_Paper_Scissors_Lizard_Spock_en.svg.png)" ] }, { "cell_type": "markdown", "id": "ec309a01-4af8-46d1-a08b-4d5966ca140f", "metadata": {}, "source": [ "| [<<< Previous](einstein.ipynb \"SAT Solvers\") | [Next >>>](dining.ipynb \"SMT and Model Checkers\") |\n", "| :-: | :-: |\n", "\n", "* * *\n", "\n", "© Copyright 2019, Raymond Hettinger" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.5" } }, "nbformat": 4, "nbformat_minor": 5 }