{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", " \n", "## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course \n", "\n", "
Author: Denis Mironov (@dmironov)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#
Insights of Monty Hall paradox with Plotly
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plotly is a technical computing company headquartered in Montreal, Quebec, that develops online data analytics and visualization tools https://plot.ly. Plotly provides online graphing, analytics, and statistics tools as well as graphing libraries for several programming languages including Python. With the latter, we present here a Plotly tutorial as a visual analysis of well-known Monty Hall paradox https://en.wikipedia.org/wiki/Monty_Hall_problem which is related to the following game:\n", "\n", ">Suppose you are on a game show, and you are given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say №1, and the host, who knows what is behind the doors, opens another door, say №3, which has a goat. The host then says to you, \"Do you want to pick door №2?\" Is it to your advantage to switch your choice?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is analytically proven that if initial door choice is changed, the percentage to win is about 66.6%, hence the percentage to win if initial door choice remains the same is 33.3%. It is also obvious that if our final choice is solely based on the outcome of unbiased coin toss, then we win in 1/2 cases. Here we are going to demonstrate the results of these strategies graphically by using Plotly." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Jupiter Notebook is organized as follows:\n", "\n", "1. Simulation of Monty Hall Paradox\n", "2. Importing and discussing libraries used for illustrations\n", "3. Graphical analysis\n", " - Line, Box, Violin and Distribution plots\n", " - Contour and 3D Clustering plots\n", "4. Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Simulation of Monty Hall Paradox" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the problem, we need to generate the dataset first in accordance with the game rules outlined above. Here the dataset represents the results of win percentage by using one of three possible options upon opening of the other door by the host:\n", "\n", "1. Initial door choice is changed\n", "2. Initial door choice remains unchanged\n", "3. Random choice (initial door choice is either changed or not based on the outcome of an unbiased coin)\n", "\n", "To model the game with these three strategies, we present a slightly modified solution obtained by us while working on Workshop Case Study №037 of https://www.superdatascience.com. \n", "\n", "We create 6 functions:\n", "\n", "** 1. newGame() **\n", " - Creates a new game with all possible combinations of {car, goat, goat}. Final mapping of {car, goat, goat} to {door1, door2, door3} is done with the use of a random number\n", "\n", "** 2. guestChoice() **\n", " - Initial door choice made by a player\n", "\n", "** 3. openOneDoor(game, chosen_door) **\n", " - The host opens one door which is not the one choosen by a player and does not have a car behind. Such a modelling is based on the items positions behind the doors (game) and initial choice of a player (chosen_door) \n", "\n", "** 4. guestChange(game, chosen_door, change) **\n", " - Whether a player changes initial choice (chosen_door, change). A modelling based on the items positions behind the doors (game)\n", "\n", "** 5. checkResult(game, chosen_door, change) **\n", " - Whether a player wins or losses by using one of the three strategies mentioned above\n", "\n", "** 6. result(n=400, step=1) **\n", " - Modelling range(1, n+1, step) games in a row. Collecting the outcomes of each of three strategies applied to range(1, n+1, step) games. The latter means the we model 1 round, record the result, then model 2 consecutive rounds, record the result, and so on and so forth up to n consecutive rounds. Note that in these notations a game number represents the number of rounds, for instance, if a game number is 50, then there are 50 consecutive rounds in this game" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# random number generator\n", "from numpy import random\n", "\n", "\n", "# Generate a new game incorporating 3 doors: there is a car behind one of them,\n", "# there is a goat behind the other as well as a goat behind the last one.\n", "def newGame():\n", " position = random.randint(1, 4)\n", " if position == 1:\n", " game_dict = {\"door1\": \"car\", \"door2\": \"goat\", \"door3\": \"goat\"}\n", " return game_dict\n", " elif position == 2:\n", " game_dict = {\"door1\": \"goat\", \"door2\": \"car\", \"door3\": \"goat\"}\n", " return game_dict\n", " else:\n", " game_dict = {\"door1\": \"goat\", \"door2\": \"goat\", \"door3\": \"car\"}\n", " return game_dict\n", "\n", "\n", "# Function guestChoice is designed to randomly model door choice of a player\n", "def guestChoice():\n", " door_choice = random.randint(1, 4)\n", " if door_choice == 1:\n", " return \"door1\"\n", " elif door_choice == 2:\n", " return \"door2\"\n", " else:\n", " return \"door3\"\n", "\n", "\n", "# Function openOneDoor simulates door opening in the round.\n", "def openOneDoor(game, chosen_door):\n", " all_doors = [\"door1\", \"door2\", \"door3\"]\n", " element_open = []\n", " options = game.copy()\n", "\n", " if game[chosen_door] != \"car\":\n", " for element in all_doors:\n", " if element != chosen_door and game[element] != \"car\":\n", " element_open.append(element)\n", " options[element_open[0]] = \"open\"\n", " return options\n", "\n", " elif game[chosen_door] == \"car\":\n", " for element in all_doors:\n", " if element != chosen_door:\n", " element_open.append(element)\n", " number = random.randint(1, 3)\n", " if number == 1:\n", " options[element_open[0]] = \"open\"\n", " return options\n", " else:\n", " options[element_open[1]] = \"open\"\n", " return options\n", "\n", "\n", "# Function guesChange changes the game conditions depending on either\n", "# a player changes his/her initial decision or not\n", "def guestChange(game, chosen_door, change):\n", " game_with_open = openOneDoor(game, chosen_door)\n", "\n", " if change == \"Y\":\n", " for element in game_with_open:\n", " if game_with_open[element] != \"open\" and element != chosen_door:\n", " return element\n", "\n", " else:\n", " return chosen_door\n", "\n", "\n", "# Function checkResult checks result of one round of a game.\n", "def checkResult(game, chosen_door, change):\n", " final_choice = guestChange(game, chosen_door, change)\n", " result = game[final_choice]\n", "\n", " if result == \"car\":\n", " return \"WIN\"\n", " else:\n", " return \"LOSE\"\n", "\n", "\n", "# Function Result calculates win% if our strategy either\n", "# to change our initial door choice or not to change. We also simulate the\n", "# random process: decision whether to change the door or not solely based on\n", "# the outcome of unbiased coin.\n", "def result(n=400, step=1):\n", " list_N = []\n", " win_change = []\n", " win_nochange = []\n", " win_rand = []\n", "\n", " for N in range(1, n + 1, step):\n", " count_y = 0\n", " count_n = 0\n", " count_rand = 0\n", "\n", " for i in range(N):\n", " game = newGame()\n", " chosen_door = guestChoice()\n", " if checkResult(game, chosen_door, change=\"Y\") == \"WIN\":\n", " count_y += 1\n", " if checkResult(game, chosen_door, change=\"N\") == \"WIN\":\n", " count_n += 1\n", " if (\n", " checkResult(\n", " game, chosen_door, change=\"Y\" if random.randint(2) == 0 else \"N\"\n", " )\n", " == \"WIN\"\n", " ):\n", " count_rand += 1\n", "\n", " list_N.append(N)\n", " win_change.append(count_y / N * 100)\n", " win_nochange.append(count_n / N * 100)\n", " win_rand.append(count_rand / N * 100)\n", "\n", " return list_N, win_change, win_nochange, win_rand" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list_N, win_change, win_nochange, win_rand = result(\n", " n=2000, step=1\n", ") # modelling of 2000 games. i-th game has i rounds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- ** list_N: ** is a list with № of game\n", "- ** win_change: ** is a list with win% if initial door choice has been changed. \n", "- ** win_nochange: ** win% if initial door choice has been unchanged. \n", "- ** win_rand: ** win% if a decision whether to change initial door choice has been made upon unbiased coin outcome. \n", "\n", "Each position in ** win_change ** , ** win_nochange **, ** win_rand ** corresponds to that in ** list_N **. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Total number of games: %s \" % list_N[-1])\n", "print(\"Max number of rounds with changing initial door choice: %s\" % len(win_change))\n", "print(\n", " \"Max number of rounds without changing initial door choice: %s \" % len(win_nochange)\n", ")\n", "print(\n", " \"Max number of rounds with random choice after one door is open: %s\" % len(win_rand)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recall that in our modelling i-th game has i rounds. The latter means that, for instance, in game №5 we play the game 5 consecutive times, and calculate win% based on the outcomes in these 5 rounds." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Importing and discussing libraries used for illustrations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The only library we have been needed so far is a random number generator. Now, when we have simulated the game and collected the results, we are ready to start performing graphical analysis, but before that, evidently, we need to import some libraries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial, we are interested in learning Plotly, so let's check its version first." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import plotly\n", "\n", "print(\n", " \"Below figures are generated with {} {}\".format(\n", " plotly.__version__, \"plotly version\"\n", " )\n", ")\n", "# author's output: Below figures are generated with 3.3.0 plotly version" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we will see below, there is a JSON object under every plotly visualization. It is possible to operate with such an object similar to dictionary data structure by changing the values of keywords within the object. So, we load library to generate plotting objects in future:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# generates a plotting object\n", "import plotly.graph_objs as go" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are interested in getting illustrations in this Jupiter Notebook. For this purpose, we need to import `download_plotlyjs`, `init_notebook_mode` and `iplot`. The first one is used for plotly javascript initiallization, the second one is governed by argument `connected = True` or `connected = False` determining whether illustrations should be plotted inside Jupiter Notebook, `True`, or outside, `False` (new browser tab will be initiallized containing your figure), and the last one is used for illustrating a figure." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from plotly.offline import download_plotlyjs, init_notebook_mode, iplot\n", "\n", "init_notebook_mode(connected=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are several chart types that are not included in plotly.js, for instance, a distribution plot. So, we have to import them separately." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import plotly.figure_factory as ff" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also import some packages that are well-known for you." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "\n", "import pandas as pd\n", "\n", "warnings.simplefilter(\"ignore\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Graphical analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.1 Line, Box, Violin and Distribution plots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we present ** plotResult** function which will be responsible for plotting scatter, box, violin and distribution illustrations. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plotResult(\n", " change,\n", " nochange,\n", " rand,\n", " game=None,\n", " output=\"fig\",\n", " plot=\"line\",\n", " title=\"\",\n", " jitter=0,\n", " pointpos=0,\n", " width=700,\n", " height=300,\n", " bin_size=0.2,\n", " show_rug=False,\n", "):\n", "\n", " \"\"\"\n", " Creates line, box, violin or distribution plot by using Plotly library\n", " \n", " games: list\n", " - number of games\n", " \n", " change: list\n", " - percentage of wins if initial door choice is changed\n", " \n", " nochange: list\n", " - percentage of wins if initial door choice is unchanged\n", " \n", " rand: list\n", " - percentage of wins if upon opening, in accordance with game rules, \n", " a door by a host final choice is made pure randomly\n", " \n", " output: string\n", " - 'fig' means to give an output in form of illustration\n", " - 'json' means to give an output in form of json format\n", " \n", " plot: string\n", " - 'line' means to plot a line\n", " - 'dist' means to plot a distribution\n", " - 'violin' means to plot a violin\n", " - 'box' means to plot a box\n", " \n", " title: string\n", " - title of a graph\n", " \n", " jitter: number\n", " - a number between -1 and 1 (scatter of points while generating box plot)\n", " \n", " pointpot: number\n", " - a number between -2 and 2 (position of jitter in relation to box plot)\n", " \n", " width: number\n", " - width of a figure (not available when plot = 'dist' or plot = 'violin')\n", " \n", " height: number\n", " - height of a figure (not available when plot = 'dist' or plot = 'violin')\n", " \n", " bin_size: number\n", " - size of bins while creating a distribution plot\n", " \n", " show_rug: bool\n", " - whether a distribution graph should show a rug or not\n", " \n", " \"\"\"\n", "\n", " if type(game) == list:\n", " # Slicing list up to game[-1]\n", " change = change[: games[-1]]\n", " nochange = nochange[: games[-1]]\n", " rand = rand[: games[-1]]\n", " else:\n", " change = change\n", " nochange = nochange\n", " rand = rand\n", "\n", " # Two main conditions: whether plot != 'violin' and plot != 'dist'\n", " # this is due to different ways of creating plots\n", " if (plot != \"violin\") & (plot != \"dist\"):\n", "\n", " # In this block we choose whether to plot line or box illustrations\n", " # Here is line option\n", " if plot == \"line\":\n", " trace_change = go.Scatter(\n", " x=games, y=change, name=\"Change\", line=dict(color=\"#33CFA5\")\n", " )\n", " trace_no_change = go.Scatter(\n", " x=games, y=nochange, name=\"No Change\", line=dict(color=\"#F06A6A\")\n", " )\n", " trace_guessing = go.Scatter(\n", " x=games, y=rand, name=\"Random\", line=dict(color=\"gray\")\n", " )\n", " xx = dict(title=\"№ Game\", ticklen=5, zeroline=False, gridwidth=2)\n", "\n", " # Here is box option\n", " elif plot == \"box\":\n", " trace_change = go.Box(\n", " y=change,\n", " jitter=jitter,\n", " boxpoints=\"all\",\n", " pointpos=pointpos,\n", " name=\"Change\",\n", " line=dict(color=\"#33CFA5\"),\n", " )\n", " trace_no_change = go.Box(\n", " y=nochange,\n", " jitter=jitter,\n", " boxpoints=\"all\",\n", " pointpos=pointpos,\n", " name=\"No Change\",\n", " line=dict(color=\"#F06A6A\"),\n", " )\n", " trace_guessing = go.Box(\n", " y=rand,\n", " jitter=jitter,\n", " boxpoints=\"all\",\n", " pointpos=pointpos,\n", " name=\"Random\",\n", " line=dict(color=\"gray\"),\n", " )\n", " xx = None\n", "\n", " # Collecting data in a list form.\n", " # trace_change - data related to percentage of wins when initial door choice is changed\n", " # trace_no_change - data ..... initial door choice is unchanged\n", " # trace_guessing - data .... final door choice is based on unbiased coin outcome\n", " data = [trace_change, trace_guessing, trace_no_change]\n", "\n", " # So-called ornament of our figure\n", " layout = go.Layout(\n", " title=title,\n", " autosize=False,\n", " width=width,\n", " height=height,\n", " xaxis=xx,\n", " yaxis=dict(title=\"% Win\", ticklen=5, gridwidth=2),\n", " )\n", "\n", " # Saving figure as fig variable which will be return by the function as iplot(fig) if output = 'fig'\n", " fig = go.Figure(data=data, layout=layout)\n", "\n", " # Block of code related to plotting violin type of chart\n", " elif plot == \"violin\":\n", " win_dist = pd.DataFrame(\n", " data={\"Change\": win_change, \"Random\": win_rand, \"No Change\": win_nochange}\n", " )\n", " color = [\"#33CFA5\", \"gray\", \"#F06A6A\"]\n", " data = []\n", " for i in range(3):\n", " trace = {\n", " \"type\": \"violin\",\n", " \"y\": win_dist.iloc[:, i],\n", " \"name\": win_dist.columns[i],\n", " \"box\": {\"visible\": True},\n", " \"points\": \"all\",\n", " \"jitter\": jitter,\n", " \"pointpos\": pointpos,\n", " \"meanline\": {\"visible\": True},\n", " \"line\": {\"color\": color[i]},\n", " }\n", " data.append(trace)\n", "\n", " fig = {\n", " \"data\": data,\n", " \"layout\": {\"title\": \"\", \"yaxis\": {\"title\": \"% Win\", \"zeroline\": False}},\n", " }\n", "\n", " # Block of code related to plotting a distribution\n", " elif plot == \"dist\":\n", " data = [change, nochange, rand]\n", " group_labels = [\"Change\", \"No Change\", \"Random\"]\n", " colors = [\n", " \"#33CFA5\",\n", " \"#F06A6A\",\n", " \"gray\",\n", " ]\n", " xx = None\n", "\n", " fig = ff.create_distplot(\n", " data, group_labels, colors=colors, bin_size=bin_size, show_rug=show_rug\n", " )\n", "\n", " # if output == 'fig' return graph by using iplot from plotly library\n", " if output == \"fig\":\n", " return iplot(fig)\n", "\n", " # if output == 'json' return information about plotly object\n", " elif output == \"json\":\n", " return fig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**go.Scatter **" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's discuss a part of ** plotResult** which is implemented under `if plot == 'line` condition: \n", "\n", " if plot == 'line':\n", " trace_change = go.Scatter(x=games, \n", " y=change, \n", " name='change',\n", " line=dict(color='#33CFA5')) \n", " trace_no_change = go.Scatter(x=games, \n", " y=nochange, \n", " name='no change',\n", " line=dict(color='#F06A6A'))\n", " trace_guessing = go.Scatter(x=games, \n", " y=rand,\n", " name='random', \n", " line=dict(color='gray'))\n", " xx = dict(title= '№ Game', \n", " ticklen= 5, \n", " zeroline= False,\n", " gridwidth= 2)\n", "\n", " data = [trace_change, trace_guessing, trace_no_change]\n", "\n", " layout = go.Layout(title=title, \n", " autosize=False,\n", " width=width,\n", " height=height,\n", " xaxis = xx,\n", " yaxis = dict(title= '% Win',\n", " ticklen= 5,\n", " gridwidth= 2))\n", "\n", " fig = go.Figure(data=data, layout=layout)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The comments within the function have been purposely deleted." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this code snippet, we create three objects which are quite similar and started as `go.Scatter`. However, they have different arguments: \n", "\n", "- **y**: responsible for values taken on vertical axis\n", "- **name**: saves the name which should be shown in the output figure\n", "- **line**: a color of the curve\n", "\n", "All `go.Scatter` objects are saved in appropriate variables which names usually start with phrase `trace`. Later, such variables are stored in a list which is usually named as data since this is a collection of all our data. If we are not interested in putting any ornament of our figure, for instance, a title, width and/or height specifications, then we are ready to produce a figure by injecting data list to `go.Figure` plotly object.\n", "\n", "As one can see above, we have decided to design our figure in a more nicely way. For doing this we use `go.Layout` object incorporating arguments:\n", "\n", "- **title**: a title of our illustration\n", "- **width **: width of our figure*\n", "- **height **: height of our figure*\n", "- **xaxis **: responsible for notations on horizontal axis (uses `xx` variable defined within `if` condition)\n", "- **yaxis **: responsible for notations on vertical axis\n", "\n", "($*$) **width** and **height** control figure size with `autosize=False`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** go.Box **" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A `Box` is a graph object which takes arguments similar to those taken by `Scatter`, but with several new:\n", "\n", "- ** jitter **: to make points more visible (from 0 to 1)\n", "- ** boxpoints **: show data points\n", "- ** pointpos **: position of points in relation to violin graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Violin**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " elif plot == 'violin':\n", " win_dist = pd.DataFrame(data={\n", " 'Change': win_change,\n", " 'Rand': win_rand,\n", " 'No change': win_nochange\n", " })\n", " color = ['#33CFA5', 'gray', '#F06A6A']\n", " data = []\n", " for i in range(3):\n", " trace = {\n", " \"type\": 'violin',\n", " \"y\": win_dist.iloc[:,i],\n", " \"name\": win_dist.columns[i],\n", " \"box\": {\"visible\": True},\n", " \"points\": 'all',\n", " \"jitter\": jitter,\n", " \"pointpos\" : pointpos,\n", " \"meanline\": {\"visible\": True},\n", " \"line\": {\"color\": color[i]}\n", " }\n", " data.append(trace)\n", "\n", " fig = {\n", " \"data\": data,\n", " \"layout\" : {\n", " \"title\": \"\",\n", " \"yaxis\": {\"zeroline\": False}\n", " }\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here you see a code block responsible for violin plotting. The procedure is similar to that we have already covered, though slighly different:\n", "\n", "1. Creating a dataframe with columns representing our data (with initial door change, without initial changing, and a random choice upon opening one of doors)\n", "2. Creating a list of colors which will be later mapped to the `line` keyword within `trace`.\n", "3. We initialize a loop over each of columns of our dataframe, and save trace as an element of a data list. Variable `trace` contains the following parameters of future figure:\n", "\n", " - **type **: a type of our figure ('violin' in our case)\n", " - **y**: data on vertical axis\n", " - **name**: a name shown in the figure\n", " - **box**: make a box\n", " - **points**: show data points\n", " - **jitter**: make points more visible (from 0 to 1)\n", " - **pointpos** : position of points in relation to violin graph\n", " - **meanline**: line showing mean value\n", " - **line**: here it is responsible for violin color \n", "4. Previously, we used `go.Figure`, which is a json object, and assigned it to `fig` variable, while this time we create such an object by hands." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Distribution **" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " elif plot == 'dist':\n", " data = [change, nochange, rand]\n", " group_labels = ['change', 'no change', 'random']\n", " colors = ['#F06A6A', '#33CFA5', 'gray']\n", " xx=None\n", "\n", " fig = ff.create_distplot(data, \n", " group_labels, \n", " colors=colors,\n", " bin_size=bin_size, \n", " show_rug=show_rug)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This code block is responsible for creating of a distribution plot. The procedure shown here is differ from that related to scatter, box or violin plots.\n", "\n", "1. We create a list of elements which are lists. In our case, the lists contain an information about our percentage of wins obtained under different conditions\n", "2. Next we define `group_labels` that will be shown in the figure. The order of labels corresponding to that in `data` list\n", "3. After, we set up a color to each list of our data in the same order as the that defined in `data`\n", "4. Parameter `bin_size` defines the size of histogram bins, while `show_rug` adds rug to a distribution plot\n", "5. Having created `data`, `group_labels`, `colors`, `bin_size` and `show_rug`, we use them as input parameters to `ff.create_distplot` imported from `plotly.figure_factory`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now see our **plotResult** function in action" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "games = list_N.copy()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plotResult(win_change, win_nochange, win_rand, games, plot=\"line\", output=\"fig\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plotResult(win_change, win_nochange, win_rand, games, plot=\"box\", output=\"fig\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plotResult(\n", " win_change,\n", " win_nochange,\n", " win_rand,\n", " games,\n", " plot=\"box\",\n", " output=\"fig\",\n", " pointpos=-1.5,\n", " jitter=0.3,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plotResult(\n", " win_change,\n", " win_nochange,\n", " win_rand,\n", " games,\n", " plot=\"violin\",\n", " output=\"fig\",\n", " pointpos=-1.5,\n", " jitter=0.3,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# our implementation does not allow to control the size of distribution plot\n", "plotResult(win_change, win_nochange, win_rand, games, plot=\"dist\", output=\"fig\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that if you put a mouse cursor over a figure generated by Plotly, values of data are displayed so we can retrieve accurate result of, say, outliers instanteneously without doing additional manipulations with data. Legend located in the top right position of a Plotly figure is also clickable. You may choose which data you would like to see. By clicking once on the label, the data it is connected with will vanish and will be displayed again only when you click that label one more time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the figures above one can see that the most profitable strategy in our game is to change the initial door choice. Such a strategy gives 66.6% to win when we play sufficient number of consecutive rounds while, if we do not change our initial choice, winning percentage is tending to 33.3. We have also demonstrated that if our final choice between two doors is made based on the outcome of an unbiased coin, then there is 50% chance to win. The latter result is trivial but it is nice to see it in our figures as well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's demonstrate an example of JSON object under a Plotly visualization to get a flavor of how it works." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plotResult(win_change, win_nochange, win_rand, games, plot=\"line\", output=\"json\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As one can see, there is a JSON object under the visualization. It is possible to operate with such an object similar to dictionary data structure by changing the values of keywords within the object. For instance, note that layout has a height equal to 300. Let's change it to 700. This can be done as follows." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "json_demonstration = plotResult(\n", " win_change, win_nochange, win_rand, games, plot=\"line\", output=\"json\"\n", ")\n", "json_demonstration[\"layout\"][\"height\"] = 700\n", "iplot(json_demonstration)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Such a way to modify Plotly plot is convenient and time-consuming, especially, when plot incorporates much more parameters than presented here in our tutorial. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we consider Contour plot which allows us to see the distribution of our variables in form of histogram and how they depend on each other. After, 3D Clustering figure is illustrated showing a cluster of all three strategies results. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.2 Contour and 3D Clustering plots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Contour plot**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We start by building **contour_plot** function for our purposes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def contour_plot(\n", " x,\n", " y,\n", " game=None,\n", " colorscale=\"Jet\",\n", " name_x=\"X\",\n", " name_y=\"Y\",\n", " height=600,\n", " width=600,\n", " bargap=0,\n", "):\n", "\n", " \"\"\"\n", " \n", " Creates a contour plot by using Plotly library\n", " \n", " x: list\n", " - related to a strategy when a random choice is made when two choices left\n", " \n", " y: list\n", " - related to a strategy when initial door choice remains unchanged\n", " \n", " game: list\n", " - contains number of rounds per game. The same number corresponds to № of game. \n", " For instance, number 5 in the list means that this is game number 5 with length of 5 consecutive rounds.\n", " If not used, then x and y control the game rounds of interest\n", " \n", " colorscale: str\n", " - colorbar on the right of a figure\n", " \n", " name_x: str\n", " - name of a strategy on horizontal axis. It is 'X' by default\n", " \n", " name_y: str\n", " - name of a strategy on vertical axis. It is 'Y' by default\n", " \n", " height: number\n", " - controls the height of figure\n", " \n", " width: number\n", " - controls the width of figure\n", " \n", " bargap: int or float\n", " - gaps between bars. Parameter takes values from 0 to 1\n", " \n", " \"\"\"\n", "\n", " if type(game) == list:\n", " x = x[: game[-1]]\n", " y = y[: game[-1]]\n", " else:\n", " x = x\n", " y = y\n", "\n", " data = [\n", " go.Histogram2dContour(\n", " x=x, y=y, colorscale=colorscale, reversescale=True, xaxis=\"x\", yaxis=\"y\"\n", " ),\n", " go.Scatter(\n", " x=x,\n", " y=y,\n", " name=\"(\" + name_x + \",\" + name_y + \")\",\n", " xaxis=\"x\",\n", " yaxis=\"y\",\n", " mode=\"markers\",\n", " marker=dict(color=\"rgba(0,0,0,0.3)\", size=3),\n", " ),\n", " go.Histogram(y=y, name=name_y, xaxis=\"x2\", marker=dict(color=\"rgba(0,0,0,1)\")),\n", " go.Histogram(x=x, name=name_x, yaxis=\"y2\", marker=dict(color=\"rgba(0,0,0,1)\")),\n", " ]\n", "\n", " layout = go.Layout(\n", " autosize=False,\n", " xaxis=dict(zeroline=False, domain=[0, 0.85], showgrid=False),\n", " yaxis=dict(zeroline=False, domain=[0, 0.85], showgrid=False),\n", " xaxis2=dict(zeroline=False, domain=[0.85, 1], showgrid=False),\n", " yaxis2=dict(zeroline=False, domain=[0.85, 1], showgrid=False),\n", " height=height,\n", " width=width,\n", " bargap=bargap,\n", " hovermode=\"closest\",\n", " showlegend=False,\n", " )\n", "\n", " fig = go.Figure(data=data, layout=layout)\n", " return iplot(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we did previously while creating ** plotResult**, within **contour_plot ** we combine `go.Histogram2dContour`, `go.Scatter` and two `go.Histogram` to the list and assign the latter to `data` variable. Later this `data` list is injected to `go.Figure`. The latter is used as an argument for `iplot` to show the figure. \n", "\n", "We also use `go.Layout` here to specify parameters of our plot. In particular, we need to specify domains of our graph objects such as `go.Histogram2dContour`, `go.Scatter` and `go.Histogram` in order to obtain a nice figure without overlapping of different parts. Besides, in `go.Layout` we use `zeroline=False` not to show zero level on vertical and horizontal axises. This information would be redundant but you may set it as `True` to see how it works. The same with `showgrid` which is responsible for showing a grid. \n", "\n", "In the very end of `go.Layout`, we specify several more parameters such as **height**, **width**, **bargap**, **hovermode** and **showlegend**:\n", "\n", "- **height**: controls height of our figure (input parameter of **contour_plot**)\n", "- **width** controls width of our figure (input parameter)\n", "- **bargap**: responsible for gaps in distribution bars, if `bargap=0`, then it plots histogram, increasing **bargap** makes it look like a barplot. Note **bargap** can take values in-between 0 and 1 (input parameter)\n", "- **hovermode** determines the mode of hover interactions\n", "- **showlegend**: determines whether to show a legend \n", "\n", "Now, when we understand better how **contour_plot** works, let's use it to demonstrate several illustrations related to our Monty Hall problem." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a copy of a list\n", "games = list_N.copy()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "contour_plot(win_change, win_nochange, games[:200], name_x=\"Change\", name_y=\"No change\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "contour_plot(win_change, win_rand, games[:200], name_x=\"Change\", name_y=\"Random\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "contour_plot(win_nochange, win_rand, games[:200], name_x=\"No change\", name_y=\"Random\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do not forget that Plotly plots interactive figures. You may zoom in or out, use autoscale, use box select option to focus on the region of your interest as well as several other options which are available for figure investigation. It is possible to retrive coordinates of points by using Plotly interactive illustrations which is very convenient to collect an information about a particular point and get to know how it fits into the whole picture." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Such contour plots let us see where datapoints of two strategies, used as ** contour_plot** arguments, are concentrated. A datapoint of one strategy represents a coordinate of one axis, say, vertical one, while a datapoint of the other strategy a coordinate of the horizontal axis. Both of datapoints correspond to the same № game in order to be correctly mapped on the plane. We may see a correlation of strategies as well as the center of their clustering marked as dark blue when `colorscale = 'Jet'` is used." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The clustering may also be depicted in one 3D figure. This is what we do by the next and last function of our tutorial." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** 3D point clustering **" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To conclude our visual investigation, let's plot 3D point clustering figure. This is done with **clustering ** function preseted below which uses similar methods described above and, therefore, we are not going to cover it in details. This function shows how win percentage make a 3D cluster with the center at about\n", "\n", "\n", "
$(random, nochange, change) = (50, 33, 66) $
\n", "\n", " \n", "while modelling quite a large number of games, say, 2000. Recall one more time that a game number also represents the number of rounds within the game. So, if game number is 50, this means that there are 50 consecutive rounds in the game, then we calculate % of wins in this 50 rounds and assign the value to game №50." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def clustering(game, x, y, z):\n", "\n", " \"\"\"\n", " \n", " Creates a 3D clustering figure by using Plotly library\n", " \n", " game: list\n", " - contains number of rounds per game. The same number corresponds to № of game. \n", " For instance, number 5 in the list means that this is game number 5 with length of 5 consecutive rounds\n", " x: list\n", " - related to a strategy when a random choice is made when two choices left\n", " y: list\n", " - related to a strategy when initial door choice remains unchanged\n", " z: list \n", " - related to a strategy when initial door choice is changed\n", " \n", " \n", " \"\"\"\n", "\n", " x = x[: game[-1]]\n", " y = y[: game[-1]]\n", " z = z[: game[-1]]\n", "\n", " df = pd.DataFrame(data={\"Random\": x, \"No change\": y, \"Change\": z})\n", "\n", " scatter = dict(\n", " mode=\"markers\",\n", " name=\"A\",\n", " type=\"scatter3d\",\n", " x=df[\"Random\"],\n", " y=df[\"No change\"],\n", " z=df[\"Change\"],\n", " marker=dict(size=2, color=\"rgb(23, 190, 207)\"),\n", " )\n", "\n", " clusters = dict(\n", " alphahull=7,\n", " opacity=0.1,\n", " type=\"mesh3d\",\n", " x=df[\"Random\"],\n", " y=df[\"No change\"],\n", " z=df[\"Change\"],\n", " )\n", "\n", " layout = dict(\n", " title=\"3D clustering\",\n", " scene=dict(\n", " xaxis=dict(title=\"Random (x)\"),\n", " yaxis=dict(title=\"No change (y)\"),\n", " zaxis=dict(title=\"Change (z)\"),\n", " ),\n", " )\n", "\n", " fig = dict(data=[scatter, clusters], layout=layout)\n", " return iplot(fig)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "game = list_N.copy()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "clustering(game, win_rand, win_nochange, win_change)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This 3D clustering figure shows us the relation of all three strategies used to win in the game." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial we have learned about Plotly library by visualizing probability concepts of Monty Hall paradox. The charts we have covered are\n", "\n", "- Line\n", "- Box\n", "- Violin\n", "- Distribution\n", "- Contour\n", "- 3D Clustering\n", "\n", "There are many more illustrations possible to make with Plotly which is a very powerful library. You can find more examples here: https://plot.ly/python/. \n", "\n", "I hope that this tutorial was both interesting and useful for you. Thank you for your time and consideration." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }