{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Basics of randomness and simulation\n", "> This chapter gives you the tools required to run a simulation. We'll start with a review of random variables and probability distributions. We will then learn how to run a simulation by first looking at a simulation workflow and then recreating it in the context of a game of dice. Finally, we will learn how to use simulations for making decisions. This is the Summary of lecture \"Statistical Simulation in Python\", via datacamp.\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Statistics, Modeling]\n", "- image: " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "plt.rcParams['figure.figsize'] = (10, 5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Introduction to random variables\n", "- Continous Random Variables\n", " - Infinitely many possible values (e.g., Height, Weights)\n", "- Discrete Random Variables\n", " - Finite set of possible values (e.g., Outcomes of a six-sided die)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Poisson random variable\n", "The `numpy.random` module also has a number of useful probability distributions for both discrete and continuous random variables. In this exercise, you will learn how to draw samples from a probability distribution.\n", "\n", "In particular, you will draw samples from a very important discrete probability distribution, the Poisson distribution, which is typically used for modeling the average rate at which events occur.\n", "\n", "Following the exercise, you should be able to apply these steps to any of the probability distributions found in `numpy.random`. In addition, you will also see how the sample mean changes as we draw more samples from a distribution." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "|Lambda - sample mean| with 3 samples is 0.33333333333333304 and with 100 samples is 0.11000000000000032. \n" ] } ], "source": [ "# Initialize seed and parameters\n", "np.random.seed(123)\n", "lam, size_1, size_2 = 5, 3, 100\n", "\n", "# Draw samples & calculate absolute difference between lambda and sample mean\n", "samples_1 = np.random.poisson(lam, size_1)\n", "samples_2 = np.random.poisson(lam, size_2)\n", "answer_1 = abs(lam - np.mean(samples_1))\n", "answer_2 = abs(lam - np.mean(samples_2))\n", "\n", "print(\"|Lambda - sample mean| with {} samples is {} and with {} samples is {}. \".format(size_1, \n", " answer_1, \n", " size_2, \n", " answer_2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Shuffling a deck of cards\n", "Often times we are interested in randomizing the order of a set of items. Consider a game of cards where you first shuffle the deck of cards or a game of scrabble where the letters are first mixed in a bag. As the final exercise of this section, you will learn another useful function - `np.random.shuffle()`. This function allows you to randomly shuffle a sequence in place. At the end of this exercise, you will know how to shuffle a deck of cards or any sequence of items." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "#hide\n", "deck_of_cards = [('Heart', 0),\n", " ('Heart', 1),\n", " ('Heart', 2),\n", " ('Heart', 3),\n", " ('Heart', 4),\n", " ('Heart', 5),\n", " ('Heart', 6),\n", " ('Heart', 7),\n", " ('Heart', 8),\n", " ('Heart', 9),\n", " ('Heart', 10),\n", " ('Heart', 11),\n", " ('Heart', 12),\n", " ('Club', 0),\n", " ('Club', 1),\n", " ('Club', 2),\n", " ('Club', 3),\n", " ('Club', 4),\n", " ('Club', 5),\n", " ('Club', 6),\n", " ('Club', 7),\n", " ('Club', 8),\n", " ('Club', 9),\n", " ('Club', 10),\n", " ('Club', 11),\n", " ('Club', 12),\n", " ('Spade', 0),\n", " ('Spade', 1),\n", " ('Spade', 2),\n", " ('Spade', 3),\n", " ('Spade', 4),\n", " ('Spade', 5),\n", " ('Spade', 6),\n", " ('Spade', 7),\n", " ('Spade', 8),\n", " ('Spade', 9),\n", " ('Spade', 10),\n", " ('Spade', 11),\n", " ('Spade', 12),\n", " ('Diamond', 0),\n", " ('Diamond', 1),\n", " ('Diamond', 2),\n", " ('Diamond', 3),\n", " ('Diamond', 4),\n", " ('Diamond', 5),\n", " ('Diamond', 6),\n", " ('Diamond', 7),\n", " ('Diamond', 8),\n", " ('Diamond', 9),\n", " ('Diamond', 10),\n", " ('Diamond', 11),\n", " ('Diamond', 12)]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[('Spade', 2), ('Heart', 9), ('Diamond', 3)]\n" ] } ], "source": [ "# Shuffle the deck\n", "np.random.shuffle(deck_of_cards)\n", "\n", "# Print out the top three cars\n", "card_choices_after_shuffle = deck_of_cards[:3]\n", "print(card_choices_after_shuffle)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simulation basics\n", "- Simulations\n", " - Framework for modeling real-world events\n", " - Characterized by repeated random sampling\n", " - Gives us an approximate solution\n", " - Can help solve complex problems\n", "- Simulation steps\n", " 1. Define possible outcomes for random variables\n", " 2. Assign probabilities\n", " 3. Define relationships between random variables\n", " 4. Get multiple outcomes by repeated random sampling\n", " 5. Analyze sample outcomes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Throwing a fair die\n", "Once you grasp the basics of designing a simulation, you can apply it to any system or process. Next, we will learn how each step is implemented using some basic examples.\n", "\n", "As we have learned, simulation involves repeated random sampling. The first step then is to get one random sample. Once we have that, all we do is repeat the process multiple times. This exercise will focus on understanding how we get one random sample. We will study this in the context of throwing a fair six-sided die.\n", "\n", "By the end of this exercise, you will be familiar with how to implement the first two steps of running a simulation - defining a random variable and assigning probabilities." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Outcome of the throw: 5\n" ] } ], "source": [ "np.random.seed(123)\n", "die, probabilities, throws = [1, 2, 3, 4, 5, 6], [1/6, 1/6, 1/6, 1/6, 1/6, 1/6], 1\n", "\n", "# Use np.random.choice to throw the die once and record the outcome\n", "outcome = np.random.choice(die, size=throws, p=probabilities)\n", "print(\"Outcome of the throw: {}\".format(outcome[0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Throwing two fair dice\n", "We now know how to implement the first two steps of a simulation. Now let's implement the next step - defining the relationship between random variables.\n", "\n", "Often times, our simulation will involve not just one, but multiple random variables. Consider a game where throw you two dice and win if each die shows the same number. Here we have two random variables - the two dice - and a relationship between each of them - we win if they show the same number, lose if they don't. In reality, the relationship between random variables can be much more complex, especially when simulating things like weather patterns.\n", "\n", "By the end of this exercise, you will be familiar with how to implement the third step of running a simulation - defining relationships between random variables." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The dice show 5 and 5. You win!\n" ] } ], "source": [ "np.random.seed(223)\n", "# Initialize number of dice, simulate & record outcome\n", "die, probabilities, num_dice = [1,2,3,4,5,6], [1/6, 1/6, 1/6, 1/6, 1/6, 1/6], 2\n", "outcomes = np.random.choice(die, size=num_dice, p=probabilities) \n", "\n", "# Win if the two dice show the same number\n", "if outcomes[0] == outcomes[1]: \n", " answer = 'win' \n", "else:\n", " answer = 'lose'\n", "\n", "print(\"The dice show {} and {}. You {}!\".format(outcomes[0], outcomes[1], answer))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulating the dice game\n", "We now know how to implement the first three steps of a simulation. Now let's consider the next step - repeated random sampling.\n", "\n", "Simulating an outcome once doesn't tell us much about how often we can expect to see that outcome. In the case of the dice game from the previous exercise, it's great that we won once. But suppose we want to see how many times we can expect to win if we played this game multiple times, we need to repeat the random sampling process many times. Repeating the process of random sampling is helpful to understand and visualize inherent uncertainty and deciding next steps.\n", "\n", "Following this exercise, you will be familiar with implementing the fourth step of running a simulation - sampling repeatedly and generating outcomes.\n", "\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "In 100 games, you win 25 times\n" ] } ], "source": [ "np.random.seed(223)\n", "# Initialize model parameters & simulate dice throw\n", "die, probabilities, num_dice = [1,2,3,4,5,6], [1/6, 1/6, 1/6, 1/6, 1/6, 1/6], 2\n", "sims, wins = 100, 0\n", "\n", "for i in range(sims):\n", " outcomes = np.random.choice(die, num_dice, p=probabilities) \n", " # Increment `wins` by 1 if the dice show same number\n", " if outcomes[0] == outcomes[1]: \n", " wins = wins + 1 \n", "\n", "print(\"In {} games, you win {} times\".format(sims, wins))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using simulation for decision-making\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulating one lottery drawing\n", "In the last three exercises of this chapter, we will be bringing together everything you've learned so far. We will run a complete simulation, take a decision based on our observed outcomes, and learn to modify inputs to the simulation model.\n", "\n", "We will use simulations to figure out whether or not we want to buy a lottery ticket. Suppose you have the opportunity to buy a lottery ticket which gives you a shot at a grand prize of \\\\$ 1 Million. Since there are 1000 tickets in total, your probability of winning is 1 in 1000. Each ticket costs \\\\$ 10. Let's use our understanding of basic simulations to first simulate one drawing of the lottery." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Outcome of one drawing of the lottery is [-10]\n" ] } ], "source": [ "np.random.seed(123)\n", "# Pre-defined constant variables\n", "lottery_ticket_cost, num_tickets, grand_prize = 10, 1000, 1000000\n", "\n", "# Probability of winning\n", "chance_of_winning = 1 / num_tickets\n", "\n", "# Simulate a single drawing of the lottery\n", "gains = [-lottery_ticket_cost, grand_prize-lottery_ticket_cost]\n", "probability = [1 - chance_of_winning, chance_of_winning]\n", "outcome = np.random.choice(a=gains, size=1, p=probability, replace=True)\n", "\n", "print(\"Outcome of one drawing of the lottery is {}\".format(outcome))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Should we buy?\n", "In the last exercise, we simulated the random drawing of the lottery ticket once. In this exercise, we complete the simulation process by repeating the process multiple times.\n", "\n", "Repeating the process gives us multiple outcomes. We can think of this as multiple universes where the same lottery drawing occurred. We can then determine the average winnings across all these universes. If the average winnings are greater than what we pay for the ticket then it makes sense to buy it, otherwise, we might not want to buy the ticket.\n", "\n", "This is typically how simulations are used for evaluating business investments. After completing this exercise, you will have the basic tools required to use simulations for decision-making." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Average payoff from 2000 simulations = 1990.0\n" ] } ], "source": [ "np.random.seed(123)\n", "# Initialize size and simulate outcome\n", "lottery_ticket_cost, num_tickets, grand_prize = 10, 1000, 1000000\n", "chance_of_winning = 1/num_tickets\n", "size = 2000\n", "payoffs = [-lottery_ticket_cost, grand_prize - lottery_ticket_cost]\n", "probs = [1 - chance_of_winning, chance_of_winning]\n", "\n", "outcomes = np.random.choice(a=payoffs, size=size, p=probs, replace=True)\n", "\n", "# Mean of outcomes.\n", "answer = np.mean(outcomes)\n", "print(\"Average payoff from {} simulations = {}\".format(size, answer))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculating a break-even lottery price\n", "Simulations allow us to ask more nuanced questions that might not necessarily have an easy analytical solution. Rather than solving a complex mathematical formula, we directly get multiple sample outcomes. We can run experiments by modifying inputs and studying how those changes impact the system. For example, once we have a moderately reasonable model of global weather patterns, we could evaluate the impact of increased greenhouse gas emissions.\n", "\n", "In the lottery example, we might want to know how expensive the ticket needs to be for it to not make sense to buy it. To understand this, we need to modify the ticket cost to see when the expected payoff is negative." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The highest price at which it makes sense to buy the ticket is 9\n" ] } ], "source": [ "np.random.seed(333)\n", "# Initialize simulations and cost of ticket\n", "sims, lottery_ticket_cost = 3000, 0\n", "\n", "# Use a while loop to increment `lottery_ticket_cost` till average value of outcomes falls below zero\n", "while 1:\n", " outcomes = np.random.choice([-lottery_ticket_cost, grand_prize-lottery_ticket_cost],\n", " size=sims, p=[1-chance_of_winning, chance_of_winning], replace=True)\n", " if outcomes.mean() < 0:\n", " break\n", " else:\n", " lottery_ticket_cost += 1\n", "answer = lottery_ticket_cost - 1\n", "\n", "print(\"The highest price at which it makes sense to buy the ticket is {}\".format(answer))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }