{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# An Introduction to Probabilistic Programming in FSharp\n", "\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "A combination of pursuing a Master's in Data Science combined with an awesome trip to [Open FSharp](https://www.openfsharp.org/) in San Francisco convinced me to contribute to this year's FSharp Advent Calendar; my contribution from last year can be found [here](https://medium.com/@mukund.sharma92/the-lord-of-the-rings-an-f-approach-an-introduction-of-the-blogposts-5d6ad7624fee). One of the workshops presented at Open FSharp entitled __\"Interactive Computing with F# Jupyter\"__ by Colin Gravill covered making use of IFSharp Notebooks, the F# analog of Python's Jupyter notebooks, for computational experiments; this workshop truly blew my mind by highlighting the amazing capability of utilizing the power of F# in a \"Notebook-esque\" environment. \n", "\n", "Probabilistic Programming has always been a topic that has caused me to break out in metaphorical hives. Therefore, I decided to pursue the selfish goal of trying to demystify this topic as much as possible by writing a blog post. I hope you enjoy it as much as I researching and developing this notebook!\n", "\n", "### What is Probabilistic Programming?\n", "\n", "__Probabilistic Programming__ is a software methodology for constructing probabilistic models to be used to make probabilistic inferences. \n", "\n", "Simple and succint domain driven modelling techniques in conjunction with the lucid mathematical expressibility of F# makes the language and the concomitant ecosystem a perfect environment to start developing probabilistic models. The two experiments I cover in this blog post that elucidate on these qualities and try to give a taste of Probabilistic Programming are the __Monty Hall Problem__ and __Approximate Bayesian Computation__." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Monty Hall Problem\n", "\n", " \n", "\n", "[The Monty Hall Problem](https://en.wikipedia.org/wiki/Monty_Hall_problem) involves a contestant at a game show choosing one of three doors in an effort to choose the door that hides a prize. Once the player chooses a door, the game show host opens a door that doesn't have a prize behind it and gives the contestant a choice to switch or stay with the chosen door. \n", "\n", "Intuitively, staying with the selected door would make sense since switching wouldn't make a difference to the end result. However, after the host opens a door without the prize behind it, there is more information available to the contestant to make a decision on and hence, this 50/50 chance induced by switching improves the overall favors of winning. \n", "\n", "This problem has stumped even the most astute mathematicians such as [Paul Erdős](https://en.wikipedia.org/wiki/Paul_Erd%C5%91s) and so, seemed like a perfect problem to solve to demonstrate the power of domain driven modelling and constructing probabilistic models in F#. In this section, I plan to develop the Monty Hall game domain and then take a probabilistic approach of proving that switching doors after the reveal is more favorable than staying with the selected door." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preliminaries\n", "\n", "Our first step is to implement some building blocks that will be used while developing the domain." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Probability\n", "\n", "Probability is defined as the __likelihood__ of a given event's occurrence that can be modelled as a double with valid values between 0 and 1 (both inclusive)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "type Probability = private Probability of double\n", "\n", "module Probability =\n", " let create ( probability : double ) : Probability option =\n", " if probability >= 0.0 && probability <= 1.0 then Some ( Probability probability )\n", " else None\n", "\n", " let getProbability ( Probability probability ) = probability" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Example" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// Creating\n", "let prob = Probability.create ( 1.0 / 2.0 )\n", "\n", "// Getting the value\n", "match prob with\n", " | Some p -> Probability.getProbability p\n", " | None -> nan" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Probability Distribution\n", "\n", "A Probability Distribution can be modelled as a sequence of pairs of event occurences and probabilities." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "type Distribution<'T> = seq< 'T * Probability >" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Uniform Distribution\n", "\n", "For the Monty Hall problem we will be making use of the __Uniform Distribution__ as from the perspective of the contestant, the prize could be behind any of the doors available to be opened. The Uniform Distribution assumes an equal and constant probability of all events." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "module UniformDistribution = \n", "\n", " let private uniformRandomNumberGenerator = System.Random()\n", "\n", " let create ( events : seq< int > ) : Distribution = \n", " seq {\n", " let countOfSequence = Seq.length events\n", " let distributionSequence =\n", " events\n", " |> Seq.map( fun e -> e, Probability.create ( 1.0 / double( countOfSequence )))\n", " |> Seq.filter( fun ( e, p ) -> p.IsSome )\n", " |> Seq.map( fun (e, p) -> e, p.Value )\n", " yield! distributionSequence\n", " }\n", "\n", " let drawOneFromDistribution ( uniformDistribution : Distribution ) : int * Probability = \n", " let countOfDistribution = \n", " Seq.length uniformDistribution\n", " let idx = uniformRandomNumberGenerator.Next( 0, countOfDistribution )\n", " uniformDistribution\n", " |> Seq.item idx\n", " \n", " let drawOneFromEvents ( events : seq< int > ) : int * Probability = \n", " let distribution = create events\n", " drawOneFromDistribution distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###### Example: Dice Rolls" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(1, Probability 0.1666666667); (2, Probability 0.1666666667);\n", " (3, Probability 0.1666666667); (4, Probability 0.1666666667);\n", " (5, Probability 0.1666666667); (6, Probability 0.1666666667)]\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "let diceRollDistribution = UniformDistribution.create [ 1..6 ]\n", "printfn \"%A\" ( Seq.toList diceRollDistribution )" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3, Probability 0.1666666667)\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "printfn \"%A\" ( UniformDistribution.drawOneFromDistribution diceRollDistribution )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Game Domain\n", "\n", "The game domain consists of 3 main elements:\n", "\n", "1. __Game Outcome__: The contestant can either __Win__ or __Lose__.\n", "2. __Doors__: The game consists of 3 doors. To represent the initial state of the doors, we add in an \"NA\" door representing our __\"null\"__ state.\n", "3. __Game State__: The Game State represents the state of the game at any instance of the game consisting of:\n", " * The Door Chosen by the Host as the _Winning_ door\n", " * The Door _Chosen_ by the Contestant\n", " * The Door Chosen by the Host to _Open_ after the contestant has chosen the door." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "type Outcome = Win | Lose\n", "type Door = A | B | C | NA\n", "type GameState = { winningDoor : Door; chosenDoor : Door; openedDoor : Door }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Game States\n", "\n", " \n", "\n", "The game states are the following:\n", "\n", "1. Start of the game\n", "2. The host hides the prize behind a random door\n", "3. The contestant chooses a door believed to hide the prize\n", "4. The host opens one of the doors that doesn't have the prize and isn't the door chosen by the contestant.\n", "5. The contestant chooses one of two strategies:\n", " - Switch doors\n", " - Stay with the currently chosen door\n", "6. Prize is revealed and the outcome of the game is made known." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "let doors = [ A; B; C ]\n", "let startGameState : GameState = \n", " { winningDoor = NA; chosenDoor = NA; openedDoor = NA }" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{winningDoor = B;\n", " chosenDoor = NA;\n", " openedDoor = NA;}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "let hidePrize ( state : GameState ) : GameState =\n", " let winningDoorIdx = fst ( UniformDistribution.drawOneFromEvents ( [ 1..3 ] )) - 1\n", " { state with winningDoor = doors.[ winningDoorIdx ] }\n", " \n", "hidePrize startGameState" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The start of the game and hidding of the prize is naturally composed into the __initializeGame__." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{winningDoor = C;\n", " chosenDoor = NA;\n", " openedDoor = NA;}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "let initializeGame : GameState =\n", " hidePrize startGameState\n", " \n", "initializeGame" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{winningDoor = C;\n", " chosenDoor = C;\n", " openedDoor = NA;}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "let chooseDoor ( state : GameState ) ( door : Door ) : GameState =\n", " { state with chosenDoor = door }\n", " \n", "let chooseRandomDoor ( state : GameState ) : GameState =\n", " let chosenDoorIdx = fst ( UniformDistribution.drawOneFromEvents ( [ 1..3 ] )) - 1\n", " { state with chosenDoor = doors.[ chosenDoorIdx ] }\n", "\n", "let chosenRandomDoor = chooseRandomDoor initializeGame\n", "chosenRandomDoor" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{winningDoor = C;\n", " chosenDoor = C;\n", " openedDoor = A;}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "let openDoor ( state : GameState ) : GameState =\n", " // Choose the Non-Winning Door that hasn't been chosen by the contestant.\n", " let doorToOpen = \n", " doors \n", " |> List.except [ state.winningDoor; state.chosenDoor ]\n", " |> List.item 0\n", " { state with openedDoor = doorToOpen }\n", " \n", "let postOpenDoor = openDoor ( chosenRandomDoor )\n", "postOpenDoor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we define a strategy once the door is opened by host. The two strategies are switching or staying with the chosen door." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "On Switch: Lose\n", "On Stay: Win\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "type Strategy = GameState -> Outcome\n", "\n", "let switch ( state : GameState ) : Outcome =\n", " let doorToSwitchTo = \n", " doors\n", " |> List.except [ state.chosenDoor; state.openedDoor ]\n", " |> List.item 0\n", " if doorToSwitchTo = state.winningDoor then Win\n", " else Lose\n", " \n", "let stay ( state : GameState ) : Outcome = \n", " if state.chosenDoor = state.winningDoor then Win\n", " else Lose\n", "\n", "printfn \"On Switch: %A\" ( switch postOpenDoor )\n", "printfn \"On Stay: %A\" ( stay postOpenDoor )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Running Simulations\n", "\n", "Now that we have the domain model and all the game states and strategies well defined, we continue by running 10,000 simulations of the contestant strategy of switching doors or staying with the current door." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Win" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "let simulationCount = 10000\n", "\n", "let simulateMontyHall ( strategy : Strategy ) : Outcome = \n", " let game = \n", " initializeGame \n", " |> chooseRandomDoor\n", " |> openDoor\n", " strategy( game )\n", "\n", "simulateMontyHall switch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting the Results\n", "\n", "Now that we have all the pieces in place to run the simulation, we proceed to pull in the __XPlot__ library to help us plot our results via Paket. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\r\n", "