{ "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5-final" }, "orig_nbformat": 2, "kernelspec": { "name": "Python 3.8.5 64-bit ('bigquery': conda)", "display_name": "Python 3.8.5 64-bit ('bigquery': conda)", "metadata": { "interpreter": { "hash": "8e6f8fd53d913fe50345f9e659ed342277121f637d2311273da0eef260503de3" } } } }, "nbformat": 4, "nbformat_minor": 2, "cells": [ { "source": [ "# Computational checking of the girl/boy probability problem" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "Here I replay the classical [two child problem](https://en.wikipedia.org/wiki/Boy_or_Girl_paradox) and code it up in Python.\n", "\n", "In a nutshell, the solution changes depending on if and how we differentiate between the children (for example, whether we talk about 'older' or 'younger' child or just refering to them as 'either').\n", "\n", "I also chart the solutions while I run the simulation to see how the probabilites tend to converge to their values." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import enum, random" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "class Kid(enum.Enum):\n", " Boy = 0\n", " Girl = 1" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def random_kid() -> Kid:\n", " return random.choice([Kid.Boy, Kid.Girl])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "random.seed(42)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [] }, "outputs": [], "source": [ "both_girls = 0\n", "older_girl = 0\n", "either_girl = 0\n", "\n", "results = []\n", "\n", "for _ in range(1000):\n", " younger = random_kid()\n", " older = random_kid()\n", "\n", " if older == Kid.Girl:\n", " older_girl += 1\n", " \n", " if older == Kid.Girl and younger == Kid.Girl:\n", " both_girls += 1\n", " \n", " if older == Kid.Girl or younger == Kid.Girl:\n", " either_girl += 1\n", "\n", " try:\n", " p_both_older = both_girls / older_girl\n", " except ZeroDivisionError:\n", " p_both_older = 0\n", " \n", " try: \n", " p_both_either = both_girls / either_girl\n", " except ZeroDivisionError:\n", " p_both_either = 0\n", "\n", " results.append([younger.name, older.name, both_girls, older_girl, either_girl, p_both_either, p_both_older])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [] }, "outputs": [], "source": [ "import altair as alt\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "df_results = pd.DataFrame(results, columns=['younger', 'older', 'both girls', 'older girl', 'either girl', 'P(Both|Either)', 'P(Both|Older)']).reset_index()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": " index younger older both girls older girl either girl P(Both|Either) \\\n0 0 Boy Boy 0 0 0 0.000000 \n1 1 Girl Boy 0 0 1 0.000000 \n2 2 Boy Boy 0 0 1 0.000000 \n3 3 Boy Boy 0 0 1 0.000000 \n4 4 Girl Boy 0 0 2 0.000000 \n.. ... ... ... ... ... ... ... \n995 995 Girl Boy 263 513 763 0.344692 \n996 996 Girl Girl 264 514 764 0.345550 \n997 997 Girl Girl 265 515 765 0.346405 \n998 998 Girl Boy 265 515 766 0.345953 \n999 999 Girl Girl 266 516 767 0.346806 \n\n P(Both|Older) \n0 0.000000 \n1 0.000000 \n2 0.000000 \n3 0.000000 \n4 0.000000 \n.. ... \n995 0.512671 \n996 0.513619 \n997 0.514563 \n998 0.514563 \n999 0.515504 \n\n[1000 rows x 8 columns]", "text/html": "
\n | index | \nyounger | \nolder | \nboth girls | \nolder girl | \neither girl | \nP(Both|Either) | \nP(Both|Older) | \n
---|---|---|---|---|---|---|---|---|
0 | \n0 | \nBoy | \nBoy | \n0 | \n0 | \n0 | \n0.000000 | \n0.000000 | \n
1 | \n1 | \nGirl | \nBoy | \n0 | \n0 | \n1 | \n0.000000 | \n0.000000 | \n
2 | \n2 | \nBoy | \nBoy | \n0 | \n0 | \n1 | \n0.000000 | \n0.000000 | \n
3 | \n3 | \nBoy | \nBoy | \n0 | \n0 | \n1 | \n0.000000 | \n0.000000 | \n
4 | \n4 | \nGirl | \nBoy | \n0 | \n0 | \n2 | \n0.000000 | \n0.000000 | \n
... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n... | \n
995 | \n995 | \nGirl | \nBoy | \n263 | \n513 | \n763 | \n0.344692 | \n0.512671 | \n
996 | \n996 | \nGirl | \nGirl | \n264 | \n514 | \n764 | \n0.345550 | \n0.513619 | \n
997 | \n997 | \nGirl | \nGirl | \n265 | \n515 | \n765 | \n0.346405 | \n0.514563 | \n
998 | \n998 | \nGirl | \nBoy | \n265 | \n515 | \n766 | \n0.345953 | \n0.514563 | \n
999 | \n999 | \nGirl | \nGirl | \n266 | \n516 | \n767 | \n0.346806 | \n0.515504 | \n
1000 rows × 8 columns
\n\n | index | \nvariable | \nvalue | \ntype | \n
---|---|---|---|---|
0 | \n0 | \nyounger | \nBoy | \nNaN | \n
1 | \n1 | \nyounger | \nGirl | \nNaN | \n
2 | \n2 | \nyounger | \nBoy | \nNaN | \n
3 | \n3 | \nyounger | \nBoy | \nNaN | \n
4 | \n4 | \nyounger | \nGirl | \nNaN | \n
... | \n... | \n... | \n... | \n... | \n
6995 | \n995 | \nP(Both|Older) | \n0.512671 | \nprobability | \n
6996 | \n996 | \nP(Both|Older) | \n0.513619 | \nprobability | \n
6997 | \n997 | \nP(Both|Older) | \n0.514563 | \nprobability | \n
6998 | \n998 | \nP(Both|Older) | \n0.514563 | \nprobability | \n
6999 | \n999 | \nP(Both|Older) | \n0.515504 | \nprobability | \n
7000 rows × 4 columns
\n