{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "s1",
     "content",
     "l1"
    ]
   },
   "source": [
    "# Joint Distributions\n",
    "\n",
    "Consider two discrete random variables X and Y.  The function given by\n",
    "f (x, y) = P(X = x, Y = y) for each pair of values (x, y) within the\n",
    "range of X is called the joint probability distribution of X and Y.\n",
    "\n",
    "The joint probability mass function for discrete random variables (X=x, Y=y) is given by:\n",
    "\n",
    "${\\begin{aligned}\\mathrm {P} (X=x\\ \\mathrm {and} \\ Y=y)=\\mathrm {P} (Y=y\\mid X=x)\\cdot \\mathrm {P} (X=x)=\\mathrm {P} (X=x\\mid Y=y)\\cdot \\mathrm {P} (Y=y)\\end{aligned}}$\n",
    "\n",
    "\n",
    "### Example\n",
    "\n",
    "A coin is tossed twice. Let X denote the number of heads on the first toss and Y the total number of heads on the 2 tosses. \n",
    "Assume that the coin is biased and a head has a 60% chance of occurring:\n",
    "\n",
    "* X = First head\n",
    "* Y = Number of heads in 2 tosses\n",
    "\n",
    "Compute the joint probability table and assign the values to the dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "tags": [
     "s1",
     "ce",
     "l1"
    ]
   },
   "outputs": [],
   "source": [
    "# Assign the values of the dictionary of the form p_xy[X][Y] below\n",
    "p_h = 0.6\n",
    "p_t = 1-0.6\n",
    "p_12 = 0\n",
    "p_11 = 0\n",
    "p_01 = 0\n",
    "p_10 = 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "tags": [
     "s1",
     "l1",
     "hint"
    ]
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "tags": [
     "s1",
     "l1",
     "ans"
    ]
   },
   "outputs": [],
   "source": [
    "p_12 = p_h * p_h\n",
    "p_11 = p_h * p_t\n",
    "p_01 = p_t * p_h\n",
    "p_00 = p_t * p_t\n",
    "\n",
    "print(\"p_12 %s, p_11 %s, p_01 %s, p_00 %.4s\" % (p_12, p_11, p_01, p_00))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "tags": [
     "s1",
     "hid",
     "l1"
    ]
   },
   "outputs": [],
   "source": [
    "ref_tmp_var = False\n",
    "\n",
    "try:\n",
    "    if (abs(p_12 - 0.36)<0.1) and (abs(p_11 - 0.24) < 0.1) and (abs(p_01 - 0.24) < 0.1) and (abs(p_00 - .16) < 0.1): \n",
    "        ref_assert_var = True\n",
    "        ref_tmp_var = True\n",
    "    else:\n",
    "        ref_assert_var = False\n",
    "        print('Please follow the instructions given and use the same variables provided in the instructions.')\n",
    "except Exception:\n",
    "    print('Please follow the instructions given and use the same variables provided in the instructions.')\n",
    "\n",
    "assert ref_tmp_var"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "l2",
     "content",
     "s2"
    ]
   },
   "source": [
    "\\begin{array}{ l | c | r }\n",
    "     \\hline\n",
    "     - & 1st - Toss & 2nd-Toss & JP \\\\ \n",
    "     \\hline\n",
    "     HH & 0.6 & 0.6 & 0.36 \\\\ \n",
    "     \\hline\n",
    "     HT & 0.6 & 0.4 & 0.24 \\\\ \n",
    "     \\hline\n",
    "     TH & 0.4 & 0.6 & 0.24 \\\\ \n",
    "     \\hline\n",
    "     TT & 0.4 & 0.4 & 0.16 \\\\ \n",
    "   \\hline\n",
    "\\end{array}\n",
    "\n",
    "\n",
    "\n",
    "The joint probability distribution looks like :\n",
    "\n",
    "\\begin{array}{ l | c | r }\n",
    "     \\hline\n",
    "     H:T & X & Y & JP \\\\ \n",
    "     \\hline\n",
    "     HH & 1 & 2 & 0.36 \\\\ \n",
    "     \\hline\n",
    "     HT & 1 & 1 & 0.24 \\\\ \n",
    "     \\hline\n",
    "     TH & 0 & 1 & 0.24 \\\\ \n",
    "     \\hline\n",
    "     TT & 0 & 0 & 0.16 \\\\ \n",
    "   \\hline\n",
    "\\end{array}\n",
    "\n",
    "We can now organize the above in the form of a map with Y, X as:\n",
    "\n",
    "\\begin{array}{ l | c | r }\n",
    "     \\hline\n",
    "     Y:X-> & 0 & 1 \\\\ \n",
    "     \\hline\n",
    "     0 & 0.16 & 0 \\\\ \n",
    "     \\hline\n",
    "     1 & 0.24 & 0.24 \\\\ \n",
    "     \\hline\n",
    "     2 & 0 & 0.36 \\\\ \n",
    "     \\hline\n",
    "\\end{array}\n",
    "\n",
    "\n",
    "## Marginal Distribution\n",
    "\n",
    "For a given two random variables X and Y whose joint distribution is known, the marginal distribution of X is simply the probability distribution of X averaging over information about Y. This is  calculated by summing the joint probability distribution over Y.\n",
    "\n",
    "For discrete random variable , marginal distribution of variable X is obtained by summing up the distribution of X over values of Y.\n",
    "\n",
    "Let us consider the above joint distribution again:\n",
    "\n",
    "\\begin{array}{ l | c | r }\n",
    "     \\hline\n",
    "     Y : X-> & 0 & 1 \\\\ \n",
    "     \\hline\n",
    "     0 & 0.16 & 0 \\\\ \n",
    "     \\hline\n",
    "     1 & 0.24 & 0.24 \\\\ \n",
    "     \\hline\n",
    "     2 & 0 & 0.36 \\\\ \n",
    "     \\hline\n",
    "\\end{array}\n",
    "\n",
    "\n",
    "## Example\n",
    "\n",
    "* Compute the marginal distributions, f(X), f(y). Assign the list to the variables fX, fY."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "tags": [
     "l2",
     "ce",
     "s2"
    ]
   },
   "outputs": [],
   "source": [
    "#Exercise\n",
    "fX = []\n",
    "fY = []"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "l2",
     "s2",
     "hint"
    ]
   },
   "source": [
    "Sum over rows and columns for each marginal distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "tags": [
     "l2",
     "s2",
     "ans"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "fX:  [0.4, 0.6]\n",
      "fY:  [0.16, 0.48, 0.36]\n"
     ]
    }
   ],
   "source": [
    "fX = [0.4, 0.6]\n",
    "fY = [0.16, 0.48, 0.36]\n",
    "\n",
    "print(\"fX: \", fX)\n",
    "print(\"fY: \", fY)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "tags": [
     "l2",
     "hid",
     "s2"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "continue\n"
     ]
    }
   ],
   "source": [
    "ref_tmp_var = False\n",
    "\n",
    "try:\n",
    "    if fX == [0.4, 0.6] and fY == [0.16, 0.48, 0.36]: \n",
    "        ref_assert_var = True\n",
    "        ref_tmp_var = True\n",
    "    else:\n",
    "        ref_assert_var = False\n",
    "        print('Please follow the instructions given and use the same variables provided in the instructions.')\n",
    "except Exception:\n",
    "    print('Please follow the instructions given and use the same variables provided in the instructions.')\n",
    "\n",
    "assert ref_tmp_var"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "l3",
     "s3",
     "content"
    ]
   },
   "source": [
    "For the above joint distribution the marginal distribution is below:\n",
    "\n",
    "Marginal Distribution of X:\n",
    "\n",
    "\\begin{array}{ l | c | r }\n",
    "     \\hline\n",
    "     X-> & 0 & 1 \\\\ \n",
    "     \\hline\n",
    "     f(x) & 0.4 & 0.6 \\\\ \n",
    "     \\hline\n",
    "\\end{array}\n",
    "\n",
    "\n",
    "Marginal Distribution of Y:\n",
    "\n",
    "\\begin{array}{ l | c | r }\n",
    "     \\hline\n",
    "     Y-> & 0 & 1  & 2 \\\\ \n",
    "     \\hline\n",
    "     f(y) & 0.16 & 0.48 & 0.36 \\\\ \n",
    "     \\hline\n",
    "\\end{array}\n",
    "\n",
    "http://www.sci.csueastbay.edu/~btrumbo/Stat3401/Hand3401/JointDistnsCor.pdf\n",
    "\n",
    "\n",
    "### Corpus of words\n",
    "\n",
    "Let us consider the case of a corpus (collection) of 100 words in a text. The words are tabulated below based on their frequency of occurrence and the probability - \n",
    "c(w) = count\n",
    "P(w) = Probability\n",
    "X = word length\n",
    "Y- number of Vowels.\n",
    "\n",
    "\n",
    "Let us look at a joint probability table for this:\n",
    "\n",
    "\\begin{array}{ l | c | r }\n",
    "     \\hline\n",
    "     word & c(w) & P(w) & X & Y  \\\\ \n",
    "     \\hline\n",
    "     the & 30 & 0.30  & 3 & 1 \\\\ \n",
    "     \\hline\n",
    "     to & 18 & 0.18  & 2 & 1 \\\\ \n",
    "     \\hline\n",
    "     will & 16 & 0.16  & 4 & 1 \\\\ \n",
    "     \\hline\n",
    "     of & 10 & 0.10  & 2 & 1 \\\\ \n",
    "     \\hline\n",
    "     hello & 7 & 0.07  & 5 & 2 \\\\ \n",
    "     \\hline\n",
    "     in & 6 & 0.06  & 2 & 1 \\\\ \n",
    "     \\hline\n",
    "     tools & 4 & 0.04  & 5 & 2 \\\\ \n",
    "     \\hline\n",
    "     pose & 3 & 0.03  & 4 & 2 \\\\ \n",
    "     \\hline\n",
    "     taste & 3 & 0.03  & 5 & 2 \\\\ \n",
    "     \\hline\n",
    "     PGM & 3 & 0.03  & 3 & 0 \\\\ \n",
    "     \\hline\n",
    "\\end{array}\n",
    "\n",
    "From the above table, it is evident that the word \"the\" occurs 30 times (count column) out of a total of 100 words. Hence the probability of the word \"the\" is 0.30 (30/100 = 0.30). The X column refers to the length of the word. In this case x=3. The Y column refers to the number of vowels. In this case y=1. Similarly for the word \"to\" the probability of occurrence is 0.18 (18/100 = 0.18). X and Y are 2 and 1 respectively.\n",
    "\n",
    "For arriving at joint probability distribution of variables X and Y, we must consider all the combinations of X and Y that are observed. For example, let us consider all the words with a length of 2 (that is X=2) and with exactly 1 vowel (Y=1). We have 3 occurrences namely \"to\", \"of\" and \"in\". We can get the joint probability by summing up the individual probabilities for these words. Those are 0.18, 0.10 and 0.06. Hence for X=2, Y=1 the joint probability is 0.18+0.10+0.06 which is 0.34. Similarly calculating the joint probabilities for all combinations of X and Y we get the Joint Probability Distribution table.     \n",
    "\n",
    "The joint probability distribution looks like this:\n",
    "\n",
    "\\begin{array}{ l | c | r }\n",
    "     \\hline\n",
    "     Y/X-> & 2 & 3 & 4 & 5 \\\\ \n",
    "     \\hline\n",
    "     0 & 0 & 0.03 & 0 & 0 \\\\ \n",
    "     \\hline\n",
    "     1 & 0.34 & 0.30 & 0.16 & 0 \\\\ \n",
    "     \\hline\n",
    "     2 & 0 & 0 & 0.03 & 0.14 \\\\ \n",
    "     \\hline\n",
    "\\end{array}\n",
    "\n",
    "### Exercise\n",
    "\n",
    "Find the marginal distribution of X and Y from the above joint probability distribution.\n",
    "\n",
    "Assign them to the variables fX and fY respectively.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true,
    "tags": [
     "l3",
     "s3",
     "ce"
    ]
   },
   "outputs": [],
   "source": [
    "#Exercise\n",
    "fX = []\n",
    "fY = []"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "l3",
     "s3",
     "hint"
    ]
   },
   "source": [
    "Sum over rows(for fY array) and columns(for fX array) for each marginal distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "tags": [
     "l3",
     "s3",
     "ans"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "fX:  [0.34, 0.33, 0.19, 0.14]\n",
      "fY:  [0.03, 0.8, 0.17]\n"
     ]
    }
   ],
   "source": [
    "fX = [0.34, 0.33, 0.19, 0.14]\n",
    "fY = [0.03, 0.80, 0.17]\n",
    "\n",
    "print(\"fX: \", fX)\n",
    "print(\"fY: \", fY)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "tags": [
     "l3",
     "s3",
     "hid"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "continue\n"
     ]
    }
   ],
   "source": [
    "ref_tmp_var = False\n",
    "\n",
    "try:\n",
    "    if fX == [0.34, 0.33, 0.19, 0.14] and fY == [0.03, 0.8, 0.17]: \n",
    "        ref_assert_var = True\n",
    "        ref_tmp_var = True\n",
    "    else:\n",
    "        ref_assert_var = False\n",
    "        print('Please follow the instructions given and use the same variables provided in the instructions.')\n",
    "except Exception:\n",
    "    print('Please follow the instructions given and use the same variables provided in the instructions.')\n",
    "\n",
    "assert ref_tmp_var"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "l4",
     "s4",
     "content"
    ]
   },
   "source": [
    "For the above joint distribution, the marginal distribution of X and Y are given below:\n",
    "\n",
    "Marginal Distribution of X:\n",
    "\n",
    "\\begin{array}{ l | c | r }\n",
    "     \\hline\n",
    "     X-> & 2 & 3 & 4 & 5 \\\\ \n",
    "     \\hline\n",
    "     f(X) & 0.34 & 0.33 & 0.19 & 0.14 \\\\ \n",
    "     \\hline\n",
    "\\end{array}\n",
    "\n",
    "\n",
    "Marginal Distribution of Y:\n",
    "\n",
    "\\begin{array}{ l | c | r }\n",
    "     \\hline\n",
    "     Y-> & 0 & 1  & 2 \\\\ \n",
    "     \\hline\n",
    "     f(Y) & 0.03 & 0.80 & 0.17 \\\\ \n",
    "     \\hline\n",
    "\\end{array}\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "## Fraud Modeling Example\n",
    "\n",
    "Consider a simple model of fraudulent transactions with data containing Sex (S), Age (A), Fraud (F), Jewelry (J) and probabilities P {P(S,A,F,J)}:\n",
    "\n",
    "| S   | A   | F   | J   |       P        |\n",
    "|-----|-----|-----|-----|----------------|\n",
    "| S_0 | A_0 | F_0 | J_0 |         0.0025 |\n",
    "| S_0 | A_0 | F_0 | J_1 |         0.0100 |\n",
    "| S_0 | A_0 | F_1 | J_0 |         0.1069 |\n",
    "| ... | ... | ... | ... |          ...   |\n",
    "| S_1 | A_2 | F_1 | J_1 |         0.0079 |\n",
    "\n",
    "\n",
    "(F = No) corresponds to F_1\n",
    "\n",
    "* Compute p(S, A, F, J | F=No) and assign it to p_SAFJ "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "tags": [
     "l4",
     "s4",
     "ce"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>S</th>\n",
       "      <th>A</th>\n",
       "      <th>F</th>\n",
       "      <th>J</th>\n",
       "      <th>P</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S_0</td>\n",
       "      <td>A_0</td>\n",
       "      <td>F_0</td>\n",
       "      <td>J_0</td>\n",
       "      <td>0.0025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S_0</td>\n",
       "      <td>A_0</td>\n",
       "      <td>F_0</td>\n",
       "      <td>J_1</td>\n",
       "      <td>0.0100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S_0</td>\n",
       "      <td>A_0</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_0</td>\n",
       "      <td>0.1069</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S_0</td>\n",
       "      <td>A_0</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_1</td>\n",
       "      <td>0.0056</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>S_0</td>\n",
       "      <td>A_1</td>\n",
       "      <td>F_0</td>\n",
       "      <td>J_0</td>\n",
       "      <td>0.0008</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       S      A      F      J       P\n",
       "0   S_0    A_0    F_0    J_0   0.0025\n",
       "1   S_0    A_0    F_0    J_1   0.0100\n",
       "2   S_0    A_0    F_1    J_0   0.1069\n",
       "3   S_0    A_0    F_1    J_1   0.0056\n",
       "4   S_0    A_1    F_0    J_0   0.0008"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "fraud_data = pd.read_csv('https://raw.githubusercontent.com/colaberry/data/master/Fraud/fraud_data.csv')\n",
    "fraud_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "s4",
     "l4",
     "hint"
    ]
   },
   "source": [
    "Use fraud_data['F'].str.contains('F_1')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "tags": [
     "s4",
     "l4",
     "ans"
    ]
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
      "  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>S</th>\n",
       "      <th>A</th>\n",
       "      <th>F</th>\n",
       "      <th>J</th>\n",
       "      <th>P</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S_0</td>\n",
       "      <td>A_0</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_0</td>\n",
       "      <td>0.118778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S_0</td>\n",
       "      <td>A_0</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_1</td>\n",
       "      <td>0.006222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>S_0</td>\n",
       "      <td>A_1</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_0</td>\n",
       "      <td>0.190000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>S_0</td>\n",
       "      <td>A_1</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_1</td>\n",
       "      <td>0.010000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>S_0</td>\n",
       "      <td>A_2</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_0</td>\n",
       "      <td>0.166222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>S_0</td>\n",
       "      <td>A_2</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_1</td>\n",
       "      <td>0.008778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>S_1</td>\n",
       "      <td>A_0</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_0</td>\n",
       "      <td>0.118778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>S_1</td>\n",
       "      <td>A_0</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_1</td>\n",
       "      <td>0.006222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>S_1</td>\n",
       "      <td>A_1</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_0</td>\n",
       "      <td>0.190000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>S_1</td>\n",
       "      <td>A_1</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_1</td>\n",
       "      <td>0.010000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>S_1</td>\n",
       "      <td>A_2</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_0</td>\n",
       "      <td>0.166222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>S_1</td>\n",
       "      <td>A_2</td>\n",
       "      <td>F_1</td>\n",
       "      <td>J_1</td>\n",
       "      <td>0.008778</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        S      A      F      J         P\n",
       "2    S_0    A_0    F_1    J_0   0.118778\n",
       "3    S_0    A_0    F_1    J_1   0.006222\n",
       "6    S_0    A_1    F_1    J_0   0.190000\n",
       "7    S_0    A_1    F_1    J_1   0.010000\n",
       "10   S_0    A_2    F_1    J_0   0.166222\n",
       "11   S_0    A_2    F_1    J_1   0.008778\n",
       "14   S_1    A_0    F_1    J_0   0.118778\n",
       "15   S_1    A_0    F_1    J_1   0.006222\n",
       "18   S_1    A_1    F_1    J_0   0.190000\n",
       "19   S_1    A_1    F_1    J_1   0.010000\n",
       "22   S_1    A_2    F_1    J_0   0.166222\n",
       "23   S_1    A_2    F_1    J_1   0.008778"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "p_SAFJ = fraud_data[fraud_data['F'].str.contains('F_1')]\n",
    "p_SAFJ['P'] = p_SAFJ['P']/p_SAFJ['P'].sum()\n",
    "p_SAFJ"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "tags": [
     "s4",
     "hid",
     "l4"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "continue\n"
     ]
    }
   ],
   "source": [
    "ref_tmp_var = False\n",
    "\n",
    "try:\n",
    "    if abs(p_SAFJ['P'][2] - 0.1069) < 0.1: \n",
    "        ref_assert_var = True\n",
    "        ref_tmp_var = True\n",
    "    else:\n",
    "        ref_assert_var = False\n",
    "        print('Please follow the instructions given and use the same variables provided in the instructions.')\n",
    "except Exception:\n",
    "    print('Please follow the instructions given and use the same variables provided in the instructions.')\n",
    "\n",
    "assert ref_tmp_var"
   ]
  }
 ],
 "metadata": {
  "executed_sections": [],
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}