{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exploring the relationship between gender and policing\n",
    "> A Summary of lecture \"Analyzing Police Activity with pandas\", via datacamp\n",
    "\n",
    "- toc: true \n",
    "- badges: true\n",
    "- comments: true\n",
    "- author: Chanseok Kang\n",
    "- categories: [Python, Datacamp, Data_Science]\n",
    "- image: images/logo.png"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the pandas library as pd\n",
    "import pandas as pd\n",
    "\n",
    "# Read 'police.csv' into a DataFrame named ri\n",
    "ri = pd.read_csv('./dataset/police.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Do the genders commit different violations?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Examining traffic violations\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Speeding               48424\n",
      "Moving violation       16224\n",
      "Equipment              10922\n",
      "Other                   4410\n",
      "Registration/plates     3703\n",
      "Seat belt               2856\n",
      "Name: violation, dtype: int64\n",
      "Speeding               0.559563\n",
      "Moving violation       0.187476\n",
      "Equipment              0.126209\n",
      "Other                  0.050960\n",
      "Registration/plates    0.042790\n",
      "Seat belt              0.033002\n",
      "Name: violation, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "# Count the unique values in 'violation'\n",
    "print(ri['violation'].value_counts())\n",
    "\n",
    "# Express the counts as proportions\n",
    "print(ri['violation'].value_counts(normalize=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Comparing violations by gender"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Speeding               0.658114\n",
      "Moving violation       0.138218\n",
      "Equipment              0.105199\n",
      "Registration/plates    0.044418\n",
      "Other                  0.029738\n",
      "Seat belt              0.024312\n",
      "Name: violation, dtype: float64\n",
      "Speeding               0.522243\n",
      "Moving violation       0.206144\n",
      "Equipment              0.134158\n",
      "Other                  0.058985\n",
      "Registration/plates    0.042175\n",
      "Seat belt              0.036296\n",
      "Name: violation, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "# Create a DataFrame of female drivers\n",
    "female = ri[ri['driver_gender'] == 'F']\n",
    "\n",
    "# Create a DataFrame of male drivers\n",
    "male = ri[ri['driver_gender'] == 'M']\n",
    "\n",
    "# Compute the violations by female drivers (as proportions)\n",
    "print(female['violation'].value_counts(normalize=True))\n",
    "\n",
    "# Compute the violations by male drivers (as proportions)\n",
    "print(male['violation'].value_counts(normalize=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Does gender affect who gets a ticket for speeding?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Comparing speeding outcomes by gender"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Citation            0.952192\n",
      "Warning             0.040074\n",
      "Arrest Driver       0.005752\n",
      "N/D                 0.000959\n",
      "Arrest Passenger    0.000639\n",
      "No Action           0.000383\n",
      "Name: stop_outcome, dtype: float64\n",
      "Citation            0.944595\n",
      "Warning             0.036184\n",
      "Arrest Driver       0.015895\n",
      "Arrest Passenger    0.001281\n",
      "No Action           0.001068\n",
      "N/D                 0.000976\n",
      "Name: stop_outcome, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "# Create a DataFrame of female drivers stopped for speeding\n",
    "female_and_speeding = ri[(ri['driver_gender'] == 'F') & (ri['violation'] == 'Speeding')]\n",
    "\n",
    "# Create a DataFrame of male drivers stopped for speeding\n",
    "male_and_speeding = ri[(ri['driver_gender'] == 'M') & (ri['violation'] == 'Speeding')]\n",
    "\n",
    "# Compute the stop outcomes for female drivers (as proportions)\n",
    "print(female_and_speeding.stop_outcome.value_counts(normalize=True))\n",
    "\n",
    "# Compute the stop outcomes for male drivers (as proportions)\n",
    "print(male_and_speeding.stop_outcome.value_counts(normalize=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Does gender affect whose vehicle is searched?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Calculating the search rate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bool\n",
      "False    0.963953\n",
      "True     0.036047\n",
      "Name: search_conducted, dtype: float64\n",
      "0.03604713268876511\n"
     ]
    }
   ],
   "source": [
    "# Check the data type of 'search_conducted'\n",
    "print(ri['search_conducted'].dtypes)\n",
    "\n",
    "# Calculate the search rate by counting the values\n",
    "print(ri['search_conducted'].value_counts(normalize=True))\n",
    "\n",
    "# Calculate the search rate by taking the mean\n",
    "print(ri['search_conducted'].mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Comparing search rates by gender\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.019180617481282074\n",
      "0.04542557598546892\n",
      "driver_gender\n",
      "F    0.019181\n",
      "M    0.045426\n",
      "Name: search_conducted, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "# Calculating the search rate for female drivers\n",
    "print(ri[ri['driver_gender'] == 'F'].search_conducted.mean())\n",
    "\n",
    "# Calculating the search rate for male drivers\n",
    "print(ri[ri['driver_gender'] == 'M'].search_conducted.mean())\n",
    "\n",
    "# Calculate the search rate for both groups simultaneously\n",
    "print(ri.groupby('driver_gender').search_conducted.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding a second factor to the analysis\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "driver_gender  violation          \n",
      "F              Equipment              0.039984\n",
      "               Moving violation       0.039257\n",
      "               Other                  0.041018\n",
      "               Registration/plates    0.054924\n",
      "               Seat belt              0.017301\n",
      "               Speeding               0.008309\n",
      "M              Equipment              0.071496\n",
      "               Moving violation       0.061524\n",
      "               Other                  0.046191\n",
      "               Registration/plates    0.108802\n",
      "               Seat belt              0.035119\n",
      "               Speeding               0.027885\n",
      "Name: search_conducted, dtype: float64\n",
      "violation            driver_gender\n",
      "Equipment            F                0.039984\n",
      "                     M                0.071496\n",
      "Moving violation     F                0.039257\n",
      "                     M                0.061524\n",
      "Other                F                0.041018\n",
      "                     M                0.046191\n",
      "Registration/plates  F                0.054924\n",
      "                     M                0.108802\n",
      "Seat belt            F                0.017301\n",
      "                     M                0.035119\n",
      "Speeding             F                0.008309\n",
      "                     M                0.027885\n",
      "Name: search_conducted, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "# Calculate the search rate for each combination of gender and violation\n",
    "print(ri.groupby(['driver_gender', 'violation']).search_conducted.mean())\n",
    "\n",
    "# Reverse the ordering to group by violation before gender\n",
    "print(ri.groupby(['violation', 'driver_gender']).search_conducted.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Does gender affect who is frisked during a search?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Counting protective frisks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "During a vehicle search, the police officer may pat down the driver to check if they have a weapon. This is known as a \"protective frisk.\"\n",
    "\n",
    "In this exercise, you'll first check to see how many times \"Protective Frisk\" was the only search type. Then, you'll use a string method to locate all instances in which the driver was frisked."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Incident to Arrest                                          1290\n",
      "Probable Cause                                               924\n",
      "Inventory                                                    219\n",
      "Reasonable Suspicion                                         214\n",
      "Protective Frisk                                             164\n",
      "Incident to Arrest,Inventory                                 123\n",
      "Incident to Arrest,Probable Cause                            100\n",
      "Probable Cause,Reasonable Suspicion                           54\n",
      "Probable Cause,Protective Frisk                               35\n",
      "Incident to Arrest,Inventory,Probable Cause                   35\n",
      "Incident to Arrest,Protective Frisk                           33\n",
      "Inventory,Probable Cause                                      25\n",
      "Protective Frisk,Reasonable Suspicion                         19\n",
      "Incident to Arrest,Inventory,Protective Frisk                 18\n",
      "Incident to Arrest,Probable Cause,Protective Frisk            13\n",
      "Inventory,Protective Frisk                                    12\n",
      "Incident to Arrest,Reasonable Suspicion                        8\n",
      "Incident to Arrest,Probable Cause,Reasonable Suspicion         5\n",
      "Probable Cause,Protective Frisk,Reasonable Suspicion           5\n",
      "Incident to Arrest,Inventory,Reasonable Suspicion              4\n",
      "Inventory,Reasonable Suspicion                                 2\n",
      "Incident to Arrest,Protective Frisk,Reasonable Suspicion       2\n",
      "Inventory,Probable Cause,Protective Frisk                      1\n",
      "Inventory,Protective Frisk,Reasonable Suspicion                1\n",
      "Inventory,Probable Cause,Reasonable Suspicion                  1\n",
      "Name: search_type, dtype: int64\n",
      "bool\n",
      "303\n"
     ]
    }
   ],
   "source": [
    "# Count the 'search_type' values\n",
    "print(ri['search_type'].value_counts())\n",
    "\n",
    "# Check if 'search_type' contains the string 'Protective Frisk'\n",
    "ri['frisk'] = ri.search_type.str.contains('Protective Frisk', na=False)\n",
    "\n",
    "# Check the data type of 'frisk'\n",
    "print(ri['frisk'].dtypes)\n",
    "\n",
    "# Take the sum of frisk\n",
    "print(ri['frisk'].sum())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Comparing frisk rates by gender"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this exercise, you'll compare the rates at which female and male drivers are frisked during a search. Are males frisked more often than females, perhaps because police officers consider them to be higher risk?\n",
    "\n",
    "Before doing any calculations, it's important to filter the DataFrame to only include the relevant subset of data, namely stops in which a search was conducted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.09162382824312065\n",
      "driver_gender\n",
      "F    0.074561\n",
      "M    0.094353\n",
      "Name: frisk, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "# Create a DataFrame of stops in which a search was conducted\n",
    "searched = ri[ri.search_conducted == True]\n",
    "\n",
    "# Calculate the overall frisk rate by taking the mean of 'frisk'\n",
    "print(searched.frisk.mean())\n",
    "\n",
    "# Calculate the frisk rate for each gender\n",
    "print(searched.groupby('driver_gender').frisk.mean())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}