{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring the relationship between gender and policing\n", "> A Summary of lecture \"Analyzing Police Activity with pandas\", via datacamp\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Data_Science]\n", "- image: images/logo.png" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import the pandas library as pd\n", "import pandas as pd\n", "\n", "# Read 'police.csv' into a DataFrame named ri\n", "ri = pd.read_csv('./dataset/police.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Do the genders commit different violations?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Examining traffic violations\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Speeding 48424\n", "Moving violation 16224\n", "Equipment 10922\n", "Other 4410\n", "Registration/plates 3703\n", "Seat belt 2856\n", "Name: violation, dtype: int64\n", "Speeding 0.559563\n", "Moving violation 0.187476\n", "Equipment 0.126209\n", "Other 0.050960\n", "Registration/plates 0.042790\n", "Seat belt 0.033002\n", "Name: violation, dtype: float64\n" ] } ], "source": [ "# Count the unique values in 'violation'\n", "print(ri['violation'].value_counts())\n", "\n", "# Express the counts as proportions\n", "print(ri['violation'].value_counts(normalize=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparing violations by gender" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Speeding 0.658114\n", "Moving violation 0.138218\n", "Equipment 0.105199\n", "Registration/plates 0.044418\n", "Other 0.029738\n", "Seat belt 0.024312\n", "Name: violation, dtype: float64\n", "Speeding 0.522243\n", "Moving violation 0.206144\n", "Equipment 0.134158\n", "Other 0.058985\n", "Registration/plates 0.042175\n", "Seat belt 0.036296\n", "Name: violation, dtype: float64\n" ] } ], "source": [ "# Create a DataFrame of female drivers\n", "female = ri[ri['driver_gender'] == 'F']\n", "\n", "# Create a DataFrame of male drivers\n", "male = ri[ri['driver_gender'] == 'M']\n", "\n", "# Compute the violations by female drivers (as proportions)\n", "print(female['violation'].value_counts(normalize=True))\n", "\n", "# Compute the violations by male drivers (as proportions)\n", "print(male['violation'].value_counts(normalize=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Does gender affect who gets a ticket for speeding?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparing speeding outcomes by gender" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Citation 0.952192\n", "Warning 0.040074\n", "Arrest Driver 0.005752\n", "N/D 0.000959\n", "Arrest Passenger 0.000639\n", "No Action 0.000383\n", "Name: stop_outcome, dtype: float64\n", "Citation 0.944595\n", "Warning 0.036184\n", "Arrest Driver 0.015895\n", "Arrest Passenger 0.001281\n", "No Action 0.001068\n", "N/D 0.000976\n", "Name: stop_outcome, dtype: float64\n" ] } ], "source": [ "# Create a DataFrame of female drivers stopped for speeding\n", "female_and_speeding = ri[(ri['driver_gender'] == 'F') & (ri['violation'] == 'Speeding')]\n", "\n", "# Create a DataFrame of male drivers stopped for speeding\n", "male_and_speeding = ri[(ri['driver_gender'] == 'M') & (ri['violation'] == 'Speeding')]\n", "\n", "# Compute the stop outcomes for female drivers (as proportions)\n", "print(female_and_speeding.stop_outcome.value_counts(normalize=True))\n", "\n", "# Compute the stop outcomes for male drivers (as proportions)\n", "print(male_and_speeding.stop_outcome.value_counts(normalize=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Does gender affect whose vehicle is searched?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculating the search rate" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bool\n", "False 0.963953\n", "True 0.036047\n", "Name: search_conducted, dtype: float64\n", "0.03604713268876511\n" ] } ], "source": [ "# Check the data type of 'search_conducted'\n", "print(ri['search_conducted'].dtypes)\n", "\n", "# Calculate the search rate by counting the values\n", "print(ri['search_conducted'].value_counts(normalize=True))\n", "\n", "# Calculate the search rate by taking the mean\n", "print(ri['search_conducted'].mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparing search rates by gender\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.019180617481282074\n", "0.04542557598546892\n", "driver_gender\n", "F 0.019181\n", "M 0.045426\n", "Name: search_conducted, dtype: float64\n" ] } ], "source": [ "# Calculating the search rate for female drivers\n", "print(ri[ri['driver_gender'] == 'F'].search_conducted.mean())\n", "\n", "# Calculating the search rate for male drivers\n", "print(ri[ri['driver_gender'] == 'M'].search_conducted.mean())\n", "\n", "# Calculate the search rate for both groups simultaneously\n", "print(ri.groupby('driver_gender').search_conducted.mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding a second factor to the analysis\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "driver_gender violation \n", "F Equipment 0.039984\n", " Moving violation 0.039257\n", " Other 0.041018\n", " Registration/plates 0.054924\n", " Seat belt 0.017301\n", " Speeding 0.008309\n", "M Equipment 0.071496\n", " Moving violation 0.061524\n", " Other 0.046191\n", " Registration/plates 0.108802\n", " Seat belt 0.035119\n", " Speeding 0.027885\n", "Name: search_conducted, dtype: float64\n", "violation driver_gender\n", "Equipment F 0.039984\n", " M 0.071496\n", "Moving violation F 0.039257\n", " M 0.061524\n", "Other F 0.041018\n", " M 0.046191\n", "Registration/plates F 0.054924\n", " M 0.108802\n", "Seat belt F 0.017301\n", " M 0.035119\n", "Speeding F 0.008309\n", " M 0.027885\n", "Name: search_conducted, dtype: float64\n" ] } ], "source": [ "# Calculate the search rate for each combination of gender and violation\n", "print(ri.groupby(['driver_gender', 'violation']).search_conducted.mean())\n", "\n", "# Reverse the ordering to group by violation before gender\n", "print(ri.groupby(['violation', 'driver_gender']).search_conducted.mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Does gender affect who is frisked during a search?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Counting protective frisks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "During a vehicle search, the police officer may pat down the driver to check if they have a weapon. This is known as a \"protective frisk.\"\n", "\n", "In this exercise, you'll first check to see how many times \"Protective Frisk\" was the only search type. Then, you'll use a string method to locate all instances in which the driver was frisked." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Incident to Arrest 1290\n", "Probable Cause 924\n", "Inventory 219\n", "Reasonable Suspicion 214\n", "Protective Frisk 164\n", "Incident to Arrest,Inventory 123\n", "Incident to Arrest,Probable Cause 100\n", "Probable Cause,Reasonable Suspicion 54\n", "Probable Cause,Protective Frisk 35\n", "Incident to Arrest,Inventory,Probable Cause 35\n", "Incident to Arrest,Protective Frisk 33\n", "Inventory,Probable Cause 25\n", "Protective Frisk,Reasonable Suspicion 19\n", "Incident to Arrest,Inventory,Protective Frisk 18\n", "Incident to Arrest,Probable Cause,Protective Frisk 13\n", "Inventory,Protective Frisk 12\n", "Incident to Arrest,Reasonable Suspicion 8\n", "Incident to Arrest,Probable Cause,Reasonable Suspicion 5\n", "Probable Cause,Protective Frisk,Reasonable Suspicion 5\n", "Incident to Arrest,Inventory,Reasonable Suspicion 4\n", "Inventory,Reasonable Suspicion 2\n", "Incident to Arrest,Protective Frisk,Reasonable Suspicion 2\n", "Inventory,Probable Cause,Protective Frisk 1\n", "Inventory,Protective Frisk,Reasonable Suspicion 1\n", "Inventory,Probable Cause,Reasonable Suspicion 1\n", "Name: search_type, dtype: int64\n", "bool\n", "303\n" ] } ], "source": [ "# Count the 'search_type' values\n", "print(ri['search_type'].value_counts())\n", "\n", "# Check if 'search_type' contains the string 'Protective Frisk'\n", "ri['frisk'] = ri.search_type.str.contains('Protective Frisk', na=False)\n", "\n", "# Check the data type of 'frisk'\n", "print(ri['frisk'].dtypes)\n", "\n", "# Take the sum of frisk\n", "print(ri['frisk'].sum())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparing frisk rates by gender" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this exercise, you'll compare the rates at which female and male drivers are frisked during a search. Are males frisked more often than females, perhaps because police officers consider them to be higher risk?\n", "\n", "Before doing any calculations, it's important to filter the DataFrame to only include the relevant subset of data, namely stops in which a search was conducted." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.09162382824312065\n", "driver_gender\n", "F 0.074561\n", "M 0.094353\n", "Name: frisk, dtype: float64\n" ] } ], "source": [ "# Create a DataFrame of stops in which a search was conducted\n", "searched = ri[ri.search_conducted == True]\n", "\n", "# Calculate the overall frisk rate by taking the mean of 'frisk'\n", "print(searched.frisk.mean())\n", "\n", "# Calculate the frisk rate for each gender\n", "print(searched.groupby('driver_gender').frisk.mean())" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }