{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:01.666277Z", "iopub.status.busy": "2024-03-18T21:43:01.665749Z", "iopub.status.idle": "2024-03-18T21:43:01.765220Z", "shell.execute_reply": "2024-03-18T21:43:01.764890Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# a bit of setup just for lecture; ignore me\n", "import numpy as np\n", "\n", "np.random.seed(5)" ] }, { "cell_type": "markdown", "metadata": { "id": "SA9q8DQPL6oZ", "slideshow": { "slide_type": "slide" } }, "source": [ "# Class 1: Working with data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Questions?\n", "- How was the homework? 👍/👎" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Fundamental programming concepts" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "columbia-only" ] }, "source": [ "From the PB&J exercise last week, ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "columbia-only" ] }, "source": [ "### Objects\n", "\n", "- Jelly\n", "- Jar of peanut butter\n", "- Knife\n", "\n", "In code, they represent a \"thing\", which often has associated actions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "columbia-only" ] }, "source": [ "### Abstraction\n", "\n", "Verbs like:\n", "\n", "- \"Open\"\n", "- \"Spread\"\n", "\n", "In code, known as functions/methods, which hide the details of what's happening (the \"implementation\"). Similarly, \"the sandwich\" is an abstraction for an object, which contains other objects." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "columbia-only" ] }, "source": [ "### Pronouns/nicknames\n", "\n", "- \"It\"\n", "- \"PB\"\n", "\n", "In code, known as variables." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- **Variables:**\n", " - Statistics: [\"any characteristics, number, or quantity that can be measured or counted,\"](https://www.abs.gov.au/websitedbs/D3310114.nsf/home/statistical+language+-+what+are+variables) which we'll see as columns of tables\n", " - Programming: Buckets for values\n", "- **Functions:** Buckets for code\n", " - You'll learn more about these in the homework" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Variables and functions\n", "\n", "- Make it possible to reuse the respective thing elsewhere\n", "- [Can be named _almost_ anything you want](https://realpython.com/python-variables/#variable-names)\n", "- Should have names that are meaningful, for readability\n", "\n", "Also:\n", "\n", "- Function arguments are basically variables within that function, which get assigned to whatever is passed in" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- Functions == methods\n", "- Arguments == parameters\n", "\n", "For simplicity, we'll use them interchangeably." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Packages\n", "\n", "- a.k.a. \"libraries\" or \"modules\"\n", "- Developers have create them to make code/functionality reusable and easily sharable\n", "- Software plugins that you `import`\n", "- Main packages we’ll use:\n", " - `pandas`\n", " - `plotly`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "columbia-only" ] }, "source": [ "### Operator precedence\n", "\n", "a.k.a. order of operations, like [PEMDAS](https://en.wikipedia.org/wiki/Order_of_operations#Mnemonics) from math" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "columbia-only" ] }, "source": [ "```python\n", "answer = \"No\"\n", "\n", "answer == \"Yes\" or \"yes\"\n", "```\n", "\n", "What will this evaluate to?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "columbia-only" ] }, "source": [ "[Evaluation order](https://docs.python.org/3/reference/expressions.html#evaluation-order) and [operator precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "columbia-only" ] }, "source": [ "```python\n", "answer = \"No\"\n", "\n", "\n", "result = answer == \"Yes\" or \"yes\"\n", "# ↓\n", "result = \"No\" == \"Yes\" or \"yes\"\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [ "columbia-only" ] }, "source": [ "```python\n", "# ↓\n", "result = ( \"No\" == \"Yes\") or \"yes\"\n", "# `==` has higher precedence than `or`\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [ "columbia-only" ] }, "source": [ "```python\n", "# ↓\n", "result = False or \"yes\"\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [ "columbia-only" ] }, "source": [ "```python\n", "# ↓\n", "result = \"yes\"\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:01.768076Z", "iopub.status.busy": "2024-03-18T21:43:01.767932Z", "iopub.status.idle": "2024-03-18T21:43:01.771687Z", "shell.execute_reply": "2024-03-18T21:43:01.771348Z" }, "slideshow": { "slide_type": "subslide" }, "tags": [ "columbia-only" ] }, "outputs": [ { "data": { "text/plain": [ "'yes'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "answer = \"No\"\n", "\n", "result = answer == \"Yes\" or \"yes\"\n", "result" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [ "columbia-only" ] }, "source": [ "**Takeaway:** Code is better when readable. Use parentheses so the reader doesn't have to think!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Working with CSVs in pure Python\n", "\n", "We will use Python's CSV [DictReader](https://docs.python.org/3/library/csv.html#csv.DictReader). We'll open the file, parse it as a CSV, then operate row by row." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example\n", "\n", "```python\n", "import csv\n", "\n", "\n", "# set up the file object\n", "with open(\"people.csv\") as csvfile:\n", " # set up the reader\n", " reader = csv.DictReader(csvfile)\n", " # loop through the rows\n", " for row in reader:\n", " # access the data in various columns\n", " first = row[\"first_name\"]\n", " last = row[\"last_name\"]\n", "\n", " print(f\"{first} {last}\")\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### [In-class exercise](https://python-public-policy.afeld.me/en/{{school_slug}}/lecture_1_exercise.html)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 311 requests\n", "\n", "Who's called 311 before?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "[NYC 311 homepage](https://portal.311.nyc.gov/)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### [311 data](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9)" ] }, { "cell_type": "markdown", "metadata": { "id": "CRAqTQ2rbXAA", "slideshow": { "slide_type": "slide" } }, "source": [ "## Today's goal\n", "\n", "- Which 311 complaints are most common?\n", "- Which agencies are responsible for handling them?" ] }, { "cell_type": "markdown", "metadata": { "id": "9rvnMzjSMK36", "slideshow": { "slide_type": "slide" } }, "source": [ "## Pandas\n", "\n", "- A Python package (bundled up code that you can reuse)\n", "- Very common for data science in Python\n", "- [A lot like R](https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_r.html)\n", " - Both organize around \"data frames\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Start by importing necessary packages" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:01.792529Z", "iopub.status.busy": "2024-03-18T21:43:01.792361Z", "iopub.status.idle": "2024-03-18T21:43:02.006698Z", "shell.execute_reply": "2024-03-18T21:43:02.006369Z" }, "id": "Q4l2QosJLzP6" }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": { "id": "R1G04BmMMFJb", "slideshow": { "slide_type": "subslide" } }, "source": [ "### Read and save 311 Service Requests dataset as a pandas dataframe\n", "\n", "We're using a sample to make it easier/faster to work with. This will take a while (~30 seconds)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:02.008609Z", "iopub.status.busy": "2024-03-18T21:43:02.008468Z", "iopub.status.idle": "2024-03-18T21:43:06.797885Z", "shell.execute_reply": "2024-03-18T21:43:06.797553Z" }, "id": "iQgE8qFAMbiF" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/kr/nx0m1j811kz5vy8c87ffchzr0000gn/T/ipykernel_27038/4056869583.py:1: DtypeWarning: Columns (8,20,31,34) have mixed types. Specify dtype option on import or set low_memory=False.\n", " requests = pd.read_csv(\n" ] } ], "source": [ "requests = pd.read_csv(\n", " \"https://storage.googleapis.com/python-public-policy2/data/311_requests_2018-19_sample.csv.zip\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ignore the `DtypeWarning` for now; we'll come back to it." ] }, { "cell_type": "markdown", "metadata": { "id": "MgQ0ykVeNFsF", "slideshow": { "slide_type": "slide" } }, "source": [ "## Preview the data contents" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:06.799693Z", "iopub.status.busy": "2024-03-18T21:43:06.799573Z", "iopub.status.idle": "2024-03-18T21:43:06.810635Z", "shell.execute_reply": "2024-03-18T21:43:06.810381Z" }, "id": "YdtFBy0Il7E7", "outputId": "7bb49293-60dd-4df6-e5df-1eec7f2343ce", "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident Address...Vehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentLatitudeLongitudeLocation
03988588908/01/2018 12:05:13 AM08/01/2018 12:05:13 AMDOTDepartment of TransportationStreet ConditionPotholeNaN112353143 SHORE PARKWAY...NaNNaNNaNNaNNaNNaNNaN40.585156-73.959119(40.585155533520144, -73.95911915841708)
13988647008/01/2018 12:06:05 AM08/01/2018 12:06:05 AMDOTDepartment of TransportationStreet ConditionPotholeNaN112353153 SHORE PARKWAY...NaNNaNNaNNaNNaNNaNNaN40.585218-73.958608(40.58521848090658, -73.95860788382927)
23989354308/01/2018 12:06:16 AM08/03/2018 02:03:55 PMHPDDepartment of Housing Preservation and Develop...HEAT/HOT WATERENTIRE BUILDINGRESIDENTIAL BUILDING11221729 LAFAYETTE AVENUE...NaNNaNNaNNaNNaNNaNNaN40.690733-73.943964(40.69073285353906, -73.943963521266)
33988623308/01/2018 12:06:29 AM08/01/2018 02:54:24 AMNYPDNew York City Police DepartmentNoise - ResidentialBanging/PoundingResidential Building/House1169382-01 BEACH CHANNEL DRIVE...NaNNaNNaNNaNNaNNaNNaN40.589931-73.808896(40.58993080750793, -73.80889570815852)
43988030908/01/2018 12:06:51 AM08/01/2018 04:54:26 AMNYPDNew York City Police DepartmentNoise - ResidentialLoud Music/PartyResidential Building/House1121664 HERKIMER STREET...NaNNaNNaNNaNNaNNaNNaN40.679716-73.951234(40.67971590505359, -73.95123396494363)
\n", "

5 rows × 41 columns

\n", "
" ], "text/plain": [ " Unique Key Created Date Closed Date Agency \\\n", "0 39885889 08/01/2018 12:05:13 AM 08/01/2018 12:05:13 AM DOT \n", "1 39886470 08/01/2018 12:06:05 AM 08/01/2018 12:06:05 AM DOT \n", "2 39893543 08/01/2018 12:06:16 AM 08/03/2018 02:03:55 PM HPD \n", "3 39886233 08/01/2018 12:06:29 AM 08/01/2018 02:54:24 AM NYPD \n", "4 39880309 08/01/2018 12:06:51 AM 08/01/2018 04:54:26 AM NYPD \n", "\n", " Agency Name Complaint Type \\\n", "0 Department of Transportation Street Condition \n", "1 Department of Transportation Street Condition \n", "2 Department of Housing Preservation and Develop... HEAT/HOT WATER \n", "3 New York City Police Department Noise - Residential \n", "4 New York City Police Department Noise - Residential \n", "\n", " Descriptor Location Type Incident Zip \\\n", "0 Pothole NaN 11235 \n", "1 Pothole NaN 11235 \n", "2 ENTIRE BUILDING RESIDENTIAL BUILDING 11221 \n", "3 Banging/Pounding Residential Building/House 11693 \n", "4 Loud Music/Party Residential Building/House 11216 \n", "\n", " Incident Address ... Vehicle Type Taxi Company Borough \\\n", "0 3143 SHORE PARKWAY ... NaN NaN \n", "1 3153 SHORE PARKWAY ... NaN NaN \n", "2 729 LAFAYETTE AVENUE ... NaN NaN \n", "3 82-01 BEACH CHANNEL DRIVE ... NaN NaN \n", "4 64 HERKIMER STREET ... NaN NaN \n", "\n", " Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " Road Ramp Bridge Highway Segment Latitude Longitude \\\n", "0 NaN NaN 40.585156 -73.959119 \n", "1 NaN NaN 40.585218 -73.958608 \n", "2 NaN NaN 40.690733 -73.943964 \n", "3 NaN NaN 40.589931 -73.808896 \n", "4 NaN NaN 40.679716 -73.951234 \n", "\n", " Location \n", "0 (40.585155533520144, -73.95911915841708) \n", "1 (40.58521848090658, -73.95860788382927) \n", "2 (40.69073285353906, -73.943963521266) \n", "3 (40.58993080750793, -73.80889570815852) \n", "4 (40.67971590505359, -73.95123396494363) \n", "\n", "[5 rows x 41 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "requests.head() # defaults to providing the first 5 if you don't specify a number" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:06.812229Z", "iopub.status.busy": "2024-03-18T21:43:06.812114Z", "iopub.status.idle": "2024-03-18T21:43:06.819498Z", "shell.execute_reply": "2024-03-18T21:43:06.819232Z" }, "id": "WgDL8mn5MPTh", "outputId": "6fae2f9d-681d-454b-f80d-809df194e855", "scrolled": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident Address...Vehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentLatitudeLongitudeLocation
4999904362224708/24/2019 01:33:43 AMNaNNYPDNew York City Police DepartmentNoise - ResidentialLoud Music/PartyResidential Building/House11385.0578 FAIRVIEW AVENUE...NaNNaNNaNNaNNaNNaNNaN40.707576-73.907325(40.70757578135031, -73.90732527364065)
4999914362087708/24/2019 01:34:32 AMNaNNYPDNew York City Police DepartmentNoise - ResidentialLoud Music/PartyResidential Building/House11201.0160 NAVY WALK...NaNNaNNaNNaNNaNNaNNaN40.693967-73.980210(40.6939671536727, -73.98020958205214)
4999924361923208/24/2019 01:38:44 AMNaNNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud Music/PartyStreet/Sidewalk11238.0981 DEAN STREET...NaNNaNNaNNaNNaNNaNNaN40.678030-73.957620(40.67803039848778, -73.95762012074778)
4999934362661308/24/2019 01:43:57 AMNaNNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud TalkingStreet/Sidewalk10023.0WEST 65 STREET...NaNNaNNaNNaNNaNNaNNaN40.775372-73.987710(40.7753720958196, -73.98770974232366)
4999944361975608/24/2019 01:44:27 AMNaNNYPDNew York City Police DepartmentNoise - ResidentialLoud Music/PartyResidential Building/House11208.0211 NICHOLS AVENUE...NaNNaNNaNNaNNaNNaNNaN40.685495-73.868876(40.68549515215576, -73.8688762456483)
4999954362230208/24/2019 01:46:09 AMNaNNYPDNew York City Police DepartmentNoise - ResidentialLoud Music/PartyResidential Building/House10009.0431 EAST 9 STREET...NaNNaNNaNNaNNaNNaNNaN40.727536-73.983295(40.72753608835362, -73.98329522742081)
4999964361970908/24/2019 01:49:49 AMNaNNYPDNew York City Police DepartmentNoise - ResidentialLoud Music/PartyResidential Building/House10304.0191 BROAD STREET...NaNNaNNaNNaNNaNNaNNaN40.624157-74.081006(40.62415703282506, -74.08100614362155)
4999974362312408/24/2019 01:56:35 AMNaNNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud Music/PartyStreet/Sidewalk10031.0534 WEST 153 STREET...NaNNaNNaNNaNNaNNaNNaN40.830718-73.945006(40.83071800761314, -73.94500557250639)
4999984362559508/24/2019 01:56:40 AMNaNNYPDNew York City Police DepartmentNoise - CommercialLoud Music/PartyClub/Bar/Restaurant10452.0EAST 170 STREET...NaNNaNNaNNaNNaNNaNNaN40.839882-73.916783(40.839882158779105, -73.91678321635897)
4999994362281708/24/2019 01:57:58 AMNaNNYPDNew York City Police DepartmentNoise - CommercialLoud Music/PartyStore/Commercial10033.0247 AUDUBON AVENUE...NaNNaNNaNNaNNaNNaNNaN40.846376-73.934048(40.84637632367179, -73.93404825809533)
\n", "

10 rows × 41 columns

\n", "
" ], "text/plain": [ " Unique Key Created Date Closed Date Agency \\\n", "499990 43622247 08/24/2019 01:33:43 AM NaN NYPD \n", "499991 43620877 08/24/2019 01:34:32 AM NaN NYPD \n", "499992 43619232 08/24/2019 01:38:44 AM NaN NYPD \n", "499993 43626613 08/24/2019 01:43:57 AM NaN NYPD \n", "499994 43619756 08/24/2019 01:44:27 AM NaN NYPD \n", "499995 43622302 08/24/2019 01:46:09 AM NaN NYPD \n", "499996 43619709 08/24/2019 01:49:49 AM NaN NYPD \n", "499997 43623124 08/24/2019 01:56:35 AM NaN NYPD \n", "499998 43625595 08/24/2019 01:56:40 AM NaN NYPD \n", "499999 43622817 08/24/2019 01:57:58 AM NaN NYPD \n", "\n", " Agency Name Complaint Type \\\n", "499990 New York City Police Department Noise - Residential \n", "499991 New York City Police Department Noise - Residential \n", "499992 New York City Police Department Noise - Street/Sidewalk \n", "499993 New York City Police Department Noise - Street/Sidewalk \n", "499994 New York City Police Department Noise - Residential \n", "499995 New York City Police Department Noise - Residential \n", "499996 New York City Police Department Noise - Residential \n", "499997 New York City Police Department Noise - Street/Sidewalk \n", "499998 New York City Police Department Noise - Commercial \n", "499999 New York City Police Department Noise - Commercial \n", "\n", " Descriptor Location Type Incident Zip \\\n", "499990 Loud Music/Party Residential Building/House 11385.0 \n", "499991 Loud Music/Party Residential Building/House 11201.0 \n", "499992 Loud Music/Party Street/Sidewalk 11238.0 \n", "499993 Loud Talking Street/Sidewalk 10023.0 \n", "499994 Loud Music/Party Residential Building/House 11208.0 \n", "499995 Loud Music/Party Residential Building/House 10009.0 \n", "499996 Loud Music/Party Residential Building/House 10304.0 \n", "499997 Loud Music/Party Street/Sidewalk 10031.0 \n", "499998 Loud Music/Party Club/Bar/Restaurant 10452.0 \n", "499999 Loud Music/Party Store/Commercial 10033.0 \n", "\n", " Incident Address ... Vehicle Type Taxi Company Borough \\\n", "499990 578 FAIRVIEW AVENUE ... NaN NaN \n", "499991 160 NAVY WALK ... NaN NaN \n", "499992 981 DEAN STREET ... NaN NaN \n", "499993 WEST 65 STREET ... NaN NaN \n", "499994 211 NICHOLS AVENUE ... NaN NaN \n", "499995 431 EAST 9 STREET ... NaN NaN \n", "499996 191 BROAD STREET ... NaN NaN \n", "499997 534 WEST 153 STREET ... NaN NaN \n", "499998 EAST 170 STREET ... NaN NaN \n", "499999 247 AUDUBON AVENUE ... NaN NaN \n", "\n", " Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction \\\n", "499990 NaN NaN NaN \n", "499991 NaN NaN NaN \n", "499992 NaN NaN NaN \n", "499993 NaN NaN NaN \n", "499994 NaN NaN NaN \n", "499995 NaN NaN NaN \n", "499996 NaN NaN NaN \n", "499997 NaN NaN NaN \n", "499998 NaN NaN NaN \n", "499999 NaN NaN NaN \n", "\n", " Road Ramp Bridge Highway Segment Latitude Longitude \\\n", "499990 NaN NaN 40.707576 -73.907325 \n", "499991 NaN NaN 40.693967 -73.980210 \n", "499992 NaN NaN 40.678030 -73.957620 \n", "499993 NaN NaN 40.775372 -73.987710 \n", "499994 NaN NaN 40.685495 -73.868876 \n", "499995 NaN NaN 40.727536 -73.983295 \n", "499996 NaN NaN 40.624157 -74.081006 \n", "499997 NaN NaN 40.830718 -73.945006 \n", "499998 NaN NaN 40.839882 -73.916783 \n", "499999 NaN NaN 40.846376 -73.934048 \n", "\n", " Location \n", "499990 (40.70757578135031, -73.90732527364065) \n", "499991 (40.6939671536727, -73.98020958205214) \n", "499992 (40.67803039848778, -73.95762012074778) \n", "499993 (40.7753720958196, -73.98770974232366) \n", "499994 (40.68549515215576, -73.8688762456483) \n", "499995 (40.72753608835362, -73.98329522742081) \n", "499996 (40.62415703282506, -74.08100614362155) \n", "499997 (40.83071800761314, -73.94500557250639) \n", "499998 (40.839882158779105, -73.91678321635897) \n", "499999 (40.84637632367179, -73.93404825809533) \n", "\n", "[10 rows x 41 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "requests.tail(10) # last 10 records in the dataframe" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:06.820998Z", "iopub.status.busy": "2024-03-18T21:43:06.820891Z", "iopub.status.idle": "2024-03-18T21:43:06.833115Z", "shell.execute_reply": "2024-03-18T21:43:06.832859Z" }, "id": "eH6-z-3vM0QI", "outputId": "46ab7edf-2b18-43f2-8d83-f90f0cc91be3", "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident Address...Vehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentLatitudeLongitudeLocation
3111504204461803/25/2019 10:16:38 PM03/25/2019 11:47:09 PMNYPDNew York City Police DepartmentNoise - ResidentialLoud TalkingResidential Building/House10453.01800 POPHAM AVENUE...NaNNaNNaNNaNNaNNaNNaN40.851389-73.917651(40.85138859491611, -73.91765079814148)
321424010778808/25/2018 10:24:40 AM08/25/2018 04:30:08 PMNYPDNew York City Police DepartmentBlocked DrivewayNo AccessStreet/Sidewalk11419.0133-05 107 AVENUE...NaNNaNNaNNaNNaNNaNNaN40.687721-73.811536(40.6877211401399, -73.8115361385404)
1208494069241710/29/2018 07:23:28 AM11/01/2018 09:52:41 PMHPDDepartment of Housing Preservation and Develop...HEAT/HOT WATERAPARTMENT ONLYRESIDENTIAL BUILDING10040660 FT WASHINGTON AVENUE...NaNNaNNaNNaNNaNNaNNaN40.856431-73.936092(40.856431325229096, -73.93609157070478)
4715824338639207/26/2019 11:17:21 PM07/27/2019 02:36:26 AMNYPDNew York City Police DepartmentNoise - ResidentialLoud Music/PartyResidential Building/House10303.0351 HARBOR ROAD...NaNNaNNaNNaNNaNNaNNaN40.627510-74.160446(40.62751009515846, -74.16044618563123)
4552134325378007/11/2019 03:23:28 PM07/12/2019 11:25:43 AMNYPDNew York City Police DepartmentIllegal ParkingDouble Parked Blocking TrafficStreet/Sidewalk11204.061 STREET...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

5 rows × 41 columns

\n", "
" ], "text/plain": [ " Unique Key Created Date Closed Date Agency \\\n", "311150 42044618 03/25/2019 10:16:38 PM 03/25/2019 11:47:09 PM NYPD \n", "32142 40107788 08/25/2018 10:24:40 AM 08/25/2018 04:30:08 PM NYPD \n", "120849 40692417 10/29/2018 07:23:28 AM 11/01/2018 09:52:41 PM HPD \n", "471582 43386392 07/26/2019 11:17:21 PM 07/27/2019 02:36:26 AM NYPD \n", "455213 43253780 07/11/2019 03:23:28 PM 07/12/2019 11:25:43 AM NYPD \n", "\n", " Agency Name \\\n", "311150 New York City Police Department \n", "32142 New York City Police Department \n", "120849 Department of Housing Preservation and Develop... \n", "471582 New York City Police Department \n", "455213 New York City Police Department \n", "\n", " Complaint Type Descriptor \\\n", "311150 Noise - Residential Loud Talking \n", "32142 Blocked Driveway No Access \n", "120849 HEAT/HOT WATER APARTMENT ONLY \n", "471582 Noise - Residential Loud Music/Party \n", "455213 Illegal Parking Double Parked Blocking Traffic \n", "\n", " Location Type Incident Zip Incident Address \\\n", "311150 Residential Building/House 10453.0 1800 POPHAM AVENUE \n", "32142 Street/Sidewalk 11419.0 133-05 107 AVENUE \n", "120849 RESIDENTIAL BUILDING 10040 660 FT WASHINGTON AVENUE \n", "471582 Residential Building/House 10303.0 351 HARBOR ROAD \n", "455213 Street/Sidewalk 11204.0 61 STREET \n", "\n", " ... Vehicle Type Taxi Company Borough Taxi Pick Up Location \\\n", "311150 ... NaN NaN NaN \n", "32142 ... NaN NaN NaN \n", "120849 ... NaN NaN NaN \n", "471582 ... NaN NaN NaN \n", "455213 ... NaN NaN NaN \n", "\n", " Bridge Highway Name Bridge Highway Direction Road Ramp \\\n", "311150 NaN NaN NaN \n", "32142 NaN NaN NaN \n", "120849 NaN NaN NaN \n", "471582 NaN NaN NaN \n", "455213 NaN NaN NaN \n", "\n", " Bridge Highway Segment Latitude Longitude \\\n", "311150 NaN 40.851389 -73.917651 \n", "32142 NaN 40.687721 -73.811536 \n", "120849 NaN 40.856431 -73.936092 \n", "471582 NaN 40.627510 -74.160446 \n", "455213 NaN NaN NaN \n", "\n", " Location \n", "311150 (40.85138859491611, -73.91765079814148) \n", "32142 (40.6877211401399, -73.8115361385404) \n", "120849 (40.856431325229096, -73.93609157070478) \n", "471582 (40.62751009515846, -74.16044618563123) \n", "455213 NaN \n", "\n", "[5 rows x 41 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "requests.sample(5) # random sample of size determined by you" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Pandas data structures\n", "\n", "\n", "\n", "![Diagram showing a DataFrame, Series, labels, and indexes](extras/img/data_structures-1.png)" ] }, { "cell_type": "markdown", "metadata": { "id": "f-YNgxGPNRgi", "slideshow": { "slide_type": "slide" } }, "source": [ "## How many records are in the dataset?" ] }, { "cell_type": "markdown", "metadata": { "id": "7DPo85wSNU6q", "slideshow": { "slide_type": "subslide" }, "toc-hr-collapsed": true, "toc-nb-collapsed": true }, "source": [ "### `info()` method" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:06.834842Z", "iopub.status.busy": "2024-03-18T21:43:06.834732Z", "iopub.status.idle": "2024-03-18T21:43:07.889619Z", "shell.execute_reply": "2024-03-18T21:43:07.889306Z" }, "id": "--ben4hfmTaB", "outputId": "62bae542-8fda-40c4-82f6-7a6410c2a90b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 500000 entries, 0 to 499999\n", "Data columns (total 41 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Unique Key 500000 non-null int64 \n", " 1 Created Date 500000 non-null object \n", " 2 Closed Date 476156 non-null object \n", " 3 Agency 500000 non-null object \n", " 4 Agency Name 500000 non-null object \n", " 5 Complaint Type 500000 non-null object \n", " 6 Descriptor 492534 non-null object \n", " 7 Location Type 392590 non-null object \n", " 8 Incident Zip 480411 non-null object \n", " 9 Incident Address 434544 non-null object \n", " 10 Street Name 434519 non-null object \n", " 11 Cross Street 1 300838 non-null object \n", " 12 Cross Street 2 299635 non-null object \n", " 13 Intersection Street 1 107383 non-null object \n", " 14 Intersection Street 2 107048 non-null object \n", " 15 Address Type 451018 non-null object \n", " 16 City 476649 non-null object \n", " 17 Landmark 32521 non-null object \n", " 18 Facility Type 134923 non-null object \n", " 19 Status 500000 non-null object \n", " 20 Due Date 171559 non-null object \n", " 21 Resolution Description 457389 non-null object \n", " 22 Resolution Action Updated Date 488801 non-null object \n", " 23 Community Board 500000 non-null object \n", " 24 BBL 407355 non-null float64\n", " 25 Borough 500000 non-null object \n", " 26 X Coordinate (State Plane) 470830 non-null float64\n", " 27 Y Coordinate (State Plane) 470830 non-null float64\n", " 28 Open Data Channel Type 500000 non-null object \n", " 29 Park Facility Name 499973 non-null object \n", " 30 Park Borough 500000 non-null object \n", " 31 Vehicle Type 37 non-null object \n", " 32 Taxi Company Borough 403 non-null object \n", " 33 Taxi Pick Up Location 4475 non-null object \n", " 34 Bridge Highway Name 697 non-null object \n", " 35 Bridge Highway Direction 766 non-null object \n", " 36 Road Ramp 759 non-null object \n", " 37 Bridge Highway Segment 1027 non-null object \n", " 38 Latitude 470830 non-null float64\n", " 39 Longitude 470830 non-null float64\n", " 40 Location 470830 non-null object \n", "dtypes: float64(5), int64(1), object(35)\n", "memory usage: 156.4+ MB\n" ] } ], "source": [ "requests.info()" ] }, { "cell_type": "markdown", "metadata": { "id": "jy54_jkFNYm3", "slideshow": { "slide_type": "slide" } }, "source": [ "## What are the distinct sets of values in columns that seem most useful?" ] }, { "cell_type": "markdown", "metadata": { "id": "UqfGEk09uHsN", "slideshow": { "slide_type": "subslide" } }, "source": [ "### [`unique()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.unique.html) method for getting list of unique values\n", "\n", "Let's look at the \"status\" column. What are the status options for these 311 complaints?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:07.891481Z", "iopub.status.busy": "2024-03-18T21:43:07.891353Z", "iopub.status.idle": "2024-03-18T21:43:07.905418Z", "shell.execute_reply": "2024-03-18T21:43:07.905137Z" }, "id": "cIVITVAQN3_c", "outputId": "d98eba00-5a05-4327-ea0a-eeb92d63a09d" }, "outputs": [ { "data": { "text/plain": [ "array(['Closed', 'In Progress', 'Pending', 'Started', 'Open', 'Assigned',\n", " 'Email Sent'], dtype=object)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "requests[\"Status\"].unique()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:07.906984Z", "iopub.status.busy": "2024-03-18T21:43:07.906885Z", "iopub.status.idle": "2024-03-18T21:43:07.923033Z", "shell.execute_reply": "2024-03-18T21:43:07.922733Z" }, "id": "W_tjsqNSp5vd", "outputId": "672b21e3-d0ca-45e3-ed2b-964187e8276b", "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array(['UNKNOWN', 'MOBILE', 'PHONE', 'ONLINE', 'OTHER'], dtype=object)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "requests[\"Open Data Channel Type\"].unique()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:07.924847Z", "iopub.status.busy": "2024-03-18T21:43:07.924736Z", "iopub.status.idle": "2024-03-18T21:43:07.939335Z", "shell.execute_reply": "2024-03-18T21:43:07.939051Z" }, "id": "uYrNPIqfqhz5", "outputId": "70677b3d-8f73-4a69-d6ed-86a7821197bf", "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array(['DOT', 'HPD', 'NYPD', 'DSNY', 'DEP', 'DOB', 'DPR', 'DOHMH', 'DHS',\n", " 'DCA', 'DOF', 'DFTA', 'TLC', 'HRA', 'EDC', 'DOE', 'DCAS', 'DOITT',\n", " 'ACS', 'DCP', 'NYCEM', 'FDNY', 'TAX', 'MOC', 'TAT', 'COIB', 'DVS'],\n", " dtype=object)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "requests[\"Agency\"].unique()" ] }, { "cell_type": "markdown", "metadata": { "id": "qeYA8-rMlpJa", "slideshow": { "slide_type": "slide" } }, "source": [ "## Excluding bad records from the dataframe" ] }, { "cell_type": "markdown", "metadata": { "id": "RgP7ehPsmozX", "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's look at the list of distinct complaint types." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:07.941031Z", "iopub.status.busy": "2024-03-18T21:43:07.940907Z", "iopub.status.idle": "2024-03-18T21:43:07.962391Z", "shell.execute_reply": "2024-03-18T21:43:07.962133Z" }, "id": "cU0-Bnjfb5ft", "outputId": "a9da5ac2-ca37-4897-cf03-199f783dc430" }, "outputs": [ { "data": { "text/plain": [ "array(['Street Condition', 'HEAT/HOT WATER', 'Noise - Residential',\n", " 'Illegal Parking', 'Request Large Bulky Item Collection', 'Noise',\n", " 'Noise - Street/Sidewalk', 'Electronics Waste Appointment',\n", " 'Blocked Driveway', 'Dirty Conditions', 'Curb Condition',\n", " 'Noise - Commercial', 'General Construction/Plumbing',\n", " 'Traffic Signal Condition', 'Street Light Condition', 'Lead',\n", " 'Street Sign - Damaged', 'Noise - Vehicle', 'New Tree Request',\n", " 'Sanitation Condition', 'Mosquitoes', 'WATER LEAK',\n", " 'UNSANITARY CONDITION', 'Root/Sewer/Sidewalk Condition',\n", " 'Dead/Dying Tree', 'Derelict Vehicles', 'Collection Truck Noise',\n", " 'Sewer', 'GENERAL', 'Overflowing Litter Baskets', 'Vacant Lot',\n", " 'Sidewalk Condition', 'PAINT/PLASTER', 'Building/Use',\n", " 'Street Sign - Dangling', 'Construction Safety Enforcement',\n", " 'PLUMBING', 'Derelict Vehicle', 'Homeless Person Assistance',\n", " 'ELECTRIC', 'Water System', 'Damaged Tree',\n", " 'Missed Collection (All Materials)', 'Animal Abuse',\n", " 'Borough Office', 'Elevator', 'Food Establishment',\n", " 'Maintenance or Facility', 'FLOORING/STAIRS', 'Consumer Complaint',\n", " 'Overgrown Tree/Branches', 'Graffiti', 'APPLIANCE',\n", " 'Other Enforcement', 'DOF Property - Payment Issue',\n", " 'Hazardous Materials', 'Water Conservation', 'Water Quality',\n", " 'Plumbing', 'DOF Property - Update Account', 'DOOR/WINDOW',\n", " 'Housing - Low Income Senior', 'BEST/Site Safety', 'SCRIE',\n", " 'Animal in a Park', 'DOF Parking - Payment Issue',\n", " 'For Hire Vehicle Complaint', 'Benefit Card Replacement',\n", " 'DOF Property - Reduction Issue',\n", " 'Special Projects Inspection Team (SPIT)', 'SAFETY',\n", " 'Taxi Complaint', 'Beach/Pool/Sauna Complaint',\n", " 'Homeless Encampment', 'DOF Property - RPIE Issue',\n", " 'Recycling Enforcement', 'DPR Internal', 'DRIE',\n", " 'Street Sign - Missing', 'Taxi Report',\n", " 'DOF Parking - Request Status', 'Air Quality',\n", " 'Broken Parking Meter', 'Mold', 'HPD Literature Request',\n", " 'Poison Ivy', 'Non-Emergency Police Matter',\n", " 'DOF Property - Request Copy', 'Housing Options',\n", " 'Illegal Tree Damage', 'Traffic', 'Noise - Park', 'Asbestos',\n", " 'DCA / DOH New License Application Request', 'Cooling Tower',\n", " 'Drinking', 'Indoor Sewage', 'Vending',\n", " 'Emergency Response Team (ERT)', 'Electrical',\n", " 'DOF Property - Owner Issue', 'Parking Card', 'Indoor Air Quality',\n", " 'Violation of Park Rules', 'Mobile Food Vendor',\n", " 'Highway Condition', 'ELEVATOR', 'Noise - Helicopter',\n", " 'Unsanitary Animal Pvt Property', 'Rodent', 'Standing Water',\n", " 'Food Poisoning', 'Litter Basket / Request', 'Day Care',\n", " 'Executive Inspections', 'DOF Parking - Tax Exemption',\n", " 'OUTSIDE BUILDING', 'DOF Parking - Request Copy',\n", " 'DOF Property - Property Value', 'School Maintenance', 'Question',\n", " 'Industrial Waste', 'Senior Center Complaint', 'Smoking',\n", " 'Quality of Life', 'Derelict Bicycle', 'Ferry Complaint',\n", " 'Sweeping/Missed', 'Ferry Inquiry', 'Sweeping/Inadequate',\n", " 'Found Property', 'LinkNYC', 'Sustainability Enforcement',\n", " 'Taxpayer Advocate Inquiry', 'Investigations and Discipline (IAD)',\n", " 'Noise - House of Worship', 'Scaffold Safety', 'Pet Shop',\n", " 'Bus Stop Shelter Complaint', 'Forms', 'Elder Abuse',\n", " 'Drug Activity', 'Unsanitary Pigeon Condition', 'Comments',\n", " 'Boilers', 'Building Marshals office',\n", " 'Home Delivered Meal - Missed Delivery',\n", " 'Bike/Roller/Skate Chronic', 'Posting Advertisement',\n", " 'Research Questions', 'Urinating in Public',\n", " 'Public Payphone Complaint', 'Unleashed Dog', 'Facades',\n", " 'Illegal Animal Sold', 'Cranes and Derricks',\n", " 'Bus Stop Shelter Placement', 'Panhandling', 'Bridge Condition',\n", " 'Illegal Animal Kept as Pet', 'OEM Literature Request',\n", " 'Advocate-Personal Exemptions', 'Abandoned Vehicle',\n", " \"Alzheimer's Care\", 'Taxi Compliment', 'Disorderly Youth',\n", " 'Public Assembly', 'Advocate-Prop Refunds/Credits',\n", " 'Unsanitary Condition', 'DOF Parking - DMV Clearance',\n", " 'Home Delivered Meal Complaint',\n", " 'Transportation Provider Complaint', 'Ferry Permit',\n", " 'For Hire Vehicle Report', 'Harboring Bees/Wasps',\n", " 'Advocate - Other', 'Home Care Provider Complaint',\n", " 'Miscellaneous Categories', 'Municipal Parking Facility', 'FATF',\n", " 'Special Natural Area District (SNAD)', 'Illegal Fireworks',\n", " 'Highway Sign - Damaged', 'Drinking Water', 'Calorie Labeling',\n", " 'Case Management Agency Complaint', 'Bike Rack Condition',\n", " 'Unsanitary Animal Facility', 'Plant', 'Public Toilet',\n", " 'Tattooing', 'Animal Facility - No Permit', 'Electronics Waste',\n", " 'General', 'Snow', 'Unlicensed Dog', 'Taxi Licensee Complaint',\n", " 'FHV Licensee Complaint', 'Overflowing Recycling Baskets',\n", " 'Highway Sign - Missing', 'Non-Residential Heat',\n", " 'Special Operations', 'Safety', 'Radioactive Material',\n", " 'Dispatched Taxi Complaint', 'Lifeguard', 'Window Guard',\n", " 'Select Message Type...', 'Bereavement Support Group',\n", " 'Sweeping/Missed-Inadequate', 'Squeegee', 'Tunnel Condition',\n", " 'Foam Ban Enforcement', 'Request Xmas Tree Collection',\n", " 'Advocate-Business Tax', 'Advocate - RPIE',\n", " 'Highway Sign - Dangling', 'X-Ray Machine/Equipment',\n", " 'Advocate-Co-opCondo Abatement', 'Forensic Engineering',\n", " 'Outside Building', 'Lost Property',\n", " '(select extractvalue(xmltyp...', \"Misc. Comments' and 7702=77...\",\n", " 'qfix4${695*589}lixaf', \"eval(compile('for x in rang...\",\n", " 'Misc. Comments\"|echo 2...', 'Misc. Comments\"|ping -...',\n", " \"Misc. Comments'|echo y9v6fc...\", 'file:///c:/windows/win.ini',\n", " \"Misc. Comments'|ping -c 21 ...\", '../../../../../../../../../...',\n", " '\\\\windows\\\\win.ini', '..././..././..././..././......',\n", " 'idexf3mrb7)(!(objectClass=*)', 'Misc. Comments BCC:a5glxzf...',\n", " '$(sleep 11)', '@(9313*3464)', \"{!xmlparser v='\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Complaint Typecount
0$(sleep 11)1
1(select extractvalue(xmltyp...1
2..././..././..././..././......1
3../../../../../../../../../...2
4../WEB-INF/web.xml1
.........
243eval(compile('for x in rang...1
244file:///c:/windows/win.ini1
245idexf3mrb7)(!(objectClass=*)1
246qfix4${695*589}lixaf1
247{!xmlparser v='<!DOCTYPE...1
\n", "

248 rows × 2 columns

\n", "" ], "text/plain": [ " Complaint Type count\n", "0 $(sleep 11) 1\n", "1 (select extractvalue(xmltyp... 1\n", "2 ..././..././..././..././...... 1\n", "3 ../../../../../../../../../... 2\n", "4 ../WEB-INF/web.xml 1\n", ".. ... ...\n", "243 eval(compile('for x in rang... 1\n", "244 file:///c:/windows/win.ini 1\n", "245 idexf3mrb7)(!(objectClass=*) 1\n", "246 qfix4${695*589}lixaf 1\n", "247 {!xmlparser v='= 3.**\n", "\n", "Remember: A single column from a pandas dataframe is called a series. It's essentially a list containing all the values in the column." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```python\n", "requests.groupby('Complaint Type').size().reset_index(name='size')\n", "```\n", "\n", "is equivalent to:\n", "\n", "```python\n", "requests.groupby('Complaint Type', as_index=False).size()\n", "```\n", "\n", "Without the `reset_index()` or `as_index=False`, the `Complaint Type` values are returned as the index. See also: [`value_counts()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html)." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:08.042269Z", "iopub.status.busy": "2024-03-18T21:43:08.042142Z", "iopub.status.idle": "2024-03-18T21:43:08.044993Z", "shell.execute_reply": "2024-03-18T21:43:08.044740Z" }, "id": "dztWQGD3uxbD", "outputId": "62507280-0ba5-4cb0-f809-d4b8d0c2fbf6", "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "6 APPLIANCE\n", "7 Abandoned Vehicle\n", "8 Advocate - Other\n", "11 Advocate-Co-opCondo Abatement\n", "12 Advocate-Personal Exemptions\n", " ... \n", "236 WATER LEAK\n", "237 Water Conservation\n", "238 Water Quality\n", "239 Water System\n", "240 Window Guard\n", "Name: Complaint Type, Length: 213, dtype: object" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "valid_complaint_types = counts[\"Complaint Type\"][counts[\"count\"] >= 3]\n", "valid_complaint_types" ] }, { "cell_type": "markdown", "metadata": { "id": "5AHFe3Nhp1gz", "slideshow": { "slide_type": "subslide" } }, "source": [ "Filter our `requests` dataframe to only keep the rows where the `Complaint Type` value is in the `valid_complaint_types` series we created in the previous step. Save the result in a new dataframe." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:08.046491Z", "iopub.status.busy": "2024-03-18T21:43:08.046388Z", "iopub.status.idle": "2024-03-18T21:43:08.169343Z", "shell.execute_reply": "2024-03-18T21:43:08.169044Z" }, "id": "PII26jb0g8Eg", "outputId": "8c839d5d-da47-4576-b1cb-86a147ec52e7" }, "outputs": [], "source": [ "df_cleaned = requests[requests[\"Complaint Type\"].isin(valid_complaint_types)]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "How can we make sure this worked? Let's check how many records there were originally in `requests` vs how many are in `df_cleaned`.\n", "\n", "Before:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:08.171151Z", "iopub.status.busy": "2024-03-18T21:43:08.171052Z", "iopub.status.idle": "2024-03-18T21:43:08.173266Z", "shell.execute_reply": "2024-03-18T21:43:08.173028Z" } }, "outputs": [ { "data": { "text/plain": [ "500000" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "requests[\"Unique Key\"].size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:08.174939Z", "iopub.status.busy": "2024-03-18T21:43:08.174811Z", "iopub.status.idle": "2024-03-18T21:43:08.177060Z", "shell.execute_reply": "2024-03-18T21:43:08.176785Z" } }, "outputs": [ { "data": { "text/plain": [ "499958" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_cleaned[\"Unique Key\"].size" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can also print the set of complaint_type values from our cleaned dataframe to make sure they look correct." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:08.178630Z", "iopub.status.busy": "2024-03-18T21:43:08.178519Z", "iopub.status.idle": "2024-03-18T21:43:08.186941Z", "shell.execute_reply": "2024-03-18T21:43:08.186683Z" } }, "outputs": [ { "data": { "text/plain": [ "349480 Borough Office\n", "254578 PLUMBING\n", "198644 Request Large Bulky Item Collection\n", "319316 Street Light Condition\n", "29580 General Construction/Plumbing\n", "398404 Street Light Condition\n", "303870 HPD Literature Request\n", "86399 Sewer\n", "22631 Noise - Street/Sidewalk\n", "398130 Illegal Parking\n", "Name: Complaint Type, dtype: object" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_cleaned[\"Complaint Type\"].sample(10)" ] }, { "cell_type": "markdown", "metadata": { "id": "hNv98pVghJ1J", "slideshow": { "slide_type": "fragment" } }, "source": [ "Great, now those invalid records will be excluded from our analysis!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Another approach to excluding those invalid records would be to use [regular expressions (\"RegExes\")](https://www.w3schools.com/python/python_regex.asp) to find records with weird characters." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Filtering rows\n", "\n", "Slicing and dicing is done through [indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![DataFrame](extras/img/data_structures-2.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Boolean indexing\n", "\n", "![DataFrame and Series](extras/img/data_structures-3.jpg)\n", "\n", "_The table on the right is missing the `state` column - need to fix._" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### How it works" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": { "id": "SLUy01x4h_I3", "slideshow": { "slide_type": "slide" } }, "source": [ "## Done with clean up! Time for the actual analysis: \n", "### Which 311 complaints are most common and which agencies are responsible for handling them?" ] }, { "cell_type": "markdown", "metadata": { "id": "HkmnJPxfiJrY", "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Which complaints are the most common?" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:08.188911Z", "iopub.status.busy": "2024-03-18T21:43:08.188796Z", "iopub.status.idle": "2024-03-18T21:43:08.213778Z", "shell.execute_reply": "2024-03-18T21:43:08.213514Z" }, "id": "S90gXzXAiFoC", "outputId": "7fb263d5-a01c-42ca-ee21-267a0c0c18bf" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Complaint Typecount
0APPLIANCE2539
1Abandoned Vehicle1655
2Advocate - Other26
3Advocate-Co-opCondo Abatement4
4Advocate-Personal Exemptions24
.........
208WATER LEAK6641
209Water Conservation853
210Water Quality332
211Water System12949
212Window Guard6
\n", "

213 rows × 2 columns

\n", "
" ], "text/plain": [ " Complaint Type count\n", "0 APPLIANCE 2539\n", "1 Abandoned Vehicle 1655\n", "2 Advocate - Other 26\n", "3 Advocate-Co-opCondo Abatement 4\n", "4 Advocate-Personal Exemptions 24\n", ".. ... ...\n", "208 WATER LEAK 6641\n", "209 Water Conservation 853\n", "210 Water Quality 332\n", "211 Water System 12949\n", "212 Window Guard 6\n", "\n", "[213 rows x 2 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_cleaned.groupby(\"Complaint Type\").size().reset_index(name=\"count\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`.reset_index(name='count')` isn't necessary, but it's helpful to include because it allows us to name the new column that contains the count of rows." ] }, { "cell_type": "markdown", "metadata": { "id": "J8kKgIZdiOom", "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Which agencies are responsible for handling these complaint categories?" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:08.215610Z", "iopub.status.busy": "2024-03-18T21:43:08.215488Z", "iopub.status.idle": "2024-03-18T21:43:08.258566Z", "shell.execute_reply": "2024-03-18T21:43:08.258285Z" }, "id": "UVRyyXt8iUgF", "outputId": "b570ace6-1864-48b3-8ff7-17d5a8b1e024" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgencyComplaint Typecount
0ACSForms56
1COIBForms1
2DCAConsumer Complaint2892
3DCADCA / DOH New License Application Request186
4DCASComments13
............
216TLCLost Property472
217TLCTaxi Complaint2416
218TLCTaxi Compliment41
219TLCTaxi Licensee Complaint5
220TLCTaxi Report318
\n", "

221 rows × 3 columns

\n", "
" ], "text/plain": [ " Agency Complaint Type count\n", "0 ACS Forms 56\n", "1 COIB Forms 1\n", "2 DCA Consumer Complaint 2892\n", "3 DCA DCA / DOH New License Application Request 186\n", "4 DCAS Comments 13\n", ".. ... ... ...\n", "216 TLC Lost Property 472\n", "217 TLC Taxi Complaint 2416\n", "218 TLC Taxi Compliment 41\n", "219 TLC Taxi Licensee Complaint 5\n", "220 TLC Taxi Report 318\n", "\n", "[221 rows x 3 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_cleaned.groupby([\"Agency\", \"Complaint Type\"]).size().reset_index(name=\"count\")" ] }, { "cell_type": "markdown", "metadata": { "id": "hwUfwRFIianT", "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Which agencies receive the most total 311 requests?" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:08.260176Z", "iopub.status.busy": "2024-03-18T21:43:08.260059Z", "iopub.status.idle": "2024-03-18T21:43:08.279343Z", "shell.execute_reply": "2024-03-18T21:43:08.279069Z" }, "id": "c8go2qe9iZiE", "outputId": "21a6f3b2-da1e-4dd3-c68f-ba1f03f2b66d" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Agencycount
0ACS56
1COIB1
2DCA3078
3DCAS149
4DCP4
5DEP36283
6DFTA1182
7DHS3670
8DOB26024
9DOE442
10DOF7223
11DOHMH12690
12DOITT115
13DOT52001
14DPR21086
15DSNY73227
16EDC303
17HPD105634
18HRA1649
19NYCEM25
20NYPD148884
21TAT1
22TAX37
23TLC6194
\n", "
" ], "text/plain": [ " Agency count\n", "0 ACS 56\n", "1 COIB 1\n", "2 DCA 3078\n", "3 DCAS 149\n", "4 DCP 4\n", "5 DEP 36283\n", "6 DFTA 1182\n", "7 DHS 3670\n", "8 DOB 26024\n", "9 DOE 442\n", "10 DOF 7223\n", "11 DOHMH 12690\n", "12 DOITT 115\n", "13 DOT 52001\n", "14 DPR 21086\n", "15 DSNY 73227\n", "16 EDC 303\n", "17 HPD 105634\n", "18 HRA 1649\n", "19 NYCEM 25\n", "20 NYPD 148884\n", "21 TAT 1\n", "22 TAX 37\n", "23 TLC 6194" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_cleaned.groupby(\"Agency\").size().reset_index(name=\"count\")" ] }, { "cell_type": "markdown", "metadata": { "id": "S6ERw200iseN", "slideshow": { "slide_type": "subslide" } }, "source": [ "#### What is the most frequent request per agency?\n", "\n", "First, create a dataframe that contains the count of complaints per `Agency` per `Complaint Type`." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:08.281041Z", "iopub.status.busy": "2024-03-18T21:43:08.280931Z", "iopub.status.idle": "2024-03-18T21:43:08.324166Z", "shell.execute_reply": "2024-03-18T21:43:08.323887Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgencyComplaint Typecount
0ACSForms56
1COIBForms1
2DCAConsumer Complaint2892
3DCADCA / DOH New License Application Request186
4DCASComments13
5DCASQuestion136
6DCPResearch Questions4
7DEPAir Quality1457
8DEPAsbestos316
9DEPFATF23
10DEPHazardous Materials549
11DEPIndustrial Waste263
12DEPLead1825
13DEPNoise10937
14DEPPlant22
15DEPSewer6757
16DEPWater Conservation853
17DEPWater Quality332
18DEPWater System12949
19DFTAAlzheimer's Care25
\n", "
" ], "text/plain": [ " Agency Complaint Type count\n", "0 ACS Forms 56\n", "1 COIB Forms 1\n", "2 DCA Consumer Complaint 2892\n", "3 DCA DCA / DOH New License Application Request 186\n", "4 DCAS Comments 13\n", "5 DCAS Question 136\n", "6 DCP Research Questions 4\n", "7 DEP Air Quality 1457\n", "8 DEP Asbestos 316\n", "9 DEP FATF 23\n", "10 DEP Hazardous Materials 549\n", "11 DEP Industrial Waste 263\n", "12 DEP Lead 1825\n", "13 DEP Noise 10937\n", "14 DEP Plant 22\n", "15 DEP Sewer 6757\n", "16 DEP Water Conservation 853\n", "17 DEP Water Quality 332\n", "18 DEP Water System 12949\n", "19 DFTA Alzheimer's Care 25" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agency_counts = df_cleaned.groupby([\"Agency\", \"Complaint Type\"]).size().reset_index(name=\"count\")\n", "agency_counts.head(20)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Use `drop_duplicates()` to keep the row with the highest value per `Agency`." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:08.325980Z", "iopub.status.busy": "2024-03-18T21:43:08.325857Z", "iopub.status.idle": "2024-03-18T21:43:08.330577Z", "shell.execute_reply": "2024-03-18T21:43:08.330304Z" }, "id": "_3o0pW8Xixt0", "outputId": "d6e82795-7abb-4c71-b712-58b26adb55dd", "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgencyComplaint Typecount
198NYPDNoise - Residential41311
169HPDHEAT/HOT WATER39095
194NYPDIllegal Parking34297
154DSNYRequest Large Bulky Item Collection30939
186NYPDBlocked Driveway25530
............
119DOTHighway Sign - Dangling3
94DOHMHRadioactive Material3
139DPRUnsanitary Condition1
207TATQuestion1
1COIBForms1
\n", "

221 rows × 3 columns

\n", "
" ], "text/plain": [ " Agency Complaint Type count\n", "198 NYPD Noise - Residential 41311\n", "169 HPD HEAT/HOT WATER 39095\n", "194 NYPD Illegal Parking 34297\n", "154 DSNY Request Large Bulky Item Collection 30939\n", "186 NYPD Blocked Driveway 25530\n", ".. ... ... ...\n", "119 DOT Highway Sign - Dangling 3\n", "94 DOHMH Radioactive Material 3\n", "139 DPR Unsanitary Condition 1\n", "207 TAT Question 1\n", "1 COIB Forms 1\n", "\n", "[221 rows x 3 columns]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted_agency_counts = agency_counts.sort_values(\"count\", ascending=False)\n", "sorted_agency_counts" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:08.332354Z", "iopub.status.busy": "2024-03-18T21:43:08.332209Z", "iopub.status.idle": "2024-03-18T21:43:08.336822Z", "shell.execute_reply": "2024-03-18T21:43:08.336511Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgencyComplaint Typecount
198NYPDNoise - Residential41311
169HPDHEAT/HOT WATER39095
154DSNYRequest Large Bulky Item Collection30939
124DOTStreet Condition16895
18DEPWater System12949
42DOBGeneral Construction/Plumbing9537
95DOHMHRodent6253
132DPRDamaged Tree6065
29DHSHomeless Person Assistance3670
2DCAConsumer Complaint2892
212TLCFor Hire Vehicle Complaint2789
67DOFDOF Property - Reduction Issue1846
180HRABenefit Card Replacement1649
25DFTAHousing - Low Income Senior768
53DOESchool Maintenance442
161EDCNoise - Helicopter303
5DCASQuestion136
106DOITTPublic Payphone Complaint75
0ACSForms56
209TAXQuestion34
181NYCEMOEM Literature Request25
6DCPResearch Questions4
207TATQuestion1
1COIBForms1
\n", "
" ], "text/plain": [ " Agency Complaint Type count\n", "198 NYPD Noise - Residential 41311\n", "169 HPD HEAT/HOT WATER 39095\n", "154 DSNY Request Large Bulky Item Collection 30939\n", "124 DOT Street Condition 16895\n", "18 DEP Water System 12949\n", "42 DOB General Construction/Plumbing 9537\n", "95 DOHMH Rodent 6253\n", "132 DPR Damaged Tree 6065\n", "29 DHS Homeless Person Assistance 3670\n", "2 DCA Consumer Complaint 2892\n", "212 TLC For Hire Vehicle Complaint 2789\n", "67 DOF DOF Property - Reduction Issue 1846\n", "180 HRA Benefit Card Replacement 1649\n", "25 DFTA Housing - Low Income Senior 768\n", "53 DOE School Maintenance 442\n", "161 EDC Noise - Helicopter 303\n", "5 DCAS Question 136\n", "106 DOITT Public Payphone Complaint 75\n", "0 ACS Forms 56\n", "209 TAX Question 34\n", "181 NYCEM OEM Literature Request 25\n", "6 DCP Research Questions 4\n", "207 TAT Question 1\n", "1 COIB Forms 1" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top_agency_counts = sorted_agency_counts.drop_duplicates(\"Agency\")\n", "top_agency_counts" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "For nicer display, order alphabetically:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:08.338710Z", "iopub.status.busy": "2024-03-18T21:43:08.338578Z", "iopub.status.idle": "2024-03-18T21:43:08.342457Z", "shell.execute_reply": "2024-03-18T21:43:08.342170Z" }, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgencyComplaint Typecount
0ACSForms56
1COIBForms1
2DCAConsumer Complaint2892
5DCASQuestion136
6DCPResearch Questions4
18DEPWater System12949
25DFTAHousing - Low Income Senior768
29DHSHomeless Person Assistance3670
42DOBGeneral Construction/Plumbing9537
53DOESchool Maintenance442
67DOFDOF Property - Reduction Issue1846
95DOHMHRodent6253
106DOITTPublic Payphone Complaint75
124DOTStreet Condition16895
132DPRDamaged Tree6065
154DSNYRequest Large Bulky Item Collection30939
161EDCNoise - Helicopter303
169HPDHEAT/HOT WATER39095
180HRABenefit Card Replacement1649
181NYCEMOEM Literature Request25
198NYPDNoise - Residential41311
207TATQuestion1
209TAXQuestion34
212TLCFor Hire Vehicle Complaint2789
\n", "
" ], "text/plain": [ " Agency Complaint Type count\n", "0 ACS Forms 56\n", "1 COIB Forms 1\n", "2 DCA Consumer Complaint 2892\n", "5 DCAS Question 136\n", "6 DCP Research Questions 4\n", "18 DEP Water System 12949\n", "25 DFTA Housing - Low Income Senior 768\n", "29 DHS Homeless Person Assistance 3670\n", "42 DOB General Construction/Plumbing 9537\n", "53 DOE School Maintenance 442\n", "67 DOF DOF Property - Reduction Issue 1846\n", "95 DOHMH Rodent 6253\n", "106 DOITT Public Payphone Complaint 75\n", "124 DOT Street Condition 16895\n", "132 DPR Damaged Tree 6065\n", "154 DSNY Request Large Bulky Item Collection 30939\n", "161 EDC Noise - Helicopter 303\n", "169 HPD HEAT/HOT WATER 39095\n", "180 HRA Benefit Card Replacement 1649\n", "181 NYCEM OEM Literature Request 25\n", "198 NYPD Noise - Residential 41311\n", "207 TAT Question 1\n", "209 TAX Question 34\n", "212 TLC For Hire Vehicle Complaint 2789" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top_agency_counts.sort_values(\"Agency\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Another way, only sorting it once:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2024-03-18T21:43:08.343961Z", "iopub.status.busy": "2024-03-18T21:43:08.343870Z", "iopub.status.idle": "2024-03-18T21:43:08.348591Z", "shell.execute_reply": "2024-03-18T21:43:08.348320Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgencyComplaint Typecount
0ACSForms56
1COIBForms1
2DCAConsumer Complaint2892
5DCASQuestion136
6DCPResearch Questions4
18DEPWater System12949
25DFTAHousing - Low Income Senior768
29DHSHomeless Person Assistance3670
42DOBGeneral Construction/Plumbing9537
53DOESchool Maintenance442
67DOFDOF Property - Reduction Issue1846
95DOHMHRodent6253
106DOITTPublic Payphone Complaint75
124DOTStreet Condition16895
132DPRDamaged Tree6065
154DSNYRequest Large Bulky Item Collection30939
161EDCNoise - Helicopter303
169HPDHEAT/HOT WATER39095
180HRABenefit Card Replacement1649
181NYCEMOEM Literature Request25
198NYPDNoise - Residential41311
207TATQuestion1
209TAXQuestion34
212TLCFor Hire Vehicle Complaint2789
\n", "
" ], "text/plain": [ " Agency Complaint Type count\n", "0 ACS Forms 56\n", "1 COIB Forms 1\n", "2 DCA Consumer Complaint 2892\n", "5 DCAS Question 136\n", "6 DCP Research Questions 4\n", "18 DEP Water System 12949\n", "25 DFTA Housing - Low Income Senior 768\n", "29 DHS Homeless Person Assistance 3670\n", "42 DOB General Construction/Plumbing 9537\n", "53 DOE School Maintenance 442\n", "67 DOF DOF Property - Reduction Issue 1846\n", "95 DOHMH Rodent 6253\n", "106 DOITT Public Payphone Complaint 75\n", "124 DOT Street Condition 16895\n", "132 DPR Damaged Tree 6065\n", "154 DSNY Request Large Bulky Item Collection 30939\n", "161 EDC Noise - Helicopter 303\n", "169 HPD HEAT/HOT WATER 39095\n", "180 HRA Benefit Card Replacement 1649\n", "181 NYCEM OEM Literature Request 25\n", "198 NYPD Noise - Residential 41311\n", "207 TAT Question 1\n", "209 TAX Question 34\n", "212 TLC For Hire Vehicle Complaint 2789" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agency_counts.sort_values([\"Agency\", \"count\"]).drop_duplicates(\"Agency\", keep=\"last\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Some pandas/Jupyter best practices\n", "\n", "- Make variable names descriptive\n", " - Ignore that all examples use `requests`\n", "- Only do one thing per line\n", " - Makes troubleshooting easier\n", "- Make notebooks [idempotent](https://en.wikipedia.org/wiki/Idempotence)\n", " - Makes your work reproducible\n", " - Use `Restart and run all` (⏩ button in toolbar)" ] }, { "cell_type": "markdown", "metadata": { "id": "ddj8VVZRixCn", "slideshow": { "slide_type": "slide" } }, "source": [ "## [Homework 1](https://python-public-policy.afeld.me/en/{{school_slug}}/hw_1.html)" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python [conda env:python-public-policy] *", "language": "python", "name": "conda-env-python-public-policy-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 4 }