{ "cells": [ { "cell_type": "markdown", "metadata": { "editable": true, "id": "SA9q8DQPL6oZ", "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "# Lecture 1: Working with data\n", "\n", "_Please sign attendance sheet; close devices_\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "- How was the homework? 👍/👎\n", "- Questions?\n", "- Reminder about the [between-class participation](https://python-public-policy.afeld.me/en/{{school_slug}}/syllabus.html#participation)\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [ "columbia-only" ] }, "source": [ "## Additional programming concepts\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "subslide" }, "tags": [ "columbia-only" ] }, "source": [ "### Functions\n", "\n", "- Functions == methods\n", "- Arguments == parameters\n", "\n", "For simplicity, we'll use them interchangeably.\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "subslide" }, "tags": [ "columbia-only" ] }, "source": [ "### Packages\n", "\n", "- a.k.a. \"libraries\"\n", "- Developers have create them to make code/functionality reusable and easily sharable\n", "- Software plugins that you `import`\n", "- Main packages we’ll use:\n", " - `pandas`\n", " - `plotly`\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "subslide" }, "tags": [ "columbia-only" ] }, "source": [ "> A module is a file containing Python definitions and statements.\n", "\n", "https://docs.python.org/3/tutorial/modules.html\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Challenge\n", "\n", "Complete the demos and exercise today with generative AI _only_.\n", "\n", "- Allowed\n", " - Prompts\n", " - Copy-pasting\n", "- Not allowed\n", " - Googling\n", " - Editing\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "fragment" }, "tags": [] }, "source": [ "I'll be using [Gemini, built into Colab](https://research.google.com/colaboratory/faq.html#how-to-use-ai-features); you can use a different tool if you prefer.\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [ "columbia-only" ] }, "source": [ "## Working with files in Python\n", "\n", "Let's say we have a CSV file. Print out all the rows.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "columbia-only" ] }, "outputs": [], "source": [ "# our code here" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Working with CSVs in pure Python\n", "\n", "We will use Python's CSV [DictReader](https://docs.python.org/3/library/csv.html#csv.DictReader). We'll open the file, parse it as a CSV, then operate row by row.\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# our code here" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "### [In-class exercise](https://python-public-policy.afeld.me/en/{{school_slug}}/lecture_1_exercise.html)\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## 311 requests\n", "\n", "Who's called 311 before?\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "[NYC 311 homepage](https://portal.311.nyc.gov/)\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "### [311 data](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9)\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "id": "CRAqTQ2rbXAA", "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Today's goal\n", "\n", "- Which 311 complaints are most common?\n", "- Which agencies are responsible for handling them?\n" ] }, { "cell_type": "markdown", "metadata": { "id": "9rvnMzjSMK36", "slideshow": { "slide_type": "slide" } }, "source": [ "## Pandas\n", "\n", "- A Python package (bundled up code that you can reuse)\n", "- Very common for data science in Python\n", "- [A lot like R](https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_r.html)\n", " - Both organize around \"data frames\"\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "id": "R1G04BmMMFJb", "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "### Load data\n", "\n", "Pull data from:\n", "\n", "https://storage.googleapis.com/python-public-policy2/data/311_requests_2018-19_sample.csv.zip\n", "\n", "We're using a sample to make it easier/faster to work with. This will take a while (~30 seconds).\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "editable": true, "id": "iQgE8qFAMbiF", "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# our code here" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "fragment" }, "tags": [] }, "source": [ "If you see a `DtypeWarning`, ignore it for now. We'll come back to it.\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "### Preview the data\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# our code here" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Pandas data structures\n", "\n", "[![Diagram showing a DataFrame, Series, labels, and indexes](extras/img/data_structures-1.svg)](https://docs.google.com/drawings/d/17LRBIjyA5gvKw69xZA1Mrq66pBpPwti5YGbw6UtTPrk/edit)\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "id": "7DPo85wSNU6q", "slideshow": { "slide_type": "slide" }, "tags": [], "toc-hr-collapsed": true, "toc-nb-collapsed": true }, "source": [ "## DataFrame information\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "editable": true, "id": "--ben4hfmTaB", "outputId": "62bae542-8fda-40c4-82f6-7a6410c2a90b", "scrolled": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# our code here" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Demo\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "### Analysis\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "#### Which complaints are most common?\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# code goes here" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "#### What's the most frequent request per agency?\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# code goes here" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "- [`groupby()`](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html#grouping) similar to [pivot tables](https://support.google.com/docs/answer/1272900) in spreadsheets\n", "- [`to_frame()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.to_frame.html)\n", "- [`reset_index()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reset_index.html)\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "id": "qeYA8-rMlpJa", "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "### Exclude bad records from the DataFrame\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "id": "RgP7ehPsmozX", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Let's look at the complaint types.\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# code goes here" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "id": "RrwqSmKSbiYC", "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "How should we go about cleaning those up?\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# code goes here" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Reflections?\n", "\n", "- What worked well?\n", "- What didn't work well?\n", "- Did this change how you're thinking about generative AI?\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## [Best practices](https://python-public-policy.afeld.me/en/{{school_slug}}/assignments.html#tips)\n" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "id": "ddj8VVZRixCn", "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## [Homework 1](https://python-public-policy.afeld.me/en/{{school_slug}}/hw_1.html)\n" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.12" } }, "nbformat": 4, "nbformat_minor": 4 }