{ "cells": [ { "cell_type": "markdown", "source": [ "# Convert a pandas dataframe to geojson for web-mapping\n", "\n", "Author: Geoff Boeing\n", "\n", "Original: [pandas-to-geojson](https://github.com/gboeing/urban-data-science/blob/dc86c9c89b73f87f97301883d7456f1f814589f5/17-Leaflet-Web-Mapping/pandas-to-geojson.ipynb)" ], "metadata": {} }, { "cell_type": "code", "source": [ "import pandas as pd, requests, json" ], "outputs": [], "execution_count": 1, "metadata": { "collapsed": false, "execution": { "iopub.status.busy": "2020-07-17T02:12:46.974Z", "iopub.execute_input": "2020-07-17T02:12:46.977Z", "iopub.status.idle": "2020-07-17T02:12:47.175Z", "shell.execute_reply": "2020-07-17T02:12:47.207Z" } } }, { "cell_type": "markdown", "source": [ "First download data from the city of Berkeley's API. You can use Socrata's $limit parameter to specify how many rows to grab (otherwise the default is 1,000 rows of data): https://dev.socrata.com/docs/paging.html\n", "\n", "Example request: https://data.cityofberkeley.info/resource/k489-uv4i.json?$limit=5" ], "metadata": {} }, { "cell_type": "code", "source": [ "# API endpoint for city of Berkeley's 311 calls\n", "endpoint_url = 'https://data.cityofberkeley.info/resource/bscu-qpbu.json?$limit=20&$where=latitude%20%3C%3E%20%22%22'" ], "outputs": [], "execution_count": 2, "metadata": { "collapsed": true, "execution": { "iopub.status.busy": "2020-07-17T02:12:47.180Z", "iopub.execute_input": "2020-07-17T02:12:47.184Z", "iopub.status.idle": "2020-07-17T02:12:47.191Z", "shell.execute_reply": "2020-07-17T02:12:47.209Z" } } }, { "cell_type": "code", "source": [ "# fetch the URL and load the data\n", "response = requests.get(endpoint_url)\n", "data = response.json()" ], "outputs": [], "execution_count": 3, "metadata": { "collapsed": true, "execution": { "iopub.status.busy": "2020-07-17T02:12:47.197Z", "iopub.execute_input": "2020-07-17T02:12:47.201Z", "iopub.status.idle": "2020-07-17T02:12:47.625Z", "shell.execute_reply": "2020-07-17T02:12:47.631Z" } } }, { "cell_type": "markdown", "source": [ "Next, turn the json data into a dataframe and clean it up a bit: drop unnecessary columns and any rows that lack lat-long data. We want to make our json file as small as possible (prefer under 5 mb) so that it can be loaded over the Internet to anyone viewing your map, without taking forever to download a huge file." ], "metadata": {} }, { "cell_type": "code", "source": [ "# turn the json data into a dataframe and see how many rows and what columns we have\n", "df = pd.DataFrame(data)\n", "\n", "print('We have {} rows'.format(len(df)))\n", "str(df.columns.tolist())" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "We have 20 rows\n" ] }, { "output_type": "execute_result", "execution_count": 4, "data": { "text/plain": "\"['case_id', 'date_opened', 'case_status', 'date_closed', 'request_category', 'request_subcategory', 'request_detail', 'object_type', 'apn', 'street_address', 'city', 'state', 'neighborhood', 'latitude', 'longitude', 'location']\"" }, "metadata": {} } ], "execution_count": 4, "metadata": { "collapsed": false, "execution": { "iopub.status.busy": "2020-07-17T02:12:47.637Z", "iopub.execute_input": "2020-07-17T02:12:47.642Z", "iopub.status.idle": "2020-07-17T02:12:47.655Z", "shell.execute_reply": "2020-07-17T02:12:47.890Z" } } }, { "cell_type": "code", "source": [ "# convert lat-long to floats and change address from ALL CAPS to regular capitalization\n", "df['latitude'] = df['latitude'].astype(float)\n", "df['longitude'] = df['longitude'].astype(float)\n", "df['street_address'] = df['street_address'].str.title()" ], "outputs": [], "execution_count": 5, "metadata": { "collapsed": true, "execution": { "iopub.status.busy": "2020-07-17T02:12:47.661Z", "iopub.execute_input": "2020-07-17T02:12:47.665Z", "iopub.status.idle": "2020-07-17T02:12:47.671Z", "shell.execute_reply": "2020-07-17T02:12:47.893Z" } } }, { "cell_type": "code", "source": [ "# we don't need all those columns - only keep useful ones\n", "cols = ['request_detail', 'request_subcategory', 'latitude', 'longitude', 'street_address', 'case_status']\n", "df_subset = df[cols]" ], "outputs": [], "execution_count": 6, "metadata": { "collapsed": false, "execution": { "iopub.status.busy": "2020-07-17T02:12:47.677Z", "iopub.execute_input": "2020-07-17T02:12:47.681Z", "iopub.status.idle": "2020-07-17T02:12:47.688Z", "shell.execute_reply": "2020-07-17T02:12:47.896Z" } } }, { "cell_type": "code", "source": [ "# drop any rows that lack lat/long data\n", "df_geo = df_subset.dropna(subset=['latitude', 'longitude'], axis=0, inplace=False)\n", "\n", "print('We have {} geotagged rows'.format(len(df_geo)))\n", "df_geo.tail()" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "We have 20 geotagged rows\n" ] }, { "output_type": "execute_result", "execution_count": 7, "data": { "text/plain": " request_detail request_subcategory latitude longitude \\\n15 Commercial Reminder Commercial 37.880719 -122.268930 \n16 Commercial Reminder Commercial 37.851141 -122.271171 \n17 Recycling - City of Berkeley Request 37.878899 -122.297101 \n18 Roll Off Bin Request 37.853874 -122.291220 \n19 Encampment Complaint Inquiry 37.882101 -122.302594 \n\n street_address case_status \n15 1475 Shattuck Ave Closed \n16 3132 M L King Jr Way Closed \n17 1005 Camelia St Closed \n18 2840 Eighth St Closed \n19 1102 Sixth St Closed ", "text/html": "
\n | request_detail | \nrequest_subcategory | \nlatitude | \nlongitude | \nstreet_address | \ncase_status | \n
---|---|---|---|---|---|---|
15 | \nCommercial Reminder | \nCommercial | \n37.880719 | \n-122.268930 | \n1475 Shattuck Ave | \nClosed | \n
16 | \nCommercial Reminder | \nCommercial | \n37.851141 | \n-122.271171 | \n3132 M L King Jr Way | \nClosed | \n
17 | \nRecycling - City of Berkeley | \nRequest | \n37.878899 | \n-122.297101 | \n1005 Camelia St | \nClosed | \n
18 | \nRoll Off Bin | \nRequest | \n37.853874 | \n-122.291220 | \n2840 Eighth St | \nClosed | \n
19 | \nEncampment Complaint | \nInquiry | \n37.882101 | \n-122.302594 | \n1102 Sixth St | \nClosed | \n