{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Harvesting data from the web: APIs " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### A first API\n", "\n", "[Chronicling America](http://chroniclingamerica.loc.gov/about/) is a joint project of the National Endowment for the Humanities and the Library of Congress .\n", "\n", "Search for articles that mention \"[lynching](http://chroniclingamerica.loc.gov/search/pages/results/?andtext=lynching)\"." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![](https://raw.githubusercontent.com/nealcaren/UiOBigData/master/notebooks/images/chron.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![](https://github.com/nealcaren/ScrapingData/raw/master/Notebooks/images/lynch_ca.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "
\n",
"https://chroniclingamerica.loc.gov/search/pages/results/?andtext=murder&page=251&sort=relevance\n",
"
\n",
"\n",
"search_json['items']\n",
"
\n",
"\n",
"print(first_item['title'])\n",
"
\n",
"Conduct your own search of the API. Store the results in a dataframe.\n", "\n", "
\n",
"search_word = 'robbery'\n",
"\n",
"\n",
"base_url = 'http://chroniclingamerica.loc.gov/search/pages/results/'\n",
"parameters = '?andtext=' + search_word + '&format=json'\n",
"\n",
"r = requests.get(base_url + parameters)\n",
"search_json = r.json()\n",
"df = pd.DataFrame(search_json['items'])\n",
"\n",
"
\n",
"Conduct your own search of the API. Change the page size to 200. Store the results in a csv file.\n", "\n", "
\n",
"r = requests.get('https://api.fbi.gov/wanted/v1/list?pageSize=200&page=1&sort_on=modified')\n",
"mw_df = pd.DataFrame(r.json()['items'])\n",
"mw_df.to_csv('mw.csv')\n",
"\n",
"
\n",
"Set a page parameter to \"2\" to get the second page of results.\n", "
\n",
"base_url = 'https://api.fbi.gov/wanted/v1/list'\n",
"parameters = {'pageSize' : 20,\n",
" 'page' : 2}\n",
"\n",
"r = requests.get(base_url, \n",
" params = parameters)\n",
"pd.DataFrame(r.json()['items'])\n",
"
\n",
"Read about the API on the website. Then create a dataframe that contains that take place in a residence. Run some descriptive statistics on your data.\n", "
\n",
"base_url = 'https://data.cityofchicago.org/resource/ijzp-q8t2.json'\n",
"\n",
"parameters = {'location_description' : 'RESIDENCE',\n",
" }\n",
"\n",
"r = requests.get(base_url, \n",
" params = parameters)\n",
"df = pd.DataFrame(r.json())\n",
"\n",
"df['primary_type'].value_counts()\n",
"df['arrest'].value_counts()\n",
"pd.crosstab(df['primary_type'] ,df['arrest'], normalize='index')\n",
"
\n",
"