{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using the BISON API\n",
"The USGS provides an API for accessing species observation data. https://bison.usgs.gov/doc/api.jsp\n",
"\n",
"This API is much better documented than the NWIS API, and we'll use it to dig a bit deeper into how the `requests` package can faciliate data access via APIs. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* We'll begin by replicating the example API call they show on their web page:
\n",
"[https://bison.usgs.gov/api/search.json?species=Bison bison&type=scientific_name&start=0&count=1](\n",
"https://bison.usgs.gov/api/search.json?species=Bison%20bison&type=scientific_name&start=0&count=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#First, import the wonderful requests module\n",
"import requests"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Now, we'll deconstruct the example URL into the service URL and parameters, saving the paramters as a dictionary. Note we are just providing a few of the parameters available through the [API](https://bison.usgs.gov/doc/api.jsp#opensearch). We could add more search criteria if we wanted, but for now we just want to grab the first 500 Bison records. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Construct the service URL as two components: the service URL and the request parameters\n",
"url = 'http://bison.usgs.gov/api/search.json'\n",
"params = {'species':'Bison bison',\n",
" 'type':'scientific_name',\n",
" 'start':'0',\n",
" 'count':'500'\n",
" }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* With the components set as variables, we use the `requests.get()` function to send our request off to the server at the address provided, storing the servers response as a variable called `response`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Send the request to the server and store the response as a variable\n",
"response = requests.get(url,params)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* This response object contains a number of properties and methods. Let's have a look at the reponse in raw text format. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#View the reponse in text format\n",
"response.text"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Yikes**, that's much less readable than the NWIS output!\n",
"\n",
"Well, that's because the response from the BISON server is in **JSON** format. JSON, short for *JavaScript Object Notation*, is a text document that stores information in `key`:`value` pairs, *much like a Python dictionary*. Still, it's a raw text object, but one that we convert into a Python dictionary using Python's json package."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Import the module\n",
"import json\n",
"\n",
"#Convert the response \n",
"data = json.loads(response.text)\n",
"type(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> *Note*: we could also convert this to JSON using the `json` function of the `response` object...
The code below has the exact same results as the one above. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data=response.json()\n",
"type(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Ok, if it's a dictionary, what are it's keys? "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#List the keys in the returned JSON object\n",
"data.keys()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* What are the values linked with the 'data' key?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"#Show the value associated with the `data` key\n",
"data['data']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Oh, it's a list of occurrences! Let's examine the first one..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Display the first \"data\" value\n",
"data['data'][0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* We see it's a dictionary too! Let's list the `decimalLatitude` item value..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#We can get the latitude of the record from it's `decimalLatitude` key\n",
"data['data'][0]['decimalLatitude']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"► **So** we see the Bison observations are stored as list of dictionaries which are accessed within the `data` key in the results dictionary generated from the JSON response to our API request. (Phew!)\n",
"\n",
"* With a bit more code we can loop through all the data records and print out the lat and long coordinates..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Loop thorough each observation and print the lat and long values\n",
"for observation in data['data']:\n",
" print (observation['decimalLatitude'],observation['decimalLongitude'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"► *If the above throws an error, can you debug it? HINT: the `geo` tag indicates whether coordinate info exist for the record...*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### [Another] Preview of 'Pandas' - that clever Python package with many uses!\n",
"Pandas can create a \"data frame\" from dictionary values. We'll talk about this soon, but can be quite useful!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"df = pd.DataFrame(data['data'])\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And Pandas allows us to do some nifty analyses, including subsetting records for a specific provider.\n",
"* First we'll get a list of unique providers found in the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Generate a list of providers\n",
"df.provider.unique()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Now, we'll subset the rows that include that provider..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.query(\"provider == 'Denver Museum of Nature & Science'\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise:\n",
"* Extract the first 500 red wolf (*\"Canis rufus\"*) records from the BISON API. \n",
"* Can you create a table listing the records collected by the `University of Kansas Biodiversity Institute`?\n",
"* *Challenge*: Can you create a table listing all the records collected in North Carolina?"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}