{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the BISON API\n", "The USGS provides an API for accessing species observation data. https://bison.usgs.gov/doc/api.jsp\n", "\n", "This API is much better documented than the NWIS API, and we'll use it to dig a bit deeper into how the `requests` package can faciliate data access via APIs. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* We'll begin by replicating the example API call they show on their web page:
\n", "[https://bison.usgs.gov/api/search.json?species=Bison bison&type=scientific_name&start=0&count=1](\n", "https://bison.usgs.gov/api/search.json?species=Bison%20bison&type=scientific_name&start=0&count=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#First, import the wonderful requests module\n", "import requests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Now, we'll deconstruct the example URL into the service URL and parameters, saving the paramters as a dictionary. Note we are just providing a few of the parameters available through the [API](https://bison.usgs.gov/doc/api.jsp#opensearch). We could add more search criteria if we wanted, but for now we just want to grab the first 500 Bison records. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Construct the service URL as two components: the service URL and the request parameters\n", "url = 'http://bison.usgs.gov/api/search.json'\n", "params = {'species':'Bison bison',\n", " 'type':'scientific_name',\n", " 'start':'0',\n", " 'count':'500'\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* With the components set as variables, we use the `requests.get()` function to send our request off to the server at the address provided, storing the servers response as a variable called `response`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Send the request to the server and store the response as a variable\n", "response = requests.get(url,params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* This response object contains a number of properties and methods. Let's have a look at the reponse in raw text format. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#View the reponse in text format\n", "print(response.text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Yikes**, that's much less readable than the NWIS output!\n", "\n", "Well, that's because the response from the BISON server is in **JSON** format. JSON, short for *JavaScript Object Notation*, is a text document that stores information in `key`:`value` pairs, *much like a Python dictionary*. Still, it's a raw text object, but one that we convert into a Python dictionary using `requests`'s `json()` function to convert the servers response into a Python dictionary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Convert the response \n", "data = response.json()\n", "type(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Ok, if it's a dictionary, what are it's keys? " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#List the keys in the returned JSON object\n", "data.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* What are the values linked with the 'data' key?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "#Show the value associated with the `data` key\n", "data['data']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Oh, it's a list of occurrences! Let's examine the first one..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Display the first \"data\" value\n", "data['data'][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* We see it's a dictionary too! Let's list the `decimalLatitude` item value..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#We can get the latitude of the record from it's `decimalLatitude` key\n", "data['data'][0]['decimalLatitude']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "► **So** we see the Bison observations are stored as list of dictionaries which are accessed within the `data` key in the results dictionary generated from the JSON response to our API request. (Phew!)\n", "\n", "* With a bit more code we can loop through all the data records and print out the lat and long coordinates..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Loop thorough each observation and print the lat and long values\n", "for observation in data['data']:\n", " print (observation['decimalLatitude'],observation['decimalLongitude'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "► If the above throws an error, can you debug it? HINT: the `geo` tag indicates whether coordinate info exist for the record...\n", " \n", "
\n",
    "#Loop thorough each observation and print the lat and long values\n",
    "for observation in data['data']:\n",
    "    if(observation['geo'] == 'Yes'):\n",
    "        print (observation['decimalLatitude'],observation['decimalLongitude'])\n",
    "    
\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Loop thorough each observation and print the lat and long values\n", "for observation in data['data']:\n", " if(observation['geo'] == 'Yes'):\n", " print (observation['decimalLatitude'],observation['decimalLongitude'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using Pandas to streamline the process...\n", "Pandas can create a dataframe directly from dictionary values. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "df = pd.DataFrame(data['data'])\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So now we can use our Panda's know-how to do some nifty analyses, including subsetting records for a specific provider.\n", "* First we'll get a list of unique providers found in the data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Generate a list of providers\n", "df.provider.unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Now, we'll subset the rows that include that provider..." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "df.query(\"provider == 'iNaturalist.org'\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise:\n", "* Extract the first 500 red wolf (*\"Canis rufus\"*) records from the BISON API. \n", "* Can you create a table listing the records collected by the `University of Kansas Biodiversity Institute`?\n", "* *Challenge*: Can you create a table listing all the records collected in North Carolina?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.12" } }, "nbformat": 4, "nbformat_minor": 2 }