# Using REST APIs as data sources

* Data is everywhere and it is generated constantly
* The number of datasources is amazingly huge
* Datasets are huge and can be used in many ways

* We may do amazing things using data made available by third-party:
    - https://developer.walmartlabs.com/docs
    - https://developer.spotify.com/documentation/web-api/
    - https://earthquake.usgs.gov/fdsnws/event/1/
    
    
We will have a nice and brief overview about how to consume data from REST APIs, mainly focusing on **JSON**.


### What is an API?

**Application Programming Interface** defines the methods for one software program to interact with the other. 

In the case of this lecture, we are dealing with a REST API, which sends data over a network: one type of Web service.

When we want to receive data from an Web service, we need to make a `request` to this service. When the server receives this request, it sends a `response`.

![request.png](request.png)




### Requests

Knowing that, we will not have to learn about making requests in Python

We do it by importing the module requests

In [1]:
import requests

There are different types of requests. 

In our case we will use a `GET`, which is used to retrieve data. This is the type of request we use to collect data.

A response from the API contains 2 things (among others): 
* response code
* response data

To make a request, we use:

In [10]:
response = requests.get('http://www.nau.edu/')
type(response)

requests.models.Response

The `request.get(URL)` returns an object Response, which provides, among other things, the response code.

In [12]:
response.status_code

200

THe most common codes are:
* 200: Everything went okay, and the result has been returned (if any).
* 301: The server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
* 400: The server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
* 401: The server thinks you’re not authenticated. Many APIs require login ccredentials, so this happens when you don’t send the right credentials to access an API.
* 403: The resource you’re trying to access is forbidden: you don’t have the right permissions to see it.
* 404: The resource you tried to access wasn’t found on the server.
* 503: The server is not ready to handle the request.

More details about status codes list can be found [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status)

### What about getting the data?

First, read the documentation! Everytime you use an API, please read the documentation to understand how to use, the structure, etc.

We will use the [Open Notify API](http://api.open-notify.org/), which gives access to data about the international space station.

These APIs usually provide multiple endpoints, which are the ways we can interact with that service.

Let's try a request and see how it goes:

In [20]:
response = requests.get("http://api.open-notify.org/astros.json")
print(response.status_code)

200


Now we can see the data...

In [21]:
type(response.content)

bytes

In [24]:
response.text

'{"message": "success", "people": [{"name": "Sergey Ryzhikov", "craft": "ISS"}, {"name": "Kate Rubins", "craft": "ISS"}, {"name": "Sergey Kud-Sverchkov", "craft": "ISS"}], "number": 3}'

In [29]:
response.json()

{'message': 'success',
 'people': [{'name': 'Sergey Ryzhikov', 'craft': 'ISS'},
  {'name': 'Kate Rubins', 'craft': 'ISS'},
  {'name': 'Sergey Kud-Sverchkov', 'craft': 'ISS'}],
 'number': 3}

### Working with JSON 
JSON stands for JavaScript Object Notation. It is a way to encode data structures that ensures that they are easily readable. 

JSON output look like Python something with *dictionaries, lists, strings* and *integers*. And it is...

But, how to use it? Well, we used it in the last command.


In [30]:
import json

json has two main functions:

* `json.dumps()` — Takes in a Python object and converts (dumps) to a string.
* `json.loads()` — Takes a JSON string and converts (loads) to a Python object.

The `dumps()` is particularly useful as we can use it to format the json, making it easier to understand the output

In [32]:
json_response = response.json()
formatted_json = json.dumps(json_response, sort_keys=True, indent=3
                           )

print(formatted_json)

{
   "message": "success",
   "number": 3,
   "people": [
      {
         "craft": "ISS",
         "name": "Sergey Ryzhikov"
      },
      {
         "craft": "ISS",
         "name": "Kate Rubins"
      },
      {
         "craft": "ISS",
         "name": "Sergey Kud-Sverchkov"
      }
   ]
}


### REST API with Query Parameters

In some cases, it is possible to pass parameters to filter the output of the API. 

The http://api.open-notify.org/iss-pass.json endpoint tells the next times that the international space station will pass over a given location on the earth.

It requires parameters

In [33]:
response = requests.get("http://api.open-notify.org/iss-pass.json")
print("RESPONSE CODE:" + str(response.status_code))
print(response.json())


RESPONSE CODE:400
{'message': 'failure', 'reason': 'Latitude must be specified'}


Let's read the docs: 
* http://open-notify.org/Open-Notify-API/ISS-Pass-Times/

In [35]:
response = requests.get("http://api.open-notify.org/iss-pass.json?lat=35.1983&lon=111.6513")
print("RESPONSE CODE:" + str(response.status_code))
print(response.json())
#35.1983, 111.6513

RESPONSE CODE:200
{'message': 'success', 'request': {'altitude': 100, 'datetime': 1603836506, 'latitude': 35.1983, 'longitude': 111.6513, 'passes': 5}, 'response': [{'duration': 390, 'risetime': 1603841584}, {'duration': 520, 'risetime': 1603847415}, {'duration': 648, 'risetime': 1603853197}, {'duration': 542, 'risetime': 1603859036}, {'duration': 479, 'risetime': 1603907545}]}


In [44]:
formatted_json = json.dumps(response.json(), sort_keys=False, indent=2)
#print(formatted_json)
print(response.json()["response"][0]["risetime"])

1603841584


#### Let’s deal with the pass times from our JSON object

Reading the docs (and looking at our JSON), we can see what we need to do

In [46]:
times = []

for item in response.json()['response']:
    times.append(item['risetime'])
    
print(times)

[1603841584, 1603847415, 1603853197, 1603859036, 1603907545]


In [47]:
from datetime import datetime

datetime.fromtimestamp(times[0]).strftime("%Y-%m-%d %I:%M:%S")

'2020-10-27 04:33:04'

In [48]:
response = requests.get("https://api.github.com/repos/rails/rails/pulls")
pulls = response.json()
print(json.dumps(pulls, indent=2))

[
  {
    "url": "https://api.github.com/repos/rails/rails/pulls/40467",
    "id": 511057455,
    "node_id": "MDExOlB1bGxSZXF1ZXN0NTExMDU3NDU1",
    "html_url": "https://github.com/rails/rails/pull/40467",
    "diff_url": "https://github.com/rails/rails/pull/40467.diff",
    "patch_url": "https://github.com/rails/rails/pull/40467.patch",
    "issue_url": "https://api.github.com/repos/rails/rails/issues/40467",
    "number": 40467,
    "state": "open",
    "locked": false,
    "title": "Test find_signed/! on Relation",
    "user": {
      "login": "bogdanvlviv",
      "id": 6443532,
      "node_id": "MDQ6VXNlcjY0NDM1MzI=",
      "avatar_url": "https://avatars0.githubusercontent.com/u/6443532?v=4",
      "gravatar_id": "",
      "url": "https://api.github.com/users/bogdanvlviv",
      "html_url": "https://github.com/bogdanvlviv",
      "followers_url": "https://api.github.com/users/bogdanvlviv/followers",
      "following_url": "https://api.github.com/users/bogdanvlviv/following{/other_u

In [54]:
pulls[0]

{'url': 'https://api.github.com/repos/rails/rails/pulls/40467',
 'id': 511057455,
 'node_id': 'MDExOlB1bGxSZXF1ZXN0NTExMDU3NDU1',
 'html_url': 'https://github.com/rails/rails/pull/40467',
 'diff_url': 'https://github.com/rails/rails/pull/40467.diff',
 'patch_url': 'https://github.com/rails/rails/pull/40467.patch',
 'issue_url': 'https://api.github.com/repos/rails/rails/issues/40467',
 'number': 40467,
 'state': 'open',
 'locked': False,
 'title': 'Test find_signed/! on Relation',
 'user': {'login': 'bogdanvlviv',
  'id': 6443532,
  'node_id': 'MDQ6VXNlcjY0NDM1MzI=',
  'avatar_url': 'https://avatars0.githubusercontent.com/u/6443532?v=4',
  'gravatar_id': '',
  'url': 'https://api.github.com/users/bogdanvlviv',
  'html_url': 'https://github.com/bogdanvlviv',
  'followers_url': 'https://api.github.com/users/bogdanvlviv/followers',
  'following_url': 'https://api.github.com/users/bogdanvlviv/following{/other_user}',
  'gists_url': 'https://api.github.com/users/bogdanvlviv/gists{/gist_id}',

### Let's play: your turn

Look at this API:
* https://earthquake.usgs.gov/fdsnws/event/1/

I want you to 
1. use filters to get the earthquakes from the previous 60 days, with magnitude between 5.8 and 7.
2. print the place, date, and magnitude of each of them
3. find the highest magnitude
4. using the ISS API, show when the satelite will go through the place where the earthquake with the highest magnitude happened


In [55]:
import requests
response = requests.get("http://api.open-notify.org/iss-pass.json")
print("RESPONSE CODE:" + str(response.status_code))
print(response.json())

RESPONSE CODE:400
{'message': 'failure', 'reason': 'Latitude must be specified'}


In [57]:
import requests
import json
from datetime import datetime

response = requests.get("https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2020-08-27&maxmagnitude=7&minmagnitude=5.8")
json_response = response.json()
formatted_json = json.dumps(json_response, sort_keys=False, indent=2)

print(formatted_json)

{
  "type": "FeatureCollection",
  "metadata": {
    "generated": 1603840328000,
    "url": "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2020-08-27&maxmagnitude=7&minmagnitude=5.8",
    "title": "USGS Earthquakes",
    "status": 200,
    "api": "1.10.3",
    "count": 40
  },
  "features": [
    {
      "type": "Feature",
      "properties": {
        "mag": 5.9,
        "place": "75 km NNE of Hihifo, Tonga",
        "time": 1603626457004,
        "updated": 1603816850040,
        "tz": null,
        "url": "https://earthquake.usgs.gov/earthquakes/eventpage/us6000ccyh",
        "detail": "https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=us6000ccyh&format=geojson",
        "felt": 9,
        "cdi": 4.1,
        "mmi": 3.887,
        "alert": "green",
        "status": "reviewed",
        "tsunami": 0,
        "sig": 539,
        "net": "us",
        "code": "6000ccyh",
        "ids": ",pt20299001,us6000ccyh,at00qira3f,",
        "sources": ",pt,us,at

In [58]:
max_magnitude = 0
max_long = 0
max_lat = 0
for earthquake in json_response["features"]:
    magnitude = earthquake["properties"]["mag"]
    print("----")
    print("Place:  " + earthquake["properties"]["place"])    
    print("Time:  " + str(earthquake["properties"]["time"]))    
    print("Mag:  " + str(magnitude))
    if (magnitude > max_magnitude):
        max_magnitude = magnitude
        max_long = earthquake["geometry"]["coordinates"][0]
        max_lat = earthquake["geometry"]["coordinates"][1]

print ("\nMaximum magnitude: " + str(max_magnitude))
    

----
Place:  75 km NNE of Hihifo, Tonga
Time:  1603626457004
Mag:  5.9
----
Place:  south of the Fiji Islands
Time:  1603436672004
Mag:  6.1
----
Place:  West Chile Rise
Time:  1603417577475
Mag:  6
----
Place:  148 km WNW of Haveluloto, Tonga
Time:  1603355046992
Mag:  5.8
----
Place:  183 km ESE of Neiafu, Tonga
Time:  1603239753928
Mag:  5.9
----
Place:  117 km SSE of Sand Point, Alaska
Time:  1603143925888
Mag:  5.9
----
Place:  Easter Island region
Time:  1602335697344
Mag:  5.9
----
Place:  86 km ESE of Kimbe, Papua New Guinea
Time:  1602172729998
Mag:  5.8
----
Place:  38 km ENE of Kainantu, Papua New Guinea
Time:  1602142532224
Mag:  6.3
----
Place:  233 km E of Levuka, Fiji
Time:  1601979106688
Mag:  6
----
Place:  68 km SE of Sand Point, Alaska
Time:  1601963690520
Mag:  5.9
----
Place:  South Shetland Islands
Time:  1601633853188
Mag:  5.8
----
Place:  99 km W of Kandrian, Papua New Guinea
Time:  1601548488481
Mag:  6
----
Place:  39 km NE of Pangai, Tonga
Time:  16015148165

In [59]:
iss_response = requests.get("http://api.open-notify.org/iss-pass.json?lat="+str(max_lat)+"&lon="+str(max_lat))
time=iss_response.json()["response"][0]["risetime"]
print(datetime.fromtimestamp(time).strftime("%Y-%m-%d %I:%M:%S"))

2020-10-27 05:40:15
