JOIN
GROUP BY
MAX
.head()
.join()
.merge()
.hist()
.plot()
.mean()
.map()
.groupby()
.count()
.resample()
Speaking broadly:
An application programming interface (API) specifies how some software components should interact with each other.
More specifically:
A web API is a programmatic interface to a defined request-response message system, typically expressed in JSON or XML, which is exposed via the web—most commonly by means of an HTTP-based web server.
from Wikipedia
Web APIs allow people to interact with the structures of an application to:
data
Best practices for web APIs are to use RESTful principles.
Think of some web services you might like to get data from. Perhaps they have APIs?
REST = REpresentational State Transfer
REST vs. SQL
GET ( ~ SELECT)
POST ( ~ UPDATE)
PUT ( ~ INSERT)
DELETE ( ~ DELETE)
Resource | GET | PUT | POST | DELETE |
---|---|---|---|---|
Collection URI, such as http://example.com/resources | List the URIs and perhaps other details of the collection's members. | Replace the entire collection with another collection. | Create a new entry in the collection. The new entry's URI is assigned automatically and is usually returned by the operation. | Delete the entire collection. |
Element URI, such as http://example.com/resources/item17 | Retrieve a representation of the addressed member of the collection, expressed in an appropriate Internet media type. | Replace the addressed member of the collection, or if it doesn't exist, create it. | Not generally used. Treat the addressed member as a collection in its own right and create a new entry in it. | Delete the addressed member of the collection. |
requests
library.First we will load our credentials which we keep in a YAML file for safe keeping.
import yaml
credentials = yaml.load(open('/Users/alessandro.gagliardi/api_cred.yml'))
Then we pass those credentials in to a GET request using the requests library. In this case, I am querying my own user data from Github:
import requests
r = requests.get('https://api.github.com/user',
auth=(credentials['USER'], credentials['PASS']))
Requests gives us an object from which we can read its content.
r.content
dict
object using the json
library:import json
user = json.loads(r.content)
user
print user.keys()
We can access values in this dict directly (such as my hireable status) and even render the url of my avatar:
from IPython.display import Image
print "Hireable: {}".format(user.get('hireable'))
Image(url=user.get('avatar_url'))
Twitter has no less than 10 python libraries. We'll be using Python Twitter Tools because it's what's used in Mining the Social Web.
import twitter
auth = twitter.oauth.OAuth(credentials['ACCESS_TOKEN'],
credentials['ACCESS_TOKEN_SECRET'],
credentials['API_KEY'],
credentials['API_SECRET'])
twitter_api = twitter.Twitter(auth=auth)
print twitter_api
Using a library like this, it's easy to do something like search for tweets mentioning #bigdata
The results are transformed into a Python object (which in this case is a thin wrapper around a dict
)
bigdata = twitter_api.search.tweets(q='#bigdata', count=5)
type(bigdata)
for status in bigdata['statuses']:
print status.get('text')
NoSQL databases are a new trend in databases
The name NoSQL refers to the lack of a relational structure between stored objects. Data are semi-structured.
Most importantly they attempt to minimize the need for JOIN operations, or solve other data needs
This is good for engineers but bad for data scientists.
Still, NoSQL databases have their uses.
Memcached:
Memcached is best used for storing application configuration settings, and essential caching those settings.
Cassandra:
Mongo:
When might you want to use a NoSQL database? When not?
MongoDB is an open-source document database that provides high performance, high availability, and automatic scaling.
A record in MongoDB is a document, which is a data structure composed of field and value pairs. MongoDB documents are similar to JSON objects. The values of fields may include other documents, arrays, and arrays of documents.
A MongoDB document.
The advantages of using documents are:
Notice how similar this looks to a Python dictionary.
DS_Lab04-API