# Using cartoframes to investigate the Citibike system

What does the Citibike station system look like right now? Citibike publishes an open feed of station statuses. Let's use cartframes to process this data and send it to your CARTO account and create some maps.

`cartoframes` lets you use CARTO in a Python environment so that you can do all of your analysis and mapping in, for example, a Jupyter notebook. `cartoframes` allows you to use CARTO's functionality for data analysis, storage, location services like routing and geocoding, and visualization. `cartoframes` is based on working with data in a Pandas dataframe. Pandas is a handy python library for data analysis (https://pandas.pydata.org/)

Read the `cartoframes` docs here: http://cartoframes.readthedocs.io/en/latest/

You can view this notebook best on `nbviewer` here: <https://nbviewer.jupyter.org/github/CartoDB/cartoframes/blob/master/examples/Citibike%20Example.ipynb>, however
it is recommended to download this notebook, install cartoframes and dependencies, and use on your computer instead so you can more easily explore the functionality of `cartoframes`.

To get started, let's load the required packages, and set credentials.

In [14]:
import cartoframes

# For convenience we're getting Credentials, Layer, Basemap, and styling
from cartoframes import Credentials
from cartoframes import Layer, BaseMap, styling

import pandas as pd
%matplotlib inline

In [None]:
USERNAME = 'michellemho'  # <-- replace with your username 
APIKEY = 'abcdefg'       # <-- your CARTO API key
creds = Credentials(username=USERNAME, 
                    key=APIKEY)
cc = cartoframes.CartoContext(creds=creds)

Citibike system data can be found here: https://www.citibikenyc.com/system-data
We're going to use the real time data, which comes in General Bikeshare Feed Specification (GBFS) format as a series of JSON files.

In [17]:
# Use Pandas to read a JSON of Citibike stations and their statuses
stations_data = pd.read_json('https://gbfs.citibikenyc.com/gbfs/en/station_information.json')
stations = pd.DataFrame(stations_data.data[0])
status_data = pd.read_json('https://gbfs.citibikenyc.com/gbfs/en/station_status.json')
status = pd.DataFrame(status_data.data[0])

In [18]:
# Grab the last updated timestamps
timestamp_stations = stations_data.last_updated[0]
timestamp_status = status_data.last_updated[0]

In [19]:
# Join the station and statuses together by 'station_id'
station_status = pd.merge(stations,status,how='left', on='station_id')

In [81]:
# Preview the dataframe
station_status.head()

Unnamed: 0,capacity,eightd_has_key_dispenser,eightd_station_services,lat,lon,name,region_id,rental_methods,rental_url,short_name,...,eightd_has_available_keys,is_installed,is_renting,is_returning,last_reported,num_bikes_available,num_bikes_disabled,num_docks_available,num_docks_disabled,num_ebikes_available
0,39,False,,40.767272,-73.993929,W 52 St & 11 Ave,71.0,"[KEY, CREDITCARD]",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,6926.01,...,False,1,1,1,1523628435,2,0,37,0,0
1,33,False,,40.719116,-74.006667,Franklin St & W Broadway,71.0,"[KEY, CREDITCARD]",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,5430.08,...,False,1,1,1,1523628157,22,2,9,0,0
2,27,False,,40.711174,-74.000165,St James Pl & Pearl St,71.0,"[KEY, CREDITCARD]",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,5167.06,...,False,1,1,1,1523626231,17,1,9,0,0
3,62,False,,40.683826,-73.976323,Atlantic Ave & Fort Greene Pl,71.0,"[KEY, CREDITCARD]",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,4354.07,...,False,1,1,1,1523627612,42,1,19,0,0
4,19,False,,40.696089,-73.978034,Park Ave & St Edwards St,71.0,"[KEY, CREDITCARD]",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,4700.06,...,False,1,1,1,1523627405,6,0,13,0,0


## `cc.write`

`CartoContext` has several methods for interacting with [CARTO](https://carto.com) in a Python environment. The first one we're using is `cc.write` which will send a Pandas dataframe to your CARTO account.

In [21]:
# Write station status data to CARTO, using string-formatting to name the dataset with the timestmap
cc.write(station_status, 'cb_stations_status_{}'.format(timestamp_stations), lnglat=('lon','lat'), overwrite=True)



Table successfully written to CARTO: https://michellemho-carto.carto.com/dataset/cb_stations_status_1523628491


## `cc.map`

Now that we can inspect the data, we can map it to see how the values change over the geography. We can use the `cc.map` method for this purpose.

`cc.map` takes a `layers` argument which specifies the data layers that are to be visualized. They can be imported from `cartoframes` as below.

There are different types of layers:

* `Layer` for visualizing CARTO tables
* `QueryLayer` for visualizing arbitrary queries from tables in user's CARTO account
* `BaseMap` for specifying the base map to be used

Each of the layers has different styling options. `Layer` and `QueryLayer` take the same styling arguments, and `BaseMap` can be specified to be light/dark and options on label placement.

Maps can be `interactive` or not. Set interactivity with the `interactive` with `True` or `False`. If the map is static (not interactive), it will be embedded in the notebook as either a `matplotlib` axis or `IPython.Image`. Either way, the image will be transported with the notebook. Interactive maps will be embedded zoom and pan-able maps.

In [26]:
# Bring the data back as a map. Style by number of bikes available at each station
# Replace the name of the table with the correct timestamp!

cc.map(layers=[Layer('cb_stations_status_1523628491',
                      color={'column': 'num_bikes_available',
                             'scheme': styling.geyser(7, bin_method='quantiles')},
                      size=6),
               BaseMap(source='dark')],
       interactive=True)

## `cc.query`

`CartoContext` has several methods for retrieving data from your CARTO account into a Pandas dataframe. In this example, we'll use `cc.query` to pass in a SQL query and return the results.

In [43]:
# set up SQL query to find all the empty citibike stations
# cdb_isochrone is a function available through CARTO data services
# https://carto.com/docs/carto-engine/dataservices-api/isoline-functions/

empty_query = '''
        SELECT *
        FROM cb_stations_status_1523628491
        WHERE num_bikes_available = 0
        '''

In [None]:
# use cartoframes query method, and persist as a new table called empty_stations, also return results as dataframe
new_df = cc.query(empty_query, table_name="empty_stations")

In [57]:
new_df.head()

Unnamed: 0_level_0,capacity,eightd_active_station_services,eightd_has_available_keys,eightd_has_key_dispenser,eightd_station_services,is_installed,is_renting,is_returning,last_reported,lat,...,num_bikes_disabled,num_docks_available,num_docks_disabled,num_ebikes_available,region_id,rental_methods,rental_url,short_name,station_id,the_geom
cartodb_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6,19,,False,False,,1,1,1,1523623566,40.686768,...,0,19,0,0,71.0,"['KEY', 'CREDITCARD']",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,4452.03,120,0101000020E610000020D0FCDE647D52C054A5F302E857...
12,29,,False,False,,1,1,1,1523627512,40.720874,...,0,29,0,0,71.0,"['KEY', 'CREDITCARD']",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,5476.03,150,0101000020E610000062516C60C67E52C05F460C96455C...
14,29,,False,False,,1,0,0,1523366145,40.71474,...,0,29,0,0,71.0,"['KEY', 'CREDITCARD']",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,5288.09,152,0101000020E6100000ABF57632958052C0673F18997C5B...
21,30,,False,False,,1,1,1,1523628272,40.738177,...,1,29,0,0,71.0,"['KEY', 'CREDITCARD']",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,6004.07,174,0101000020E6100000AC1C9C808D7E52C07F164B917C5E...
31,31,,False,False,,1,0,0,1523543799,40.736197,...,0,31,0,0,71.0,"['KEY', 'CREDITCARD']",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,5964.01,238,0101000020E6100000EBE9C0C58C8052C029F686B13B5E...


In [80]:
# map the empty stations, style by capacity
cc.map(layers=[Layer('empty_stations',
                    color='capacity'),
               BaseMap(source='dark')],
       interactive=True)