{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CARTOFrames in Action\n", "\n", "Data science workflows that leverage CARTO\n", "\n", "## Motivation\n", "\n", "A large number of data scientists rely on the de facto standards of analysis in a Jupyter notebook. We want to support that by creating a Python module that allows these users to develop analyses while seamlessly interacting with CARTO. We aim to feature:\n", "\n", "* Stunning cartography from CARTO maps\n", "* Seamless reading and writing to CARTO for arbitrary updates to a DataFrame\n", "* Interactions with the Data Observatory to enrich a user's analysis\n", "\n", "\n", "## Basics\n", "\n", "You'll need the following for this:\n", "\n", "1. Your CARTO username\n", "2. Your API key\n", "3. Your favorite table (I recommend duplicating it and using the copy because we will do some oprations on it)\n", "\n", "Paste these values in the quotes (`''`) below." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import cartoframes\n", "\n", "username = '' # <-- insert your username here\n", "api_key = '' # <-- insert your API key here\n", "tablename = '' # <-- insert your tablename here\n", "\n", "cc = cartoframes.CartoContext('https://{}.carto.com/'.format(username),\n", " api_key)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read from your CARTO account" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "scrolled": false }, "outputs": [], "source": [ "df = cc.read(tablename)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make updates to the DataFrame and Sync Changes" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['favorite_cookie'] = 'pecan'\n", "df['favorite_cookie'][df.index % 2 == 0] = 'oatmeal'\n", "cc.write(df, tablename, overwrite=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Map it out" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df.carto_map(color='favorite_cookie')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### cartoframe from query\n", "\n", "Query your CARTO account and create a table from the query. Finally, pull that new table into a pandas DataFrame." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_buffer = cc.query(query='''\n", " SELECT ST_Buffer(the_geom::geography, 10000)::geometry as the_geom,\n", " cartodb_id, mag, depth, place\n", " FROM all_month_3\n", " LIMIT 100\n", " ''',\n", " tablename='buffered_earthquakes')\n", "df_buffer.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "print(df_buffer.get_carto_datapage())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model workflow\n", "\n", "Let's recreate the workflow from , where the author explores [`dask`](http://dask.pydata.org/en/latest/) for splitting up the computations between multiple cores in a machine to complete tasks more quickly. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from dask import dataframe as dd\n", "import pandas as pd\n", "columns = [\"name\", \"amenity\", \"Longitude\", \"Latitude\"]\n", "data = dd.read_csv('POIWorld.csv', usecols=columns)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "with_name = data[data.name.notnull()]\n", "with_amenity = data[data.amenity.notnull()]\n", "\n", "is_starbucks = with_name.name.str.contains('[Ss]tarbucks')\n", "is_dunkin = with_name.name.str.contains('[Dd]unkin')\n", "\n", "starbucks = with_name[is_starbucks].compute()\n", "dunkin = with_name[is_dunkin].compute()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "starbucks['type'] = 'starbucks'\n", "dunkin['type'] = 'dunkin'\n", "coffee_places = pd.concat([starbucks, dunkin])\n", "coffee_places.head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Write DataFrame to CARTO" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import cartoframes\n", "\n", "username = 'eschbacher'\n", "api_key = 'abcdefghijklmnopqrstuvwxyz'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# specify columns for lng/lat so carto will create a geometry\n", "cc.write(coffee_places,\n", " tablename='coffee_places',\n", " lnglat=('longitude', 'latitude'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's visualize this DataFrame\n", "\n", "Category map on Dunkin' Donuts vs. Starbucks (aka, color by 'type')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "scrolled": false }, "outputs": [], "source": [ "from cartoframes import Layer\n", "cc.map(layers=Layer('coffee_places', color='type', size=5),\n", " zoom=9, lng=-71.0637, lat=36.4275,\n", " interactive=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fast Food" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "is_fastfood = with_amenity.amenity.str.contains('fast_food')\n", "fastfood = with_amenity[is_fastfood]\n", "fastfood.name.value_counts().head(12)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ff = fastfood.compute()\n", "ff.sync_carto(username=username,\n", " api_key=api_key,\n", " requested_tablename='fastfood_dask',\n", " lnglat_cols=('longitude', 'latitude'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Number of Fast Food places in this OSM dump" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "len(ff)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Recreating the map from the blog" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cc.map(layers=Layer('fastfood_dask', size=2, color='#FFF'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Going crazy with the Data Observatory\n", "\n", "This method relies in you having the `do_augment_table` function that John had you load into your account. This might be kinda slow given that we have " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# DO measures: Total Population,\n", "# Children under 18 years of age\n", "# Median income\n", "\n", "data_obs_measures = [{'numer_id': 'us.census.acs.B01003001'},\n", " {'numer_id': 'us.census.acs.B17001001'},\n", " {'numer_id': 'us.census.acs.B19013001'}]\n", "cc.data_augment('coffee_places', data_obs_measures)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }