{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CARTO Data Observatory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a basic template notebook to start exploring your new Dataset from CARTO's Data Observatory via the Python library [CARTOframes](https://carto.com/cartoframes).\n", "\n", "You can find more details about how to use CARTOframes in the [Quickstart guide](https://carto.com/developers/cartoframes/guides/Quickstart/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installation\n", "\n", "Make sure that you have the latest version installed. Please, find more information in the [Installation guide](https://carto.com/developers/cartoframes/guides/Installation/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install -U cartoframes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Note: a kernel restart is required after installing the library\n", "\n", "import cartoframes\n", "\n", "cartoframes.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Credentials" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to be able to use the Data Observatory via CARTOframes, you need to set your CARTO account credentials first." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from cartoframes.auth import set_default_credentials\n", "\n", "username = 'YOUR_USERNAME'\n", "api_key = 'YOUR_API_KEY' # Master API key. Do not make this file public!\n", "\n", "set_default_credentials(username, api_key)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**⚠️ Note about credentials**\n", "\n", "For security reasons, we recommend storing your credentials in an external file preventing publishing them by accident within your notebook. You can get more information in the section *Setting your credentials* of the [Authentication guide](https://carto.com/developers/cartoframes/guides/Authentication/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Dataset operations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Metadata exploration\n", "\n", "In this section we will explore some basic information regarding the Dataset you have licensed. More information on how to explore metadata associated to a Dataset is available in the [Data discovery guide](https://carto.com/developers/cartoframes/guides/Data-discovery/).\n", "\n", "In order to access the Dataset and its associeted metadata, you need to provide the \"ID\" which is a unique identifier of that Dataset. The IDs of your Datasets are available from Your Subscriptions page in the CARTO Dashboard and via the Discovery methods in CARTOFrames." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from cartoframes.data.observatory import Dataset\n", "\n", "dataset = Dataset.get('YOUR_ID')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Retrieve some general metadata about the Dataset\n", "dataset.to_dict()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Explore the first 10 rows of the Dataset\n", "dataset.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Explore the last 10 rows of the Dataset\n", "dataset.tail()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get the geographical coverage of the data\n", "dataset.geom_coverage()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Access the list of variables in the dataset\n", "dataset.variables.to_dataframe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Summary of some variable stats\n", "dataset.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Access the data\n", "\n", "Now that we have explored some basic information about the Dataset, we will proceed to download a sample of the Dataset into a dataframe so we can operate it in Python. \n", "\n", "Datasets can be downloaded in full or by applying a filter with a SQL query. More info on how to download the Dataset or portions of it is available in the [Data discovery guide](https://carto.com/developers/cartoframes/guides/Data-discovery/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Filter by SQL query\n", "query = \"SELECT * FROM $dataset$ LIMIT 50\"\n", "\n", "dataset_df = dataset.to_dataframe(sql_query=query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note about SQL filters**\n", "\n", "Our SQL filtering queries allow for any PostgreSQL and PostGIS operation, so you can filter the rows (by a WHERE condition) or the columns (using the SELECT). Some common examples are filtering the Dataset by bounding box or filtering by column value: \n", "\n", "```\n", "SELECT * FROM $dataset$ WHERE ST_IntersectsBox(geom, -74.044467,40.706128,-73.891345,40.837690)\n", "```\n", "\n", "```\n", "SELECT total_pop, geom FROM $dataset$\n", "```\n", "\n", "A good tool to get the bounding box details for a specific area is [bboxfinder.com](http://bboxfinder.com/#0.000000,0.000000,0.000000,0.000000)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# First 10 rows of the Dataset sample\n", "dataset_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualization\n", "\n", "Now that we have downloaded some data into a dataframe we can leverage the visualization capabilities of CARTOframes to build an interactive map.\n", "\n", "More info about building visualizations with CARTOframes is available in the [Visualization guide](https://carto.com/developers/cartoframes/guides/Visualization/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from cartoframes.viz import Layer\n", "\n", "Layer(dataset_df, geom_col='geom')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note about variables**\n", "\n", "CARTOframes allows you to make data-driven visualizations from your Dataset variables (columns) via the Style helpers. These functions provide out-of-the-box cartographic styles, legends, popups and widgets.\n", "\n", "Style helpers are also highly customizable to reach your desired visualization setting simple parameters. The helpers collection contains functions to visualize by color and size, and also by type: category, bins and continuous, depending on the type of the variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from cartoframes.viz import color_bins_style\n", "\n", "Layer(dataset_df, color_bins_style('YOUR_VARIABLE_ID'), geom_col='geom')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Upload to CARTO account\n", "\n", "In order to operate with the data in CARTO Builder or to build a CARTOFrames visualization reading the data from a table in the Cloud instead of having it in your local environment (with its benefits in performance), you can load the dataframe as a table in your CARTO account.\n", "\n", "More info in the [Data Management guide](https://carto.com/developers/cartoframes/guides/Data-management/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from cartoframes import to_carto\n", "\n", "to_carto(dataset_df, 'my_dataset', geom_col='geom')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Build a visualization reading the data from your CARTO account\n", "Layer('my_dataset')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Enrichment\n", "\n", "Enrichment is the process of adding variables to a geometry, which we call the target (point, line, polygon…) from a spatial Dataset, which we call the source. CARTOFrames has a set of methods for you to augment your data with new variables from a Dataset in the Data Observatory.\n", "\n", "In this example, you will need to load a dataframe with the geometries that you want to enrich with a variable or a group of variables from the Dataset. You can detail the variables to get from the Dataset by passing the variable's ID. You can get the variables IDs with the metadata methods.\n", "\n", "More info in the [Data enrichment guide](https://carto.com/developers/cartoframes/guides/Data-enrichment/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from cartoframes.data.observatory import Enrichment\n", "\n", "enriched_df = Enrichment().enrich_polygons(\n", " df, # Insert here the DataFrame to be enriched\n", " variables=['YOUR_VARIABLE_ID']\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save to file\n", "\n", "Finally, you can also export the data into a CSV file. More info in the [Data discovery guide](https://carto.com/developers/cartoframes/guides/Data-discovery/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Filter by SQL query\n", "query = \"SELECT * FROM $dataset$ LIMIT 50\"\n", "\n", "dataset.to_csv('my_dataset.csv', sql_query=query)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }