{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "bbW-dEmEIQ3T" }, "source": [ "## Revenue Prediction for Site Selection\n", "\n", "Whether it’s expansion, consolidation or performance monitoring, understanding revenue drivers is essential for **Site Planning** in many sectors such as Retail or Restaurant and Food Services.\n", "\n", "This notebook walks you through all the data collection and preparation steps required for building a revenue prediction model. The main steps followed are:\n", "1. Processing data. Geocoding\n", "2. Spatial analysis of client's data\n", "3. Enrichment\n", "\n", " 3.1 Calculate isochrones\n", "\n", " 3.2 Enrich isochrones\n", " \n", "4. Modeling hints\n", "\n", "\n", "We'll use [**CARTOframes**](https://carto.com/developers/cartoframes/) throughout the analysis. \n", "\n", "_**Note** this use case leverages premium datasets from CARTO's [Data Observatory](https://carto.com/spatial-data-catalog/)._\n", "\n", "\n", "### Use case description\n", "\n", "In order to show all the steps and functionality, we'll work with simulated sales data of Carrefour Express, a chain of small-sized supermarkets. \n", "\n", "Carreforu Express (CE) wants to reorganize (open/close) their stores in the city of Madrid (Spain). In order to define an optimal plan of openings and closures, they first need to understand why some stores are performing better (in terms of annual revenue) than others, and identify areas where they could have a high performance.\n", "\n", "They have provided us with the stores they have in the city of Madrid, together with the average annual sales of the last three years.\n", "\n", "_**Note** the annual sales are not Carrefour Express' actual data._" ] }, { "cell_type": "markdown", "metadata": { "id": "9W73O1xlIQ3V" }, "source": [ "### 0. Setup\n", "\n", "Import the packages we'll use.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 52 }, "id": "iEkn5au6IQ3W", "outputId": "8007db82-42e3-477e-e780-b67adaccf449" }, "outputs": [], "source": [ "import geopandas as gpd\n", "import ipywidgets as widgets\n", "import numpy as np\n", "import pandas as pd\n", "import pyproj\n", "\n", "from cartoframes.auth import set_default_credentials\n", "from cartoframes.data.observatory import *\n", "from cartoframes.data.services import Geocoding, Isolines\n", "from cartoframes.viz import *\n", "from IPython.display import clear_output, display\n", "from scipy.spatial.distance import cdist\n", "from shapely import wkt\n", "from shapely.geometry import Point\n", "\n", "pd.set_option('display.max_columns', 100)" ] }, { "cell_type": "markdown", "metadata": { "id": "0l20OJn-LH4t" }, "source": [ "In order to be able to use the Data Observatory via CARTOframes, you need to set your CARTO account credentials first.\n", "\n", "Please, visit the [Authentication guide](https://carto.com/developers/cartoframes/guides/Authentication/) for further detail." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "t2jXq3Y_IQ3a" }, "outputs": [], "source": [ "set_default_credentials('creds.json')" ] }, { "cell_type": "markdown", "metadata": { "id": "ftXeGg8WIQ3c" }, "source": [ "### 1. Load data\n", "\n", "We'll start loading the CSV file containing all stores in Madrid with their address and annual revenue. They 57 stores in total." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 197 }, "id": "HLo-3VYPb5W9", "outputId": "53116322-f26f-4827-d9ce-bac8af84db71" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namemainaddresslinepostcodeareaname2areaname1annual_sales
0CARREFOUR EXPRESS SAN BERNARDO 76CALLE DEL DIVINO PASTOR 7628004MADRIDCOMUNIDAD DE MADRID22.46
1CARREFOUR EXPRESS ARAVACAAVENIDA DE LA OSA MAYOR 6228023MADRIDCOMUNIDAD DE MADRID17.21
2CARREFOUR EXPRESS RAMIREZ DEL PRADOCALLE DE JUAN DE MARIANA 828045MADRIDCOMUNIDAD DE MADRID19.27
3CARREFOUR EXPRESS PASTORA IMPERIO. 1CALLE PASTORA IMPERIO 128036MADRIDCOMUNIDAD DE MADRID17.53
4CARREFOUR EXPRESS BÉJARCALLE DE BÉJAR 128028MADRIDCOMUNIDAD DE MADRID25.91
\n", "
" ], "text/plain": [ " name mainaddressline postcode \\\n", "0 CARREFOUR EXPRESS SAN BERNARDO 76 CALLE DEL DIVINO PASTOR 76 28004 \n", "1 CARREFOUR EXPRESS ARAVACA AVENIDA DE LA OSA MAYOR 62 28023 \n", "2 CARREFOUR EXPRESS RAMIREZ DEL PRADO CALLE DE JUAN DE MARIANA 8 28045 \n", "3 CARREFOUR EXPRESS PASTORA IMPERIO. 1 CALLE PASTORA IMPERIO 1 28036 \n", "4 CARREFOUR EXPRESS BÉJAR CALLE DE BÉJAR 1 28028 \n", "\n", " areaname2 areaname1 annual_sales \n", "0 MADRID COMUNIDAD DE MADRID 22.46 \n", "1 MADRID COMUNIDAD DE MADRID 17.21 \n", "2 MADRID COMUNIDAD DE MADRID 19.27 \n", "3 MADRID COMUNIDAD DE MADRID 17.53 \n", "4 MADRID COMUNIDAD DE MADRID 25.91 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stores = pd.read_csv('https://docs.google.com/spreadsheets/d/1RlOsWN3OBTS0Zhq2lbYvBrwXxSJjpCdWrOWHSqg2JVE/export?gid=0&format=csv')\n", "stores.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "id": "ji-2pcI3IQ3f", "outputId": "054e2cc6-df23-4559-bfe1-e63b493634c2" }, "outputs": [ { "data": { "text/plain": [ "(57, 6)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stores.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "5fpQM9GIIQ3h" }, "source": [ "### 2. Processing data. Geocoding\n", "\n", "We have the address of each store, but we need their coordinates in order to perform spatial analysis. We'll use [CARTOframes geocoding functionality](https://carto.com/developers/cartoframes/guides/Data-Services/#geocoding) for this." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "QohCA9rIMa4g" }, "outputs": [], "source": [ "stores['complete_address'] = stores.apply(lambda row : f\"{row['mainaddressline']}, {row['postcode']}\",axis=1) " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "vxhkSu-SIQ3i" }, "outputs": [], "source": [ "gc = Geocoding()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "id": "Zs5Tb2BjIQ3k", "outputId": "5a859852-ea31-4ee6-9aa9-a24f003fac62" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Success! Data geocoded correctly\n" ] } ], "source": [ "gdf, metadata = gc.geocode(stores, street='complete_address', city='areaname2', \n", " state='areaname1', country={'value': 'Spain'}, )" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 141 }, "id": "GbvJTYRPIQ3m", "outputId": "d9a402ed-270c-48f6-9c3a-9792078ebc16" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
the_geomnamemainaddresslinepostcodeareaname2areaname1annual_salescomplete_addressgc_status_relcarto_geocode_hash
0POINT (-3.70394 40.42763)CARREFOUR EXPRESS SAN BERNARDO 76CALLE DEL DIVINO PASTOR 7628004MADRIDCOMUNIDAD DE MADRID22.46CALLE DEL DIVINO PASTOR 76, 280040.993979e14d3cd5995e7e5809430b429da4
1POINT (-3.78261 40.45914)CARREFOUR EXPRESS ARAVACAAVENIDA DE LA OSA MAYOR 6228023MADRIDCOMUNIDAD DE MADRID17.21AVENIDA DE LA OSA MAYOR 62, 280231.0091271c1c6ff4bc5f03c148b1d567b4a7
\n", "
" ], "text/plain": [ " the_geom name \\\n", "0 POINT (-3.70394 40.42763) CARREFOUR EXPRESS SAN BERNARDO 76 \n", "1 POINT (-3.78261 40.45914) CARREFOUR EXPRESS ARAVACA \n", "\n", " mainaddressline postcode areaname2 areaname1 \\\n", "0 CALLE DEL DIVINO PASTOR 76 28004 MADRID COMUNIDAD DE MADRID \n", "1 AVENIDA DE LA OSA MAYOR 62 28023 MADRID COMUNIDAD DE MADRID \n", "\n", " annual_sales complete_address gc_status_rel \\\n", "0 22.46 CALLE DEL DIVINO PASTOR 76, 28004 0.99 \n", "1 17.21 AVENIDA DE LA OSA MAYOR 62, 28023 1.00 \n", "\n", " carto_geocode_hash \n", "0 3979e14d3cd5995e7e5809430b429da4 \n", "1 91271c1c6ff4bc5f03c148b1d567b4a7 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdf.head(2)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 107 }, "id": "_1cUt4ZYIQ3p", "outputId": "2970a24c-eddc-4683-950c-da6633ed1189" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namemainaddresslinepostcodeareaname2areaname1annual_salescomplete_addressgeometry
0CARREFOUR EXPRESS SAN BERNARDO 76CALLE DEL DIVINO PASTOR 7628004MADRIDCOMUNIDAD DE MADRID22.46CALLE DEL DIVINO PASTOR 76, 28004POINT (-3.70394 40.42763)
1CARREFOUR EXPRESS ARAVACAAVENIDA DE LA OSA MAYOR 6228023MADRIDCOMUNIDAD DE MADRID17.21AVENIDA DE LA OSA MAYOR 62, 28023POINT (-3.78261 40.45914)
\n", "
" ], "text/plain": [ " name mainaddressline postcode \\\n", "0 CARREFOUR EXPRESS SAN BERNARDO 76 CALLE DEL DIVINO PASTOR 76 28004 \n", "1 CARREFOUR EXPRESS ARAVACA AVENIDA DE LA OSA MAYOR 62 28023 \n", "\n", " areaname2 areaname1 annual_sales \\\n", "0 MADRID COMUNIDAD DE MADRID 22.46 \n", "1 MADRID COMUNIDAD DE MADRID 17.21 \n", "\n", " complete_address geometry \n", "0 CALLE DEL DIVINO PASTOR 76, 28004 POINT (-3.70394 40.42763) \n", "1 AVENIDA DE LA OSA MAYOR 62, 28023 POINT (-3.78261 40.45914) " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stores = gdf[stores.columns.tolist() + ['the_geom']].rename(columns={'the_geom':'geometry'})\n", "stores.head(2)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 674 }, "id": "FtClVwelIQ3t", "outputId": "5b67b707-1f36-4a40-8c79-7dc890eb9dc8" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " None\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " Static map image\n", " \n", " \n", "
\n", "
\n", "
\n", " \n", " \n", "
\n", "
\n", "
\n", "\n", " \n", "\n", "
\n", "
\n", " :\n", "
\n", " \n", " \n", "
\n", "
\n", "\n", "
\n", " StackTrace\n", "
    \n", "
    \n", "
    \n", "\n", "\n", "\n", "\n", "\n", "\">\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Map(Layer(stores, \n", " popup_hover=popup_element('name'),\n", " geom_col='geometry'))" ] }, { "cell_type": "markdown", "metadata": { "id": "V3D_lRlhIQ3v" }, "source": [ "### 3. Spatial Data Analysis\n", "\n", "Once we have the stores geocoded, we'll analyze the spatial distribution of annual sdales." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "Xde0dur3IQ3w" }, "outputs": [], "source": [ "stores['name'] = stores['name'].str[18:]" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 674 }, "id": "7rn_OSmYIQ30", "outputId": "83f9f6f7-c444-4940-f371-eef3e1008518" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " None\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " Static map image\n", " \n", " \n", "
    \n", "
    \n", "
    \n", " \n", " \n", "
    \n", "
    \n", " \n", "\n", "
    \n", " \n", " \n", " \n", " \n", " \n", "
    \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
    \n", " \n", " \n", "
    \n", "
    \n", "
    \n", "
    \n", " \n", "
    \n", "
    \n", "
    \n", "\n", " \n", "\n", "
    \n", "
    \n", " :\n", "
    \n", " \n", " \n", "
    \n", "
    \n", "\n", "
    \n", " StackTrace\n", "
      \n", "
      \n", "
      \n", "\n", "\n", "\n", "\n", "\n", "\">\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Map(Layer(stores, \n", " style=size_continuous_style('annual_sales'), \n", " popup_hover=[popup_element('name'), popup_element('annual_sales')],\n", " legends=size_continuous_legend('Annual Sales', 'Annual sales in million euros'),\n", " geom_col='geometry'))" ] }, { "cell_type": "markdown", "metadata": { "id": "gvYc-EHsIQ32" }, "source": [ "### 3. Enrichment\n", "\n", "Sales in a grocery store is mainly influenced by:\n", " - The characteristics of the population who live in the area around the store\n", " - Competitors\n", " - How busy the area around is (residential, touristic, work)\n", " - How many people move around the area\n", " \n", "In order to enrich our initial dataset with this information, we first need to define the area of influence (catchment area) of the different stores. Once we have the catchment area, we'll bring all the data related to that area from CARTO's [Data Observatory]((https://carto.com/spatial-data-catalog/))." ] }, { "cell_type": "markdown", "metadata": { "id": "35bCdSa0IQ32" }, "source": [ "#### 3.1 Isochrones\n", "\n", "Because of the characteristics of Carrefour Express' customers, we can define their catchment areas by acknowledging that:\n", "- Their customers usually make small to medium purchases\n", "- Their customers live close to the supermarket\n", "\n", "Therefore, we'll consider 5-minute-walking isochrones as their area of influence.\n", "\n", "Explore our [Guides](https://carto.com/developers/cartoframes/guides/Data-Services/#isolines) to learn more about isochornes." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "id": "7DD3IyQfIQ33" }, "outputs": [], "source": [ "iso_service = Isolines()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "id": "IT3RhwX4IQ35", "outputId": "bbc327c0-823e-44ea-f808-749f40b1e882" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Success! Isolines created correctly\n" ] } ], "source": [ "isochrones_gdf, _ = iso_service.isochrones(stores, [300, 600], mode='walk', exclusive=False)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 406 }, "id": "fNibdsOfIQ37", "outputId": "a16e2a6c-76c7-4064-e6ca-8074cea3f9c8" }, "outputs": [ { "data": { "text/html": [ "
      \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
      source_iddata_rangethe_geom
      00300MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40...
      10600MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40...
      21300MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40...
      31600MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40...
      42300MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40...
      \n", "
      " ], "text/plain": [ " source_id data_range the_geom\n", "0 0 300 MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40...\n", "1 0 600 MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40...\n", "2 1 300 MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40...\n", "3 1 600 MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40...\n", "4 2 300 MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40..." ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "isochrones_gdf.head()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 674 }, "id": "M_Dv0y9TIQ39", "outputId": "c109b16a-9bcd-4801-99a8-8519e7e049ef" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " None\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " Static map image\n", " \n", " \n", "
      \n", "
      \n", "
      \n", " \n", " \n", "
      \n", "
      \n", "
      \n", "\n", " \n", "\n", "
      \n", "
      \n", " :\n", "
      \n", " \n", " \n", "
      \n", "
      \n", "\n", "
      \n", " StackTrace\n", "
        \n", "
        \n", "
        \n", "\n", "\n", "\n", "\n", "\n", "\">\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Map(Layer(isochrones_gdf, geom_col='the_geom', style=basic_style(opacity=0.3)))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "id": "nJrApCV-IQ3_" }, "outputs": [], "source": [ "stores['iso_5walk'] = isochrones_gdf.loc[isochrones_gdf['data_range'] == 300, 'the_geom'].values\n", "stores['iso_10walk'] = isochrones_gdf.loc[isochrones_gdf['data_range'] == 600, 'the_geom'].values" ] }, { "cell_type": "markdown", "metadata": { "id": "_lB51vZMIQ4D" }, "source": [ "#### 3.2 Enrichment\n", "\n", "Once we have the area of influence of every store, we can enrich our initial data." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "enrichment = Enrichment()" ] }, { "cell_type": "markdown", "metadata": { "id": "yKB4zGBLIQ4D" }, "source": [ "##### 3.2.1 POIs\n", "\n", "We'll start by enriching with POI data. We'll calculate for every store:\n", " - The **number of POIs** within the 5-minute-walk isochrone. This will give us a measurement of how commercially busy the area is.\n", " - The **number of competitors** within the 10-minute-walk isochrone. Note we're taking here 10 minutes because we are interested in knowing all competitors that people living within the 5-minute-walk isochrone can reach in a 5-minute walk.\n", " \n", "We will use [Pitney Bowes' Points Of Interest](https://carto.com/spatial-data-catalog/browser/dataset/pb_points_of_i_94bda91b/) premium dataset.\n", "\n", "Take a look at this template for more details on how to access and download a premium dataset.\n", "\n", "For more details on how to discover a dataset, please check this notebook or take a look at our [Guides](https://carto.com/developers/cartoframes/guides/Data-Observatory/#data-discovery)." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "id": "YP7ZARCrIQ4K" }, "outputs": [], "source": [ "dataset = Dataset.get('pb_points_of_i_94bda91b')" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
        \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
        slugnamedescriptiondb_typeagg_methodcolumn_namevariable_group_iddataset_idid
        0geoid_17c119efgeoidUnique numeric identifier, concatenated with l...STRINGNonegeoidNonecarto-do.pitney_bowes.pointsofinterest_pointso...carto-do.pitney_bowes.pointsofinterest_pointso...
        1do_label_ecfa983bdo_labelPrimary / Registered name of the businessSTRINGNonedo_labelNonecarto-do.pitney_bowes.pointsofinterest_pointso...carto-do.pitney_bowes.pointsofinterest_pointso...
        2do_date_4f2cf0b3do_dateFirst day of the month for the deliveryDATENonedo_dateNonecarto-do.pitney_bowes.pointsofinterest_pointso...carto-do.pitney_bowes.pointsofinterest_pointso...
        3NAME_ed9f897NAMEPrimary / Registered name of the businessSTRINGNoneNAMENonecarto-do.pitney_bowes.pointsofinterest_pointso...carto-do.pitney_bowes.pointsofinterest_pointso...
        4BRANDNAME_a7ebfc28BrandPB standardized Brand Name used by the businessSTRINGNoneBRANDNAMENonecarto-do.pitney_bowes.pointsofinterest_pointso...carto-do.pitney_bowes.pointsofinterest_pointso...
        \n", "
        " ], "text/plain": [ " slug name \\\n", "0 geoid_17c119ef geoid \n", "1 do_label_ecfa983b do_label \n", "2 do_date_4f2cf0b3 do_date \n", "3 NAME_ed9f897 NAME \n", "4 BRANDNAME_a7ebfc28 Brand \n", "\n", " description db_type agg_method \\\n", "0 Unique numeric identifier, concatenated with l... STRING None \n", "1 Primary / Registered name of the business STRING None \n", "2 First day of the month for the delivery DATE None \n", "3 Primary / Registered name of the business STRING None \n", "4 PB standardized Brand Name used by the business STRING None \n", "\n", " column_name variable_group_id \\\n", "0 geoid None \n", "1 do_label None \n", "2 do_date None \n", "3 NAME None \n", "4 BRANDNAME None \n", "\n", " dataset_id \\\n", "0 carto-do.pitney_bowes.pointsofinterest_pointso... \n", "1 carto-do.pitney_bowes.pointsofinterest_pointso... \n", "2 carto-do.pitney_bowes.pointsofinterest_pointso... \n", "3 carto-do.pitney_bowes.pointsofinterest_pointso... \n", "4 carto-do.pitney_bowes.pointsofinterest_pointso... \n", "\n", " id \n", "0 carto-do.pitney_bowes.pointsofinterest_pointso... \n", "1 carto-do.pitney_bowes.pointsofinterest_pointso... \n", "2 carto-do.pitney_bowes.pointsofinterest_pointso... \n", "3 carto-do.pitney_bowes.pointsofinterest_pointso... \n", "4 carto-do.pitney_bowes.pointsofinterest_pointso... " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.variables.to_dataframe().head()" ] }, { "cell_type": "markdown", "metadata": { "id": "gdnJUVonIQ4b" }, "source": [ "###### Number of POIs\n", "\n", "We'll calculate the number of POIs within each store's catchment area. This gives us an idea of how busy the area is." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "id": "ykYE3gS2IQ4c" }, "outputs": [], "source": [ "enriched_dataset_gdf = enrichment.enrich_polygons(\n", " stores,\n", " variables=['CLASS_517d6003'],\n", " aggregation='COUNT',\n", " geom_col='iso_5walk'\n", ")" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 281 }, "id": "ktvWv0D-IQ4d", "outputId": "2caca45a-2787-48cd-f6bd-7e8546a1b919" }, "outputs": [ { "data": { "text/html": [ "
        \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
        namemainaddresslinepostcodeareaname2areaname1annual_salescomplete_addressgeometryiso_5walkiso_10walkCLASS
        0SAN BERNARDO 76CALLE DEL DIVINO PASTOR 7628004MADRIDCOMUNIDAD DE MADRID22.46CALLE DEL DIVINO PASTOR 76, 28004POINT (-3.70394 40.42763)MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40...MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40...21752
        1ARAVACAAVENIDA DE LA OSA MAYOR 6228023MADRIDCOMUNIDAD DE MADRID17.21AVENIDA DE LA OSA MAYOR 62, 28023POINT (-3.78261 40.45914)MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40...MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40...6450
        2RAMIREZ DEL PRADOCALLE DE JUAN DE MARIANA 828045MADRIDCOMUNIDAD DE MADRID19.27CALLE DE JUAN DE MARIANA 8, 28045POINT (-3.68693 40.39849)MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40...MULTIPOLYGON (((-3.69149 40.39690, -3.69141 40...2622
        3PASTORA IMPERIO. 1CALLE PASTORA IMPERIO 128036MADRIDCOMUNIDAD DE MADRID17.53CALLE PASTORA IMPERIO 1, 28036POINT (-3.67468 40.48099)MULTIPOLYGON (((-3.67553 40.48325, -3.67493 40...MULTIPOLYGON (((-3.67553 40.48462, -3.67510 40...2777
        4BÉJARCALLE DE BÉJAR 128028MADRIDCOMUNIDAD DE MADRID25.91CALLE DE BÉJAR 1, 28028POINT (-3.67456 40.43516)MULTIPOLYGON (((-3.67621 40.43724, -3.67561 40...MULTIPOLYGON (((-3.68033 40.43655, -3.67990 40...16508
        \n", "
        " ], "text/plain": [ " name mainaddressline postcode areaname2 \\\n", "0 SAN BERNARDO 76 CALLE DEL DIVINO PASTOR 76 28004 MADRID \n", "1 ARAVACA AVENIDA DE LA OSA MAYOR 62 28023 MADRID \n", "2 RAMIREZ DEL PRADO CALLE DE JUAN DE MARIANA 8 28045 MADRID \n", "3 PASTORA IMPERIO. 1 CALLE PASTORA IMPERIO 1 28036 MADRID \n", "4 BÉJAR CALLE DE BÉJAR 1 28028 MADRID \n", "\n", " areaname1 annual_sales complete_address \\\n", "0 COMUNIDAD DE MADRID 22.46 CALLE DEL DIVINO PASTOR 76, 28004 \n", "1 COMUNIDAD DE MADRID 17.21 AVENIDA DE LA OSA MAYOR 62, 28023 \n", "2 COMUNIDAD DE MADRID 19.27 CALLE DE JUAN DE MARIANA 8, 28045 \n", "3 COMUNIDAD DE MADRID 17.53 CALLE PASTORA IMPERIO 1, 28036 \n", "4 COMUNIDAD DE MADRID 25.91 CALLE DE BÉJAR 1, 28028 \n", "\n", " geometry \\\n", "0 POINT (-3.70394 40.42763) \n", "1 POINT (-3.78261 40.45914) \n", "2 POINT (-3.68693 40.39849) \n", "3 POINT (-3.67468 40.48099) \n", "4 POINT (-3.67456 40.43516) \n", "\n", " iso_5walk \\\n", "0 MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40... \n", "1 MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40... \n", "2 MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40... \n", "3 MULTIPOLYGON (((-3.67553 40.48325, -3.67493 40... \n", "4 MULTIPOLYGON (((-3.67621 40.43724, -3.67561 40... \n", "\n", " iso_10walk CLASS \n", "0 MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40... 21752 \n", "1 MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40... 6450 \n", "2 MULTIPOLYGON (((-3.69149 40.39690, -3.69141 40... 2622 \n", "3 MULTIPOLYGON (((-3.67553 40.48462, -3.67510 40... 2777 \n", "4 MULTIPOLYGON (((-3.68033 40.43655, -3.67990 40... 16508 " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "enriched_dataset_gdf.head()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "id": "QafF4MBsIQ4g" }, "outputs": [], "source": [ "stores['n_pois'] = enriched_dataset_gdf['CLASS'].values" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " None\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " Static map image\n", " \n", " \n", "
        \n", "
        \n", "
        \n", " \n", " \n", "
        \n", "
        \n", " \n", "\n", "
        \n", " \n", " \n", " \n", " \n", " \n", "
        \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
        \n", " \n", " \n", "
        \n", "
        \n", "
        \n", "
        \n", " \n", "
        \n", "
        \n", "
        \n", "\n", " \n", "\n", "
        \n", "
        \n", " :\n", "
        \n", " \n", " \n", "
        \n", "
        \n", "\n", "
        \n", " StackTrace\n", "
          \n", "
          \n", "
          \n", "\n", "\n", "\n", "\n", "\n", "\">\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Map(Layer(stores, geom_col='iso_5walk', style=color_bins_style('n_pois')))" ] }, { "cell_type": "markdown", "metadata": { "id": "hX4G5Ht0IQ4i" }, "source": [ "###### Number of competitors\n", "\n", "Next, we'll count the number of competitors within the 10 minute isochrone. This represents all the supermarkets that customers within our stores can find within a 5-minute-walk distance.\n", "\n", "_**Note** we apply a filter to only count grocery stores (competitors)._" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "id": "OmA2kpX6IQ4i" }, "outputs": [], "source": [ "enriched_dataset_gdf = enrichment.enrich_polygons(\n", " stores,\n", " variables=['CLASS_517d6003'],\n", " aggregation='COUNT',\n", " geom_col='iso_10walk',\n", " filters={'carto-do.pitney_bowes.pointsofinterest_pointsofinterest_esp_latlon_v1_monthly_v1.CLASS': \n", " \"= 'GROCERY STORES'\"}\n", ")" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 187 }, "id": "h5qWkw2OIQ4k", "outputId": "29aae9dd-5b51-443d-adaf-823b886b3163" }, "outputs": [ { "data": { "text/html": [ "
          \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
          namemainaddresslinepostcodeareaname2areaname1annual_salescomplete_addressgeometryiso_5walkiso_10walkn_poisn_competitors
          0SAN BERNARDO 76CALLE DEL DIVINO PASTOR 7628004MADRIDCOMUNIDAD DE MADRID22.46CALLE DEL DIVINO PASTOR 76, 28004POINT (-3.70394 40.42763)MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40...MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40...217521241
          1ARAVACAAVENIDA DE LA OSA MAYOR 6228023MADRIDCOMUNIDAD DE MADRID17.21AVENIDA DE LA OSA MAYOR 62, 28023POINT (-3.78261 40.45914)MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40...MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40...6450199
          2RAMIREZ DEL PRADOCALLE DE JUAN DE MARIANA 828045MADRIDCOMUNIDAD DE MADRID19.27CALLE DE JUAN DE MARIANA 8, 28045POINT (-3.68693 40.39849)MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40...MULTIPOLYGON (((-3.69149 40.39690, -3.69141 40...2622108
          \n", "
          " ], "text/plain": [ " name mainaddressline postcode areaname2 \\\n", "0 SAN BERNARDO 76 CALLE DEL DIVINO PASTOR 76 28004 MADRID \n", "1 ARAVACA AVENIDA DE LA OSA MAYOR 62 28023 MADRID \n", "2 RAMIREZ DEL PRADO CALLE DE JUAN DE MARIANA 8 28045 MADRID \n", "\n", " areaname1 annual_sales complete_address \\\n", "0 COMUNIDAD DE MADRID 22.46 CALLE DEL DIVINO PASTOR 76, 28004 \n", "1 COMUNIDAD DE MADRID 17.21 AVENIDA DE LA OSA MAYOR 62, 28023 \n", "2 COMUNIDAD DE MADRID 19.27 CALLE DE JUAN DE MARIANA 8, 28045 \n", "\n", " geometry \\\n", "0 POINT (-3.70394 40.42763) \n", "1 POINT (-3.78261 40.45914) \n", "2 POINT (-3.68693 40.39849) \n", "\n", " iso_5walk \\\n", "0 MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40... \n", "1 MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40... \n", "2 MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40... \n", "\n", " iso_10walk n_pois n_competitors \n", "0 MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40... 21752 1241 \n", "1 MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40... 6450 199 \n", "2 MULTIPOLYGON (((-3.69149 40.39690, -3.69141 40... 2622 108 " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stores['n_competitors'] = enriched_dataset_gdf['CLASS'].values\n", "stores.head(3)" ] }, { "cell_type": "markdown", "metadata": { "id": "16IHBl1IIQ4q" }, "source": [ "##### 3.2.2 Sociodemographic and socioeconomic data" ] }, { "cell_type": "markdown", "metadata": { "id": "nTWpc04aK4KK" }, "source": [ "Now we'll enrich our dataframe with [Unica360 Sociodemographics](https://carto.com/spatial-data-catalog/browser/dataset/u360_sociodemogr_28e93b81/) premium dataset.\n", "\n", "For more details on how to discover a dataset, please check this notebook or take a look at our [Guides](https://carto.com/developers/cartoframes/guides/Data-Observatory/#data-discovery)." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "id": "NO41d37VIQ4q" }, "outputs": [], "source": [ "dataset = Dataset.get('u360_sociodemogr_28e93b81')" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 354 }, "id": "fOZ_6ttGIQ4s", "outputId": "a635d6d2-9de1-4893-9534-888dd60ff7e4" }, "outputs": [ { "data": { "text/html": [ "
          \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
          HOGP_Tid_xid_ygeoiddo_dateEURO_RICEXTR_NACINM_POBRn_fincasUE_28_NORn_viv_inmrenta_indP_ED_00_04_HP_ED_00_04_MP_ED_00_14_HP_ED_00_14_MP_ED_15_24_HP_ED_15_24_MP_ED_25_44_HP_ED_25_44_MP_ED_45_64_HP_ED_45_64_MP_ED_65_79_HP_ED_65_79_MP_ED_80_MAS_HP_ED_80_MAS_MP_EST_RESTO_Mrenta_hab_disprenta_hog_netarenta_tit_disprenta_tit_brutaP_EST_4_2_GRADO_MP_EST_5_3_GRADO_MC02_09_GASTO_M_ocio_MC02_06_GASTO_M_salud_MC02_12_GASTO_M_otros_MC02_03_GASTO_M_textil_MC02_10_GASTO_M_ensena_MC02_11_GASTO_M_horeca_MC02_02_GASTO_M_bebidas_MC02_05_GASTO_M_muebles_MC02_04_GASTO_M_vivienda_MC02_07_GASTO_M_transporte_MC02_01_GASTO_M_alimentacion_MC02_08_GASTO_M_comunicaciones_M
          024.29612764.370032-1977832977-19778#333522019-01-01 00:00:00+00:001.0292726.0852524.971238292.654434290.7170991.1909361.1774803.2998532.9403353.8557683.1800148.7206237.6629829.11523710.1097764.3418536.2312762.0152622.8970540.38923012202.75758520100.85756514462.46091517616.9812230.4991760.1115952221.969272800.4213052229.0702791787.518852221.5689512746.878966627.4792861390.0496656942.0661654156.1288644465.605857990.745515
          158.447519149.641189-1817333096-18173#331032019-01-01 00:00:00+00:004.51606529.09822922.820101322.607527500.9528742.4141082.2501926.6083735.3428837.7304539.15262123.43127022.46466221.90898122.2247517.11821510.8026374.8173598.0389850.98806211932.56693426709.81276919217.56929923409.2634360.0064910.0054472156.230515771.5559002779.2205101691.092649193.6232522609.000000560.7909671424.3728077148.1066604006.0549254364.704902934.180016
          226.60415573.959551-1811632687-18116#330852019-01-01 00:00:00+00:000.4584272.2921351.642697404.487540340.8136671.4516851.0314615.3101124.1640455.4247194.1640459.3595519.97078712.22471912.3775283.7056183.4764041.7573032.0247190.2911177583.81576622807.72568916410.03824319989.3598530.5971570.1117262559.120567680.1581671884.0000001836.091586423.4188383389.951794546.3129331423.7167496341.0693193706.0695304154.2314131016.996320
          333.04591492.881935-1721832661-17218#322972019-01-01 00:00:00+00:000.2562982.6538182.482952261.291380430.7940651.5364141.6645635.5422216.2216056.2819576.14228712.03434813.00800515.31738914.5532374.4410794.6688841.8021942.8687290.25394110387.57801622258.27652616014.71247019507.8064880.6107000.1353582426.496024654.8801892875.5041191770.786627280.5636352609.000000487.3818851363.9285088477.9115594339.1163124546.185871981.843483
          442.791565120.525826-1716632306-17166#324852019-01-01 00:00:00+00:003.51067742.34498638.866111340.507045430.7502592.7097322.2559958.1615227.4544147.0642806.44250117.22842618.94640119.49554517.4197626.4949816.9375951.9607442.9196540.4103229230.86423521030.35464215131.22915618431.6197280.5158510.0738282341.058219646.4555931946.3755961817.761132216.9907832713.951569531.6139871439.4184796568.5859943874.2520984230.469594999.165857
          537.239490106.338527-1716332387-17163#326132019-01-01 00:00:00+00:000.7485085.3955254.489339415.261968410.9345452.3273111.8372048.2435167.2991526.4345596.31676616.63679715.92445017.22933418.2930443.8072553.7028080.9423471.5085000.30109215206.52273126196.02180318847.89941122958.9619610.6305200.0683882455.607492661.7801952471.4105721978.238066464.3627183271.408753570.5809451538.3252157026.8625163642.6868004136.9909771097.559424
          668.027299186.194490-1716232685-17162#324832019-01-01 00:00:00+00:003.10818914.19523310.235228381.479630760.7924002.2744462.5237108.5994788.4817899.0118328.98009527.89193822.83085531.27187129.65882112.52774612.3871366.1302058.4227240.4167789131.44638722211.58072515981.11508819466.8809180.4844020.0988202244.234898740.4580902297.9392371711.685763141.4158021862.537500614.9297501266.8045576851.5816543714.8997534079.128157965.067165
          730.68729680.303223-1714432482-17144#323072019-01-01 00:00:00+00:000.6512568.2165497.610024273.388013400.9496621.4123461.2762255.1284294.9378964.3901723.82553710.48171910.69610612.81406913.4076704.1362376.0104811.6131992.8617090.58103010767.00137726619.78011419152.79127923330.3561760.2954560.1235142363.386639678.9239392879.0000001743.381029255.1104222144.928980525.4045361320.4764236952.6039623723.9780573545.675798965.785980
          833.39070085.328830-1713232486-17132#324822019-01-01 00:00:00+00:001.35764612.40315610.880929343.930721441.0463941.7145381.9900385.8423365.9152534.2953733.10350711.44528513.66982811.05946612.9554445.1297497.0336041.9321832.9468020.2429068990.67499129331.23766921103.67067825706.7571170.5280090.2290852409.842055715.7708091994.8528641793.226386336.8812112609.000000592.5121281375.2574466410.5539814015.7686414377.518211985.091840
          978.866892233.102278-1508632758-15086#337172019-01-01 00:00:00+00:004.77908953.60958646.411882242.951889720.7599317.5756296.26784821.26474316.36220315.12914812.27344343.32668435.64480230.17967930.7555708.87951911.7573332.9109794.6181770.2566799363.25649921301.46281915326.29005818669.2268870.6445140.0988082486.019486641.8567652879.0000001816.883883420.3585913391.000000540.7800051323.1987387017.3702484413.2829174442.802609977.844271
          \n", "
          " ], "text/plain": [ " HOG P_T id_x id_y geoid \\\n", "0 24.296127 64.370032 -19778 32977 -19778#33352 \n", "1 58.447519 149.641189 -18173 33096 -18173#33103 \n", "2 26.604155 73.959551 -18116 32687 -18116#33085 \n", "3 33.045914 92.881935 -17218 32661 -17218#32297 \n", "4 42.791565 120.525826 -17166 32306 -17166#32485 \n", "5 37.239490 106.338527 -17163 32387 -17163#32613 \n", "6 68.027299 186.194490 -17162 32685 -17162#32483 \n", "7 30.687296 80.303223 -17144 32482 -17144#32307 \n", "8 33.390700 85.328830 -17132 32486 -17132#32482 \n", "9 78.866892 233.102278 -15086 32758 -15086#33717 \n", "\n", " do_date EURO_RIC EXTR_NAC INM_POBR n_fincas \\\n", "0 2019-01-01 00:00:00+00:00 1.029272 6.085252 4.971238 29 \n", "1 2019-01-01 00:00:00+00:00 4.516065 29.098229 22.820101 32 \n", "2 2019-01-01 00:00:00+00:00 0.458427 2.292135 1.642697 40 \n", "3 2019-01-01 00:00:00+00:00 0.256298 2.653818 2.482952 26 \n", "4 2019-01-01 00:00:00+00:00 3.510677 42.344986 38.866111 34 \n", "5 2019-01-01 00:00:00+00:00 0.748508 5.395525 4.489339 41 \n", "6 2019-01-01 00:00:00+00:00 3.108189 14.195233 10.235228 38 \n", "7 2019-01-01 00:00:00+00:00 0.651256 8.216549 7.610024 27 \n", "8 2019-01-01 00:00:00+00:00 1.357646 12.403156 10.880929 34 \n", "9 2019-01-01 00:00:00+00:00 4.779089 53.609586 46.411882 24 \n", "\n", " UE_28_NOR n_viv_inm renta_ind P_ED_00_04_H P_ED_00_04_M P_ED_00_14_H \\\n", "0 2.654434 29 0.717099 1.190936 1.177480 3.299853 \n", "1 2.607527 50 0.952874 2.414108 2.250192 6.608373 \n", "2 4.487540 34 0.813667 1.451685 1.031461 5.310112 \n", "3 1.291380 43 0.794065 1.536414 1.664563 5.542221 \n", "4 0.507045 43 0.750259 2.709732 2.255995 8.161522 \n", "5 5.261968 41 0.934545 2.327311 1.837204 8.243516 \n", "6 1.479630 76 0.792400 2.274446 2.523710 8.599478 \n", "7 3.388013 40 0.949662 1.412346 1.276225 5.128429 \n", "8 3.930721 44 1.046394 1.714538 1.990038 5.842336 \n", "9 2.951889 72 0.759931 7.575629 6.267848 21.264743 \n", "\n", " P_ED_00_14_M P_ED_15_24_H P_ED_15_24_M P_ED_25_44_H P_ED_25_44_M \\\n", "0 2.940335 3.855768 3.180014 8.720623 7.662982 \n", "1 5.342883 7.730453 9.152621 23.431270 22.464662 \n", "2 4.164045 5.424719 4.164045 9.359551 9.970787 \n", "3 6.221605 6.281957 6.142287 12.034348 13.008005 \n", "4 7.454414 7.064280 6.442501 17.228426 18.946401 \n", "5 7.299152 6.434559 6.316766 16.636797 15.924450 \n", "6 8.481789 9.011832 8.980095 27.891938 22.830855 \n", "7 4.937896 4.390172 3.825537 10.481719 10.696106 \n", "8 5.915253 4.295373 3.103507 11.445285 13.669828 \n", "9 16.362203 15.129148 12.273443 43.326684 35.644802 \n", "\n", " P_ED_45_64_H P_ED_45_64_M P_ED_65_79_H P_ED_65_79_M P_ED_80_MAS_H \\\n", "0 9.115237 10.109776 4.341853 6.231276 2.015262 \n", "1 21.908981 22.224751 7.118215 10.802637 4.817359 \n", "2 12.224719 12.377528 3.705618 3.476404 1.757303 \n", "3 15.317389 14.553237 4.441079 4.668884 1.802194 \n", "4 19.495545 17.419762 6.494981 6.937595 1.960744 \n", "5 17.229334 18.293044 3.807255 3.702808 0.942347 \n", "6 31.271871 29.658821 12.527746 12.387136 6.130205 \n", "7 12.814069 13.407670 4.136237 6.010481 1.613199 \n", "8 11.059466 12.955444 5.129749 7.033604 1.932183 \n", "9 30.179679 30.755570 8.879519 11.757333 2.910979 \n", "\n", " P_ED_80_MAS_M P_EST_RESTO_M renta_hab_disp renta_hog_neta \\\n", "0 2.897054 0.389230 12202.757585 20100.857565 \n", "1 8.038985 0.988062 11932.566934 26709.812769 \n", "2 2.024719 0.291117 7583.815766 22807.725689 \n", "3 2.868729 0.253941 10387.578016 22258.276526 \n", "4 2.919654 0.410322 9230.864235 21030.354642 \n", "5 1.508500 0.301092 15206.522731 26196.021803 \n", "6 8.422724 0.416778 9131.446387 22211.580725 \n", "7 2.861709 0.581030 10767.001377 26619.780114 \n", "8 2.946802 0.242906 8990.674991 29331.237669 \n", "9 4.618177 0.256679 9363.256499 21301.462819 \n", "\n", " renta_tit_disp renta_tit_bruta P_EST_4_2_GRADO_M P_EST_5_3_GRADO_M \\\n", "0 14462.460915 17616.981223 0.499176 0.111595 \n", "1 19217.569299 23409.263436 0.006491 0.005447 \n", "2 16410.038243 19989.359853 0.597157 0.111726 \n", "3 16014.712470 19507.806488 0.610700 0.135358 \n", "4 15131.229156 18431.619728 0.515851 0.073828 \n", "5 18847.899411 22958.961961 0.630520 0.068388 \n", "6 15981.115088 19466.880918 0.484402 0.098820 \n", "7 19152.791279 23330.356176 0.295456 0.123514 \n", "8 21103.670678 25706.757117 0.528009 0.229085 \n", "9 15326.290058 18669.226887 0.644514 0.098808 \n", "\n", " C02_09_GASTO_M_ocio_M C02_06_GASTO_M_salud_M C02_12_GASTO_M_otros_M \\\n", "0 2221.969272 800.421305 2229.070279 \n", "1 2156.230515 771.555900 2779.220510 \n", "2 2559.120567 680.158167 1884.000000 \n", "3 2426.496024 654.880189 2875.504119 \n", "4 2341.058219 646.455593 1946.375596 \n", "5 2455.607492 661.780195 2471.410572 \n", "6 2244.234898 740.458090 2297.939237 \n", "7 2363.386639 678.923939 2879.000000 \n", "8 2409.842055 715.770809 1994.852864 \n", "9 2486.019486 641.856765 2879.000000 \n", "\n", " C02_03_GASTO_M_textil_M C02_10_GASTO_M_ensena_M C02_11_GASTO_M_horeca_M \\\n", "0 1787.518852 221.568951 2746.878966 \n", "1 1691.092649 193.623252 2609.000000 \n", "2 1836.091586 423.418838 3389.951794 \n", "3 1770.786627 280.563635 2609.000000 \n", "4 1817.761132 216.990783 2713.951569 \n", "5 1978.238066 464.362718 3271.408753 \n", "6 1711.685763 141.415802 1862.537500 \n", "7 1743.381029 255.110422 2144.928980 \n", "8 1793.226386 336.881211 2609.000000 \n", "9 1816.883883 420.358591 3391.000000 \n", "\n", " C02_02_GASTO_M_bebidas_M C02_05_GASTO_M_muebles_M \\\n", "0 627.479286 1390.049665 \n", "1 560.790967 1424.372807 \n", "2 546.312933 1423.716749 \n", "3 487.381885 1363.928508 \n", "4 531.613987 1439.418479 \n", "5 570.580945 1538.325215 \n", "6 614.929750 1266.804557 \n", "7 525.404536 1320.476423 \n", "8 592.512128 1375.257446 \n", "9 540.780005 1323.198738 \n", "\n", " C02_04_GASTO_M_vivienda_M C02_07_GASTO_M_transporte_M \\\n", "0 6942.066165 4156.128864 \n", "1 7148.106660 4006.054925 \n", "2 6341.069319 3706.069530 \n", "3 8477.911559 4339.116312 \n", "4 6568.585994 3874.252098 \n", "5 7026.862516 3642.686800 \n", "6 6851.581654 3714.899753 \n", "7 6952.603962 3723.978057 \n", "8 6410.553981 4015.768641 \n", "9 7017.370248 4413.282917 \n", "\n", " C02_01_GASTO_M_alimentacion_M C02_08_GASTO_M_comunicaciones_M \n", "0 4465.605857 990.745515 \n", "1 4364.704902 934.180016 \n", "2 4154.231413 1016.996320 \n", "3 4546.185871 981.843483 \n", "4 4230.469594 999.165857 \n", "5 4136.990977 1097.559424 \n", "6 4079.128157 965.067165 \n", "7 3545.675798 965.785980 \n", "8 4377.518211 985.091840 \n", "9 4442.802609 977.844271 " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###### Explore and identify the variables of interest\n", "\n", "We can get a detailed description of every variable." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 176 }, "id": "UGUv13f5IQ42", "outputId": "889ba767-ed3d-4c4a-800c-c20de1a6bcde" }, "outputs": [ { "data": { "text/plain": [ "{'slug': 'C02_01_GASTO_M__7ad08d93',\n", " 'name': 'C02_01_GASTO_M_alimentacion_M',\n", " 'description': 'Average household spend by the 12 COICOP categories',\n", " 'db_type': 'FLOAT',\n", " 'agg_method': 'AVG',\n", " 'column_name': 'C02_01_GASTO_M_alimentacion_M',\n", " 'variable_group_id': 'carto-do.unica360.demographics_sociodemographics_esp_grid100x100m_2019_yearly_2019.average_household_spend_by_product_category',\n", " 'dataset_id': 'carto-do.unica360.demographics_sociodemographics_esp_grid100x100m_2019_yearly_2019',\n", " 'id': 'carto-do.unica360.demographics_sociodemographics_esp_grid100x100m_2019_yearly_2019.C02_01_GASTO_M_alimentacion_M'}" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Variable.get('C02_01_GASTO_M__7ad08d93').to_dict()" ] }, { "cell_type": "markdown", "metadata": { "id": "PtDRHVU4LgC4" }, "source": [ "We decide to enrich our dataframe with the following variables." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "id": "pgbOiCnwIQ43" }, "outputs": [], "source": [ "vars_enrichment = ['P_T_9be2c6a7',\n", " 'P_ED_00_14_M_b66ee9e9', 'P_ED_00_14_H_c6041d66', 'P_ED_15_24_M_5261dc00', 'P_ED_15_24_H_220b288f',\n", " 'P_ED_25_44_M_46e29941', 'P_ED_25_44_H_36886dce', 'P_ED_45_64_M_8f3b64f0', 'P_ED_45_64_H_ff51907f',\n", " 'P_ED_65_79_M_a8c081ef', 'P_ED_65_79_H_d8aa7560', 'P_ED_80_MAS_M_c1c729f7', 'P_ED_80_MAS_H_b1addd78',\n", " 'renta_hab_disp_e4a8896c', 'C02_01_GASTO_M__7ad08d93']" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "id": "QB9SVPQkIQ49" }, "outputs": [], "source": [ "enriched_dataset_gdf = enrichment.enrich_polygons(\n", " stores,\n", " variables=vars_enrichment,\n", " geom_col='iso_5walk'\n", ")" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 467 }, "id": "UpZREiC_IQ4_", "outputId": "bc5dd419-6f38-4a0a-d346-848e842a2da6" }, "outputs": [ { "data": { "text/html": [ "
          \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
          namemainaddresslinepostcodeareaname2areaname1annual_salescomplete_addressgeometryiso_5walkiso_10walkn_poisn_competitorsp_ed_00_14_hp_ed_15_24_hp_ed_25_44_hp_ed_45_64_hp_ed_65_79_hc02_01_gasto_m_alimentacion_mrenta_hab_dispp_tp_ed_80_mas_hp_ed_00_14_mp_ed_15_24_mp_ed_25_44_mp_ed_45_64_mp_ed_65_79_mp_ed_80_mas_m
          0SAN BERNARDO 76CALLE DEL DIVINO PASTOR 7628004MADRIDCOMUNIDAD DE MADRID22.46CALLE DEL DIVINO PASTOR 76, 28004POINT (-3.70394 40.42763)MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40...MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40...217521241323.344538216.3933971759.7960461065.119112350.1381293664.56261619147.5185168098.325095128.170807329.781022291.5516341657.0044541096.365846513.270507367.389603
          1ARAVACAAVENIDA DE LA OSA MAYOR 6228023MADRIDCOMUNIDAD DE MADRID17.21AVENIDA DE LA OSA MAYOR 62, 28023POINT (-3.78261 40.45914)MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40...MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40...6450199210.641714149.459618335.942818358.687113135.8509544390.01059819620.9537842805.70850674.379705213.150064152.993159411.966220436.088810196.180844130.367487
          2RAMIREZ DEL PRADOCALLE DE JUAN DE MARIANA 828045MADRIDCOMUNIDAD DE MADRID19.27CALLE DE JUAN DE MARIANA 8, 28045POINT (-3.68693 40.39849)MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40...MULTIPOLYGON (((-3.69149 40.39690, -3.69141 40...2622108169.482196108.534454281.058539286.75359450.5992754048.95125219705.2192631830.63055210.075373139.002134110.586285275.670275312.00122862.22228624.644911
          3PASTORA IMPERIO. 1CALLE PASTORA IMPERIO 128036MADRIDCOMUNIDAD DE MADRID17.53CALLE PASTORA IMPERIO 1, 28036POINT (-3.67468 40.48099)MULTIPOLYGON (((-3.67553 40.48325, -3.67493 40...MULTIPOLYGON (((-3.67553 40.48462, -3.67510 40...277793156.911911131.700549285.061523289.686083154.9475904608.22145623047.0142562194.43312933.851479142.951123129.396225286.993301358.034212177.39981147.499323
          4BÉJARCALLE DE BÉJAR 128028MADRIDCOMUNIDAD DE MADRID25.91CALLE DE BÉJAR 1, 28028POINT (-3.67456 40.43516)MULTIPOLYGON (((-3.67621 40.43724, -3.67561 40...MULTIPOLYGON (((-3.68033 40.43655, -3.67990 40...165081034358.786956289.8563631243.738389840.314021444.0939993843.13797320587.5241147697.106579190.240250365.406908302.1779661410.3800311100.172199691.146655460.792842
          \n", "
          " ], "text/plain": [ " name mainaddressline postcode areaname2 \\\n", "0 SAN BERNARDO 76 CALLE DEL DIVINO PASTOR 76 28004 MADRID \n", "1 ARAVACA AVENIDA DE LA OSA MAYOR 62 28023 MADRID \n", "2 RAMIREZ DEL PRADO CALLE DE JUAN DE MARIANA 8 28045 MADRID \n", "3 PASTORA IMPERIO. 1 CALLE PASTORA IMPERIO 1 28036 MADRID \n", "4 BÉJAR CALLE DE BÉJAR 1 28028 MADRID \n", "\n", " areaname1 annual_sales complete_address \\\n", "0 COMUNIDAD DE MADRID 22.46 CALLE DEL DIVINO PASTOR 76, 28004 \n", "1 COMUNIDAD DE MADRID 17.21 AVENIDA DE LA OSA MAYOR 62, 28023 \n", "2 COMUNIDAD DE MADRID 19.27 CALLE DE JUAN DE MARIANA 8, 28045 \n", "3 COMUNIDAD DE MADRID 17.53 CALLE PASTORA IMPERIO 1, 28036 \n", "4 COMUNIDAD DE MADRID 25.91 CALLE DE BÉJAR 1, 28028 \n", "\n", " geometry \\\n", "0 POINT (-3.70394 40.42763) \n", "1 POINT (-3.78261 40.45914) \n", "2 POINT (-3.68693 40.39849) \n", "3 POINT (-3.67468 40.48099) \n", "4 POINT (-3.67456 40.43516) \n", "\n", " iso_5walk \\\n", "0 MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40... \n", "1 MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40... \n", "2 MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40... \n", "3 MULTIPOLYGON (((-3.67553 40.48325, -3.67493 40... \n", "4 MULTIPOLYGON (((-3.67621 40.43724, -3.67561 40... \n", "\n", " iso_10walk n_pois n_competitors \\\n", "0 MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40... 21752 1241 \n", "1 MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40... 6450 199 \n", "2 MULTIPOLYGON (((-3.69149 40.39690, -3.69141 40... 2622 108 \n", "3 MULTIPOLYGON (((-3.67553 40.48462, -3.67510 40... 2777 93 \n", "4 MULTIPOLYGON (((-3.68033 40.43655, -3.67990 40... 16508 1034 \n", "\n", " p_ed_00_14_h p_ed_15_24_h p_ed_25_44_h p_ed_45_64_h p_ed_65_79_h \\\n", "0 323.344538 216.393397 1759.796046 1065.119112 350.138129 \n", "1 210.641714 149.459618 335.942818 358.687113 135.850954 \n", "2 169.482196 108.534454 281.058539 286.753594 50.599275 \n", "3 156.911911 131.700549 285.061523 289.686083 154.947590 \n", "4 358.786956 289.856363 1243.738389 840.314021 444.093999 \n", "\n", " c02_01_gasto_m_alimentacion_m renta_hab_disp p_t p_ed_80_mas_h \\\n", "0 3664.562616 19147.518516 8098.325095 128.170807 \n", "1 4390.010598 19620.953784 2805.708506 74.379705 \n", "2 4048.951252 19705.219263 1830.630552 10.075373 \n", "3 4608.221456 23047.014256 2194.433129 33.851479 \n", "4 3843.137973 20587.524114 7697.106579 190.240250 \n", "\n", " p_ed_00_14_m p_ed_15_24_m p_ed_25_44_m p_ed_45_64_m p_ed_65_79_m \\\n", "0 329.781022 291.551634 1657.004454 1096.365846 513.270507 \n", "1 213.150064 152.993159 411.966220 436.088810 196.180844 \n", "2 139.002134 110.586285 275.670275 312.001228 62.222286 \n", "3 142.951123 129.396225 286.993301 358.034212 177.399811 \n", "4 365.406908 302.177966 1410.380031 1100.172199 691.146655 \n", "\n", " p_ed_80_mas_m \n", "0 367.389603 \n", "1 130.367487 \n", "2 24.644911 \n", "3 47.499323 \n", "4 460.792842 " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stores = enriched_dataset_gdf\n", "stores.crs = 'epsg:4326'\n", "stores.columns = map(str.lower, stores.columns)\n", "stores.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "NLNa_ll8IQ5B" }, "source": [ "##### 3.2.3 Spatial lag variables\n", "\n", "In this section, we'll calculate the following spatial lag variables:\n", " - _Distance Madrid city center (Puerta del Sol)_ \n", " \n", " In the city of Madrid, all touristic places are close to the Puerta del Sol site. This variable measures how close the store is to touristic places.\n", " - _Distance to the closest Carrefour Express_\n", " \n", "Other interesting spatial lag variables would be the average distance to the 3 closest competitors or the average revenue of the 2 closest Carrefour Express stores, just to mention some extra examples." ] }, { "cell_type": "markdown", "metadata": { "id": "rFEeIqytIQ5B" }, "source": [ "###### 3.2.3.1 Distance to Puerta del Sol" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "id": "6B3qLohFIQ5B" }, "outputs": [], "source": [ "madrid_city_center = Point(-3.703367, 40.416892)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "id": "x1YAju4VIQ5F" }, "outputs": [], "source": [ "proj_in = pyproj.Proj('epsg:4326')\n", "proj_out = pyproj.Proj('epsg:25830')\n", "project = pyproj.Transformer.from_proj(proj_in, proj_out).transform" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "id": "1_ntjVKYIQ5H" }, "outputs": [], "source": [ "stores['dist_cc'] = stores.set_geometry('geometry').to_crs('epsg:25830').distance(\n", " Point(project(madrid_city_center.y, madrid_city_center.x))).values" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 227 }, "id": "EA9gblI1IQ5J", "outputId": "50518a68-5df8-4e65-8b12-10e370c34bda" }, "outputs": [ { "data": { "text/html": [ "
          \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
          namemainaddresslinepostcodeareaname2areaname1annual_salescomplete_addressgeometryiso_5walkiso_10walkn_poisn_competitorsp_ed_00_14_hp_ed_15_24_hp_ed_25_44_hp_ed_45_64_hp_ed_65_79_hc02_01_gasto_m_alimentacion_mrenta_hab_dispp_tp_ed_80_mas_hp_ed_00_14_mp_ed_15_24_mp_ed_25_44_mp_ed_45_64_mp_ed_65_79_mp_ed_80_mas_mdist_cc
          0SAN BERNARDO 76CALLE DEL DIVINO PASTOR 7628004MADRIDCOMUNIDAD DE MADRID22.46CALLE DEL DIVINO PASTOR 76, 28004POINT (-3.70394 40.42763)MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40...MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40...217521241323.344538216.3933971759.7960461065.119112350.1381293664.56261619147.5185168098.325095128.170807329.781022291.5516341657.0044541096.365846513.270507367.3896031192.943298
          1ARAVACAAVENIDA DE LA OSA MAYOR 6228023MADRIDCOMUNIDAD DE MADRID17.21AVENIDA DE LA OSA MAYOR 62, 28023POINT (-3.78261 40.45914)MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40...MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40...6450199210.641714149.459618335.942818358.687113135.8509544390.01059819620.9537842805.70850674.379705213.150064152.993159411.966220436.088810196.180844130.3674878195.490184
          \n", "
          " ], "text/plain": [ " name mainaddressline postcode areaname2 \\\n", "0 SAN BERNARDO 76 CALLE DEL DIVINO PASTOR 76 28004 MADRID \n", "1 ARAVACA AVENIDA DE LA OSA MAYOR 62 28023 MADRID \n", "\n", " areaname1 annual_sales complete_address \\\n", "0 COMUNIDAD DE MADRID 22.46 CALLE DEL DIVINO PASTOR 76, 28004 \n", "1 COMUNIDAD DE MADRID 17.21 AVENIDA DE LA OSA MAYOR 62, 28023 \n", "\n", " geometry \\\n", "0 POINT (-3.70394 40.42763) \n", "1 POINT (-3.78261 40.45914) \n", "\n", " iso_5walk \\\n", "0 MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40... \n", "1 MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40... \n", "\n", " iso_10walk n_pois n_competitors \\\n", "0 MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40... 21752 1241 \n", "1 MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40... 6450 199 \n", "\n", " p_ed_00_14_h p_ed_15_24_h p_ed_25_44_h p_ed_45_64_h p_ed_65_79_h \\\n", "0 323.344538 216.393397 1759.796046 1065.119112 350.138129 \n", "1 210.641714 149.459618 335.942818 358.687113 135.850954 \n", "\n", " c02_01_gasto_m_alimentacion_m renta_hab_disp p_t p_ed_80_mas_h \\\n", "0 3664.562616 19147.518516 8098.325095 128.170807 \n", "1 4390.010598 19620.953784 2805.708506 74.379705 \n", "\n", " p_ed_00_14_m p_ed_15_24_m p_ed_25_44_m p_ed_45_64_m p_ed_65_79_m \\\n", "0 329.781022 291.551634 1657.004454 1096.365846 513.270507 \n", "1 213.150064 152.993159 411.966220 436.088810 196.180844 \n", "\n", " p_ed_80_mas_m dist_cc \n", "0 367.389603 1192.943298 \n", "1 130.367487 8195.490184 " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stores.head(2)" ] }, { "cell_type": "markdown", "metadata": { "id": "r0FBBGWyIQ5K" }, "source": [ "###### 3.2.3.2 Distance to closest Carrefour Express" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "dist_array = cdist(stores.set_geometry('geometry').to_crs('epsg:25830').geometry.apply(lambda point:[point.x, point.y]).tolist(),\n", " stores.set_geometry('geometry').to_crs('epsg:25830').geometry.apply(lambda point:[point.x, point.y]).tolist())" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "stores['distance_closest_ce'] = list(map(lambda dist_a:np.max(np.partition(dist_a, 2)[:2]), dist_array))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.3 Visualize enrichment" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
          \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
          namemainaddresslinepostcodeareaname2areaname1annual_salescomplete_addressgeometryiso_5walkiso_10walkn_poisn_competitorsp_ed_00_14_hp_ed_15_24_hp_ed_25_44_hp_ed_45_64_hp_ed_65_79_hc02_01_gasto_m_alimentacion_mrenta_hab_dispp_tp_ed_80_mas_hp_ed_00_14_mp_ed_15_24_mp_ed_25_44_mp_ed_45_64_mp_ed_65_79_mp_ed_80_mas_mdist_ccdistance_closest_ce
          0SAN BERNARDO 76CALLE DEL DIVINO PASTOR 7628004MADRIDCOMUNIDAD DE MADRID22.46CALLE DEL DIVINO PASTOR 76, 28004POINT (-3.70394 40.42763)MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40...MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40...217521241323.344538216.3933971759.7960461065.119112350.1381293664.56261619147.5185168098.325095128.170807329.781022291.5516341657.0044541096.365846513.270507367.3896031192.943298107.360057
          1ARAVACAAVENIDA DE LA OSA MAYOR 6228023MADRIDCOMUNIDAD DE MADRID17.21AVENIDA DE LA OSA MAYOR 62, 28023POINT (-3.78261 40.45914)MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40...MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40...6450199210.641714149.459618335.942818358.687113135.8509544390.01059819620.9537842805.70850674.379705213.150064152.993159411.966220436.088810196.180844130.3674878195.4901841570.310251
          2RAMIREZ DEL PRADOCALLE DE JUAN DE MARIANA 828045MADRIDCOMUNIDAD DE MADRID19.27CALLE DE JUAN DE MARIANA 8, 28045POINT (-3.68693 40.39849)MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40...MULTIPOLYGON (((-3.69149 40.39690, -3.69141 40...2622108169.482196108.534454281.058539286.75359450.5992754048.95125219705.2192631830.63055210.075373139.002134110.586285275.670275312.00122862.22228624.6449112473.4204161228.590695
          3PASTORA IMPERIO. 1CALLE PASTORA IMPERIO 128036MADRIDCOMUNIDAD DE MADRID17.53CALLE PASTORA IMPERIO 1, 28036POINT (-3.67468 40.48099)MULTIPOLYGON (((-3.67553 40.48325, -3.67493 40...MULTIPOLYGON (((-3.67553 40.48462, -3.67510 40...277793156.911911131.700549285.061523289.686083154.9475904608.22145623047.0142562194.43312933.851479142.951123129.396225286.993301358.034212177.39981147.4993237519.491317692.176192
          4BÉJARCALLE DE BÉJAR 128028MADRIDCOMUNIDAD DE MADRID25.91CALLE DE BÉJAR 1, 28028POINT (-3.67456 40.43516)MULTIPOLYGON (((-3.67621 40.43724, -3.67561 40...MULTIPOLYGON (((-3.68033 40.43655, -3.67990 40...165081034358.786956289.8563631243.738389840.314021444.0939993843.13797320587.5241147697.106579190.240250365.406908302.1779661410.3800311100.172199691.146655460.7928423175.485661426.405602
          \n", "
          " ], "text/plain": [ " name mainaddressline postcode areaname2 \\\n", "0 SAN BERNARDO 76 CALLE DEL DIVINO PASTOR 76 28004 MADRID \n", "1 ARAVACA AVENIDA DE LA OSA MAYOR 62 28023 MADRID \n", "2 RAMIREZ DEL PRADO CALLE DE JUAN DE MARIANA 8 28045 MADRID \n", "3 PASTORA IMPERIO. 1 CALLE PASTORA IMPERIO 1 28036 MADRID \n", "4 BÉJAR CALLE DE BÉJAR 1 28028 MADRID \n", "\n", " areaname1 annual_sales complete_address \\\n", "0 COMUNIDAD DE MADRID 22.46 CALLE DEL DIVINO PASTOR 76, 28004 \n", "1 COMUNIDAD DE MADRID 17.21 AVENIDA DE LA OSA MAYOR 62, 28023 \n", "2 COMUNIDAD DE MADRID 19.27 CALLE DE JUAN DE MARIANA 8, 28045 \n", "3 COMUNIDAD DE MADRID 17.53 CALLE PASTORA IMPERIO 1, 28036 \n", "4 COMUNIDAD DE MADRID 25.91 CALLE DE BÉJAR 1, 28028 \n", "\n", " geometry \\\n", "0 POINT (-3.70394 40.42763) \n", "1 POINT (-3.78261 40.45914) \n", "2 POINT (-3.68693 40.39849) \n", "3 POINT (-3.67468 40.48099) \n", "4 POINT (-3.67456 40.43516) \n", "\n", " iso_5walk \\\n", "0 MULTIPOLYGON (((-3.70660 40.42780, -3.70651 40... \n", "1 MULTIPOLYGON (((-3.78539 40.45990, -3.78496 40... \n", "2 MULTIPOLYGON (((-3.68875 40.39965, -3.68849 40... \n", "3 MULTIPOLYGON (((-3.67553 40.48325, -3.67493 40... \n", "4 MULTIPOLYGON (((-3.67621 40.43724, -3.67561 40... \n", "\n", " iso_10walk n_pois n_competitors \\\n", "0 MULTIPOLYGON (((-3.70986 40.42900, -3.70926 40... 21752 1241 \n", "1 MULTIPOLYGON (((-3.78900 40.45938, -3.78874 40... 6450 199 \n", "2 MULTIPOLYGON (((-3.69149 40.39690, -3.69141 40... 2622 108 \n", "3 MULTIPOLYGON (((-3.67553 40.48462, -3.67510 40... 2777 93 \n", "4 MULTIPOLYGON (((-3.68033 40.43655, -3.67990 40... 16508 1034 \n", "\n", " p_ed_00_14_h p_ed_15_24_h p_ed_25_44_h p_ed_45_64_h p_ed_65_79_h \\\n", "0 323.344538 216.393397 1759.796046 1065.119112 350.138129 \n", "1 210.641714 149.459618 335.942818 358.687113 135.850954 \n", "2 169.482196 108.534454 281.058539 286.753594 50.599275 \n", "3 156.911911 131.700549 285.061523 289.686083 154.947590 \n", "4 358.786956 289.856363 1243.738389 840.314021 444.093999 \n", "\n", " c02_01_gasto_m_alimentacion_m renta_hab_disp p_t p_ed_80_mas_h \\\n", "0 3664.562616 19147.518516 8098.325095 128.170807 \n", "1 4390.010598 19620.953784 2805.708506 74.379705 \n", "2 4048.951252 19705.219263 1830.630552 10.075373 \n", "3 4608.221456 23047.014256 2194.433129 33.851479 \n", "4 3843.137973 20587.524114 7697.106579 190.240250 \n", "\n", " p_ed_00_14_m p_ed_15_24_m p_ed_25_44_m p_ed_45_64_m p_ed_65_79_m \\\n", "0 329.781022 291.551634 1657.004454 1096.365846 513.270507 \n", "1 213.150064 152.993159 411.966220 436.088810 196.180844 \n", "2 139.002134 110.586285 275.670275 312.001228 62.222286 \n", "3 142.951123 129.396225 286.993301 358.034212 177.399811 \n", "4 365.406908 302.177966 1410.380031 1100.172199 691.146655 \n", "\n", " p_ed_80_mas_m dist_cc distance_closest_ce \n", "0 367.389603 1192.943298 107.360057 \n", "1 130.367487 8195.490184 1570.310251 \n", "2 24.644911 2473.420416 1228.590695 \n", "3 47.499323 7519.491317 692.176192 \n", "4 460.792842 3175.485661 426.405602 " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stores.head()" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " None\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " Static map image\n", " \n", " \n", " \n", "\n", "\n", " \n", "
          \n", "
          \n", "
          \n", " \n", " \n", "
          \n", "
          \n", " \n", "\n", "
          \n", " \n", " \n", " \n", " \n", " \n", "
          \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
          \n", " \n", " \n", "
          \n", "
          \n", "
          \n", "
          \n", " \n", "
          \n", "
          \n", "
          \n", "\n", " \n", "\n", "
          \n", "
          \n", " :\n", "
          \n", " \n", " \n", "
          \n", "
          \n", "\n", "
          \n", " StackTrace\n", "
            \n", "
            \n", "
            \n", "\n", "\n", "\n", "\n", "\n", "\">\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Map(Layer(stores,\n", " geom_col='iso_5walk',\n", " style=color_bins_style('n_competitors'),\n", " legends=color_bins_legend('# Competitors', 'competitos within 10-minute driving isochrone'),\n", " popup_hover=[popup_element('name', 'Name'),\n", " popup_element('n_pois', 'Number of POIs'),\n", " popup_element('n_competitors', 'Number of competitors'),\n", " popup_element('p_t', 'Population coverage'),\n", " popup_element('c02_01_gasto_m_alimentacion_m', 'Groceries spending'),\n", " popup_element('renta_hab_disp', 'income'),\n", " popup_element('distance_closest_ce', 'Distance to closest CE')],\n", " widgets=[histogram_widget('n_pois', 'Number of POIs', description='Select a range of values to filter', buckets=10),\n", " histogram_widget('n_competitors', 'Number of competitors', description='Select a range of values to filter', buckets=10),\n", " histogram_widget('dist_cc', 'Distance to city center', description='Select a range of values to filter', buckets=10),\n", " histogram_widget('distance_closest_ce', 'Distance to closest CE store', description='Select a range of values to filter', buckets=10)]))" ] }, { "cell_type": "markdown", "metadata": { "id": "MbiVz2U4IQ5V" }, "source": [ "### 4. Modeling\n", "\n", "Once we have calculated our store's cathcment areas and enriched them, the next steps required to build a predictive model would be:\n", " - Data Processing\n", " - Multivariate Analysis\n", " - Feature engineering\n", " - Build and assess the model\n", " - Project the model on a grid of the area of interest\n", "\n", "We suggest using [Regression-kriging](https://en.wikipedia.org/wiki/Regression-kriging) as prediction technique that combines a regression of the dependent variable on auxiliary variables with kriging of the regression residuals. \n", "\n", "If you are interested in learning more about Revenue Prediction for Site Selection and how Regression-kriging is applied, take a look at this [blog post](https://carto.com/blog/retail-revenue-prediction-data-science/) where we describe an end-to-end revenue prediction use case." ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "[CARTO] Revenue prediction template_reduced.ipynb", "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 4 }