{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Data discovery\n", "\n", "### Introduction\n", "\n", "The Data Observatory is a spatial data repository that enables data scientists to augment their data and broaden their analysis. It offers a wide range of datasets from around the globe.\n", "\n", "This guide is intended for those who want to start augmenting their own data using CARTOframes and wish to explore CARTO's public Data Observatory catalog to find datasets that best fit their use cases and analyses.\n", "\n", "**Note: The catalog is public and you don't need a CARTO account to search for available datasets**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find demographic data for the US\n", "\n", "In this guide we walk through the Data Observatory catalog looking for demographics data in the US.\n", "\n", "The catalog is comprised of thousands of curated spatial datasets, so when searching for data the easiest way to find what you are looking for is to make use of a faceted search. A faceted (or hierarchical) search allows you to narrow down search results by applying multiple filters based on faceted classification of catalog datasets.\n", "\n", "Datasets are organized in three main hierarchies:\n", "\n", "- Country\n", "- Category\n", "- Geography (or spatial resolution)\n", "\n", "For our analysis we are looking for demographic datasets in the US with a spatial resolution at the block group level.\n", "\n", "We can start by discovering which available geographies (or spatial resolutions) we have for demographic data in the US, by filtering the `catalog` by `country` and `category` and listing the available `geographies`.\n", "\n", "Let's start exploring the available categories of data for the US:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from cartoframes.data.observatory import Catalog\n", "Catalog().country('usa').categories" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the case of the US, the Data Observatory provides six different categories of datasets. Let's discover the available spatial resolutions for the demographics category (which at a first sight will contain the population data we need)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from cartoframes.data.observatory import Catalog\n", "geographies = Catalog().country('usa').category('demographics').geographies\n", "geographies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's filter the geographies by those that contain information at the level of blockgroup. For that purpose we are converting the geographies to a pandas `DataFrame` and search for the string `blockgroup` in the `id` of the geographies:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idslugnamedescriptioncountry_idprovider_idprovider_namelanggeom_coveragegeom_typeupdate_frequencyversionis_public_data
2carto-do.mbi.geography_usa_blockgroups_2019mbi_blockgroups_1ab060aUSA - BlockgroupsMBI Digital Boundaries for USA at Blockgroups ...usambiMichael Bauer InternationalengNoneMULTIPOLYGONNone2019False
6carto-do-public-data.usa_carto.geography_usa_b...usct_blockgroup_f45b6b49Census Block Groups (2015) - shoreline clippedShoreline clipped TIGER/Line boundaries. More ...usausa_cartoCARTO shoreline-clipped USA Tiger geographiesengNoneMULTIPOLYGONNone2015True
\n", "
" ], "text/plain": [ " id \\\n", "2 carto-do.mbi.geography_usa_blockgroups_2019 \n", "6 carto-do-public-data.usa_carto.geography_usa_b... \n", "\n", " slug name \\\n", "2 mbi_blockgroups_1ab060a USA - Blockgroups \n", "6 usct_blockgroup_f45b6b49 Census Block Groups (2015) - shoreline clipped \n", "\n", " description country_id provider_id \\\n", "2 MBI Digital Boundaries for USA at Blockgroups ... usa mbi \n", "6 Shoreline clipped TIGER/Line boundaries. More ... usa usa_carto \n", "\n", " provider_name lang geom_coverage \\\n", "2 Michael Bauer International eng None \n", "6 CARTO shoreline-clipped USA Tiger geographies eng None \n", "\n", " geom_type update_frequency version is_public_data \n", "2 MULTIPOLYGON None 2019 False \n", "6 MULTIPOLYGON None 2015 True " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = geographies.to_dataframe()\n", "df[df['id'].str.contains('blockgroup', case=False, na=False)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have three available datasets, from three different providers: Michael Bauer International, Open Data and AGS. For this example, we are going to look for demographic datasets for the MBI blockgroups geography `mbi_blockgroups_1ab060a`:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datasets = Catalog().country('usa').category('demographics').geography('mbi_blockgroups_1ab060a').datasets\n", "datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's continue with the data discovery. We have 6 datasets in the US with demographics information at the level of MBI blockgroups:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idslugnamedescriptioncountry_idgeography_idgeography_namegeography_descriptioncategory_idcategory_nameprovider_idprovider_namedata_source_idlangtemporal_aggregationtime_coverageupdate_frequencyversionis_public_data
0carto-do.mbi.demographics_householdsbytype_usa...mbi_households__45067b14Households By Type at Blockgroups (micro) leve...Data is country-specific.usacarto-do.mbi.geography_usa_blockgroups_2019USA - BlockgroupsMBI Digital Boundaries for USA at Blockgroups ...demographicsDemographicsmbiMichael Bauer Internationalhouseholds_by_typeengyearly[2019-01-01, 2020-01-01)None2019False
1carto-do.mbi.demographics_population_usa_block...mbi_population_341ee33bPopulation at Blockgroups (micro) level for USAPopulation figures are shown as projected aver...usacarto-do.mbi.geography_usa_blockgroups_2019USA - BlockgroupsMBI Digital Boundaries for USA at Blockgroups ...demographicsDemographicsmbiMichael Bauer Internationalpopulationengyearly[2019-01-01, 2020-01-01)None2019False
2carto-do.mbi.demographics_purchasingpower_usa_...mbi_purchasing__53ab279dPurchasing Power at Blockgroups (micro) level ...Purchasing Power describes the disposable inco...usacarto-do.mbi.geography_usa_blockgroups_2019USA - BlockgroupsMBI Digital Boundaries for USA at Blockgroups ...demographicsDemographicsmbiMichael Bauer Internationalpurchasing_powerengyearly[2019-01-01, 2020-01-01)None2019False
3carto-do.mbi.demographics_consumerspending_usa...mbi_consumer_sp_54c4abc3Consumer Spending at Blockgroups (micro) level...MBI Consumer Spending by product groups quanti...usacarto-do.mbi.geography_usa_blockgroups_2019USA - BlockgroupsMBI Digital Boundaries for USA at Blockgroups ...demographicsDemographicsmbiMichael Bauer Internationalconsumer_spendingengyearly[2019-01-01, 2020-01-01)None2019False
4carto-do.mbi.demographics_sociodemographics_us...mbi_sociodemogr_b5516832Sociodemographics at Blockgroups (micro) level...MBI Sociodemographics includes:\\n- Population\\...usacarto-do.mbi.geography_usa_blockgroups_2019USA - BlockgroupsMBI Digital Boundaries for USA at Blockgroups ...demographicsDemographicsmbiMichael Bauer Internationalsociodemographicsengyearly[2019-01-01, 2020-01-01)None2019False
5carto-do.mbi.demographics_education_usa_blockg...mbi_education_20063878Education at Blockgroups (micro) level for USAData is country-specific.usacarto-do.mbi.geography_usa_blockgroups_2019USA - BlockgroupsMBI Digital Boundaries for USA at Blockgroups ...demographicsDemographicsmbiMichael Bauer Internationaleducationengyearly[2019-01-01, 2020-01-01)None2019False
6carto-do.mbi.demographics_householdsbyincomequ...mbi_households__c943a740Households By Income Quintiles at Blockgroups ...On the national level the number of households...usacarto-do.mbi.geography_usa_blockgroups_2019USA - BlockgroupsMBI Digital Boundaries for USA at Blockgroups ...demographicsDemographicsmbiMichael Bauer Internationalhouseholds_by_income_quintilesengyearly[2019-01-01, 2020-01-01)None2019False
7carto-do.mbi.demographics_retailspending_usa_b...mbi_retail_spen_c31f0ba0Retail Spending at Blockgroups (micro) level f...Retail Spending relates to the proportion of P...usacarto-do.mbi.geography_usa_blockgroups_2019USA - BlockgroupsMBI Digital Boundaries for USA at Blockgroups ...demographicsDemographicsmbiMichael Bauer Internationalretail_spendingengyearly[2019-01-01, 2020-01-01)None2019False
8carto-do.mbi.demographics_consumerprofiles_usa...mbi_consumer_pr_68d1265aConsumer Profiles at Blockgroups (micro) level...The MB International Consumer Styles describe ...usacarto-do.mbi.geography_usa_blockgroups_2019USA - BlockgroupsMBI Digital Boundaries for USA at Blockgroups ...demographicsDemographicsmbiMichael Bauer Internationalconsumer_profilesengyearly[2019-01-01, 2020-01-01)None2019False
\n", "
" ], "text/plain": [ " id \\\n", "0 carto-do.mbi.demographics_householdsbytype_usa... \n", "1 carto-do.mbi.demographics_population_usa_block... \n", "2 carto-do.mbi.demographics_purchasingpower_usa_... \n", "3 carto-do.mbi.demographics_consumerspending_usa... \n", "4 carto-do.mbi.demographics_sociodemographics_us... \n", "5 carto-do.mbi.demographics_education_usa_blockg... \n", "6 carto-do.mbi.demographics_householdsbyincomequ... \n", "7 carto-do.mbi.demographics_retailspending_usa_b... \n", "8 carto-do.mbi.demographics_consumerprofiles_usa... \n", "\n", " slug \\\n", "0 mbi_households__45067b14 \n", "1 mbi_population_341ee33b \n", "2 mbi_purchasing__53ab279d \n", "3 mbi_consumer_sp_54c4abc3 \n", "4 mbi_sociodemogr_b5516832 \n", "5 mbi_education_20063878 \n", "6 mbi_households__c943a740 \n", "7 mbi_retail_spen_c31f0ba0 \n", "8 mbi_consumer_pr_68d1265a \n", "\n", " name \\\n", "0 Households By Type at Blockgroups (micro) leve... \n", "1 Population at Blockgroups (micro) level for USA \n", "2 Purchasing Power at Blockgroups (micro) level ... \n", "3 Consumer Spending at Blockgroups (micro) level... \n", "4 Sociodemographics at Blockgroups (micro) level... \n", "5 Education at Blockgroups (micro) level for USA \n", "6 Households By Income Quintiles at Blockgroups ... \n", "7 Retail Spending at Blockgroups (micro) level f... \n", "8 Consumer Profiles at Blockgroups (micro) level... \n", "\n", " description country_id \\\n", "0 Data is country-specific. usa \n", "1 Population figures are shown as projected aver... usa \n", "2 Purchasing Power describes the disposable inco... usa \n", "3 MBI Consumer Spending by product groups quanti... usa \n", "4 MBI Sociodemographics includes:\\n- Population\\... usa \n", "5 Data is country-specific. usa \n", "6 On the national level the number of households... usa \n", "7 Retail Spending relates to the proportion of P... usa \n", "8 The MB International Consumer Styles describe ... usa \n", "\n", " geography_id geography_name \\\n", "0 carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups \n", "1 carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups \n", "2 carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups \n", "3 carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups \n", "4 carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups \n", "5 carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups \n", "6 carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups \n", "7 carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups \n", "8 carto-do.mbi.geography_usa_blockgroups_2019 USA - Blockgroups \n", "\n", " geography_description category_id \\\n", "0 MBI Digital Boundaries for USA at Blockgroups ... demographics \n", "1 MBI Digital Boundaries for USA at Blockgroups ... demographics \n", "2 MBI Digital Boundaries for USA at Blockgroups ... demographics \n", "3 MBI Digital Boundaries for USA at Blockgroups ... demographics \n", "4 MBI Digital Boundaries for USA at Blockgroups ... demographics \n", "5 MBI Digital Boundaries for USA at Blockgroups ... demographics \n", "6 MBI Digital Boundaries for USA at Blockgroups ... demographics \n", "7 MBI Digital Boundaries for USA at Blockgroups ... demographics \n", "8 MBI Digital Boundaries for USA at Blockgroups ... demographics \n", "\n", " category_name provider_id provider_name \\\n", "0 Demographics mbi Michael Bauer International \n", "1 Demographics mbi Michael Bauer International \n", "2 Demographics mbi Michael Bauer International \n", "3 Demographics mbi Michael Bauer International \n", "4 Demographics mbi Michael Bauer International \n", "5 Demographics mbi Michael Bauer International \n", "6 Demographics mbi Michael Bauer International \n", "7 Demographics mbi Michael Bauer International \n", "8 Demographics mbi Michael Bauer International \n", "\n", " data_source_id lang temporal_aggregation \\\n", "0 households_by_type eng yearly \n", "1 population eng yearly \n", "2 purchasing_power eng yearly \n", "3 consumer_spending eng yearly \n", "4 sociodemographics eng yearly \n", "5 education eng yearly \n", "6 households_by_income_quintiles eng yearly \n", "7 retail_spending eng yearly \n", "8 consumer_profiles eng yearly \n", "\n", " time_coverage update_frequency version is_public_data \n", "0 [2019-01-01, 2020-01-01) None 2019 False \n", "1 [2019-01-01, 2020-01-01) None 2019 False \n", "2 [2019-01-01, 2020-01-01) None 2019 False \n", "3 [2019-01-01, 2020-01-01) None 2019 False \n", "4 [2019-01-01, 2020-01-01) None 2019 False \n", "5 [2019-01-01, 2020-01-01) None 2019 False \n", "6 [2019-01-01, 2020-01-01) None 2019 False \n", "7 [2019-01-01, 2020-01-01) None 2019 False \n", "8 [2019-01-01, 2020-01-01) None 2019 False " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datasets.to_dataframe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "They comprise different information: consumer spending, retail potential, consumer profiles, etc.\n", "\n", "At a first sight, it looks the dataset with `data_source_id: sociodemographic` might contain the population information we are looking for. Let's try to understand a little bit better what data this dataset contains by looking at its variables:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[ #'Population age 0-4 (2019A)',\n", " #'Population age 5-9 (2019A)',\n", " #'Population age 10-14 (2019A)',\n", " #'Population age 15-19 (2019A)',\n", " #'Population age 20-24 (2019A)',\n", " #'Population age 25-29 (2019A)',\n", " #'Population age 30-34 (2019A)',\n", " #'Population age 35-39 (2019A)',\n", " #'Population age 40-44 (2019A)',\n", " #'Population age 45-49 (2019A)',\n", " #'Population age 50-54 (2019A)',\n", " #'Population age 55-59 (2019A)',\n", " #'Population age 60-64 (2019A)',\n", " #'Population age 65-69 (2019A)',\n", " #'Population age 70-74 (2019A)',\n", " #'Population age 75-79 (2019A)',\n", " #'Population age 80-84 (2019A)',\n", " #'Population Age 15+ (2019A)',\n", " #'Population Age 25+ (2019A)',\n", " #'Population age 85+ (2019A)',\n", " #'Median Age (2019A)',\n", " #'Median Age (2024A)',\n", " #'Geographic Identifier',\n", " #'Housing units (2019A)',\n", " #'Occupied units owner (2019A)',\n", " #'Occupied units renter (2019A)',\n", " #'Housing units vacant (2019A)',\n", " #'Housing units (2024A)',\n", " #'Pop 25+ Associate degree (2019A)',\n", " #'Pop 25+ Bachelors degree (2019A)',\n", " #'Pop 25+ graduate or prof school degree (2019A)',\n", " #'Pop 25+ HS graduate (2019A)',\n", " #'Pop 25+ less than 9th grade (2019A)',\n", " #'Pop 25+ college no diploma (2019A)',\n", " #'Pop 25+ 9th-12th grade no diploma (2019A)',\n", " #'Households (2019A)',\n", " #'Average Household Size (2019A)',\n", " #'Family Households (2019A)',\n", " #'Median Age of Householder (2019A)',\n", " #'Households (2024A)',\n", " #'Families female no husband children (2019A)',\n", " #'Families male no wife w children (2019A)',\n", " #'Families married couple w children (2019A)',\n", " #'Household Income $100000-$124999 (2019A)',\n", " #'Household Income $10000-$14999 (2019A)',\n", " #'Household Income $125000-$149999 (2019A)',\n", " #'Household Income $150000-$199999 (2019A)',\n", " #'Household Income $15000-$19999 (2019A)',\n", " #'Household Income $20000-$24999 (2019A)',\n", " #'Household Income $25000-$29999 (2019A)',\n", " #'Household Income $30000-$34999 (2019A)',\n", " #'Household Income $35000-$39999 (2019A)',\n", " #'Household Income $40000-$44999 (2019A)',\n", " #'Household Income $45000-$49999 (2019A)',\n", " #'Household Income $50000-$59999 (2019A)',\n", " #'Household Income $60000-$74999 (2019A)',\n", " #'Household Income $75000-$99999 (2019A)',\n", " #'Household Income > $200000 (2019A)',\n", " #'Household Income < $10000 (2019A)',\n", " #'Median Household Income: Age < 25 (2019A)',\n", " #'Median Household Income: Age 25-34 (2019A)',\n", " #'Median Household Income: Age 35-44 (2019A)',\n", " #'Median Household Income: Age 45-54 (2019A)',\n", " #'Median Household Income: Age 55-64 (2019A)',\n", " #'Median Household Income: Age 65-74 (2019A)',\n", " #'Median Household Income: Age 75+ (2019A)',\n", " #'Population Hispanic (2019A)',\n", " #'Median Value of Owner Occupied Housing Units',\n", " #'UNITS IN STRUCTURE: 1 DETACHED',\n", " #'UNITS IN STRUCTURE: 20 OR MORE',\n", " #'Average household Income (2019A)',\n", " #'Median family income (2019A)',\n", " #'Median household income (2019A)',\n", " #'Per capita income (2019A)',\n", " #'Average household Income (2024A)',\n", " #'Median household income (2024A)',\n", " #'Per capita income (2024A)',\n", " #'Pop 16+ in Armed Forces (2019A)',\n", " #'Pop 16+ civilian employed (2019A)',\n", " #'Population In Labor Force (2019A)',\n", " #'Pop 16+ not in labor force (2019A)',\n", " #'Population Age 16+ (2019A)',\n", " #'Pop 16+ civilian unemployed (2019A)',\n", " #'LINGUISTICALLY ISOLATED HOUSEHOLDS (NON-ENGLISH SP...',\n", " #'SPANISH SPEAKING HOUSEHOLDS',\n", " #'Divorced (2019A)',\n", " #'Now Married (2019A)',\n", " #'Never Married (2019A)',\n", " #'Separated (2019A)',\n", " #'Widowed (2019A)',\n", " #'Population (2019A)',\n", " #'Population in Group Quarters (2019A)',\n", " #'Institutional Group Quarters Population (2019A)',\n", " #'Population (2024A)',\n", " #'Non Hispanic American Indian (2019A)',\n", " #'Non Hispanic Asian (2019A)',\n", " #'Non Hispanic Black (2019A)',\n", " #'Non Hispanic Hawaiian/Pacific Islander (2019A)',\n", " #'Non Hispanic Multiple Race (2019A)',\n", " #'Non Hispanic Other Race (2019A)',\n", " #'Non Hispanic White (2019A)',\n", " #'Median Cash Rent',\n", " #'Population female (2019A)',\n", " #'Population male (2019A)',\n", " #'Unemployment Rate (2019A)',\n", " #'Households: One Vehicle Available (2019A)',\n", " #'Households: Two or More Vehicles Available (2019A)',\n", " #'Households: No Vehicle Available (2019A)']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from cartoframes.data.observatory import Dataset\n", "dataset = Dataset.get('ags_sociodemogr_e92b1637')\n", "variables = dataset.variables\n", "variables" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idslugnamedescriptioncolumn_namedb_typedataset_idagg_methodvariable_group_idstarred
0carto-do.ags.demographics_sociodemographic_usa...AGECY0004_bf30e80aAGECY0004Population age 0-4 (2019A)AGECY0004INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
1carto-do.ags.demographics_sociodemographic_usa...AGECY0509_c74a565cAGECY0509Population age 5-9 (2019A)AGECY0509INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
2carto-do.ags.demographics_sociodemographic_usa...AGECY1014_1e97be2eAGECY1014Population age 10-14 (2019A)AGECY1014INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
3carto-do.ags.demographics_sociodemographic_usa...AGECY1519_66ed0078AGECY1519Population age 15-19 (2019A)AGECY1519INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
4carto-do.ags.demographics_sociodemographic_usa...AGECY2024_270f4203AGECY2024Population age 20-24 (2019A)AGECY2024INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
.................................
103carto-do.ags.demographics_sociodemographic_usa...SEXCYMAL_ca14d4b8SEXCYMALPopulation male (2019A)SEXCYMALINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
104carto-do.ags.demographics_sociodemographic_usa...UNECYRATE_b3dc32baUNECYRATEUnemployment Rate (2019A)UNECYRATEFLOATcarto-do.ags.demographics_sociodemographic_usa...AVGNoneFalse
105carto-do.ags.demographics_sociodemographic_usa...VPHCY1_53dc760fVPHCY1Households: One Vehicle Available (2019A)VPHCY1INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
106carto-do.ags.demographics_sociodemographic_usa...VPHCYGT1_a052056dVPHCYGT1Households: Two or More Vehicles Available (20...VPHCYGT1INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
107carto-do.ags.demographics_sociodemographic_usa...VPHCYNONE_22cb7350VPHCYNONEHouseholds: No Vehicle Available (2019A)VPHCYNONEINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
\n", "

108 rows × 10 columns

\n", "
" ], "text/plain": [ " id slug \\\n", "0 carto-do.ags.demographics_sociodemographic_usa... AGECY0004_bf30e80a \n", "1 carto-do.ags.demographics_sociodemographic_usa... AGECY0509_c74a565c \n", "2 carto-do.ags.demographics_sociodemographic_usa... AGECY1014_1e97be2e \n", "3 carto-do.ags.demographics_sociodemographic_usa... AGECY1519_66ed0078 \n", "4 carto-do.ags.demographics_sociodemographic_usa... AGECY2024_270f4203 \n", ".. ... ... \n", "103 carto-do.ags.demographics_sociodemographic_usa... SEXCYMAL_ca14d4b8 \n", "104 carto-do.ags.demographics_sociodemographic_usa... UNECYRATE_b3dc32ba \n", "105 carto-do.ags.demographics_sociodemographic_usa... VPHCY1_53dc760f \n", "106 carto-do.ags.demographics_sociodemographic_usa... VPHCYGT1_a052056d \n", "107 carto-do.ags.demographics_sociodemographic_usa... VPHCYNONE_22cb7350 \n", "\n", " name description column_name \\\n", "0 AGECY0004 Population age 0-4 (2019A) AGECY0004 \n", "1 AGECY0509 Population age 5-9 (2019A) AGECY0509 \n", "2 AGECY1014 Population age 10-14 (2019A) AGECY1014 \n", "3 AGECY1519 Population age 15-19 (2019A) AGECY1519 \n", "4 AGECY2024 Population age 20-24 (2019A) AGECY2024 \n", ".. ... ... ... \n", "103 SEXCYMAL Population male (2019A) SEXCYMAL \n", "104 UNECYRATE Unemployment Rate (2019A) UNECYRATE \n", "105 VPHCY1 Households: One Vehicle Available (2019A) VPHCY1 \n", "106 VPHCYGT1 Households: Two or More Vehicles Available (20... VPHCYGT1 \n", "107 VPHCYNONE Households: No Vehicle Available (2019A) VPHCYNONE \n", "\n", " db_type dataset_id agg_method \\\n", "0 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "1 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "2 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "3 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "4 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", ".. ... ... ... \n", "103 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "104 FLOAT carto-do.ags.demographics_sociodemographic_usa... AVG \n", "105 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "106 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "107 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "\n", " variable_group_id starred \n", "0 None False \n", "1 None False \n", "2 None False \n", "3 None False \n", "4 None False \n", ".. ... ... \n", "103 None False \n", "104 None False \n", "105 None False \n", "106 None False \n", "107 None False \n", "\n", "[108 rows x 10 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from cartoframes.data.observatory import Dataset\n", "vdf = variables.to_dataframe()\n", "vdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see there are several variables related to population, so this is the `Dataset` we are looking for." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idslugnamedescriptioncolumn_namedb_typedataset_idagg_methodvariable_group_idstarred
0carto-do.ags.demographics_sociodemographic_usa...AGECY0004_bf30e80aAGECY0004Population age 0-4 (2019A)AGECY0004INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
1carto-do.ags.demographics_sociodemographic_usa...AGECY0509_c74a565cAGECY0509Population age 5-9 (2019A)AGECY0509INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
2carto-do.ags.demographics_sociodemographic_usa...AGECY1014_1e97be2eAGECY1014Population age 10-14 (2019A)AGECY1014INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
3carto-do.ags.demographics_sociodemographic_usa...AGECY1519_66ed0078AGECY1519Population age 15-19 (2019A)AGECY1519INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
4carto-do.ags.demographics_sociodemographic_usa...AGECY2024_270f4203AGECY2024Population age 20-24 (2019A)AGECY2024INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
5carto-do.ags.demographics_sociodemographic_usa...AGECY2529_5f75fc55AGECY2529Population age 25-29 (2019A)AGECY2529INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
6carto-do.ags.demographics_sociodemographic_usa...AGECY3034_86a81427AGECY3034Population age 30-34 (2019A)AGECY3034INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
7carto-do.ags.demographics_sociodemographic_usa...AGECY3539_fed2aa71AGECY3539Population age 35-39 (2019A)AGECY3539INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
8carto-do.ags.demographics_sociodemographic_usa...AGECY4044_543eba59AGECY4044Population age 40-44 (2019A)AGECY4044INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
9carto-do.ags.demographics_sociodemographic_usa...AGECY4549_2c44040fAGECY4549Population age 45-49 (2019A)AGECY4549INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
10carto-do.ags.demographics_sociodemographic_usa...AGECY5054_f599ec7dAGECY5054Population age 50-54 (2019A)AGECY5054INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
11carto-do.ags.demographics_sociodemographic_usa...AGECY5559_8de3522bAGECY5559Population age 55-59 (2019A)AGECY5559INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
12carto-do.ags.demographics_sociodemographic_usa...AGECY6064_cc011050AGECY6064Population age 60-64 (2019A)AGECY6064INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
13carto-do.ags.demographics_sociodemographic_usa...AGECY6569_b47bae06AGECY6569Population age 65-69 (2019A)AGECY6569INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
14carto-do.ags.demographics_sociodemographic_usa...AGECY7074_6da64674AGECY7074Population age 70-74 (2019A)AGECY7074INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
15carto-do.ags.demographics_sociodemographic_usa...AGECY7579_15dcf822AGECY7579Population age 75-79 (2019A)AGECY7579INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
16carto-do.ags.demographics_sociodemographic_usa...AGECY8084_b25d4aedAGECY8084Population age 80-84 (2019A)AGECY8084INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
17carto-do.ags.demographics_sociodemographic_usa...AGECYGT15_681a1204AGECYGT15Population Age 15+ (2019A)AGECYGT15INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
18carto-do.ags.demographics_sociodemographic_usa...AGECYGT25_433741c7AGECYGT25Population Age 25+ (2019A)AGECYGT25INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
19carto-do.ags.demographics_sociodemographic_usa...AGECYGT85_b9d8a94dAGECYGT85Population age 85+ (2019A)AGECYGT85INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
28carto-do.ags.demographics_sociodemographic_usa...EDUCYASSOC_fa1bcf13EDUCYASSOCPop 25+ Associate degree (2019A)EDUCYASSOCINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
29carto-do.ags.demographics_sociodemographic_usa...EDUCYBACH_c2295f79EDUCYBACHPop 25+ Bachelors degree (2019A)EDUCYBACHINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
30carto-do.ags.demographics_sociodemographic_usa...EDUCYGRAD_d0179ccbEDUCYGRADPop 25+ graduate or prof school degree (2019A)EDUCYGRADINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
31carto-do.ags.demographics_sociodemographic_usa...EDUCYHSCH_b236c803EDUCYHSCHPop 25+ HS graduate (2019A)EDUCYHSCHINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
32carto-do.ags.demographics_sociodemographic_usa...EDUCYLTGR9_cbcfcc89EDUCYLTGR9Pop 25+ less than 9th grade (2019A)EDUCYLTGR9INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
33carto-do.ags.demographics_sociodemographic_usa...EDUCYSCOLL_1e8c4828EDUCYSCOLLPop 25+ college no diploma (2019A)EDUCYSCOLLINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
34carto-do.ags.demographics_sociodemographic_usa...EDUCYSHSCH_5c444debEDUCYSHSCHPop 25+ 9th-12th grade no diploma (2019A)EDUCYSHSCHINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
66carto-do.ags.demographics_sociodemographic_usa...HISCYHISP_f3b3a31eHISCYHISPPopulation Hispanic (2019A)HISCYHISPINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
77carto-do.ags.demographics_sociodemographic_usa...LBFCYARM_8c06223aLBFCYARMPop 16+ in Armed Forces (2019A)LBFCYARMINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
78carto-do.ags.demographics_sociodemographic_usa...LBFCYEMPL_c9c22a0LBFCYEMPLPop 16+ civilian employed (2019A)LBFCYEMPLINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
79carto-do.ags.demographics_sociodemographic_usa...LBFCYLBF_59ce7ab0LBFCYLBFPopulation In Labor Force (2019A)LBFCYLBFINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
80carto-do.ags.demographics_sociodemographic_usa...LBFCYNLF_c4c98350LBFCYNLFPop 16+ not in labor force (2019A)LBFCYNLFINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
81carto-do.ags.demographics_sociodemographic_usa...LBFCYPOP16_53fa921cLBFCYPOP16Population Age 16+ (2019A)LBFCYPOP16INTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
82carto-do.ags.demographics_sociodemographic_usa...LBFCYUNEM_1e711de4LBFCYUNEMPop 16+ civilian unemployed (2019A)LBFCYUNEMINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
90carto-do.ags.demographics_sociodemographic_usa...POPCY_f5800f44POPCYPopulation (2019A)POPCYINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
91carto-do.ags.demographics_sociodemographic_usa...POPCYGRP_74c19673POPCYGRPPopulation in Group Quarters (2019A)POPCYGRPINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
92carto-do.ags.demographics_sociodemographic_usa...POPCYGRPI_147af7a9POPCYGRPIInstitutional Group Quarters Population (2019A)POPCYGRPIINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
93carto-do.ags.demographics_sociodemographic_usa...POPPY_946f4ed6POPPYPopulation (2024A)POPPYFLOATcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
102carto-do.ags.demographics_sociodemographic_usa...SEXCYFEM_d52acecbSEXCYFEMPopulation female (2019A)SEXCYFEMINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
103carto-do.ags.demographics_sociodemographic_usa...SEXCYMAL_ca14d4b8SEXCYMALPopulation male (2019A)SEXCYMALINTEGERcarto-do.ags.demographics_sociodemographic_usa...SUMNoneFalse
\n", "
" ], "text/plain": [ " id slug \\\n", "0 carto-do.ags.demographics_sociodemographic_usa... AGECY0004_bf30e80a \n", "1 carto-do.ags.demographics_sociodemographic_usa... AGECY0509_c74a565c \n", "2 carto-do.ags.demographics_sociodemographic_usa... AGECY1014_1e97be2e \n", "3 carto-do.ags.demographics_sociodemographic_usa... AGECY1519_66ed0078 \n", "4 carto-do.ags.demographics_sociodemographic_usa... AGECY2024_270f4203 \n", "5 carto-do.ags.demographics_sociodemographic_usa... AGECY2529_5f75fc55 \n", "6 carto-do.ags.demographics_sociodemographic_usa... AGECY3034_86a81427 \n", "7 carto-do.ags.demographics_sociodemographic_usa... AGECY3539_fed2aa71 \n", "8 carto-do.ags.demographics_sociodemographic_usa... AGECY4044_543eba59 \n", "9 carto-do.ags.demographics_sociodemographic_usa... AGECY4549_2c44040f \n", "10 carto-do.ags.demographics_sociodemographic_usa... AGECY5054_f599ec7d \n", "11 carto-do.ags.demographics_sociodemographic_usa... AGECY5559_8de3522b \n", "12 carto-do.ags.demographics_sociodemographic_usa... AGECY6064_cc011050 \n", "13 carto-do.ags.demographics_sociodemographic_usa... AGECY6569_b47bae06 \n", "14 carto-do.ags.demographics_sociodemographic_usa... AGECY7074_6da64674 \n", "15 carto-do.ags.demographics_sociodemographic_usa... AGECY7579_15dcf822 \n", "16 carto-do.ags.demographics_sociodemographic_usa... AGECY8084_b25d4aed \n", "17 carto-do.ags.demographics_sociodemographic_usa... AGECYGT15_681a1204 \n", "18 carto-do.ags.demographics_sociodemographic_usa... AGECYGT25_433741c7 \n", "19 carto-do.ags.demographics_sociodemographic_usa... AGECYGT85_b9d8a94d \n", "28 carto-do.ags.demographics_sociodemographic_usa... EDUCYASSOC_fa1bcf13 \n", "29 carto-do.ags.demographics_sociodemographic_usa... EDUCYBACH_c2295f79 \n", "30 carto-do.ags.demographics_sociodemographic_usa... EDUCYGRAD_d0179ccb \n", "31 carto-do.ags.demographics_sociodemographic_usa... EDUCYHSCH_b236c803 \n", "32 carto-do.ags.demographics_sociodemographic_usa... EDUCYLTGR9_cbcfcc89 \n", "33 carto-do.ags.demographics_sociodemographic_usa... EDUCYSCOLL_1e8c4828 \n", "34 carto-do.ags.demographics_sociodemographic_usa... EDUCYSHSCH_5c444deb \n", "66 carto-do.ags.demographics_sociodemographic_usa... HISCYHISP_f3b3a31e \n", "77 carto-do.ags.demographics_sociodemographic_usa... LBFCYARM_8c06223a \n", "78 carto-do.ags.demographics_sociodemographic_usa... LBFCYEMPL_c9c22a0 \n", "79 carto-do.ags.demographics_sociodemographic_usa... LBFCYLBF_59ce7ab0 \n", "80 carto-do.ags.demographics_sociodemographic_usa... LBFCYNLF_c4c98350 \n", "81 carto-do.ags.demographics_sociodemographic_usa... LBFCYPOP16_53fa921c \n", "82 carto-do.ags.demographics_sociodemographic_usa... LBFCYUNEM_1e711de4 \n", "90 carto-do.ags.demographics_sociodemographic_usa... POPCY_f5800f44 \n", "91 carto-do.ags.demographics_sociodemographic_usa... POPCYGRP_74c19673 \n", "92 carto-do.ags.demographics_sociodemographic_usa... POPCYGRPI_147af7a9 \n", "93 carto-do.ags.demographics_sociodemographic_usa... POPPY_946f4ed6 \n", "102 carto-do.ags.demographics_sociodemographic_usa... SEXCYFEM_d52acecb \n", "103 carto-do.ags.demographics_sociodemographic_usa... SEXCYMAL_ca14d4b8 \n", "\n", " name description column_name \\\n", "0 AGECY0004 Population age 0-4 (2019A) AGECY0004 \n", "1 AGECY0509 Population age 5-9 (2019A) AGECY0509 \n", "2 AGECY1014 Population age 10-14 (2019A) AGECY1014 \n", "3 AGECY1519 Population age 15-19 (2019A) AGECY1519 \n", "4 AGECY2024 Population age 20-24 (2019A) AGECY2024 \n", "5 AGECY2529 Population age 25-29 (2019A) AGECY2529 \n", "6 AGECY3034 Population age 30-34 (2019A) AGECY3034 \n", "7 AGECY3539 Population age 35-39 (2019A) AGECY3539 \n", "8 AGECY4044 Population age 40-44 (2019A) AGECY4044 \n", "9 AGECY4549 Population age 45-49 (2019A) AGECY4549 \n", "10 AGECY5054 Population age 50-54 (2019A) AGECY5054 \n", "11 AGECY5559 Population age 55-59 (2019A) AGECY5559 \n", "12 AGECY6064 Population age 60-64 (2019A) AGECY6064 \n", "13 AGECY6569 Population age 65-69 (2019A) AGECY6569 \n", "14 AGECY7074 Population age 70-74 (2019A) AGECY7074 \n", "15 AGECY7579 Population age 75-79 (2019A) AGECY7579 \n", "16 AGECY8084 Population age 80-84 (2019A) AGECY8084 \n", "17 AGECYGT15 Population Age 15+ (2019A) AGECYGT15 \n", "18 AGECYGT25 Population Age 25+ (2019A) AGECYGT25 \n", "19 AGECYGT85 Population age 85+ (2019A) AGECYGT85 \n", "28 EDUCYASSOC Pop 25+ Associate degree (2019A) EDUCYASSOC \n", "29 EDUCYBACH Pop 25+ Bachelors degree (2019A) EDUCYBACH \n", "30 EDUCYGRAD Pop 25+ graduate or prof school degree (2019A) EDUCYGRAD \n", "31 EDUCYHSCH Pop 25+ HS graduate (2019A) EDUCYHSCH \n", "32 EDUCYLTGR9 Pop 25+ less than 9th grade (2019A) EDUCYLTGR9 \n", "33 EDUCYSCOLL Pop 25+ college no diploma (2019A) EDUCYSCOLL \n", "34 EDUCYSHSCH Pop 25+ 9th-12th grade no diploma (2019A) EDUCYSHSCH \n", "66 HISCYHISP Population Hispanic (2019A) HISCYHISP \n", "77 LBFCYARM Pop 16+ in Armed Forces (2019A) LBFCYARM \n", "78 LBFCYEMPL Pop 16+ civilian employed (2019A) LBFCYEMPL \n", "79 LBFCYLBF Population In Labor Force (2019A) LBFCYLBF \n", "80 LBFCYNLF Pop 16+ not in labor force (2019A) LBFCYNLF \n", "81 LBFCYPOP16 Population Age 16+ (2019A) LBFCYPOP16 \n", "82 LBFCYUNEM Pop 16+ civilian unemployed (2019A) LBFCYUNEM \n", "90 POPCY Population (2019A) POPCY \n", "91 POPCYGRP Population in Group Quarters (2019A) POPCYGRP \n", "92 POPCYGRPI Institutional Group Quarters Population (2019A) POPCYGRPI \n", "93 POPPY Population (2024A) POPPY \n", "102 SEXCYFEM Population female (2019A) SEXCYFEM \n", "103 SEXCYMAL Population male (2019A) SEXCYMAL \n", "\n", " db_type dataset_id agg_method \\\n", "0 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "1 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "2 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "3 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "4 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "5 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "6 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "7 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "8 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "9 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "10 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "11 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "12 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "13 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "14 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "15 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "16 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "17 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "18 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "19 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "28 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "29 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "30 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "31 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "32 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "33 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "34 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "66 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "77 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "78 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "79 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "80 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "81 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "82 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "90 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "91 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "92 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "93 FLOAT carto-do.ags.demographics_sociodemographic_usa... SUM \n", "102 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "103 INTEGER carto-do.ags.demographics_sociodemographic_usa... SUM \n", "\n", " variable_group_id starred \n", "0 None False \n", "1 None False \n", "2 None False \n", "3 None False \n", "4 None False \n", "5 None False \n", "6 None False \n", "7 None False \n", "8 None False \n", "9 None False \n", "10 None False \n", "11 None False \n", "12 None False \n", "13 None False \n", "14 None False \n", "15 None False \n", "16 None False \n", "17 None False \n", "18 None False \n", "19 None False \n", "28 None False \n", "29 None False \n", "30 None False \n", "31 None False \n", "32 None False \n", "33 None False \n", "34 None False \n", "66 None False \n", "77 None False \n", "78 None False \n", "79 None False \n", "80 None False \n", "81 None False \n", "82 None False \n", "90 None False \n", "91 None False \n", "92 None False \n", "93 None False \n", "102 None False \n", "103 None False " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vdf[vdf['description'].str.contains('pop', case=False, na=False)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dataset and variables metadata\n", "\n", "The Data Observatory catalog is not only a repository of curated spatial datasets, it also contains valuable information that helps on understanding better the underlying data for every dataset, so you can take an informed decision on what data best fits your problem.\n", "\n", "Some of the augmented metadata you can find for each dataset in the catalog is:\n", "\n", "- `head` and `tail` methods to get a glimpse of the actual data. This helps you to understand the available columns, data types, etc. To start modelling your problem right away.\n", "- `geom_coverage` to visualize on a map the geographical coverage of the data in the `Dataset`.\n", "- `counts`, `fields_by_type` and a full `describe` method with stats of the actual values in the dataset, such as: average, stdev, quantiles, min, max, median for each of the variables of the dataset.\n", "\n", "You don't need a subscription to a dataset to be able to query the augmented metadata, it's just publicly available for anyone exploring the Data Observatory catalog.\n", "\n", "Let's overview some of that information, starting by getting a glimpse of the ten first or last rows of the actual data of the dataset:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from cartoframes.data.observatory import Dataset\n", "dataset = Dataset.get('ags_sociodemogr_e92b1637')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DWLCYHHDCYPOPCYVPHCY1AGECYMEDHHDCYFAMHOOEXMEDHUSEXAPTLBFCYARMLBFCYLBF...MARCYDIVORMARCYNEVERMARCYWIDOWRCHCYAMNHSRCHCYASNHSRCHCYBLNHSRCHCYHANHSRCHCYMUNHSRCHCYOTNHSRCHCYWHNHS
0556064.00163749000...0000000006
1225136.502124999002...0100030002
200000.0000000...0000000000
3211122464.006749990010...413200220000
400959018.910000378...095905532300250609
500000.0000000...0000000000
600000.0000000...0000000000
700000.0000000...0000000000
800000.0000000...0000000000
900000.0000000...0000000000
\n", "

10 rows × 101 columns

\n", "
" ], "text/plain": [ " DWLCY HHDCY POPCY VPHCY1 AGECYMED HHDCYFAM HOOEXMED HUSEXAPT \\\n", "0 5 5 6 0 64.00 1 63749 0 \n", "1 2 2 5 1 36.50 2 124999 0 \n", "2 0 0 0 0 0.00 0 0 0 \n", "3 21 11 22 4 64.00 6 74999 0 \n", "4 0 0 959 0 18.91 0 0 0 \n", "5 0 0 0 0 0.00 0 0 0 \n", "6 0 0 0 0 0.00 0 0 0 \n", "7 0 0 0 0 0.00 0 0 0 \n", "8 0 0 0 0 0.00 0 0 0 \n", "9 0 0 0 0 0.00 0 0 0 \n", "\n", " LBFCYARM LBFCYLBF ... MARCYDIVOR MARCYNEVER MARCYWIDOW RCHCYAMNHS \\\n", "0 0 0 ... 0 0 0 0 \n", "1 0 2 ... 0 1 0 0 \n", "2 0 0 ... 0 0 0 0 \n", "3 0 10 ... 4 13 2 0 \n", "4 0 378 ... 0 959 0 5 \n", "5 0 0 ... 0 0 0 0 \n", "6 0 0 ... 0 0 0 0 \n", "7 0 0 ... 0 0 0 0 \n", "8 0 0 ... 0 0 0 0 \n", "9 0 0 ... 0 0 0 0 \n", "\n", " RCHCYASNHS RCHCYBLNHS RCHCYHANHS RCHCYMUNHS RCHCYOTNHS RCHCYWHNHS \n", "0 0 0 0 0 0 6 \n", "1 0 3 0 0 0 2 \n", "2 0 0 0 0 0 0 \n", "3 0 22 0 0 0 0 \n", "4 53 230 0 25 0 609 \n", "5 0 0 0 0 0 0 \n", "6 0 0 0 0 0 0 \n", "7 0 0 0 0 0 0 \n", "8 0 0 0 0 0 0 \n", "9 0 0 0 0 0 0 \n", "\n", "[10 rows x 101 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, you can get the last ten ones with `dataset.tail()`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An overview of the coverage of the dataset" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " None\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " Static map image\n", " \n", " \n", "
\n", "
\n", "
\n", " \n", " \n", "
\n", "
\n", "
\n", "\n", " \n", "\n", "
\n", "
\n", " :\n", "
\n", " \n", " \n", "
\n", "
\n", "\n", "
\n", " StackTrace\n", "
    \n", "
    \n", "
    \n", "\n", "\n", "\n", "\n", "\n", "\">\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.geom_coverage()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some stats about the dataset:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "rows 217182.0\n", "cells 22369746.0\n", "null_cells 0.0\n", "null_cells_percent 0.0\n", "dtype: float64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.counts()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float 4\n", "string 1\n", "integer 96\n", "dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.fields_by_type()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
    \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
    AGECY0004AGECY0509AGECY1014AGECY1519AGECY2024AGECY2529AGECY3034AGECY3539AGECY4044AGECY4549...RCHCYMUNHSRCHCYOTNHSRCHCYWHNHSRNTEXMEDSEXCYFEMSEXCYMALUNECYRATEVPHCY1VPHCYGT1VPHCYNONE
    avg9.072047e+019.311367e+019.591034e+019.722016e+011.001196e+021.087202e+021.036462e+021.003712e+029.199482e+019.412861e+01...3.505126e+013.6731649.111044e+029.315027e+027.691157e+027.464722e+023.6872631.922163e+023.509257e+025.008733e+01
    max5.007000e+035.274000e+035.225000e+037.607000e+031.489400e+045.746000e+034.936000e+035.451000e+035.052000e+034.596000e+03...2.110000e+03950.0000003.681800e+043.999000e+033.255200e+043.104300e+04100.0000001.681400e+041.696200e+044.945000e+03
    min0.000000e+000.000000e+000.000000e+000.000000e+000.000000e+000.000000e+000.000000e+000.000000e+000.000000e+000.000000e+00...0.000000e+000.0000000.000000e+000.000000e+000.000000e+000.000000e+000.0000000.000000e+000.000000e+000.000000e+00
    sum1.970285e+072.022261e+072.083000e+072.111447e+072.174418e+072.361208e+072.251009e+072.179881e+071.997962e+072.044304e+07...7.612502e+06797745.0000001.978755e+082.023056e+081.670381e+081.621203e+08800807.2200004.174592e+077.621475e+071.087807e+07
    range5.007000e+035.274000e+035.225000e+037.607000e+031.489400e+045.746000e+034.936000e+035.451000e+035.052000e+034.596000e+03...2.110000e+03950.0000003.681800e+043.999000e+033.255200e+043.104300e+04100.0000001.681400e+041.696200e+044.945000e+03
    stdev7.802265e+018.034981e+018.116058e+011.107727e+021.230680e+029.159219e+018.815390e+018.482190e+017.528368e+017.152112e+01...5.045176e+0114.9061117.440860e+024.772473e+025.222389e+025.242907e+023.7747351.561162e+022.771389e+028.571871e+01
    q14.400000e+014.500000e+014.600000e+014.500000e+014.400000e+015.100000e+015.000000e+015.000000e+014.600000e+014.900000e+01...1.100000e+010.0000003.670000e+025.520000e+024.350000e+024.180000e+020.9700008.800000e+011.700000e+025.000000e+00
    q38.400000e+018.600000e+018.900000e+018.700000e+018.600000e+011.000000e+029.600000e+019.300000e+018.600000e+018.900000e+01...2.900000e+010.0000009.250000e+029.250000e+027.400000e+027.130000e+023.4600001.830000e+023.410000e+023.400000e+01
    median6.200000e+016.400000e+016.500000e+016.400000e+016.200000e+017.300000e+017.000000e+016.900000e+016.400000e+016.700000e+01...1.900000e+010.0000006.550000e+027.190000e+025.730000e+025.490000e+022.1300001.310000e+022.520000e+021.700000e+01
    interquartile_range4.000000e+014.100000e+014.300000e+014.200000e+014.200000e+014.900000e+014.600000e+014.300000e+014.000000e+014.000000e+01...1.800000e+010.0000005.580000e+023.730000e+023.050000e+022.950000e+022.4900009.500000e+011.710000e+022.900000e+01
    \n", "

    10 rows × 107 columns

    \n", "
    " ], "text/plain": [ " AGECY0004 AGECY0509 AGECY1014 AGECY1519 \\\n", "avg 9.072047e+01 9.311367e+01 9.591034e+01 9.722016e+01 \n", "max 5.007000e+03 5.274000e+03 5.225000e+03 7.607000e+03 \n", "min 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 \n", "sum 1.970285e+07 2.022261e+07 2.083000e+07 2.111447e+07 \n", "range 5.007000e+03 5.274000e+03 5.225000e+03 7.607000e+03 \n", "stdev 7.802265e+01 8.034981e+01 8.116058e+01 1.107727e+02 \n", "q1 4.400000e+01 4.500000e+01 4.600000e+01 4.500000e+01 \n", "q3 8.400000e+01 8.600000e+01 8.900000e+01 8.700000e+01 \n", "median 6.200000e+01 6.400000e+01 6.500000e+01 6.400000e+01 \n", "interquartile_range 4.000000e+01 4.100000e+01 4.300000e+01 4.200000e+01 \n", "\n", " AGECY2024 AGECY2529 AGECY3034 AGECY3539 \\\n", "avg 1.001196e+02 1.087202e+02 1.036462e+02 1.003712e+02 \n", "max 1.489400e+04 5.746000e+03 4.936000e+03 5.451000e+03 \n", "min 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 \n", "sum 2.174418e+07 2.361208e+07 2.251009e+07 2.179881e+07 \n", "range 1.489400e+04 5.746000e+03 4.936000e+03 5.451000e+03 \n", "stdev 1.230680e+02 9.159219e+01 8.815390e+01 8.482190e+01 \n", "q1 4.400000e+01 5.100000e+01 5.000000e+01 5.000000e+01 \n", "q3 8.600000e+01 1.000000e+02 9.600000e+01 9.300000e+01 \n", "median 6.200000e+01 7.300000e+01 7.000000e+01 6.900000e+01 \n", "interquartile_range 4.200000e+01 4.900000e+01 4.600000e+01 4.300000e+01 \n", "\n", " AGECY4044 AGECY4549 ... RCHCYMUNHS \\\n", "avg 9.199482e+01 9.412861e+01 ... 3.505126e+01 \n", "max 5.052000e+03 4.596000e+03 ... 2.110000e+03 \n", "min 0.000000e+00 0.000000e+00 ... 0.000000e+00 \n", "sum 1.997962e+07 2.044304e+07 ... 7.612502e+06 \n", "range 5.052000e+03 4.596000e+03 ... 2.110000e+03 \n", "stdev 7.528368e+01 7.152112e+01 ... 5.045176e+01 \n", "q1 4.600000e+01 4.900000e+01 ... 1.100000e+01 \n", "q3 8.600000e+01 8.900000e+01 ... 2.900000e+01 \n", "median 6.400000e+01 6.700000e+01 ... 1.900000e+01 \n", "interquartile_range 4.000000e+01 4.000000e+01 ... 1.800000e+01 \n", "\n", " RCHCYOTNHS RCHCYWHNHS RNTEXMED SEXCYFEM \\\n", "avg 3.673164 9.111044e+02 9.315027e+02 7.691157e+02 \n", "max 950.000000 3.681800e+04 3.999000e+03 3.255200e+04 \n", "min 0.000000 0.000000e+00 0.000000e+00 0.000000e+00 \n", "sum 797745.000000 1.978755e+08 2.023056e+08 1.670381e+08 \n", "range 950.000000 3.681800e+04 3.999000e+03 3.255200e+04 \n", "stdev 14.906111 7.440860e+02 4.772473e+02 5.222389e+02 \n", "q1 0.000000 3.670000e+02 5.520000e+02 4.350000e+02 \n", "q3 0.000000 9.250000e+02 9.250000e+02 7.400000e+02 \n", "median 0.000000 6.550000e+02 7.190000e+02 5.730000e+02 \n", "interquartile_range 0.000000 5.580000e+02 3.730000e+02 3.050000e+02 \n", "\n", " SEXCYMAL UNECYRATE VPHCY1 VPHCYGT1 \\\n", "avg 7.464722e+02 3.687263 1.922163e+02 3.509257e+02 \n", "max 3.104300e+04 100.000000 1.681400e+04 1.696200e+04 \n", "min 0.000000e+00 0.000000 0.000000e+00 0.000000e+00 \n", "sum 1.621203e+08 800807.220000 4.174592e+07 7.621475e+07 \n", "range 3.104300e+04 100.000000 1.681400e+04 1.696200e+04 \n", "stdev 5.242907e+02 3.774735 1.561162e+02 2.771389e+02 \n", "q1 4.180000e+02 0.970000 8.800000e+01 1.700000e+02 \n", "q3 7.130000e+02 3.460000 1.830000e+02 3.410000e+02 \n", "median 5.490000e+02 2.130000 1.310000e+02 2.520000e+02 \n", "interquartile_range 2.950000e+02 2.490000 9.500000e+01 1.710000e+02 \n", "\n", " VPHCYNONE \n", "avg 5.008733e+01 \n", "max 4.945000e+03 \n", "min 0.000000e+00 \n", "sum 1.087807e+07 \n", "range 4.945000e+03 \n", "stdev 8.571871e+01 \n", "q1 5.000000e+00 \n", "q3 3.400000e+01 \n", "median 1.700000e+01 \n", "interquartile_range 2.900000e+01 \n", "\n", "[10 rows x 107 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Every `Dataset` instance in the catalog contains other useful metadata:\n", "\n", "- slug: A short ID\n", "- name and description: Free text attributes\n", "- country\n", "- geography: Every dataset is related to a Geography instance\n", "- category\n", "- provider\n", "- data source\n", "- lang\n", "- temporal aggregation\n", "- time coverage\n", "- update frequency\n", "- version\n", "- is_public_data: whether you need a license to use the dataset for enrichment purposes or not" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': 'carto-do.ags.demographics_sociodemographic_usa_blockgroup_2015_yearly_2019',\n", " 'slug': 'ags_sociodemogr_e92b1637',\n", " 'name': 'Sociodemographic',\n", " 'description': 'Census and ACS sociodemographic data estimated for the current year and data projected to five years. Projected fields are general aggregates (total population, total households, median age, avg income etc.)',\n", " 'country_id': 'usa',\n", " 'geography_id': 'carto-do-public-data.usa_carto.geography_usa_blockgroup_2015',\n", " 'geography_name': 'Census Block Groups (2015) - shoreline clipped',\n", " 'geography_description': 'Shoreline clipped TIGER/Line boundaries. More info: https://carto.com/blog/tiger-shoreline-clip/',\n", " 'category_id': 'demographics',\n", " 'category_name': 'Demographics',\n", " 'provider_id': 'ags',\n", " 'provider_name': 'Applied Geographic Solutions',\n", " 'data_source_id': 'sociodemographic',\n", " 'lang': 'eng',\n", " 'temporal_aggregation': 'yearly',\n", " 'time_coverage': '[2019-01-01, 2020-01-01)',\n", " 'update_frequency': None,\n", " 'version': '2019',\n", " 'is_public_data': False}" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.to_dict()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There's also some intersting metadata, for each variable in the dataset:\n", "\n", "- id\n", "- slug: A short ID\n", "- name and description\n", "- column_name: Actual column name in the table that contains the data\n", "- db_type: SQL type in the database\n", "- dataset_id\n", "- agg_method: Aggregation method used\n", "- temporal aggregation and time coverage" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Variables are the most important asset in the catalog and when exploring datasets in the Data Observatory catalog it's very important that you understand clearly what variables are available to enrich your own data.\n", "\n", "For each `Variable` in each dataset, the Data Observatory provides (as it does with datasets) a set of methods and attributes to understand their underlaying data.\n", "\n", "Some of them are:\n", "\n", "- `head` and `tail` methods to get a glimpse of the actual data and start modelling your problem right away.\n", "- `counts`, `quantiles` and a full `describe` method with stats of the actual values in the dataset, such as: average, stdev, quantiles, min, max, median for each of the variables of the dataset.\n", "- an `histogram` plot with the distribution of the values on each variable." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's overview some of that augmented metadata for the variables in the AGS population dataset." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " #'Population (2024A)'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from cartoframes.data.observatory import Variable\n", "variable = Variable.get('POPPY_946f4ed6')\n", "variable" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': 'carto-do.ags.demographics_sociodemographic_usa_blockgroup_2015_yearly_2019.POPPY',\n", " 'slug': 'POPPY_946f4ed6',\n", " 'name': 'POPPY',\n", " 'description': 'Population (2024A)',\n", " 'column_name': 'POPPY',\n", " 'db_type': 'FLOAT',\n", " 'dataset_id': 'carto-do.ags.demographics_sociodemographic_usa_blockgroup_2015_yearly_2019',\n", " 'agg_method': 'SUM',\n", " 'variable_group_id': None,\n", " 'starred': False}" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "variable.to_dict()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There's also some utility methods ot understand the underlying data for each variable:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0\n", "1 0\n", "2 8\n", "3 0\n", "4 0\n", "5 0\n", "6 4\n", "7 0\n", "8 2\n", "9 59\n", "dtype: int64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "variable.head()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "all 217182.000000\n", "null 0.000000\n", "zero 303.000000\n", "extreme 9380.000000\n", "distinct 6947.000000\n", "outliers 27571.000000\n", "null_percent 0.000000\n", "zero_percent 0.139514\n", "extreme_percent 0.043190\n", "distinct_percent 3.198700\n", "outliers_percent 0.126949\n", "dtype: float64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "variable.counts()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "q1 867\n", "q3 1490\n", "median 1149\n", "interquartile_range 623\n", "dtype: int64" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "variable.quantiles()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "
    " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "variable.histogram()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "avg 1.564793e+03\n", "max 7.127400e+04\n", "min 0.000000e+00\n", "sum 3.398448e+08\n", "range 7.127400e+04\n", "stdev 1.098193e+03\n", "q1 8.670000e+02\n", "q3 1.490000e+03\n", "median 1.149000e+03\n", "interquartile_range 6.230000e+02\n", "dtype: float64" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "variable.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Subscribe to a Dataset in the catalog\n", "\n", "Once you have explored the catalog and have detected a dataset with the variables you need for your analysis and the right spatial resolution, you have to look at the `is_public_data` to know if you can just use it from CARTOframes or you first need to subscribe for a license.\n", "\n", "Subscriptions to datasets allow you to use them from CARTOframes to enrich your own data or to download them. See the enrichment guides for more information about this.\n", "\n", "Let's see the dataset and geography in our previous example:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "dataset = Dataset.get('ags_sociodemogr_e92b1637')" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.is_public_data" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "from cartoframes.data.observatory import Geography\n", "geography = Geography.get(dataset.geography)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geography.is_public_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both `dataset` and `geography` are not public data, that means you need a subscription to be able to use them to enrich your own data.\n", "\n", "**To subscribe to data in the Data Observatory catalog you need a CARTO account with access to Data Observatory**" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "from cartoframes.auth import set_default_credentials\n", "\n", "set_default_credentials('creds.json')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataset.subscribe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "geography.subscribe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Licenses to data in the Data Observatory grant you the right to use the data subscribed for the period of one year. Every dataset or geography you want to use to enrich your own data, as long as they are not public data, require a valid license.**\n", "\n", "You can check the actual status of your subscriptions directly from the catalog." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Datasets: [, , , ]\n", "Geographies: [, ]" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Catalog().subscriptions()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conclusion\n", "\n", "In this guide you've seen how to explore the Data Observatory catalog to identify variables of datasets that you can use to enrich your own data.\n", "\n", "You've learned how to:\n", "\n", "- Explore the catalog using nested hierarchical filters.\n", "- Describe the three main entities in the catalog: `Geography`, `Dataset` and their `Variables`.\n", "- Taken a look at the data and stats taken from the actual repository, to make a more informed decision on which variables to choose.\n", "- How to subscribe to the chosen dataset to get a license that grants the right to enrich your own data.\n", "\n", "We also recommend checking out the resources below to learn more about the Data Observatory catalog:\n", "\n", "- The CARTOframes [enrichment guide](/developers/cartoframes/guides/Data-enrichment/)\n", "- [Our public website](https://carto.com/platform/location-data-streams/)\n", "- Your user dashboard: Under the data section\n", "- The CARTOframes catalog [API reference](/developers/cartoframes/reference/#heading-Data-Observatory)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }