{ "cells": [ { "cell_type": "markdown", "id": "b18262ab-97a0-4073-a682-c999a05a1554", "metadata": {}, "source": [ "## Accessing US Census data with the Planetary Computer STAC API\n", "\n", "The [US Census](https://planetarycomputer.microsoft.com/dataset/us-census) collection provides information on population, demographics, and administrative boundaries at various levels of cartographic aggregation for the United States. It consists of many tabular datasets, one for each level of cartographic aggregation, each stored in [Apache Parquet](https://parquet.apache.org/) format. In this notebook, we'll use [geopandas](https://geopandas.org/) and dask-geopandas to read the files, which will preserve the `geometry` column with administrative boundaries." ] }, { "cell_type": "code", "execution_count": 1, "id": "7ebe8130-90a3-4cbf-ae0d-bce3b33826f6", "metadata": {}, "outputs": [], "source": [ "import geopandas\n", "import dask_geopandas\n", "import contextily as ctx\n", "import seaborn as sns\n", "import planetary_computer\n", "import pystac_client" ] }, { "cell_type": "markdown", "id": "b4668485-65db-412d-ae7f-36d7c3edd590", "metadata": {}, "source": [ "### Data access\n", "\n", "The datasets hosted by the Planetary Computer are available from [Azure Blob Storage](https://docs.microsoft.com/en-us/azure/storage/blobs/). We'll use [pystac-client](https://pystac-client.readthedocs.io/) to search the Planetary Computer's [STAC API](https://planetarycomputer.microsoft.com/api/stac/v1/docs) for the subset of the data that we care about, and then we'll load the data directly from Azure Blob Storage. We'll specify a `modifier` so that we can access the data stored in the Planetary Computer's private Blob Storage Containers. See [Reading from the STAC API](https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/) and [Using tokens for data access](https://planetarycomputer.microsoft.com/docs/concepts/sas/) for more." ] }, { "cell_type": "code", "execution_count": 2, "id": "42c82313-9668-439a-ab2b-34c31e42a450", "metadata": {}, "outputs": [], "source": [ "catalog = pystac_client.Client.open(\n", " \"https://planetarycomputer.microsoft.com/api/stac/v1\",\n", " modifier=planetary_computer.sign_inplace,\n", ")" ] }, { "cell_type": "markdown", "id": "0626956f-7aef-4716-99e4-8f60cbc02982", "metadata": {}, "source": [ "Each item in the `us-census` collection represents a single table, aggregating the census data a some level (Congressional district, state, etc.)." ] }, { "cell_type": "code", "execution_count": 3, "id": "095d697e-6540-42cc-89cd-47aae3430d67", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['2020-census-blocks-population',\n", " '2020-census-blocks-geo',\n", " '2020-cb_2020_us_vtd_500k',\n", " '2020-cb_2020_us_unsd_500k',\n", " '2020-cb_2020_us_ttract_500k',\n", " '2020-cb_2020_us_tract_500k',\n", " '2020-cb_2020_us_tbg_500k',\n", " '2020-cb_2020_us_state_500k',\n", " '2020-cb_2020_us_sldu_500k',\n", " '2020-cb_2020_us_sldl_500k',\n", " '2020-cb_2020_us_scsd_500k',\n", " '2020-cb_2020_us_region_500k',\n", " '2020-cb_2020_us_place_500k',\n", " '2020-cb_2020_us_nectadiv_500k',\n", " '2020-cb_2020_us_necta_500k',\n", " '2020-cb_2020_us_nation_5m',\n", " '2020-cb_2020_us_metdiv_500k',\n", " '2020-cb_2020_us_elsd_500k',\n", " '2020-cb_2020_us_division_500k',\n", " '2020-cb_2020_us_csa_500k',\n", " '2020-cb_2020_us_cousub_500k',\n", " '2020-cb_2020_us_county_within_cd116_500k',\n", " '2020-cb_2020_us_county_500k',\n", " '2020-cb_2020_us_concity_500k',\n", " '2020-cb_2020_us_cnecta_500k',\n", " '2020-cb_2020_us_cd116_500k',\n", " '2020-cb_2020_us_cbsa_500k',\n", " '2020-cb_2020_us_bg_500k',\n", " '2020-cb_2020_us_aitsn_500k',\n", " '2020-cb_2020_us_aiannh_500k',\n", " '2020-cb_2020_72_subbarrio_500k',\n", " '2020-cb_2020_02_anrc_500k']" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "search = catalog.search(collections=[\"us-census\"])\n", "items = {item.id: item for item in search.items()}\n", "list(items)" ] }, { "cell_type": "markdown", "id": "5e8beadd-797b-4554-99cf-c7ca49b3baad", "metadata": {}, "source": [ "### Read Congressional districts\n", "\n", "The `2020-cb_2020_us_cd116_500k` dataset contains geometries for Congressional Districts for the 166th Congress." ] }, { "cell_type": "code", "execution_count": 4, "id": "44dae2b1-2a19-4c68-a78a-bba50efde3e2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item = items[\"2020-cb_2020_us_cd116_500k\"]\n", "item" ] }, { "cell_type": "markdown", "id": "8da60a4e-fe3c-451c-96c9-21f6439dc88a", "metadata": {}, "source": [ "Each of the items contains a single asset, `data`, that has all the URL to the Parquet dataset and all the information necessary to load it." ] }, { "cell_type": "code", "execution_count": 5, "id": "07e90dd1-a4ce-4a8c-a1b1-5500272d30c9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "
    \n", " \n", " \n", " \n", "
  • \n", " href\n", " \"abfs://us-census/2020/cb_2020_us_cd116_500k.parquet\"\n", "
  • \n", " \n", " \n", " \n", " \n", " \n", "
  • \n", " type\n", " \"application/x-parquet\"\n", "
  • \n", " \n", " \n", " \n", " \n", " \n", "
  • \n", " title\n", " \"Dataset root\"\n", "
  • \n", " \n", " \n", " \n", " \n", " \n", "
  • \n", " table:storage_options\n", "
      \n", " \n", " \n", " \n", "
    • \n", " account_name\n", " \"ai4edataeuwest\"\n", "
    • \n", " \n", " \n", " \n", " \n", " \n", "
    • \n", " credential\n", " \"st=2024-03-19T20%3A59%3A14Z&se=2024-03-27T20%3A59%3A15Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-03-20T20%3A59%3A13Z&ske=2024-03-27T20%3A59%3A13Z&sks=b&skv=2021-06-08&sig=uEVQQlDffvcVXVvPEGtXLJjSZw1uhhaOU61EAqjwOn4%3D\"\n", "
    • \n", " \n", " \n", " \n", "
    \n", "
  • \n", " \n", " \n", " \n", " \n", "
  • \n", " \n", " roles\n", " [] 1 items\n", " \n", " \n", "
      \n", " \n", " \n", " \n", "
    • \n", " 0\n", " \"data\"\n", "
    • \n", " \n", " \n", " \n", "
    \n", " \n", "
  • \n", " \n", " \n", "
\n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "asset = item.assets[\"data\"]\n", "asset" ] }, { "cell_type": "markdown", "id": "c871bb54-7b41-46f8-a8c6-7bd318e039ae", "metadata": {}, "source": [ "This is an [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) URL, which is used by libraries like pandas, geopandas, and Dask to work with files from remote storage like Azure Blob Storage. We already signed this asset to include a `credential`. If you use an unsigned asset you'll see a `ClientAuthenticationError` error when trying to open the dataset" ] }, { "cell_type": "code", "execution_count": 6, "id": "fe825dca-ce6a-49e4-ad2c-3f5842b5a607", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
STATEFPCD116FPAFFGEOIDGEOIDNAMELSADLSADCDSESSNALANDAWATERgeometry
006425001600US06420642Congressional District 42C2116242475356344105315POLYGON ((-117.67629 33.88882, -117.65488 33.8...
13975001600US39073907Congressional District 7C21161001001639664562455MULTIPOLYGON (((-82.55933 40.78975, -82.55835 ...
24835001600US48034803Congressional District 3C2116124557401197890112POLYGON ((-96.84410 32.98891, -96.84403 32.992...
32825001600US28022802Congressional District 2C211640278711117951654563POLYGON ((-91.36371 31.78036, -91.35951 31.799...
442185001600US42184218Congressional District 18C211675765519519985421POLYGON ((-80.17834 40.33725, -80.17537 40.338...
\n", "
" ], "text/plain": [ " STATEFP CD116FP AFFGEOID GEOID NAMELSAD LSAD \\\n", "0 06 42 5001600US0642 0642 Congressional District 42 C2 \n", "1 39 7 5001600US3907 3907 Congressional District 7 C2 \n", "2 48 3 5001600US4803 4803 Congressional District 3 C2 \n", "3 28 2 5001600US2802 2802 Congressional District 2 C2 \n", "4 42 18 5001600US4218 4218 Congressional District 18 C2 \n", "\n", " CDSESSN ALAND AWATER \\\n", "0 116 2424753563 44105315 \n", "1 116 10010016396 64562455 \n", "2 116 1245574011 97890112 \n", "3 116 40278711117 951654563 \n", "4 116 757655195 19985421 \n", "\n", " geometry \n", "0 POLYGON ((-117.67629 33.88882, -117.65488 33.8... \n", "1 MULTIPOLYGON (((-82.55933 40.78975, -82.55835 ... \n", "2 POLYGON ((-96.84410 32.98891, -96.84403 32.992... \n", "3 POLYGON ((-91.36371 31.78036, -91.35951 31.799... \n", "4 POLYGON ((-80.17834 40.33725, -80.17537 40.338... " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = geopandas.read_parquet(\n", " asset.href,\n", " storage_options=asset.extra_fields[\"table:storage_options\"],\n", ")\n", "df.head()" ] }, { "cell_type": "markdown", "id": "f7052d86-d911-4b89-9828-0553b9be445e", "metadata": {}, "source": [ "We'll select a single district (Maryland's 2nd) and plot it." ] }, { "cell_type": "code", "execution_count": 7, "id": "bec86de8-5ad5-4455-b107-1c5904c1ddcb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ax = (\n", " df[df.GEOID == \"2402\"]\n", " .to_crs(epsg=3857)\n", " .plot(figsize=(10, 10), alpha=0.5, edgecolor=\"k\")\n", ")\n", "ax.set_title(\n", " \"2nd Congressional District: Maryland\",\n", " fontdict={\"fontsize\": \"20\", \"fontweight\": \"2\"},\n", ")\n", "ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap)\n", "ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "48dfa244-3671-4459-9de8-4d8b2891aca0", "metadata": { "tags": [] }, "source": [ "### Read Census Block data\n", "\n", "Census blocks are the smallest cartographic unit available from the Census Bureau. There are over 8 million census blocks." ] }, { "cell_type": "code", "execution_count": 8, "id": "a37187cb-361e-462e-a454-ae256beb74ad", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Dask DataFrame Structure:
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
STATEFPCOUNTYFPTRACTCEBLOCKCEALANDAWATERINTPTLATINTPTLONgeometry
npartitions=56
010010201001000category[unknown]category[unknown]int64int64int64int64float64float64geometry
020130001001000...........................
..............................
780109701001000...........................
780309900000008...........................
\n", "
Dask Name: readparquetfsspec, 1 expression
" ], "text/plain": [ "Dask DataFrame Structure:\n", " STATEFP COUNTYFP TRACTCE BLOCKCE ALAND AWATER INTPTLAT INTPTLON geometry\n", "npartitions=56 \n", "010010201001000 category[unknown] category[unknown] int64 int64 int64 int64 float64 float64 geometry\n", "020130001001000 ... ... ... ... ... ... ... ... ...\n", "... ... ... ... ... ... ... ... ... ...\n", "780109701001000 ... ... ... ... ... ... ... ... ...\n", "780309900000008 ... ... ... ... ... ... ... ... ...\n", "Dask Name: readparquetfsspec, 1 expression\n", "Expr=ReadParquetFSSpec(ba7eb75)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geo = dask_geopandas.read_parquet(\n", " \"abfs://us-census/2020/census_blocks_geo.parquet\",\n", " storage_options=asset.extra_fields[\"table:storage_options\"],\n", " calculate_divisions=True,\n", ")\n", "geo" ] }, { "cell_type": "code", "execution_count": 9, "id": "943cbfef-8f2e-4756-a924-5519971d077a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Dask DataFrame Structure:
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
P0010001P0010002P0010003P0010004P0010005P0010006P0010007P0010008P0010009P0010010P0010011P0010012P0010013P0010014P0010015P0010016P0010017P0010018P0010019P0010020P0010021P0010022P0010023P0010024P0010025P0010026P0010027P0010028P0010029P0010030P0010031P0010032P0010033P0010034P0010035P0010036P0010037P0010038P0010039P0010040P0010041P0010042P0010043P0010044P0010045P0010046P0010047P0010048P0010049P0010050P0010051P0010052P0010053P0010054P0010055P0010056P0010057P0010058P0010059P0010060P0010061P0010062P0010063P0010064P0010065P0010066P0010067P0010068P0010069P0010070P0010071P0020001P0020002P0020003P0020004P0020005P0020006P0020007P0020008P0020009P0020010P0020011P0020012P0020013P0020014P0020015P0020016P0020017P0020018P0020019P0020020P0020021P0020022P0020023P0020024P0020025P0020026P0020027P0020028P0020029P0020030P0020031P0020032P0020033P0020034P0020035P0020036P0020037P0020038P0020039P0020040P0020041P0020042P0020043P0020044P0020045P0020046P0020047P0020048P0020049P0020050P0020051P0020052P0020053P0020054P0020055P0020056P0020057P0020058P0020059P0020060P0020061P0020062P0020063P0020064P0020065P0020066P0020067P0020068P0020069P0020070P0020071P0020072P0020073
npartitions=52
010010201001000int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64
020130001001000................................................................................................................................................................................................................................................................................................................................................................................................................................................
...................................................................................................................................................................................................................................................................................................................................................................................................................................................
720019563001000................................................................................................................................................................................................................................................................................................................................................................................................................................................
721537506022015................................................................................................................................................................................................................................................................................................................................................................................................................................................
\n", "
Dask Name: readparquetfsspec, 1 expression
" ], "text/plain": [ "Dask DataFrame Structure:\n", " P0010001 P0010002 P0010003 P0010004 P0010005 P0010006 P0010007 P0010008 P0010009 P0010010 P0010011 P0010012 P0010013 P0010014 P0010015 P0010016 P0010017 P0010018 P0010019 P0010020 P0010021 P0010022 P0010023 P0010024 P0010025 P0010026 P0010027 P0010028 P0010029 P0010030 P0010031 P0010032 P0010033 P0010034 P0010035 P0010036 P0010037 P0010038 P0010039 P0010040 P0010041 P0010042 P0010043 P0010044 P0010045 P0010046 P0010047 P0010048 P0010049 P0010050 P0010051 P0010052 P0010053 P0010054 P0010055 P0010056 P0010057 P0010058 P0010059 P0010060 P0010061 P0010062 P0010063 P0010064 P0010065 P0010066 P0010067 P0010068 P0010069 P0010070 P0010071 P0020001 P0020002 P0020003 P0020004 P0020005 P0020006 P0020007 P0020008 P0020009 P0020010 P0020011 P0020012 P0020013 P0020014 P0020015 P0020016 P0020017 P0020018 P0020019 P0020020 P0020021 P0020022 P0020023 P0020024 P0020025 P0020026 P0020027 P0020028 P0020029 P0020030 P0020031 P0020032 P0020033 P0020034 P0020035 P0020036 P0020037 P0020038 P0020039 P0020040 P0020041 P0020042 P0020043 P0020044 P0020045 P0020046 P0020047 P0020048 P0020049 P0020050 P0020051 P0020052 P0020053 P0020054 P0020055 P0020056 P0020057 P0020058 P0020059 P0020060 P0020061 P0020062 P0020063 P0020064 P0020065 P0020066 P0020067 P0020068 P0020069 P0020070 P0020071 P0020072 P0020073\n", "npartitions=52 \n", "010010201001000 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64 int64\n", "020130001001000 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...\n", "... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...\n", "720019563001000 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...\n", "721537506022015 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...\n", "Dask Name: readparquetfsspec, 1 expression\n", "Expr=ReadParquetFSSpec(a73ac8e)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import dask.dataframe\n", "\n", "pop = dask.dataframe.read_parquet(\n", " \"abfs://us-census/2020/census_blocks_population.parquet\",\n", " storage_options=asset.extra_fields[\"table:storage_options\"],\n", " calculate_divisions=True,\n", ")\n", "pop" ] }, { "cell_type": "code", "execution_count": 10, "id": "15a976a5-2d24-4db1-8906-feceb3710e1c", "metadata": {}, "outputs": [], "source": [ "ri = (\n", " geo.get_partition(39)\n", " .compute()\n", " .join(pop[[\"P0010001\"]].get_partition(39).compute(), how=\"inner\")\n", ")\n", "ri = ri[ri.P0010001 > 10]" ] }, { "cell_type": "code", "execution_count": 11, "id": "0718e302-2ad2-4dbb-a937-dfda589f6e1c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ax = ri.to_crs(epsg=3857).plot(figsize=(10, 10), alpha=0.5, edgecolor=\"k\")\n", "ax.set_title(\n", " \"Census Blocks with Population Greater than 150: Providence County, RI\",\n", " fontdict={\"fontsize\": \"20\", \"fontweight\": \"2\"},\n", ")\n", "ctx.add_basemap(ax, source=ctx.providers.Esri.NatGeoWorldMap)\n", "ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "c457f7bb-176c-45cb-aa11-cc3188cf0463", "metadata": {}, "source": [ "Let's filter out the blocks with 0 reported population and plot the distribution of people per census block." ] }, { "cell_type": "code", "execution_count": 12, "id": "a31440bb-f16b-4799-a02f-a1836002491d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.displot(ri.P0010001, log_scale=True);" ] }, { "cell_type": "markdown", "id": "f016b764-2ffc-431d-851a-24d8485171d0", "metadata": {}, "source": [ "Or we can plot the relationship between the population and the size of the census block." ] }, { "cell_type": "code", "execution_count": 13, "id": "0d637681-0777-47b9-ba60-bc7e8f9d23d3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "\n", "sns.jointplot(x=ri.ALAND, y=np.log(ri.P0010001), marker=\".\");" ] }, { "cell_type": "markdown", "id": "97c000b7-de12-43dc-9800-34a8a35f18d7", "metadata": {}, "source": [ "### Next Steps\n", "\n", "Now that you've seen an introduction to working with US Census data from the Planetary Computer, learn more with\n", "\n", "* The [US Census data tutorial](../../tutorials/census-data.ipynb), which includes examples for accessing data at each level of cartographic aggregation available\n", "* The [Reading tabular data quickstart](../../quickstarts/reading-tabular-data.ipynb), which introduces how to use tabular data with the Planetary Computer" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.8" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }