{ "cells": [ { "cell_type": "markdown", "id": "16f34a39-14ac-49ad-98da-abdb35ebfeae", "metadata": {}, "source": [ "## Project Eclipse data on the Planetary Computer\n", "\n", "The [Project Eclipse Network](https://planetarycomputer.microsoft.com/dataset/eclipse) is a low-cost air quality sensing network for cities and a research project led by the Urban Innovation Group at Microsoft Research.\n", "\n", "### Using the STAC API\n", "\n", "Project Eclipse data are distributed as a set of parquet files -- one per week. We can use the STAC API to search for files for a specific week." ] }, { "cell_type": "code", "execution_count": 1, "id": "c38a9046-d64e-48ff-b394-0d7f95f1cd73", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 1 item\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pystac_client\n", "import planetary_computer\n", "\n", "catalog = pystac_client.Client.open(\n", " \"https://planetarycomputer.microsoft.com/api/stac/v1\",\n", " modifier=planetary_computer.sign_inplace,\n", ")\n", "search = catalog.search(collections=[\"eclipse\"], datetime=\"2022-03-01\")\n", "items = search.item_collection()\n", "print(f\"Found {len(items)} item\")\n", "item = items[0]\n", "item" ] }, { "cell_type": "markdown", "id": "22288f03-799e-4ff9-a877-fdcfae987307", "metadata": {}, "source": [ "We'll load the parquet file with pandas." ] }, { "cell_type": "code", "execution_count": 2, "id": "1510984d-be1d-48a2-9c68-7514ecc12859", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CityDeviceIdLocationNameLatitudeLongitudeReadingDateTimeUTCPM25CalibratedPM25CalibratedO3CalibratedNO2COTemperatureHumidityBatteryLevelPercentBatteryCellSignal
0Chicago2002State & Garfield (SB)41.794921-87.6258572022-02-27 00:04:049.12607110.7917.4410.490.105193-0.52635259.7030644.14390691.634804-83.0
1Chicago2002State & Garfield (SB)41.794921-87.6258572022-02-27 00:09:1410.92793711.8315.638.460.114015-0.64918560.2233894.14281291.634804-80.0
2Chicago2002State & Garfield (SB)41.794921-87.6258572022-02-27 00:14:2410.39528211.3818.295.250.096386-0.62782360.8840944.14109491.634804-82.0
3Chicago2002State & Garfield (SB)41.794921-87.6258572022-02-27 00:19:339.43124210.8515.118.530.119355-0.80940261.9842534.14296991.385475-81.0
4Chicago2002State & Garfield (SB)41.794921-87.6258572022-02-27 00:24:449.64822111.0513.059.920.125682-0.80940262.3779304.14234491.385475-82.0
...................................................
187062Chicago2214EPA COM ED Maintenance Bldg F41.751200-87.7134902022-03-05 22:33:080.7471078.1414.7613.740.28577319.89868253.8635254.18406289.785156-110.0
187063Chicago2214EPA COM ED Maintenance Bldg F41.751200-87.7134902022-03-05 22:53:220.73352412.9514.3720.181.28829720.14968954.3243414.18796989.785156-111.0
187064Chicago2214EPA COM ED Maintenance Bldg F41.751200-87.7134902022-03-05 23:13:370.00000012.9515.6917.942.92108120.02418553.4301764.18828189.785156-111.0
187065Chicago2214EPA COM ED Maintenance Bldg F41.751200-87.7134902022-03-05 23:33:573.20815714.7914.0020.470.78446420.14968953.6682134.18375089.785156-109.0
187066Chicago2214EPA COM ED Maintenance Bldg F41.751200-87.7134902022-03-05 23:54:090.0000008.0913.4416.400.33108220.29922553.3569344.18421989.683594-110.0
\n", "

187067 rows × 16 columns

\n", "
" ], "text/plain": [ " City DeviceId LocationName Latitude \\\n", "0 Chicago 2002 State & Garfield (SB) 41.794921 \n", "1 Chicago 2002 State & Garfield (SB) 41.794921 \n", "2 Chicago 2002 State & Garfield (SB) 41.794921 \n", "3 Chicago 2002 State & Garfield (SB) 41.794921 \n", "4 Chicago 2002 State & Garfield (SB) 41.794921 \n", "... ... ... ... ... \n", "187062 Chicago 2214 EPA COM ED Maintenance Bldg F 41.751200 \n", "187063 Chicago 2214 EPA COM ED Maintenance Bldg F 41.751200 \n", "187064 Chicago 2214 EPA COM ED Maintenance Bldg F 41.751200 \n", "187065 Chicago 2214 EPA COM ED Maintenance Bldg F 41.751200 \n", "187066 Chicago 2214 EPA COM ED Maintenance Bldg F 41.751200 \n", "\n", " Longitude ReadingDateTimeUTC PM25 CalibratedPM25 \\\n", "0 -87.625857 2022-02-27 00:04:04 9.126071 10.79 \n", "1 -87.625857 2022-02-27 00:09:14 10.927937 11.83 \n", "2 -87.625857 2022-02-27 00:14:24 10.395282 11.38 \n", "3 -87.625857 2022-02-27 00:19:33 9.431242 10.85 \n", "4 -87.625857 2022-02-27 00:24:44 9.648221 11.05 \n", "... ... ... ... ... \n", "187062 -87.713490 2022-03-05 22:33:08 0.747107 8.14 \n", "187063 -87.713490 2022-03-05 22:53:22 0.733524 12.95 \n", "187064 -87.713490 2022-03-05 23:13:37 0.000000 12.95 \n", "187065 -87.713490 2022-03-05 23:33:57 3.208157 14.79 \n", "187066 -87.713490 2022-03-05 23:54:09 0.000000 8.09 \n", "\n", " CalibratedO3 CalibratedNO2 CO Temperature Humidity \\\n", "0 17.44 10.49 0.105193 -0.526352 59.703064 \n", "1 15.63 8.46 0.114015 -0.649185 60.223389 \n", "2 18.29 5.25 0.096386 -0.627823 60.884094 \n", "3 15.11 8.53 0.119355 -0.809402 61.984253 \n", "4 13.05 9.92 0.125682 -0.809402 62.377930 \n", "... ... ... ... ... ... \n", "187062 14.76 13.74 0.285773 19.898682 53.863525 \n", "187063 14.37 20.18 1.288297 20.149689 54.324341 \n", "187064 15.69 17.94 2.921081 20.024185 53.430176 \n", "187065 14.00 20.47 0.784464 20.149689 53.668213 \n", "187066 13.44 16.40 0.331082 20.299225 53.356934 \n", "\n", " BatteryLevel PercentBattery CellSignal \n", "0 4.143906 91.634804 -83.0 \n", "1 4.142812 91.634804 -80.0 \n", "2 4.141094 91.634804 -82.0 \n", "3 4.142969 91.385475 -81.0 \n", "4 4.142344 91.385475 -82.0 \n", "... ... ... ... \n", "187062 4.184062 89.785156 -110.0 \n", "187063 4.187969 89.785156 -111.0 \n", "187064 4.188281 89.785156 -111.0 \n", "187065 4.183750 89.785156 -109.0 \n", "187066 4.184219 89.683594 -110.0 \n", "\n", "[187067 rows x 16 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import geopandas\n", "import pandas as pd\n", "\n", "asset = item.assets[\"data\"]\n", "df = pd.read_parquet(\n", " asset.href, storage_options=asset.extra_fields[\"table:storage_options\"]\n", ")\n", "df" ] }, { "cell_type": "code", "execution_count": 3, "id": "81503216-843c-4c6d-aa6b-335338156920", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "187067" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = df[(df.Longitude > -89) & (df.Longitude < -86)]\n", "len(df)" ] }, { "cell_type": "code", "execution_count": 4, "id": "4e8e0745-e862-47d4-9cf3-bece9a1acd47", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 17.44\n", "1 15.63\n", "2 18.29\n", "3 15.11\n", "4 13.05\n", " ... \n", "187062 14.76\n", "187063 14.37\n", "187064 15.69\n", "187065 14.00\n", "187066 13.44\n", "Name: CalibratedO3, Length: 187067, dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.CalibratedO3" ] }, { "cell_type": "code", "execution_count": 5, "id": "1d5b5225-1466-4c2e-b37a-70d40959fcd5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ts = df.resample(\"h\", on=\"ReadingDateTimeUTC\")[\n", " [\"CalibratedPM25\", \"Humidity\", \"CalibratedO3\", \"CalibratedNO2\", \"CO\"]\n", "].mean()\n", "ts.plot(subplots=True, sharex=True, figsize=(12, 12));" ] }, { "cell_type": "markdown", "id": "d6b92a98-34ed-4864-ae11-32b9730314f6", "metadata": {}, "source": [ "The dataset contains many observations from each sensor. We can plot the location of each sensor with geopandas, by selecting just the first observation for that sensor." ] }, { "cell_type": "code", "execution_count": 6, "id": "7bb0568a-7f56-40c2-b943-2b11c5bdc385", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdf = geopandas.GeoDataFrame(\n", " df, geometry=geopandas.points_from_xy(df.Longitude, df.Latitude), crs=\"epsg:4326\"\n", ")\n", "\n", "\n", "gdf[[\"LocationName\", \"geometry\"]].drop_duplicates(\n", " subset=\"LocationName\"\n", ").dropna().explore(marker_kwds=dict(radius=8))" ] }, { "cell_type": "markdown", "id": "ad5922c7-3d62-452f-b5d8-3480196d6959", "metadata": {}, "source": [ "Using a [named aggregation](https://pandas.pydata.org/docs/user_guide/groupby.html#named-aggregation) we can compute a summary per senor and plot it on a map. Hover over the markers to see the average Calibrated PM 25 per sensor." ] }, { "cell_type": "code", "execution_count": 7, "id": "0746de9e-08c6-4cfa-88d0-9e13a4cd5f25", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "average_pm25 = geopandas.GeoDataFrame(\n", " gdf.groupby(\"LocationName\").agg(\n", " mean_pm25=(\"CalibratedPM25\", \"mean\"), geometry=(\"geometry\", \"first\")\n", " ),\n", " crs=\"epsg:4326\",\n", ")\n", "average_pm25.explore(\n", " marker_kwds=dict(radius=10),\n", ")" ] }, { "cell_type": "markdown", "id": "16d76ed4-b780-421b-9df4-d539923e3c24", "metadata": {}, "source": [ "### Reading the full dataset\n", "\n", "The STAC collection includes a `data` asset, which links to the root of the parquet dataset. This can be used to read all of the data across time. We'll use [Dask](https://docs.dask.org/) to read in the dataset." ] }, { "cell_type": "code", "execution_count": 8, "id": "e58172d6-3a61-476d-9447-b68357c619cb", "metadata": {}, "outputs": [], "source": [ "eclipse = catalog.get_collection(\"eclipse\")\n", "asset = planetary_computer.sign(eclipse.assets[\"data\"])" ] }, { "cell_type": "code", "execution_count": 9, "id": "14c6e5b0-1eee-49ae-9f76-1324f02f5b5c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Dask DataFrame Structure:
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CityDeviceIdLocationNameLatitudeLongitudeReadingDateTimeUTCPM25CalibratedPM25CalibratedO3CalibratedNO2COTemperatureHumidityBatteryLevelPercentBatteryCellSignal
npartitions=89
stringint32stringfloat64float64datetime64[ns]float64float64float64float64float64float64float64float64float64float64
................................................
...................................................
................................................
................................................
\n", "
Dask Name: readparquetfsspec, 1 expression
" ], "text/plain": [ "Dask DataFrame Structure:\n", " City DeviceId LocationName Latitude Longitude ReadingDateTimeUTC PM25 CalibratedPM25 CalibratedO3 CalibratedNO2 CO Temperature Humidity BatteryLevel PercentBattery CellSignal\n", "npartitions=89 \n", " string int32 string float64 float64 datetime64[ns] float64 float64 float64 float64 float64 float64 float64 float64 float64 float64\n", " ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...\n", "... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...\n", " ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...\n", " ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...\n", "Dask Name: readparquetfsspec, 1 expression\n", "Expr=ReadParquetFSSpec(2a74956)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import adlfs\n", "import dask.dataframe as dd\n", "\n", "fs = adlfs.AzureBlobFileSystem(**asset.extra_fields[\"table:storage_options\"])\n", "files = [f\"az://{x}\" for x in fs.ls(asset.href)]\n", "\n", "ddf = dd.read_parquet(\n", " files, storage_options=asset.extra_fields[\"table:storage_options\"]\n", ")\n", "ddf" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.8" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }