{ "cells": [ { "cell_type": "markdown", "id": "0f09e1c5", "metadata": {}, "source": [ "# Speedtest Data from Ookla\n", "\n", "This example will use data collected from Ookla's Speed Test application and [shared publicly in the AWS Open Data Registry](https://registry.opendata.aws/speedtest-global-performance/). From the AWS page:\n", "\n", "> Global fixed broadband and mobile (cellular) network performance, allocated to zoom level 16 web mercator tiles (approximately 610.8 meters by 610.8 meters at the equator). Data is provided in both Shapefile format as well as Apache Parquet with geometries represented in Well Known Text (WKT) projected in EPSG:4326. Download speed, upload speed, and latency are collected via the Speedtest by Ookla applications for Android and iOS and averaged for each tile." ] }, { "cell_type": "markdown", "id": "7dd1dfef-3756-49d9-9480-9a4cdba22345", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 1, "id": "d1678764", "metadata": {}, "outputs": [], "source": [ "import geopandas as gpd\n", "import numpy as np\n", "import pandas as pd\n", "import shapely\n", "from palettable.colorbrewer.diverging import BrBG_10\n", "\n", "from lonboard import ScatterplotLayer, viz\n", "from lonboard.colormap import apply_continuous_cmap" ] }, { "cell_type": "markdown", "id": "c747d8b9-94b9-421a-967a-8350bf72de9a", "metadata": {}, "source": [ "The URL for a single data file for mobile network speeds in the first quarter of 2019." ] }, { "cell_type": "code", "execution_count": 2, "id": "34ac8eae", "metadata": {}, "outputs": [], "source": [ "url = \"https://ookla-open-data.s3.us-west-2.amazonaws.com/parquet/performance/type=mobile/year=2019/quarter=1/2019-01-01_performance_mobile_tiles.parquet\"" ] }, { "cell_type": "markdown", "id": "5991ef2c-5db0-4110-b6a1-b33fcbddad0d", "metadata": {}, "source": [ "We can fetch two columns from this data file directly from AWS. This `pd.read_parquet` command will perform a network request for the data file, so it may take a while on a slow network connection." ] }, { "cell_type": "code", "execution_count": 3, "id": "8e7bc021", "metadata": {}, "outputs": [], "source": [ "# avg_d_kbps is the average download speed for that data point in kilobits per second\n", "# tile is the WKT string representing a given zoom-16 Web Mercator tile\n", "columns = [\"avg_d_kbps\", \"tile\"]" ] }, { "cell_type": "code", "execution_count": 4, "id": "f0be9f56", "metadata": {}, "outputs": [], "source": [ "df = pd.read_parquet(url, columns=columns)" ] }, { "cell_type": "markdown", "id": "5852aa94-2d18-4a1b-b379-be19682d57eb", "metadata": {}, "source": [ "We can take a quick look at this data:" ] }, { "cell_type": "code", "execution_count": 5, "id": "4b27e9a4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
avg_d_kbpstile
05983POLYGON((-160.021362304688 70.6381267305321, -...
13748POLYGON((-160.043334960938 70.6344840663086, -...
23364POLYGON((-160.043334960938 70.6326624870732, -...
32381POLYGON((-160.037841796875 70.6344840663086, -...
43047POLYGON((-160.037841796875 70.6326624870732, -...
\n", "
" ], "text/plain": [ " avg_d_kbps tile\n", "0 5983 POLYGON((-160.021362304688 70.6381267305321, -...\n", "1 3748 POLYGON((-160.043334960938 70.6344840663086, -...\n", "2 3364 POLYGON((-160.043334960938 70.6326624870732, -...\n", "3 2381 POLYGON((-160.037841796875 70.6344840663086, -...\n", "4 3047 POLYGON((-160.037841796875 70.6326624870732, -..." ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "id": "3872dac0-2774-4695-a531-744683f18ee7", "metadata": {}, "source": [ "The `tile` column contains _strings_ representing geometries. We need to parse those strings into geometries. Then for simplicity we'll convert into their centroids." ] }, { "cell_type": "code", "execution_count": 6, "id": "93ceca1d", "metadata": {}, "outputs": [], "source": [ "tile_geometries = shapely.from_wkt(df[\"tile\"])\n", "tile_centroids = shapely.centroid(tile_geometries)" ] }, { "cell_type": "markdown", "id": "808db614-df1f-41a9-8074-a79e2db7e2e3", "metadata": {}, "source": [ "Now we can create a geopandas GeoDataFrame from the download speed and the shapely geometries." ] }, { "cell_type": "code", "execution_count": 7, "id": "e67cf569", "metadata": {}, "outputs": [], "source": [ "gdf = gpd.GeoDataFrame(df[[\"avg_d_kbps\"]], geometry=tile_centroids)" ] }, { "cell_type": "markdown", "id": "65436a4a-c498-4f40-ba79-1082062376bf", "metadata": {}, "source": [ "To ensure that this demo is snappy on most computers, we'll filter to a bounding box over Europe." ] }, { "cell_type": "code", "execution_count": 8, "id": "80326895-70ba-4f4b-a7b3-106b4bbd36d9", "metadata": {}, "outputs": [], "source": [ "gdf = gdf.cx[-11.83:25.5, 34.9:59]" ] }, { "cell_type": "markdown", "id": "3cc2215e-7706-4ab3-b674-3de4ca41899c", "metadata": {}, "source": [ "Even this filtered data frame still has 800,000 rows, so it's still a lot of data to explore:" ] }, { "cell_type": "code", "execution_count": 9, "id": "a34a6a27-0259-4da9-94c4-923466da05fb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
avg_d_kbpsgeometry
38342913570POINT (-2.94159 58.99673)
38343018108POINT (-3.29865 58.96276)
3834315569POINT (-3.29315 58.97125)
3834329349POINT (-3.29315 58.96842)
38343311216POINT (-3.23273 58.98541)
.........
1840929104723POINT (25.38666 35.09519)
184093062540POINT (25.44708 35.04124)
184093188068POINT (25.47455 35.04124)
184093838255POINT (25.38116 35.00075)
184093991750POINT (25.45807 34.99175)
\n", "

807221 rows × 2 columns

\n", "
" ], "text/plain": [ " avg_d_kbps geometry\n", "383429 13570 POINT (-2.94159 58.99673)\n", "383430 18108 POINT (-3.29865 58.96276)\n", "383431 5569 POINT (-3.29315 58.97125)\n", "383432 9349 POINT (-3.29315 58.96842)\n", "383433 11216 POINT (-3.23273 58.98541)\n", "... ... ...\n", "1840929 104723 POINT (25.38666 35.09519)\n", "1840930 62540 POINT (25.44708 35.04124)\n", "1840931 88068 POINT (25.47455 35.04124)\n", "1840938 38255 POINT (25.38116 35.00075)\n", "1840939 91750 POINT (25.45807 34.99175)\n", "\n", "[807221 rows x 2 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdf" ] }, { "cell_type": "markdown", "id": "81a61ec4-2a09-40c0-aa92-7dca570bbd49", "metadata": {}, "source": [ "The simplest way to get data on the map is to use `viz`:" ] }, { "cell_type": "code", "execution_count": 10, "id": "326f5f4e-a8f7-425b-8e9f-6744ba65cf62", "metadata": { "scrolled": true }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "cd6f767934bd451bb8f89088039e7af5", "version_major": 2, "version_minor": 0 }, "text/plain": [ "ScatterplotLayer(table=pyarrow.Table\n", "__index_level_0__: int64\n", "geometry: fixed_size_list[2]\n", " chi…" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "map_ = viz(gdf.geometry)\n", "map_" ] }, { "cell_type": "markdown", "id": "91af2a88-9363-4c00-8e54-f06a635ff991", "metadata": {}, "source": [ "This map object is a [`ScatterplotLayer`](https://developmentseed.org/lonboard/api/layers/scatterplot-layer/) type. You could have created the same map by using\n", "\n", "```py\n", "map_ = lonboard.ScatterplotLayer.from_geopandas(gdf)\n", "```" ] }, { "cell_type": "code", "execution_count": 11, "id": "7d0f6f03-ca11-4842-8e53-4a619ad7d3dd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "lonboard.layer.ScatterplotLayer" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(map_)" ] }, { "cell_type": "markdown", "id": "6f4d89c3-282a-4beb-9f35-68eb9645e8c0", "metadata": {}, "source": [ "We can look at the [documentation for `ScatterplotLayer`](https://developmentseed.org/lonboard/api/layers/scatterplot-layer/) to see what other rendering options it allows. Let's set the fill color to something other than black:" ] }, { "cell_type": "code", "execution_count": 12, "id": "3912b241-577f-4ac3-b78f-2702e89d6010", "metadata": {}, "outputs": [], "source": [ "map_.get_fill_color = [0, 0, 200, 200]" ] }, { "cell_type": "markdown", "id": "b66a73b6-69c1-4661-a7ad-2dd3941c8753", "metadata": {}, "source": [ "Blue is pretty, but the map would be more informative if we colored each point by a relevant characteristic. In this case, we have the download speed associated with each location, so let's use that!" ] }, { "cell_type": "markdown", "id": "ce630455-3e19-47f1-bb69-81fdcd99b126", "metadata": {}, "source": [ "Here we compute a linear statistic for the download speed. Given a minimum bound of `5000` and a maximum bound of `50,000` the normalized speed is linearly scaled to between 0 and 1." ] }, { "cell_type": "code", "execution_count": 13, "id": "179071b3", "metadata": {}, "outputs": [], "source": [ "min_bound = 5000\n", "max_bound = 50000\n", "download_speed = gdf['avg_d_kbps']\n", "normalized_download_speed = (download_speed - min_bound) / (max_bound - min_bound)" ] }, { "cell_type": "markdown", "id": "7f678b8e-ad23-4ebd-842c-c5158e1ec741", "metadata": {}, "source": [ "`normalized_download_speed` is now linearly scaled based on the bounds provided above. Keep in mind that the **input range of the colormap is 0-1**. So any values that are below 0 will receive the left-most color in the colormap, while any values above 1 will receive the right-most color in the colormap." ] }, { "cell_type": "code", "execution_count": 14, "id": "a8df3963-2bc2-4f89-8a38-20e232a13932", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "383429 0.190444\n", "383430 0.291289\n", "383431 0.012644\n", "383432 0.096644\n", "383433 0.138133\n", " ... \n", "1840929 2.216067\n", "1840930 1.278667\n", "1840931 1.845956\n", "1840938 0.739000\n", "1840939 1.927778\n", "Name: avg_d_kbps, Length: 807221, dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "normalized_download_speed" ] }, { "cell_type": "markdown", "id": "dc0e388f-eeed-4339-bd54-0491f79f45aa", "metadata": {}, "source": [ "We can use any colormap provided by the [`palettable`](https://github.com/jiffyclub/palettable) package. Let's inspect the `BrBG_10` diverging colormap below:" ] }, { "cell_type": "code", "execution_count": 15, "id": "9d5347e2-84c7-40bc-af45-c8638188709e", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAgAAAABACAYAAABsv8+/AAAAE3RFWHRUaXRsZQBCckJHIGNvbG9ybWFwMTXIUAAAABl0RVh0RGVzY3JpcHRpb24AQnJCRyBjb2xvcm1hcLqHWMgAAAAwdEVYdEF1dGhvcgBNYXRwbG90bGliIHYzLjcuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZ7rJ3hAAAAAydEVYdFNvZnR3YXJlAE1hdHBsb3RsaWIgdjMuNy4zLCBodHRwczovL21hdHBsb3RsaWIub3JnlG9BNwAAAilJREFUeJzt1k1u2zAURlGSQffTxXT/O7GYgSUHejahBPGgwHfOROWPKCYBitv//f0zW2ut99Zaa21cPu//GOPd+/t5/O7949h/39iP8WP+GI/TfO9l3yj79vXn88bL9/riO/Ueva4v7tHLd/rHx/7+6+d4rI/z+uq9sn8s18u5i/PG4tyrexz3bsffr5f3HvP7vnEer/a3Xu6x2N/K+V/7X8+3i++u7jn3v/Ns9+fWxv6s8+f15/lvnvPj7/32fne3Oe/j/XmbWxkf69ti/2/XV9970/q2nZ5bGd+2uZg/9pdzZj1nlv3fO/9p3/zhPco5x88763xZr/NzMb967+n8i/eX56/us1hv+++97b+Hr/GxPi/G/+n7+779fysAIIkAAIBAAgAAAgkAAAgkAAAgkAAAgEACAAACCQAACCQAACCQAACAQAIAAAIJAAAIJAAAIJAAAIBAAgAAAgkAAAgkAAAgkAAAgEACAAACCQAACCQAACCQAACAQAIAAAIJAAAIJAAAIJAAAIBAAgAAAgkAAAgkAAAgkAAAgEACAAACCQAACCQAACCQAACAQAIAAAIJAAAIJAAAIJAAAIBAAgAAAgkAAAgkAAAgkAAAgEACAAACCQAACCQAACCQAACAQAIAAAIJAAAIJAAAIJAAAIBAAgAAAgkAAAgkAAAgkAAAgEACAAACCQAACCQAACCQAACAQAIAAAIJAAAIJAAAIJAAAIBAAgAAAn0CyK5V+uT9Ti4AAAAASUVORK5CYII=", "text/html": [ "
BrBG
\"BrBG
under
bad
over
" ], "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "BrBG_10.mpl_colormap" ] }, { "cell_type": "markdown", "id": "8eb59fd6-3b7d-4db9-ad9b-a66e4e58152f", "metadata": {}, "source": [ "Now let's apply the colormap on `normalized_download_speed` using a helper provided by `lonboard`. We can set it on `layer.get_fill_color` to update the existing colors." ] }, { "cell_type": "code", "execution_count": 16, "id": "5a77f728-9cbe-4372-9bfd-d6dee4b93a01", "metadata": {}, "outputs": [], "source": [ "map_.get_fill_color = apply_continuous_cmap(normalized_download_speed, BrBG_10)" ] }, { "cell_type": "markdown", "id": "44611c75-f9e6-4f53-af7d-c640d641dc15", "metadata": {}, "source": [ "After running the above cell, you should see the map above update with a different color per point!" ] }, { "cell_type": "markdown", "id": "0bd168fe-53a1-4313-9c11-720edbfa5c6c", "metadata": {}, "source": [ "We can pass an array into any of the \"accessors\" supported by the layer (this is any attribute that starts with `get_*`).\n", "\n", "For demonstration purposes, let's also set `get_radius` to `normalized_download_speed`." ] }, { "cell_type": "code", "execution_count": 17, "id": "579233ef-e077-4c8f-a111-f33d44f30a0d", "metadata": {}, "outputs": [], "source": [ "# for now, cast to a numpy array until the layer is updated to support pandas series\n", "map_.get_radius = np.array(normalized_download_speed) * 200\n", "map_.radius_units = \"meters\"\n", "map_.radius_min_pixels = 0.5" ] }, { "cell_type": "markdown", "id": "06e2338e-8461-49f7-9884-475fffc64789", "metadata": {}, "source": [ "After running the above cell, you should see the map updated to have a different radius per point!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }