{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "A6iFUUQLNDlE" }, "source": [ "\n", " \n", " \n", " \n", "
\n", " Run in Google Colab\n", " \n", " View on Github\n", " \n", " View raw on Github\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "-DARAON8qoJR" }, "source": [ "# Module 12: Maps\n", "\n", "Let's draw some maps. 🗺🧐" ] }, { "cell_type": "markdown", "metadata": { "id": "d9E9W5lXqoJa" }, "source": [ "## A dotmap with Altair\n", "\n", "Let's start with altair. \n", "\n", "When your dataset is large, it is nice to enable something called \"json data transformer\" in altair. What it does is, instead of generating and holding the whole dataset in the memory, to transform the dataset and save into a temporary file. This makes the whole plotting process much more efficient. For more information, check out: https://altair-viz.github.io/user_guide/data_transformers.html" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 1171, "status": "ok", "timestamp": 1690232847801, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "2ycXU2IbqoJc", "outputId": "3f87e857-d4d1-43b4-cb40-a4099c2c2714" }, "outputs": [ { "data": { "text/plain": [ "DataTransformerRegistry.enable('json')" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import altair as alt\n", "\n", "# saving data into a file rather than embedding into the chart\n", "alt.data_transformers.enable('json')" ] }, { "cell_type": "markdown", "metadata": { "id": "Vz-TNR-IqoJg" }, "source": [ "We need a dataset with geographical coordinates. This `zipcodes` dataset contains the location and zipcode of each zip code area." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 2397, "status": "ok", "timestamp": 1690232854409, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "qYOeI0eLqoJh", "outputId": "5547c589-e6fd-44f1-95d7-9cc955e209c2" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
zip_codelatitudelongitudecitystatecounty
00050140.922326-72.637078HoltsvilleNYSuffolk
10054440.922326-72.637078HoltsvilleNYSuffolk
20060118.165273-66.722583AdjuntasPRAdjuntas
30060218.393103-67.180953AguadaPRAguada
40060318.455913-67.145780AguadillaPRAguadilla
\n", "
" ], "text/plain": [ " zip_code latitude longitude city state county\n", "0 00501 40.922326 -72.637078 Holtsville NY Suffolk\n", "1 00544 40.922326 -72.637078 Holtsville NY Suffolk\n", "2 00601 18.165273 -66.722583 Adjuntas PR Adjuntas\n", "3 00602 18.393103 -67.180953 Aguada PR Aguada\n", "4 00603 18.455913 -67.145780 Aguadilla PR Aguadilla" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from vega_datasets import data\n", "\n", "zipcodes_url = data.zipcodes.url\n", "zipcodes = data.zipcodes()\n", "zipcodes.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 1517, "status": "ok", "timestamp": 1690232857459, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "EyzYgquoqoJj", "outputId": "388048a3-1be1-40dc-9f37-8d17cd398c12" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
zip_codelatitudelongitudecitystatecounty
00050140.922326-72.637078HoltsvilleNYSuffolk
10054440.922326-72.637078HoltsvilleNYSuffolk
20060118.165273-66.722583AdjuntasPRAdjuntas
30060218.393103-67.180953AguadaPRAguada
40060318.455913-67.145780AguadillaPRAguadilla
\n", "
" ], "text/plain": [ " zip_code latitude longitude city state county\n", "0 00501 40.922326 -72.637078 Holtsville NY Suffolk\n", "1 00544 40.922326 -72.637078 Holtsville NY Suffolk\n", "2 00601 18.165273 -66.722583 Adjuntas PR Adjuntas\n", "3 00602 18.393103 -67.180953 Aguada PR Aguada\n", "4 00603 18.455913 -67.145780 Aguadilla PR Aguadilla" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zipcodes = data.zipcodes(dtype={'zip_code': 'category'})\n", "zipcodes.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3, "status": "ok", "timestamp": 1690232857814, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "VbSCHJVbqoJk", "outputId": "7f2a736f-11ad-46d9-b016-d76c000d01be" }, "outputs": [ { "data": { "text/plain": [ "CategoricalDtype(categories=['00501', '00544', '00601', '00602', '00603', '00604',\n", " '00605', '00606', '00610', '00611',\n", " ...\n", " '99919', '99921', '99922', '99923', '99925', '99926',\n", " '99927', '99928', '99929', '99950'],\n", ", ordered=False, categories_dtype=object)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zipcodes.zip_code.dtype" ] }, { "cell_type": "markdown", "metadata": { "id": "JS8nSE1YqoJn" }, "source": [ "Btw, you'll have fewer issues if you pass URL instead of a dataframe to `alt.Chart`." ] }, { "cell_type": "markdown", "metadata": { "id": "TPj8GsfMqoJo" }, "source": [ "### Let's draw it\n", "\n", "Now we have the dataset loaded and start drawing some plots. Let's say you don't know anything about map projections. What would you try with geographical data? Probably the simplest way is considering (longitude, latitude) as a Cartesian coordinate and directly plot them." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 368 }, "executionInfo": { "elapsed": 306, "status": "ok", "timestamp": 1690232859294, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "ldFAB0p_qoJq", "outputId": "30fed316-e3f1-4002-90e1-26974811ccc3" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(zipcodes_url).mark_circle().encode(\n", " x='longitude:Q',\n", " y='latitude:Q',\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "V4UTA0L3qoJr" }, "source": [ "Actually this itself is a map projection called [Equirectangular projection](https://en.wikipedia.org/wiki/Equirectangular_projection). This projection (or almost a *non-projection*) is super straight-forward and doesn't require any processing of the data. So, often it is used to just quickly explore geographical data. As you dig deeper, you still want to think about which map projection fits your need best. Don't just use equirectangular projection without any thoughts!\n", "\n", "Anyway, let's make it look slighly better by reducing the size of the circles and adjusting the aspect ratio. **Q: Can you adjust the width and height?**" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 268 }, "executionInfo": { "elapsed": 307, "status": "ok", "timestamp": 1690232868969, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "CEMhFuGIqoJt", "outputId": "4eebb7f5-d423-47fb-a661-df0a300ec9f9" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "kd9NuhLlqoJu" }, "source": [ "But, a much better way to do this is explicitly specifying that they are lat, lng coordinates by using `longitude=` and `latitude=`, rather than `x=` and `y=`. If you do that, altair automatically adjust the aspect ratio. **Q: Can you try it?**" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 331 }, "executionInfo": { "elapsed": 308, "status": "ok", "timestamp": 1690232870219, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "qqXF2OQrqoJv", "outputId": "bd2f7d3d-fff2-4162-a037-848c464e40e6" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "QLhCkER6qoJx" }, "source": [ "Because the [American empire is far-reaching and complicated](https://www.youtube.com/watch?v=ASSOQDQvVLU), the information density of this map is very low (although interesting). Moreover, the US looks twisted because a default projection that is not focused on the US is used. \n", "\n", "A common projection for visualizing US data is [AlbersUSA](https://bl.ocks.org/mbostock/5545680), which uses [Albers (equal-area) projection](https://en.wikipedia.org/wiki/Albers_projection). This is a standard projection used in United States Geological Survey and the United States Census Bureau. Albers USA contains a composition of US main land, Alaska, and Hawaii.\n", "\n", "To use it, we call `project` method and specify which variables are `longitude` and `latitude`.\n", "\n", "**Q: use the `project` method to draw the map in the AlbersUsa projection.**" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 433 }, "executionInfo": { "elapsed": 10, "status": "ok", "timestamp": 1690232871853, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "XqIx6087qoJx", "outputId": "371fdda2-849f-4f52-a365-c99c7a7fedbb" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "HgFp8NocqoJy" }, "source": [ "Now we're talking. 😎\n", "\n", "Let's visualize the large-scale zipcode patterns. We can use the fact that the zipcodes are hierarchically organized. That is, the first digit captures the largest area divisions and the other digits are about smaller geographical divisions.\n", "\n", "Altair provides some data transformation functionalities. One of them is extracting a substring from a variable." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 433 }, "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1690232874556, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "wjHx0a39qoJz", "outputId": "3c2c2fa8-1c46-4f1a-d70c-013766e59f2b" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from altair.expr import datum, substring\n", "\n", "alt.Chart(zipcodes_url).mark_circle(size=2).transform_calculate(\n", " 'first_digit', substring(datum.zip_code, 0, 1)\n", ").encode(\n", " longitude='longitude:Q',\n", " latitude='latitude:Q',\n", " color='first_digit:N',\n", ").project(\n", " type='albersUsa'\n", ").properties(\n", " width=700,\n", " height=400,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "QeHehou1qoJ0" }, "source": [ "For each row (`datum`), you obtain the `zip_code` variable and get the substring (imagine Python list slicing), and then you call the result `first_digit`. Now, you can use this `first_digit` variable to color the circles. Also note that we specify `first_digit` as a *nominal* variable, not quantitative, to obtain a categorical colormap. But we can also play with it too.\n", "\n", "**Q: Why don't you extract the first two digits, name it as `two_digits`, and declare that as a quantitative variable? Any interesting patterns? What does it tell us about the history of US?**" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 431, "resources": { "http://localhost:8080/altair-data-420f3d354b7ce0e0779d651ffd5241c4.json": { "data": "CjwhRE9DVFlQRSBodG1sPgo8aHRtbCBsYW5nPWVuPgogIDxtZXRhIGNoYXJzZXQ9dXRmLTg+CiAgPG1ldGEgbmFtZT12aWV3cG9ydCBjb250ZW50PSJpbml0aWFsLXNjYWxlPTEsIG1pbmltdW0tc2NhbGU9MSwgd2lkdGg9ZGV2aWNlLXdpZHRoIj4KICA8dGl0bGU+RXJyb3IgNDA0IChOb3QgRm91bmQpISExPC90aXRsZT4KICA8c3R5bGU+CiAgICAqe21hcmdpbjowO3BhZGRpbmc6MH1odG1sLGNvZGV7Zm9udDoxNXB4LzIycHggYXJpYWwsc2Fucy1zZXJpZn1odG1se2JhY2tncm91bmQ6I2ZmZjtjb2xvcjojMjIyO3BhZGRpbmc6MTVweH1ib2R5e21hcmdpbjo3JSBhdXRvIDA7bWF4LXdpZHRoOjM5MHB4O21pbi1oZWlnaHQ6MTgwcHg7cGFkZGluZzozMHB4IDAgMTVweH0qID4gYm9keXtiYWNrZ3JvdW5kOnVybCgvL3d3dy5nb29nbGUuY29tL2ltYWdlcy9lcnJvcnMvcm9ib3QucG5nKSAxMDAlIDVweCBuby1yZXBlYXQ7cGFkZGluZy1yaWdodDoyMDVweH1we21hcmdpbjoxMXB4IDAgMjJweDtvdmVyZmxvdzpoaWRkZW59aW5ze2NvbG9yOiM3Nzc7dGV4dC1kZWNvcmF0aW9uOm5vbmV9YSBpbWd7Ym9yZGVyOjB9QG1lZGlhIHNjcmVlbiBhbmQgKG1heC13aWR0aDo3NzJweCl7Ym9keXtiYWNrZ3JvdW5kOm5vbmU7bWFyZ2luLXRvcDowO21heC13aWR0aDpub25lO3BhZGRpbmctcmlnaHQ6MH19I2xvZ297YmFja2dyb3VuZDp1cmwoLy93d3cuZ29vZ2xlLmNvbS9pbWFnZXMvbG9nb3MvZXJyb3JwYWdlL2Vycm9yX2xvZ28tMTUweDU0LnBuZykgbm8tcmVwZWF0O21hcmdpbi1sZWZ0Oi01cHh9QG1lZGlhIG9ubHkgc2NyZWVuIGFuZCAobWluLXJlc29sdXRpb246MTkyZHBpKXsjbG9nb3tiYWNrZ3JvdW5kOnVybCgvL3d3dy5nb29nbGUuY29tL2ltYWdlcy9sb2dvcy9lcnJvcnBhZ2UvZXJyb3JfbG9nby0xNTB4NTQtMngucG5nKSBuby1yZXBlYXQgMCUgMCUvMTAwJSAxMDAlOy1tb3otYm9yZGVyLWltYWdlOnVybCgvL3d3dy5nb29nbGUuY29tL2ltYWdlcy9sb2dvcy9lcnJvcnBhZ2UvZXJyb3JfbG9nby0xNTB4NTQtMngucG5nKSAwfX1AbWVkaWEgb25seSBzY3JlZW4gYW5kICgtd2Via2l0LW1pbi1kZXZpY2UtcGl4ZWwtcmF0aW86Mil7I2xvZ297YmFja2dyb3VuZDp1cmwoLy93d3cuZ29vZ2xlLmNvbS9pbWFnZXMvbG9nb3MvZXJyb3JwYWdlL2Vycm9yX2xvZ28tMTUweDU0LTJ4LnBuZykgbm8tcmVwZWF0Oy13ZWJraXQtYmFja2dyb3VuZC1zaXplOjEwMCUgMTAwJX19I2xvZ297ZGlzcGxheTppbmxpbmUtYmxvY2s7aGVpZ2h0OjU0cHg7d2lkdGg6MTUwcHh9CiAgPC9zdHlsZT4KICA8YSBocmVmPS8vd3d3Lmdvb2dsZS5jb20vPjxzcGFuIGlkPWxvZ28gYXJpYS1sYWJlbD1Hb29nbGU+PC9zcGFuPjwvYT4KICA8cD48Yj40MDQuPC9iPiA8aW5zPlRoYXTigJlzIGFuIGVycm9yLjwvaW5zPgogIDxwPiAgPGlucz5UaGF04oCZcyBhbGwgd2Uga25vdy48L2lucz4K", "headers": [ [ "content-length", "1449" ], [ "content-type", "text/html; charset=utf-8" ] ], "ok": false, "status": 404, "status_text": "" } } }, "executionInfo": { "elapsed": 368, "status": "ok", "timestamp": 1690232878782, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "X-FGT_ghqoJ1", "outputId": "46e6e02c-5cfc-448f-efe7-a5b887670cb9" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "ju0PurTgqoJ2" }, "source": [ "**Q: also try it with declaring the first two digits as a categorical variable**" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "executionInfo": { "elapsed": 343, "status": "ok", "timestamp": 1690232880830, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "IJD4cLCHrjhp", "outputId": "9fdc4eed-4016-44fe-c947-fcf3a35590ff" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
zip_codelatitudelongitudecitystatecounty
00050140.922326-72.637078HoltsvilleNYSuffolk
10054440.922326-72.637078HoltsvilleNYSuffolk
20060118.165273-66.722583AdjuntasPRAdjuntas
30060218.393103-67.180953AguadaPRAguada
40060318.455913-67.145780AguadillaPRAguadilla
.....................
420449992655.094325-131.566827MetlakatlaAKPrince Wales Ketchikan
420459992755.517921-132.003244Point BakerAKPrince Wales Ketchikan
420469992855.395359-131.675370Ward CoveAKKetchikan Gateway
420479992956.449893-132.364407WrangellAKWrangell Petersburg
420489995055.542007-131.432682KetchikanAKKetchikan Gateway
\n", "

42049 rows × 6 columns

\n", "
" ], "text/plain": [ " zip_code latitude longitude city state \\\n", "0 00501 40.922326 -72.637078 Holtsville NY \n", "1 00544 40.922326 -72.637078 Holtsville NY \n", "2 00601 18.165273 -66.722583 Adjuntas PR \n", "3 00602 18.393103 -67.180953 Aguada PR \n", "4 00603 18.455913 -67.145780 Aguadilla PR \n", "... ... ... ... ... ... \n", "42044 99926 55.094325 -131.566827 Metlakatla AK \n", "42045 99927 55.517921 -132.003244 Point Baker AK \n", "42046 99928 55.395359 -131.675370 Ward Cove AK \n", "42047 99929 56.449893 -132.364407 Wrangell AK \n", "42048 99950 55.542007 -131.432682 Ketchikan AK \n", "\n", " county \n", "0 Suffolk \n", "1 Suffolk \n", "2 Adjuntas \n", "3 Aguada \n", "4 Aguadilla \n", "... ... \n", "42044 Prince Wales Ketchikan \n", "42045 Prince Wales Ketchikan \n", "42046 Ketchikan Gateway \n", "42047 Wrangell Petersburg \n", "42048 Ketchikan Gateway \n", "\n", "[42049 rows x 6 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zipcodes" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 431, "resources": { "http://localhost:8080/altair-data-420f3d354b7ce0e0779d651ffd5241c4.json": { "data": "CjwhRE9DVFlQRSBodG1sPgo8aHRtbCBsYW5nPWVuPgogIDxtZXRhIGNoYXJzZXQ9dXRmLTg+CiAgPG1ldGEgbmFtZT12aWV3cG9ydCBjb250ZW50PSJpbml0aWFsLXNjYWxlPTEsIG1pbmltdW0tc2NhbGU9MSwgd2lkdGg9ZGV2aWNlLXdpZHRoIj4KICA8dGl0bGU+RXJyb3IgNDA0IChOb3QgRm91bmQpISExPC90aXRsZT4KICA8c3R5bGU+CiAgICAqe21hcmdpbjowO3BhZGRpbmc6MH1odG1sLGNvZGV7Zm9udDoxNXB4LzIycHggYXJpYWwsc2Fucy1zZXJpZn1odG1se2JhY2tncm91bmQ6I2ZmZjtjb2xvcjojMjIyO3BhZGRpbmc6MTVweH1ib2R5e21hcmdpbjo3JSBhdXRvIDA7bWF4LXdpZHRoOjM5MHB4O21pbi1oZWlnaHQ6MTgwcHg7cGFkZGluZzozMHB4IDAgMTVweH0qID4gYm9keXtiYWNrZ3JvdW5kOnVybCgvL3d3dy5nb29nbGUuY29tL2ltYWdlcy9lcnJvcnMvcm9ib3QucG5nKSAxMDAlIDVweCBuby1yZXBlYXQ7cGFkZGluZy1yaWdodDoyMDVweH1we21hcmdpbjoxMXB4IDAgMjJweDtvdmVyZmxvdzpoaWRkZW59aW5ze2NvbG9yOiM3Nzc7dGV4dC1kZWNvcmF0aW9uOm5vbmV9YSBpbWd7Ym9yZGVyOjB9QG1lZGlhIHNjcmVlbiBhbmQgKG1heC13aWR0aDo3NzJweCl7Ym9keXtiYWNrZ3JvdW5kOm5vbmU7bWFyZ2luLXRvcDowO21heC13aWR0aDpub25lO3BhZGRpbmctcmlnaHQ6MH19I2xvZ297YmFja2dyb3VuZDp1cmwoLy93d3cuZ29vZ2xlLmNvbS9pbWFnZXMvbG9nb3MvZXJyb3JwYWdlL2Vycm9yX2xvZ28tMTUweDU0LnBuZykgbm8tcmVwZWF0O21hcmdpbi1sZWZ0Oi01cHh9QG1lZGlhIG9ubHkgc2NyZWVuIGFuZCAobWluLXJlc29sdXRpb246MTkyZHBpKXsjbG9nb3tiYWNrZ3JvdW5kOnVybCgvL3d3dy5nb29nbGUuY29tL2ltYWdlcy9sb2dvcy9lcnJvcnBhZ2UvZXJyb3JfbG9nby0xNTB4NTQtMngucG5nKSBuby1yZXBlYXQgMCUgMCUvMTAwJSAxMDAlOy1tb3otYm9yZGVyLWltYWdlOnVybCgvL3d3dy5nb29nbGUuY29tL2ltYWdlcy9sb2dvcy9lcnJvcnBhZ2UvZXJyb3JfbG9nby0xNTB4NTQtMngucG5nKSAwfX1AbWVkaWEgb25seSBzY3JlZW4gYW5kICgtd2Via2l0LW1pbi1kZXZpY2UtcGl4ZWwtcmF0aW86Mil7I2xvZ297YmFja2dyb3VuZDp1cmwoLy93d3cuZ29vZ2xlLmNvbS9pbWFnZXMvbG9nb3MvZXJyb3JwYWdlL2Vycm9yX2xvZ28tMTUweDU0LTJ4LnBuZykgbm8tcmVwZWF0Oy13ZWJraXQtYmFja2dyb3VuZC1zaXplOjEwMCUgMTAwJX19I2xvZ297ZGlzcGxheTppbmxpbmUtYmxvY2s7aGVpZ2h0OjU0cHg7d2lkdGg6MTUwcHh9CiAgPC9zdHlsZT4KICA8YSBocmVmPS8vd3d3Lmdvb2dsZS5jb20vPjxzcGFuIGlkPWxvZ28gYXJpYS1sYWJlbD1Hb29nbGU+PC9zcGFuPjwvYT4KICA8cD48Yj40MDQuPC9iPiA8aW5zPlRoYXTigJlzIGFuIGVycm9yLjwvaW5zPgogIDxwPiAgPGlucz5UaGF04oCZcyBhbGwgd2Uga25vdy48L2lucz4K", "headers": [ [ "content-length", "1449" ], [ "content-type", "text/html; charset=utf-8" ] ], "ok": false, "status": 404, "status_text": "" } } }, "executionInfo": { "elapsed": 677, "status": "ok", "timestamp": 1690232881911, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "1tiGYQJNqoJ2", "outputId": "c45b34f1-de72-40f0-e802-2d04991a89bf" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Implement\n", "\n", "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "SU4a0WzJqoJ3" }, "source": [ "Btw, you can always click \"view source\" or \"open in Vega Editor\" to look at the json object that **defines** this visualization. You can embed this json object on your webpage and easily put up an interactive visualization.\n", "\n", "**Q: Can you put a tooltip that displays the zipcode when you mouse-over?**" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 433 }, "executionInfo": { "elapsed": 9, "status": "ok", "timestamp": 1690232882918, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "lOv0DjHrqoJ4", "outputId": "4876bab4-c471-4b25-e21f-b066b5b590c5" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1690232883207, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "mre6TyO5t_rZ", "outputId": "559d4e52-c99c-40e4-fce1-83de8464e8d6" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
zip_codelatitudelongitudecitystatecounty
00050140.922326-72.637078HoltsvilleNYSuffolk
10054440.922326-72.637078HoltsvilleNYSuffolk
20060118.165273-66.722583AdjuntasPRAdjuntas
30060218.393103-67.180953AguadaPRAguada
40060318.455913-67.145780AguadillaPRAguadilla
.....................
420449992655.094325-131.566827MetlakatlaAKPrince Wales Ketchikan
420459992755.517921-132.003244Point BakerAKPrince Wales Ketchikan
420469992855.395359-131.675370Ward CoveAKKetchikan Gateway
420479992956.449893-132.364407WrangellAKWrangell Petersburg
420489995055.542007-131.432682KetchikanAKKetchikan Gateway
\n", "

42049 rows × 6 columns

\n", "
" ], "text/plain": [ " zip_code latitude longitude city state \\\n", "0 00501 40.922326 -72.637078 Holtsville NY \n", "1 00544 40.922326 -72.637078 Holtsville NY \n", "2 00601 18.165273 -66.722583 Adjuntas PR \n", "3 00602 18.393103 -67.180953 Aguada PR \n", "4 00603 18.455913 -67.145780 Aguadilla PR \n", "... ... ... ... ... ... \n", "42044 99926 55.094325 -131.566827 Metlakatla AK \n", "42045 99927 55.517921 -132.003244 Point Baker AK \n", "42046 99928 55.395359 -131.675370 Ward Cove AK \n", "42047 99929 56.449893 -132.364407 Wrangell AK \n", "42048 99950 55.542007 -131.432682 Ketchikan AK \n", "\n", " county \n", "0 Suffolk \n", "1 Suffolk \n", "2 Adjuntas \n", "3 Aguada \n", "4 Aguadilla \n", "... ... \n", "42044 Prince Wales Ketchikan \n", "42045 Prince Wales Ketchikan \n", "42046 Ketchikan Gateway \n", "42047 Wrangell Petersburg \n", "42048 Ketchikan Gateway \n", "\n", "[42049 rows x 6 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zipcodes" ] }, { "cell_type": "markdown", "metadata": { "id": "Ilc45qeDqoJ5" }, "source": [ "## Choropleth\n", "\n", "Let's try some choropleth now. Vega datasets have US county / state boundary data (`us_10m`) and world country boundary data (`world-110m`). You can take a look at the boundaries on GitHub (they renders topoJSON files):\n", "\n", "- https://github.com/vega/vega-datasets/blob/main/data/us-10m.json\n", "- https://github.com/vega/vega-datasets/blob/main/data/world-110m.json\n", "\n", "If you click \"Raw\" then you can take a look at the actual file, which is hard to read.\n", "\n", "Essentially, each file is a large dictionary with the following keys." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 370, "status": "ok", "timestamp": 1690232886065, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "4omuucvZqoJ5", "outputId": "54d45d9b-7eb7-44bf-8a6b-fbc28c840771" }, "outputs": [ { "data": { "text/plain": [ "dict_keys(['type', 'transform', 'objects', 'arcs'])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "usmap = data.us_10m()\n", "usmap.keys()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "executionInfo": { "elapsed": 317, "status": "ok", "timestamp": 1690232886368, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "Ns1AOlGLqoJ6", "outputId": "e142821d-b9f7-4b0a-8c82-ddffb9d1e399" }, "outputs": [ { "data": { "text/plain": [ "'Topology'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "usmap['type']" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 8, "status": "ok", "timestamp": 1690232887374, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "iiLUWi-eqoJ6", "outputId": "655d2324-e2d2-470a-f99b-83f13d291f68" }, "outputs": [ { "data": { "text/plain": [ "{'scale': [0.003589294092944858, 0.0005371535195261037],\n", " 'translate': [-179.1473400003406, 17.67439566600018]}" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "usmap['transform']" ] }, { "cell_type": "markdown", "metadata": { "id": "uU0dewNaqoJ7" }, "source": [ "This `transformation` is used to *quantize* the data and store the coordinates in integer (easier to store than float type numbers).\n", "\n", "https://github.com/topojson/topojson-specification#212-transforms" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 4, "status": "ok", "timestamp": 1690232887710, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "JkbKMqAfqoJ7", "outputId": "b93217f4-29f9-4291-b669-6b3f65e33500" }, "outputs": [ { "data": { "text/plain": [ "dict_keys(['counties', 'states', 'land'])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "usmap['objects'].keys()" ] }, { "cell_type": "markdown", "metadata": { "id": "xpDlwdPuqoJ8" }, "source": [ "This data contains not only county-level boundaries (objects) but also states and land boundaries." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 394, "status": "ok", "timestamp": 1690232937202, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "yUGjDhY5qoJ_", "outputId": "eaeb9c68-c1ed-4e9e-e1a3-92229c93f864" }, "outputs": [ { "data": { "text/plain": [ "('MultiPolygon', 'GeometryCollection', 'GeometryCollection')" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "usmap['objects']['land']['type'], usmap['objects']['states']['type'], usmap['objects']['counties']['type']" ] }, { "cell_type": "markdown", "metadata": { "id": "JYlcwe0iqoJ_" }, "source": [ "`land` is a multipolygon (one object) and `states` and `counties` contains many geometrics (multipolygons) because there are many states (counties). We can look at a state as a set of arcs that define it. It's `id` captures the identity of the state and is the key to link to other datasets." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3, "status": "ok", "timestamp": 1690232937472, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "osMJi24KqoKA", "outputId": "13195f5e-cfee-4291-d6aa-81fcad02ec26" }, "outputs": [ { "data": { "text/plain": [ "{'type': 'MultiPolygon',\n", " 'arcs': [[[10337]],\n", " [[10342]],\n", " [[10341]],\n", " [[10343]],\n", " [[10834, 10340]],\n", " [[10344]],\n", " [[10345]],\n", " [[10338]]],\n", " 'id': 15}" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "state1 = usmap['objects']['states']['geometries'][1]\n", "state1" ] }, { "cell_type": "markdown", "metadata": { "id": "qktmc0T8qoKA" }, "source": [ "The `arcs` referred here is defined in `usmap['arcs']`." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3, "status": "ok", "timestamp": 1690232938354, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "8S4lgQtTqoKA", "outputId": "c6c42b0e-bc3d-4f7c-c724-3d362b13cfb5" }, "outputs": [ { "data": { "text/plain": [ "[[[15739, 57220], [0, 0]],\n", " [[15739, 57220], [29, 62], [47, -273]],\n", " [[15815, 57009], [-6, -86]],\n", " [[15809, 56923], [0, 0]],\n", " [[15809, 56923], [-36, -8], [6, -210], [32, 178]],\n", " [[15811, 56883], [9, -194], [44, -176], [-29, -151], [-24, -319]],\n", " [[15811, 56043], [-12, -216], [26, -171]],\n", " [[15825, 55656], [-2, 1]],\n", " [[15823, 55657], [-19, 10], [26, -424], [-26, -52]],\n", " [[15804, 55191], [-30, -72], [-47, -344]]]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "usmap['arcs'][:10]" ] }, { "cell_type": "markdown", "metadata": { "id": "k5r3XGSTqoKB" }, "source": [ "It seems pretty daunting to work with this dataset, right? But fortunately people have already built tools to handle such data." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "executionInfo": { "elapsed": 3, "status": "ok", "timestamp": 1690232939346, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "Xq6w_c7fqoKC" }, "outputs": [], "source": [ "states = alt.topo_feature(data.us_10m.url, 'states')" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3, "status": "ok", "timestamp": 1690232939761, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "94sDpllZqoKC", "outputId": "55bb5c0e-d5d9-4e00-e4a9-10515f07fbd2" }, "outputs": [ { "data": { "text/plain": [ "UrlData({\n", " format: TopoDataFormat({\n", " feature: 'states',\n", " type: 'topojson'\n", " }),\n", " url: 'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/us-10m.json'\n", "})" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "states" ] }, { "cell_type": "markdown", "metadata": { "id": "y_FDWolcqoKD" }, "source": [ "Can you find a mark for geographical shapes from here https://altair-viz.github.io/user_guide/marks/index.html# and draw the states?" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 331 }, "executionInfo": { "elapsed": 327, "status": "ok", "timestamp": 1690232957050, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "oOLvExieqoKD", "outputId": "62cf18ce-f128-4f52-e60f-903ea8c930ef" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "LAVrvLESqoKD" }, "source": [ "And then project it using the `albersUsa`?" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 331 }, "executionInfo": { "elapsed": 17, "status": "ok", "timestamp": 1690232957991, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "256_fw7hqoKE", "outputId": "164a356e-1b0f-4823-c0de-6e9056252773" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "sD5c6TroqoKE" }, "source": [ "Can you do the same thing with counties and draw county boundaries?" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 331 }, "executionInfo": { "elapsed": 5, "status": "ok", "timestamp": 1690232958857, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "PucM5vM9qoKF", "outputId": "f4a74e7f-1cbd-4e30-c6c7-ff3a3f700700" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "JhiS_GwWqoKF" }, "source": [ "Let's load some county-level unemployment data." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 13, "status": "ok", "timestamp": 1690232959933, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "dsLXey0VqoKG", "outputId": "641cee21-733e-4d0c-80d6-16d66e31f345" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idrate
010010.097
110030.091
210050.134
310070.121
410090.099
\n", "
" ], "text/plain": [ " id rate\n", "0 1001 0.097\n", "1 1003 0.091\n", "2 1005 0.134\n", "3 1007 0.121\n", "4 1009 0.099" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unemp_data = data.unemployment(sep='\\t')\n", "unemp_data.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "jCQX5fJEqoKG" }, "source": [ "This dataset has unemployment rate. When? I don't know. We don't care about data provenance here because the goal is quickly trying out choropleth. But if you're working with a real dataset, you should be very sensitive about the provenance of your dataset. Make sure you understand where the data came from and how it was processed.\n", "\n", "Anyway, for each county specified with `id`. To combine two datasets, we use \"Lookup transform\" - https://vega.github.io/vega/docs/transforms/lookup/. Essentially, we use the `id` in the map data to look up (again) `id` field in the `unemp_data` and then bring in the `rate` variable. Then, we can use that `rate` variable to encode the color of the `geoshape` mark." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 431, "resources": { "http://localhost:8080/altair-data-141893a9f0c2b2c58be329ef58d66780.json": { "data": "CjwhRE9DVFlQRSBodG1sPgo8aHRtbCBsYW5nPWVuPgogIDxtZXRhIGNoYXJzZXQ9dXRmLTg+CiAgPG1ldGEgbmFtZT12aWV3cG9ydCBjb250ZW50PSJpbml0aWFsLXNjYWxlPTEsIG1pbmltdW0tc2NhbGU9MSwgd2lkdGg9ZGV2aWNlLXdpZHRoIj4KICA8dGl0bGU+RXJyb3IgNDA0IChOb3QgRm91bmQpISExPC90aXRsZT4KICA8c3R5bGU+CiAgICAqe21hcmdpbjowO3BhZGRpbmc6MH1odG1sLGNvZGV7Zm9udDoxNXB4LzIycHggYXJpYWwsc2Fucy1zZXJpZn1odG1se2JhY2tncm91bmQ6I2ZmZjtjb2xvcjojMjIyO3BhZGRpbmc6MTVweH1ib2R5e21hcmdpbjo3JSBhdXRvIDA7bWF4LXdpZHRoOjM5MHB4O21pbi1oZWlnaHQ6MTgwcHg7cGFkZGluZzozMHB4IDAgMTVweH0qID4gYm9keXtiYWNrZ3JvdW5kOnVybCgvL3d3dy5nb29nbGUuY29tL2ltYWdlcy9lcnJvcnMvcm9ib3QucG5nKSAxMDAlIDVweCBuby1yZXBlYXQ7cGFkZGluZy1yaWdodDoyMDVweH1we21hcmdpbjoxMXB4IDAgMjJweDtvdmVyZmxvdzpoaWRkZW59aW5ze2NvbG9yOiM3Nzc7dGV4dC1kZWNvcmF0aW9uOm5vbmV9YSBpbWd7Ym9yZGVyOjB9QG1lZGlhIHNjcmVlbiBhbmQgKG1heC13aWR0aDo3NzJweCl7Ym9keXtiYWNrZ3JvdW5kOm5vbmU7bWFyZ2luLXRvcDowO21heC13aWR0aDpub25lO3BhZGRpbmctcmlnaHQ6MH19I2xvZ297YmFja2dyb3VuZDp1cmwoLy93d3cuZ29vZ2xlLmNvbS9pbWFnZXMvbG9nb3MvZXJyb3JwYWdlL2Vycm9yX2xvZ28tMTUweDU0LnBuZykgbm8tcmVwZWF0O21hcmdpbi1sZWZ0Oi01cHh9QG1lZGlhIG9ubHkgc2NyZWVuIGFuZCAobWluLXJlc29sdXRpb246MTkyZHBpKXsjbG9nb3tiYWNrZ3JvdW5kOnVybCgvL3d3dy5nb29nbGUuY29tL2ltYWdlcy9sb2dvcy9lcnJvcnBhZ2UvZXJyb3JfbG9nby0xNTB4NTQtMngucG5nKSBuby1yZXBlYXQgMCUgMCUvMTAwJSAxMDAlOy1tb3otYm9yZGVyLWltYWdlOnVybCgvL3d3dy5nb29nbGUuY29tL2ltYWdlcy9sb2dvcy9lcnJvcnBhZ2UvZXJyb3JfbG9nby0xNTB4NTQtMngucG5nKSAwfX1AbWVkaWEgb25seSBzY3JlZW4gYW5kICgtd2Via2l0LW1pbi1kZXZpY2UtcGl4ZWwtcmF0aW86Mil7I2xvZ297YmFja2dyb3VuZDp1cmwoLy93d3cuZ29vZ2xlLmNvbS9pbWFnZXMvbG9nb3MvZXJyb3JwYWdlL2Vycm9yX2xvZ28tMTUweDU0LTJ4LnBuZykgbm8tcmVwZWF0Oy13ZWJraXQtYmFja2dyb3VuZC1zaXplOjEwMCUgMTAwJX19I2xvZ297ZGlzcGxheTppbmxpbmUtYmxvY2s7aGVpZ2h0OjU0cHg7d2lkdGg6MTUwcHh9CiAgPC9zdHlsZT4KICA8YSBocmVmPS8vd3d3Lmdvb2dsZS5jb20vPjxzcGFuIGlkPWxvZ28gYXJpYS1sYWJlbD1Hb29nbGU+PC9zcGFuPjwvYT4KICA8cD48Yj40MDQuPC9iPiA8aW5zPlRoYXTigJlzIGFuIGVycm9yLjwvaW5zPgogIDxwPiAgPGlucz5UaGF04oCZcyBhbGwgd2Uga25vdy48L2lucz4K", "headers": [ [ "content-length", "1449" ], [ "content-type", "text/html; charset=utf-8" ] ], "ok": false, "status": 404, "status_text": "" } } }, "executionInfo": { "elapsed": 12, "status": "ok", "timestamp": 1690232960674, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "UknNEGG_qoKH", "outputId": "dcce3265-2bf7-46d3-97a4-17264b815d00" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(us_counties).mark_geoshape().project(\n", " type='albersUsa'\n", ").transform_lookup(\n", " lookup='id',\n", " from_=alt.LookupData(unemp_data, 'id', ['rate'])\n", ").encode(\n", " color='rate:Q'\n", ").properties(\n", " width=700,\n", " height=400\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "mMHYUbQsqoKH" }, "source": [ "There you have it, a nice choropleth map. 😎\n" ] }, { "cell_type": "markdown", "metadata": { "id": "gdInAvwdqoKI" }, "source": [ "## Raster visualization with datashader\n", "\n", "Although many geovisualizations use vector graphics, raster visualization is still useful especially when you deal with images and lots of datapoints. Datashader is a package that aggregates and visualizes a large amount of data very quickly. Given a *scene* (visualization boundary, resolution, etc.), it quickly aggregate the data and produce **pixels** and send them to you.\n", "\n", "To appreciate its power, we need a fairly large dataset. Let's use NYC taxi trip dataset on Kaggle: https://www.kaggle.com/kentonnlp/2014-new-york-city-taxi-trips You can download even bigger trip data from NYC open data website: https://opendata.cityofnewyork.us/data/\n", "\n", "Ah, and you want to install the datashader, bokeh, and holoviews first if you don't have them yet. \n", "\n", " pip install datashader bokeh holoviews jupyter-bokeh\n", "\n", "or\n", "\n", " conda install datashader bokeh holoviews jupyter-bokeh\n", " " ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "executionInfo": { "elapsed": 3016, "status": "ok", "timestamp": 1690233051824, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "Coy9hAAfqoKI" }, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import pandas as pd\n", "import datashader as ds\n", "from datashader import transfer_functions as tf\n", "from colorcet import fire" ] }, { "cell_type": "markdown", "metadata": { "id": "camMhr7pqoKI" }, "source": [ "Because the dataset is pretty big, let's use a small sample first. For this visualization, we only keep the dropoff location." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 345 }, "executionInfo": { "elapsed": 319, "status": "error", "timestamp": 1690233059618, "user": { "displayName": "Vincent Wong", "userId": "06927694896148305320" }, "user_tz": 240 }, "id": "0DxKvpjoqoKJ", "outputId": "b9b581ae-328f-42de-8bdf-aeba33b1ff65" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dropoff_longitudedropoff_latitude
0-73.98222740.731790
1-73.96044940.763995
2-73.98662640.765217
3-73.97986340.777050
4-73.98436740.720524
\n", "
" ], "text/plain": [ " dropoff_longitude dropoff_latitude\n", "0 -73.982227 40.731790\n", "1 -73.960449 40.763995\n", "2 -73.986626 40.765217\n", "3 -73.979863 40.777050\n", "4 -73.984367 40.720524" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nyctaxi_small = pd.read_csv('~/Downloads/archive/nyc_taxi_data_2014.csv', nrows=10000,\n", " usecols=['dropoff_longitude', 'dropoff_latitude'])\n", "nyctaxi_small.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "bXyxUFG-qoKJ" }, "source": [ "Although the dataset is different, we can still follow the example here: https://datashader.org/getting_started/Introduction.html" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "id": "L6ZCUMFAqoKJ", "outputId": "94c8f839-4197-4413-b19a-a867b964eb8f" }, "outputs": [ { "data": { "image/png": "", "text/html": [ "" ], "text/plain": [ "\n", "array([[4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278230783],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " ...,\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080]], dtype=uint32)\n", "Coordinates:\n", " * dropoff_longitude (dropoff_longitude) float64 -74.31 -74.18 ... -0.06197\n", " * dropoff_latitude (dropoff_latitude) float64 0.03423 0.1027 ... 40.97 41.04" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agg = ds.Canvas().points(nyctaxi_small, 'dropoff_longitude', 'dropoff_latitude')\n", "tf.set_background(tf.shade(agg, cmap=fire),\"black\")" ] }, { "cell_type": "markdown", "metadata": { "id": "OKsxrWfhqoKK" }, "source": [ "Why can't we see anything? Wait, do you see the small dots on the left top? Can that be New York City? Maybe we don't see anything because some people travel very far? or because the dataset has some missing data?\n", "\n", "**Q: Can you first check whether there are NaNs? Then drop them and draw the map again?**" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "id": "IczXK0shqoKK", "outputId": "11dc377d-ba70-4976-a9e3-daa71f7a8824" }, "outputs": [ { "data": { "text/plain": [ "dropoff_longitude 1\n", "dropoff_latitude 1\n", "dtype: int64" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "id": "ac27UgIfqoKL", "outputId": "723f1494-65d5-4a7a-fdc7-1f03142643a2" }, "outputs": [ { "data": { "image/png": "", "text/html": [ "" ], "text/plain": [ "\n", "array([[4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278230783],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " ...,\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080]], dtype=uint32)\n", "Coordinates:\n", " * dropoff_longitude (dropoff_longitude) float64 -74.31 -74.18 ... -0.06197\n", " * dropoff_latitude (dropoff_latitude) float64 0.03423 0.1027 ... 40.97 41.04" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# drop the rows with NaN and then draw the map again.\n", "\n", "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "t-76U_7vqoKM" }, "source": [ "So it's not about the missing data.\n", "\n", "**Q: Can you identify the issue and draw the map like the following?**\n", "\n", "hint: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.between.html this method may be helpful." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "id": "kZFc77QhqoKM", "outputId": "2003389f-aaa0-4b54-ca8e-67a82ee1d642" }, "outputs": [], "source": [ "# You can use multiple cells to figure out what's going on.\n", "\n", "# YOUR SOLUTION HERE" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "id": "WyBuABbSqoKN", "outputId": "a6b22583-78ae-4356-9ee0-44c1049031df" }, "outputs": [ { "data": { "image/png": "", "text/html": [ "" ], "text/plain": [ "\n", "array([[4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " ...,\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080]], dtype=uint32)\n", "Coordinates:\n", " * dropoff_longitude (dropoff_longitude) float64 -74.09 -74.09 ... -73.71\n", " * dropoff_latitude (dropoff_latitude) float64 40.58 40.58 ... 40.98 40.98" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agg = ds.Canvas().points(nyctaxi_small_filtered, 'dropoff_longitude', 'dropoff_latitude')\n", "tf.set_background(tf.shade(agg, cmap=fire), \"black\")" ] }, { "cell_type": "markdown", "metadata": { "id": "nq6Px8I8qoKO" }, "source": [ "Do you see the black empty space at the center? That looks like the Central Park. This is cool, but it'll be awesome if we can explore the data interactively." ] }, { "cell_type": "markdown", "metadata": { "id": "FP4BKa0qqoKP" }, "source": [ "Ok, now let's get serious by loading the whole dataset. It may take some time. Apply the same data cleaning procedure." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "id": "-hKJ8Eo0qoKP" }, "outputs": [], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "DIp6mf1GqoKP" }, "source": [ "Can you feed the data directly to datashader to reproduce the static plot, this time with the full data?" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "id": "b3vV9m7RqoKQ", "outputId": "c7e39a6a-e484-4720-e8f9-4364a75b4a26" }, "outputs": [ { "data": { "image/png": "", "text/html": [ "" ], "text/plain": [ "\n", "array([[4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " ...,\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080],\n", " [4278190080, 4278190080, 4278190080, ..., 4278190080, 4278190080,\n", " 4278190080]], dtype=uint32)\n", "Coordinates:\n", " * dropoff_longitude (dropoff_longitude) float64 -74.1 -74.1 ... -73.7 -73.7\n", " * dropoff_latitude (dropoff_latitude) float64 40.5 40.5 40.5 ... 41.0 41.0" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "vk9lsI3RqoKQ" }, "source": [ "Wow, that's fast. Also it looks cool!\n", "\n", "Let's try the interactive version from here: https://datashader.org/getting_started/Introduction.html" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dropoff_longitudedropoff_latitude
0-73.98222740.731790
1-73.96044940.763995
2-73.98662640.765217
3-73.97986340.777050
4-73.98436740.720524
.........
14999994-74.00067540.725737
14999995-73.99128740.692535
14999996-73.77650540.740790
14999997-74.00595340.710922
14999998-73.97240740.747463
\n", "

14751421 rows × 2 columns

\n", "
" ], "text/plain": [ " dropoff_longitude dropoff_latitude\n", "0 -73.982227 40.731790\n", "1 -73.960449 40.763995\n", "2 -73.986626 40.765217\n", "3 -73.979863 40.777050\n", "4 -73.984367 40.720524\n", "... ... ...\n", "14999994 -74.000675 40.725737\n", "14999995 -73.991287 40.692535\n", "14999996 -73.776505 40.740790\n", "14999997 -74.005953 40.710922\n", "14999998 -73.972407 40.747463\n", "\n", "[14751421 rows x 2 columns]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nyctaxi_filtered" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We currently only have longitudes and latitudes. We need to conver them into a coordinate system that datashader understands. " ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/y9/6g21ty616b783y993xqz28hr0000gq/T/ipykernel_99414/1009409252.py:4: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df['dropoff_x'], df['dropoff_y'] = lnglat_to_meters(df.dropoff_longitude, df.dropoff_latitude)\n", "/var/folders/y9/6g21ty616b783y993xqz28hr0000gq/T/ipykernel_99414/1009409252.py:4: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df['dropoff_x'], df['dropoff_y'] = lnglat_to_meters(df.dropoff_longitude, df.dropoff_latitude)\n" ] } ], "source": [ "from datashader.utils import lnglat_to_meters\n", "\n", "df = nyctaxi_filtered\n", "df['dropoff_x'], df['dropoff_y'] = lnglat_to_meters(df.dropoff_longitude, df.dropoff_latitude)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can visualize the data interactively. See https://datashader.org/getting_started/Introduction.html" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "id": "I-BDGl86qoKR", "outputId": "be2ff1c4-1abe-4d7e-cda4-07928816f93a" }, "outputs": [ { "data": { "application/javascript": [ "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", " var py_version = '3.3.0'.replace('rc', '-rc.').replace('.dev', '-dev.');\n", " var is_dev = py_version.indexOf(\"+\") !== -1 || py_version.indexOf(\"-\") !== -1;\n", " var reloading = false;\n", " var Bokeh = root.Bokeh;\n", " var bokeh_loaded = Bokeh != null && (Bokeh.version === py_version || (Bokeh.versions !== undefined && Bokeh.versions.has(py_version)));\n", "\n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " function run_callbacks() {\n", " try {\n", " root._bokeh_onload_callbacks.forEach(function(callback) {\n", " if (callback != null)\n", " callback();\n", " });\n", " } finally {\n", " delete root._bokeh_onload_callbacks;\n", " }\n", " console.debug(\"Bokeh: all callbacks have finished\");\n", " }\n", "\n", " function load_libs(css_urls, js_urls, js_modules, js_exports, callback) {\n", " if (css_urls == null) css_urls = [];\n", " if (js_urls == null) js_urls = [];\n", " if (js_modules == null) js_modules = [];\n", " if (js_exports == null) js_exports = {};\n", "\n", " root._bokeh_onload_callbacks.push(callback);\n", "\n", " if (root._bokeh_is_loading > 0) {\n", " console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", " return null;\n", " }\n", " if (js_urls.length === 0 && js_modules.length === 0 && Object.keys(js_exports).length === 0) {\n", " run_callbacks();\n", " return null;\n", " }\n", " if (!reloading) {\n", " console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", " }\n", "\n", " function on_load() {\n", " root._bokeh_is_loading--;\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n", " run_callbacks()\n", " }\n", " }\n", " window._bokeh_on_load = on_load\n", "\n", " function on_error() {\n", " console.error(\"failed to load \" + url);\n", " }\n", "\n", " var skip = [];\n", " if (window.requirejs) {\n", " window.requirejs.config({'packages': {}, 'paths': {'jspanel': 'https://cdn.jsdelivr.net/npm/jspanel4@4.12.0/dist/jspanel', 'jspanel-modal': 'https://cdn.jsdelivr.net/npm/jspanel4@4.12.0/dist/extensions/modal/jspanel.modal', 'jspanel-tooltip': 'https://cdn.jsdelivr.net/npm/jspanel4@4.12.0/dist/extensions/tooltip/jspanel.tooltip', 'jspanel-hint': 'https://cdn.jsdelivr.net/npm/jspanel4@4.12.0/dist/extensions/hint/jspanel.hint', 'jspanel-layout': 'https://cdn.jsdelivr.net/npm/jspanel4@4.12.0/dist/extensions/layout/jspanel.layout', 'jspanel-contextmenu': 'https://cdn.jsdelivr.net/npm/jspanel4@4.12.0/dist/extensions/contextmenu/jspanel.contextmenu', 'jspanel-dock': 'https://cdn.jsdelivr.net/npm/jspanel4@4.12.0/dist/extensions/dock/jspanel.dock', 'gridstack': 'https://cdn.jsdelivr.net/npm/gridstack@7.2.3/dist/gridstack-all', 'notyf': 'https://cdn.jsdelivr.net/npm/notyf@3/notyf.min'}, 'shim': {'jspanel': {'exports': 'jsPanel'}, 'gridstack': {'exports': 'GridStack'}}});\n", " require([\"jspanel\"], function(jsPanel) {\n", "\twindow.jsPanel = jsPanel\n", "\ton_load()\n", " })\n", " require([\"jspanel-modal\"], function() {\n", "\ton_load()\n", " })\n", " require([\"jspanel-tooltip\"], function() {\n", "\ton_load()\n", " })\n", " require([\"jspanel-hint\"], function() {\n", "\ton_load()\n", " })\n", " require([\"jspanel-layout\"], function() {\n", "\ton_load()\n", " })\n", " require([\"jspanel-contextmenu\"], function() {\n", "\ton_load()\n", " })\n", " require([\"jspanel-dock\"], function() {\n", "\ton_load()\n", " })\n", " require([\"gridstack\"], function(GridStack) {\n", "\twindow.GridStack = GridStack\n", "\ton_load()\n", " })\n", " require([\"notyf\"], function() {\n", "\ton_load()\n", " })\n", " root._bokeh_is_loading = css_urls.length + 9;\n", " } else {\n", " root._bokeh_is_loading = css_urls.length + js_urls.length + js_modules.length + Object.keys(js_exports).length;\n", " }\n", "\n", " var existing_stylesheets = []\n", " var links = document.getElementsByTagName('link')\n", " for (var i = 0; i < links.length; i++) {\n", " var link = links[i]\n", " if (link.href != null) {\n", "\texisting_stylesheets.push(link.href)\n", " }\n", " }\n", " for (var i = 0; i < css_urls.length; i++) {\n", " var url = css_urls[i];\n", " if (existing_stylesheets.indexOf(url) !== -1) {\n", "\ton_load()\n", "\tcontinue;\n", " }\n", " const element = document.createElement(\"link\");\n", " element.onload = on_load;\n", " element.onerror = on_error;\n", " element.rel = \"stylesheet\";\n", " element.type = \"text/css\";\n", " element.href = url;\n", " console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n", " document.body.appendChild(element);\n", " } if (((window['jsPanel'] !== undefined) && (!(window['jsPanel'] instanceof HTMLElement))) || window.requirejs) {\n", " var urls = ['https://cdn.holoviz.org/panel/1.3.0-rc.3/dist/bundled/floatpanel/jspanel4@4.12.0/dist/jspanel.js', 'https://cdn.holoviz.org/panel/1.3.0-rc.3/dist/bundled/floatpanel/jspanel4@4.12.0/dist/extensions/modal/jspanel.modal.js', 'https://cdn.holoviz.org/panel/1.3.0-rc.3/dist/bundled/floatpanel/jspanel4@4.12.0/dist/extensions/tooltip/jspanel.tooltip.js', 'https://cdn.holoviz.org/panel/1.3.0-rc.3/dist/bundled/floatpanel/jspanel4@4.12.0/dist/extensions/hint/jspanel.hint.js', 'https://cdn.holoviz.org/panel/1.3.0-rc.3/dist/bundled/floatpanel/jspanel4@4.12.0/dist/extensions/layout/jspanel.layout.js', 'https://cdn.holoviz.org/panel/1.3.0-rc.3/dist/bundled/floatpanel/jspanel4@4.12.0/dist/extensions/contextmenu/jspanel.contextmenu.js', 'https://cdn.holoviz.org/panel/1.3.0-rc.3/dist/bundled/floatpanel/jspanel4@4.12.0/dist/extensions/dock/jspanel.dock.js'];\n", " for (var i = 0; i < urls.length; i++) {\n", " skip.push(urls[i])\n", " }\n", " } if (((window['GridStack'] !== undefined) && (!(window['GridStack'] instanceof HTMLElement))) || window.requirejs) {\n", " var urls = ['https://cdn.holoviz.org/panel/1.3.0-rc.3/dist/bundled/gridstack/gridstack@7.2.3/dist/gridstack-all.js'];\n", " for (var i = 0; i < urls.length; i++) {\n", " skip.push(urls[i])\n", " }\n", " } if (((window['Notyf'] !== undefined) && (!(window['Notyf'] instanceof HTMLElement))) || window.requirejs) {\n", " var urls = ['https://cdn.holoviz.org/panel/1.3.0-rc.3/dist/bundled/notificationarea/notyf@3/notyf.min.js'];\n", " for (var i = 0; i < urls.length; i++) {\n", " skip.push(urls[i])\n", " }\n", " } var existing_scripts = []\n", " var scripts = document.getElementsByTagName('script')\n", " for (var i = 0; i < scripts.length; i++) {\n", " var script = scripts[i]\n", " if (script.src != null) {\n", "\texisting_scripts.push(script.src)\n", " }\n", " }\n", " for (var i = 0; i < js_urls.length; i++) {\n", " var url = js_urls[i];\n", " if (skip.indexOf(url) !== -1 || existing_scripts.indexOf(url) !== -1) {\n", "\tif (!window.requirejs) {\n", "\t on_load();\n", "\t}\n", "\tcontinue;\n", " }\n", " var element = document.createElement('script');\n", " element.onload = on_load;\n", " element.onerror = on_error;\n", " element.async = false;\n", " element.src = url;\n", " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.head.appendChild(element);\n", " }\n", " for (var i = 0; i < js_modules.length; i++) {\n", " var url = js_modules[i];\n", " if (skip.indexOf(url) !== -1 || existing_scripts.indexOf(url) !== -1) {\n", "\tif (!window.requirejs) {\n", "\t on_load();\n", "\t}\n", "\tcontinue;\n", " }\n", " var element = document.createElement('script');\n", " element.onload = on_load;\n", " element.onerror = on_error;\n", " element.async = false;\n", " element.src = url;\n", " element.type = \"module\";\n", " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.head.appendChild(element);\n", " }\n", " for (const name in js_exports) {\n", " var url = js_exports[name];\n", " if (skip.indexOf(url) >= 0 || root[name] != null) {\n", "\tif (!window.requirejs) {\n", "\t on_load();\n", "\t}\n", "\tcontinue;\n", " }\n", " var element = document.createElement('script');\n", " element.onerror = on_error;\n", " element.async = false;\n", " element.type = \"module\";\n", " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " element.textContent = `\n", " import ${name} from \"${url}\"\n", " window.${name} = ${name}\n", " window._bokeh_on_load()\n", " `\n", " document.head.appendChild(element);\n", " }\n", " if (!js_urls.length && !js_modules.length) {\n", " on_load()\n", " }\n", " };\n", "\n", " function inject_raw_css(css) {\n", " const element = document.createElement(\"style\");\n", " element.appendChild(document.createTextNode(css));\n", " document.body.appendChild(element);\n", " }\n", "\n", " var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-3.3.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-3.3.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-3.3.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-3.3.0.min.js\", \"https://cdn.holoviz.org/panel/1.3.0-rc.3/dist/panel.min.js\"];\n", " var js_modules = [];\n", " var js_exports = {};\n", " var css_urls = [];\n", " var inline_js = [ function(Bokeh) {\n", " Bokeh.set_log_level(\"info\");\n", " },\n", "function(Bokeh) {} // ensure no trailing comma for IE\n", " ];\n", "\n", " function run_inline_js() {\n", " if ((root.Bokeh !== undefined) || (force === true)) {\n", " for (var i = 0; i < inline_js.length; i++) {\n", " inline_js[i].call(root, root.Bokeh);\n", " }\n", " // Cache old bokeh versions\n", " if (Bokeh != undefined && !reloading) {\n", "\tvar NewBokeh = root.Bokeh;\n", "\tif (Bokeh.versions === undefined) {\n", "\t Bokeh.versions = new Map();\n", "\t}\n", "\tif (NewBokeh.version !== Bokeh.version) {\n", "\t Bokeh.versions.set(NewBokeh.version, NewBokeh)\n", "\t}\n", "\troot.Bokeh = Bokeh;\n", " }} else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(run_inline_js, 100);\n", " } else if (!root._bokeh_failed_load) {\n", " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", " root._bokeh_failed_load = true;\n", " }\n", " root._bokeh_is_initializing = false\n", " }\n", "\n", " function load_or_wait() {\n", " // Implement a backoff loop that tries to ensure we do not load multiple\n", " // versions of Bokeh and its dependencies at the same time.\n", " // In recent versions we use the root._bokeh_is_initializing flag\n", " // to determine whether there is an ongoing attempt to initialize\n", " // bokeh, however for backward compatibility we also try to ensure\n", " // that we do not start loading a newer (Panel>=1.0 and Bokeh>3) version\n", " // before older versions are fully initialized.\n", " if (root._bokeh_is_initializing && Date.now() > root._bokeh_timeout) {\n", " root._bokeh_is_initializing = false;\n", " root._bokeh_onload_callbacks = undefined;\n", " console.log(\"Bokeh: BokehJS was loaded multiple times but one version failed to initialize.\");\n", " load_or_wait();\n", " } else if (root._bokeh_is_initializing || (typeof root._bokeh_is_initializing === \"undefined\" && root._bokeh_onload_callbacks !== undefined)) {\n", " setTimeout(load_or_wait, 100);\n", " } else {\n", " Bokeh = root.Bokeh;\n", " bokeh_loaded = Bokeh != null && (Bokeh.version === py_version || (Bokeh.versions !== undefined && Bokeh.versions.has(py_version)));\n", " root._bokeh_is_initializing = true\n", " root._bokeh_onload_callbacks = []\n", " if (!reloading && (!bokeh_loaded || is_dev)) {\n", "\troot.Bokeh = undefined;\n", " }\n", " load_libs(css_urls, js_urls, js_modules, js_exports, function() {\n", "\tconsole.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n", "\trun_inline_js();\n", " });\n", " }\n", " }\n", " // Give older versions of the autoload script a head-start to ensure\n", " // they initialize before we start loading newer version.\n", " setTimeout(load_or_wait, 100)\n", "}(window));" ], "application/vnd.holoviews_load.v0+json": "" }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", "if ((window.PyViz === undefined) || (window.PyViz instanceof HTMLElement)) {\n", " window.PyViz = {comms: {}, comm_status:{}, kernels:{}, receivers: {}, plot_index: []}\n", "}\n", "\n", "\n", " function JupyterCommManager() {\n", " }\n", "\n", " JupyterCommManager.prototype.register_target = function(plot_id, comm_id, msg_handler) {\n", " if (window.comm_manager || ((window.Jupyter !== undefined) && (Jupyter.notebook.kernel != null))) {\n", " var comm_manager = window.comm_manager || Jupyter.notebook.kernel.comm_manager;\n", " comm_manager.register_target(comm_id, function(comm) {\n", " comm.on_msg(msg_handler);\n", " });\n", " } else if ((plot_id in window.PyViz.kernels) && (window.PyViz.kernels[plot_id])) {\n", " window.PyViz.kernels[plot_id].registerCommTarget(comm_id, function(comm) {\n", " comm.onMsg = msg_handler;\n", " });\n", " } else if (typeof google != 'undefined' && google.colab.kernel != null) {\n", " google.colab.kernel.comms.registerTarget(comm_id, (comm) => {\n", " var messages = comm.messages[Symbol.asyncIterator]();\n", " function processIteratorResult(result) {\n", " var message = result.value;\n", " console.log(message)\n", " var content = {data: message.data, comm_id};\n", " var buffers = []\n", " for (var buffer of message.buffers || []) {\n", " buffers.push(new DataView(buffer))\n", " }\n", " var metadata = message.metadata || {};\n", " var msg = {content, buffers, metadata}\n", " msg_handler(msg);\n", " return messages.next().then(processIteratorResult);\n", " }\n", " return messages.next().then(processIteratorResult);\n", " })\n", " }\n", " }\n", "\n", " JupyterCommManager.prototype.get_client_comm = function(plot_id, comm_id, msg_handler) {\n", " if (comm_id in window.PyViz.comms) {\n", " return window.PyViz.comms[comm_id];\n", " } else if (window.comm_manager || ((window.Jupyter !== undefined) && (Jupyter.notebook.kernel != null))) {\n", " var comm_manager = window.comm_manager || Jupyter.notebook.kernel.comm_manager;\n", " var comm = comm_manager.new_comm(comm_id, {}, {}, {}, comm_id);\n", " if (msg_handler) {\n", " comm.on_msg(msg_handler);\n", " }\n", " } else if ((plot_id in window.PyViz.kernels) && (window.PyViz.kernels[plot_id])) {\n", " var comm = window.PyViz.kernels[plot_id].connectToComm(comm_id);\n", " comm.open();\n", " if (msg_handler) {\n", " comm.onMsg = msg_handler;\n", " }\n", " } else if (typeof google != 'undefined' && google.colab.kernel != null) {\n", " var comm_promise = google.colab.kernel.comms.open(comm_id)\n", " comm_promise.then((comm) => {\n", " window.PyViz.comms[comm_id] = comm;\n", " if (msg_handler) {\n", " var messages = comm.messages[Symbol.asyncIterator]();\n", " function processIteratorResult(result) {\n", " var message = result.value;\n", " var content = {data: message.data};\n", " var metadata = message.metadata || {comm_id};\n", " var msg = {content, metadata}\n", " msg_handler(msg);\n", " return messages.next().then(processIteratorResult);\n", " }\n", " return messages.next().then(processIteratorResult);\n", " }\n", " }) \n", " var sendClosure = (data, metadata, buffers, disposeOnDone) => {\n", " return comm_promise.then((comm) => {\n", " comm.send(data, metadata, buffers, disposeOnDone);\n", " });\n", " };\n", " var comm = {\n", " send: sendClosure\n", " };\n", " }\n", " window.PyViz.comms[comm_id] = comm;\n", " return comm;\n", " }\n", " window.PyViz.comm_manager = new JupyterCommManager();\n", " \n", "\n", "\n", "var JS_MIME_TYPE = 'application/javascript';\n", "var HTML_MIME_TYPE = 'text/html';\n", "var EXEC_MIME_TYPE = 'application/vnd.holoviews_exec.v0+json';\n", "var CLASS_NAME = 'output';\n", "\n", "/**\n", " * Render data to the DOM node\n", " */\n", "function render(props, node) {\n", " var div = document.createElement(\"div\");\n", " var script = document.createElement(\"script\");\n", " node.appendChild(div);\n", " node.appendChild(script);\n", "}\n", "\n", "/**\n", " * Handle when a new output is added\n", " */\n", "function handle_add_output(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", " if ((output.data == undefined) || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n", " return\n", " }\n", " var id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", " if (id !== undefined) {\n", " var nchildren = toinsert.length;\n", " var html_node = toinsert[nchildren-1].children[0];\n", " html_node.innerHTML = output.data[HTML_MIME_TYPE];\n", " var scripts = [];\n", " var nodelist = html_node.querySelectorAll(\"script\");\n", " for (var i in nodelist) {\n", " if (nodelist.hasOwnProperty(i)) {\n", " scripts.push(nodelist[i])\n", " }\n", " }\n", "\n", " scripts.forEach( function (oldScript) {\n", " var newScript = document.createElement(\"script\");\n", " var attrs = [];\n", " var nodemap = oldScript.attributes;\n", " for (var j in nodemap) {\n", " if (nodemap.hasOwnProperty(j)) {\n", " attrs.push(nodemap[j])\n", " }\n", " }\n", " attrs.forEach(function(attr) { newScript.setAttribute(attr.name, attr.value) });\n", " newScript.appendChild(document.createTextNode(oldScript.innerHTML));\n", " oldScript.parentNode.replaceChild(newScript, oldScript);\n", " });\n", " if (JS_MIME_TYPE in output.data) {\n", " toinsert[nchildren-1].children[1].textContent = output.data[JS_MIME_TYPE];\n", " }\n", " output_area._hv_plot_id = id;\n", " if ((window.Bokeh !== undefined) && (id in Bokeh.index)) {\n", " window.PyViz.plot_index[id] = Bokeh.index[id];\n", " } else {\n", " window.PyViz.plot_index[id] = null;\n", " }\n", " } else if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].childNodes[1].setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", "}\n", "\n", "/**\n", " * Handle when an output is cleared or removed\n", " */\n", "function handle_clear_output(event, handle) {\n", " var id = handle.cell.output_area._hv_plot_id;\n", " var server_id = handle.cell.output_area._bokeh_server_id;\n", " if (((id === undefined) || !(id in PyViz.plot_index)) && (server_id !== undefined)) { return; }\n", " var comm = window.PyViz.comm_manager.get_client_comm(\"hv-extension-comm\", \"hv-extension-comm\", function () {});\n", " if (server_id !== null) {\n", " comm.send({event_type: 'server_delete', 'id': server_id});\n", " return;\n", " } else if (comm !== null) {\n", " comm.send({event_type: 'delete', 'id': id});\n", " }\n", " delete PyViz.plot_index[id];\n", " if ((window.Bokeh !== undefined) & (id in window.Bokeh.index)) {\n", " var doc = window.Bokeh.index[id].model.document\n", " doc.clear();\n", " const i = window.Bokeh.documents.indexOf(doc);\n", " if (i > -1) {\n", " window.Bokeh.documents.splice(i, 1);\n", " }\n", " }\n", "}\n", "\n", "/**\n", " * Handle kernel restart event\n", " */\n", "function handle_kernel_cleanup(event, handle) {\n", " delete PyViz.comms[\"hv-extension-comm\"];\n", " window.PyViz.plot_index = {}\n", "}\n", "\n", "/**\n", " * Handle update_display_data messages\n", " */\n", "function handle_update_output(event, handle) {\n", " handle_clear_output(event, {cell: {output_area: handle.output_area}})\n", " handle_add_output(event, handle)\n", "}\n", "\n", "function register_renderer(events, OutputArea) {\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[0]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " events.on('output_added.OutputArea', handle_add_output);\n", " events.on('output_updated.OutputArea', handle_update_output);\n", " events.on('clear_output.CodeCell', handle_clear_output);\n", " events.on('delete.Cell', handle_clear_output);\n", " events.on('kernel_ready.Kernel', handle_kernel_cleanup);\n", "\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " safe: true,\n", " index: 0\n", " });\n", "}\n", "\n", "if (window.Jupyter !== undefined) {\n", " try {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " } catch(err) {\n", " }\n", "}\n" ], "application/vnd.holoviews_load.v0+json": "" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.holoviews_exec.v0+json": "", "text/html": [ "
\n", "
\n", "
\n", "" ] }, "metadata": { "application/vnd.holoviews_exec.v0+json": { "id": "p1002" } }, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "\n", "\n", "\n", " \n", " \n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "76e07b4b59944219b768cc3346cf1d13", "version_major": 2, "version_minor": 0 }, "text/plain": [ "BokehModel(combine_events=True, render_bundle={'docs_json': {'76e7b05b-06e4-41dc-9993-8d1e7f97fb60': {'version…" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import holoviews as hv\n", "import colorcet as cc\n", "\n", "from holoviews.element.tiles import EsriImagery\n", "from holoviews.operation.datashader import datashade\n", "hv.extension('bokeh')\n", "\n", "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "hUQHpvavqoKT" }, "source": [ "**Q: how many rows (data points) are we visualizing right now?**" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "id": "m-_JQKvZqoKU", "outputId": "b3170f1e-758e-47eb-cdf2-8cf32f44fa20" }, "outputs": [ { "data": { "text/plain": [ "14751421" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# YOUR SOLUTION HERE" ] }, { "cell_type": "markdown", "metadata": { "id": "nxd8DnIrqoKU" }, "source": [ "That's a lot of data points. If we are using a vector format, it is probably hopeless to expect any interactivity because you need to move that many points! Yet, datashader + holoviews + bokeh renders everything almost in real time!" ] }, { "cell_type": "markdown", "metadata": { "id": "m36lbKUDqoKU" }, "source": [ "## Leaflet\n", "\n", "Another useful tool is Leaflet. It allows you to use various map tile data (Google maps, Open streetmap, ...) with many types of marks (points, heatmap, etc.). [Leaflet.js](https://leafletjs.com) is one of the easiest options to do that on the web, and there is a Python bridge of it: https://github.com/jupyter-widgets/ipyleaflet. Although we will not go into details, it's certainly something that's worth checking out if you're using geographical data." ] } ], "metadata": { "anaconda-cloud": {}, "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" }, "toc": { "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 0 }