{ "cells": [ { "cell_type": "markdown", "id": "noticed-illness", "metadata": {}, "source": [ "## Visualize Parquet data\n", "*Use the Measured Parameter Data Access Parquet format to visualize STOQS data*\n", "\n", "Executing this Notebook requires a personal STOQS server - these instructions are for a Docker installation. This Notebook builds on issues raised in https://github.com/stoqs/stoqs/issues/227.\n", "\n", "### Docker Instructions\n", "Install and start the software as \n", "[detailed in the README](https://github.com/stoqs/stoqs#production-deployment-with-docker). (Note that on MacOS you will need to modify settings in your `docker-compose.yml` and `.env` files — look for comments referencing 'HOST_UID'.)\n", "\n", "Then, from your `$STOQS_HOME/docker` directory start the Jupyter Notebook server - you can query from the remote database or from a copy that you've made to your local system: \n", "\n", "#### Option A: Query from MBARI's master database\n", "Start the Jupyter Notebook server pointing to MBARI's master STOQS database server. (Note: firewall rules limit unprivileged access to such resources):\n", "\n", " docker-compose exec \\\n", " -e DATABASE_URL=postgis://everyone:guest@kraken.shore.mbari.org:5432/stoqs \\\n", " stoqs stoqs/manage.py shell_plus --notebook\n", "\n", "#### Option B: Query from your local Docker Desktop\n", "Restore a database of your choice from https://stoqs.shore.mbari.org/, for example below is how to make a local copy of the `stoqs_september2013` database from MBARI's server onto your local database and then start the Jupyter Notebook server using the default DATABASE_URL (which should be your local system) also **make sure that your Docker Desktop has at least 16 GB of RAM allocated to it**:\n", "\n", " cd $STOQS_HOME/docker\n", " docker-compose exec stoqs createdb -U postgres stoqs_september2013\n", " curl -k https://stoqs.shore.mbari.org/media/pg_dumps/stoqs_september2013.pg_dump | \\\n", " docker exec -i stoqs pg_restore -Fc -U postgres -d stoqs_september2013\n", " docker-compose exec stoqs stoqs/manage.py shell_plus --notebook\n", "\n", "### Opening this Notebook\n", "Following execution of the `stoqs/manage.py shell_plus --notebook` command a message is displayed giving a URL for you to use in a browser on your host, e.g.:\n", "\n", " http://127.0.0.1:8888/?token=\n", "\n", "In the browser window opened to this URL navigate to this file (`visualize_parquet.ipynb`) and open it. You will then be able to execute the cells and modify the code to suit your needs.\n", "\n", "The information in the output cells result from execution on a 2019 MacBook Pro with a 2.4 GHz 8-Core Intel Core i9 processor, 32 GB 2667 MHz DDR4 RAM, running Docker Desktop 3.1.0 with 16 GB with 4 CPUs and 16 GB allocated." ] }, { "cell_type": "markdown", "id": "imposed-kitty", "metadata": {}, "source": [ "In a browser navigate to https://stoqs.mbari.org/stoqs_september2013 and make selections as shown in this screen grab:\n", "![Constructing a parquet download URL](https://user-images.githubusercontent.com/1771866/110736610-d15b2180-81e0-11eb-9913-3f8a5f0a94c4.png)\n", "\n", "We will attempt to recreate this image from [Issue 227](https://github.com/stoqs/stoqs/issues/227):\n", "![biplot](https://raw.githubusercontent.com/stoqs/stoqs/master/doc/papers/AUV2014/LabeledSelectionUI.png)\n", "\n", "but this time using Datashader which can handle a lot more data." ] }, { "cell_type": "code", "execution_count": 1, "id": "peaceful-ethnic", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2021-03-17 17:52:07-- https://stoqs.shore.mbari.org/stoqs_september2013/api/measuredparameter.parquet?parameter__name=bbp420¶meter__name=fl700_uncorr¶meter__name=salinity¶meter__name=temperature&measurement__instantpoint__activity__platform__name=dorado&collect=name\n", "Resolving stoqs.shore.mbari.org (stoqs.shore.mbari.org)... 134.89.12.71\n", "Connecting to stoqs.shore.mbari.org (stoqs.shore.mbari.org)|134.89.12.71|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 8334441 (7.9M) [application/octet-stream]\n", "Saving to: ‘stoqs_september2013_dorado.parquet’\n", "\n", "stoqs_september2013 100%[===================>] 7.95M 4.80MB/s in 1.7s \n", "\n", "2021-03-17 17:52:16 (4.80 MB/s) - ‘stoqs_september2013_dorado.parquet’ saved [8334441/8334441]\n", "\n", "0.01user 0.11system 0:08.53elapsed 1%CPU (0avgtext+0avgdata 6860maxresident)k\n", "0inputs+0outputs (0major+364minor)pagefaults 0swaps\n", "--2021-03-17 17:52:16-- https://stoqs.shore.mbari.org/stoqs_september2013/api/measuredparameter.parquet?parameter__name=bb470¶meter__name=chlorophyll¶meter__name=salinity¶meter__name=temperature&measurement__instantpoint__activity__platform__name=daphne&measurement__instantpoint__activity__platform__name=tethys&collect=name\n", "Resolving stoqs.shore.mbari.org (stoqs.shore.mbari.org)... 134.89.12.71\n", "Connecting to stoqs.shore.mbari.org (stoqs.shore.mbari.org)|134.89.12.71|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 60694551 (58M) [application/octet-stream]\n", "Saving to: ‘stoqs_september2013_lrauvs.parquet’\n", "\n", "stoqs_september2013 100%[===================>] 57.88M 4.42MB/s in 14s \n", "\n", "2021-03-17 17:53:46 (4.18 MB/s) - ‘stoqs_september2013_lrauvs.parquet’ saved [60694551/60694551]\n", "\n", "0.14user 0.76system 1:29.96elapsed 1%CPU (0avgtext+0avgdata 6948maxresident)k\n", "0inputs+0outputs (0major+363minor)pagefaults 0swaps\n" ] } ], "source": [ "import time\n", "t_start = time.time()\n", "\n", "# Issuing a STOQS api request from inside the stoqs container - where this\n", "# notebook is running - is not really possible. For testing with a \n", "# host='localhost' url you need to make that request from your system and \n", "# then copy the .parquet file to this directory: stoqs/contrib/notebooks.\n", "\n", "# We have make two downloads as lrauvs and dorado have different Parameter names.\n", "# It's theoretically possible to download all Parameter names from all three\n", "# platforms, but that exceeds the container's RAM in my 16 GB Docker machine.\n", "# It's more efficient to download just what we need.\n", "\n", "##host = 'localhost'\n", "host = 'stoqs.shore.mbari.org'\n", "url_dorado = (f'https://{host}/stoqs_september2013/api/measuredparameter.parquet?'\n", " 'parameter__name=bbp420¶meter__name=fl700_uncorr&'\n", " 'parameter__name=salinity¶meter__name=temperature&'\n", " 'measurement__instantpoint__activity__platform__name=dorado&'\n", " 'collect=name')\n", "##print(url_dorado) # Uncoment for 'localhost' download from system browser\n", "url_lrauvs = (f'https://{host}/stoqs_september2013/api/measuredparameter.parquet?'\n", " 'parameter__name=bb470¶meter__name=chlorophyll&'\n", " 'parameter__name=salinity¶meter__name=temperature&'\n", " 'measurement__instantpoint__activity__platform__name=daphne&'\n", " 'measurement__instantpoint__activity__platform__name=tethys&'\n", " 'collect=name')\n", "##print(url_lrauv) # Uncoment for 'localhost' download from system browser\n", "\n", "!time wget --no-check-certificate -O stoqs_september2013_dorado.parquet \"{url_dorado}\"\n", "!time wget --no-check-certificate -O stoqs_september2013_lrauvs.parquet \"{url_lrauvs}\"" ] }, { "cell_type": "code", "execution_count": 2, "id": "significant-victorian", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 152 ms, sys: 32 ms, total: 184 ms\n", "Wall time: 226 ms\n", "dorado data: (120505, 4)\n", "CPU times: user 1.7 s, sys: 204 ms, total: 1.9 s\n", "Wall time: 2.05 s\n", "lrauv data: (1531962, 4)\n" ] } ], "source": [ "import pandas as pd\n", "\n", "%time df_dorado = pd.read_parquet('stoqs_september2013_dorado.parquet')\n", "print(f\"dorado data: {df_dorado.shape}\")\n", "##print(df_dorado.head(2))\n", "\n", "%time df_lrauvs = pd.read_parquet('stoqs_september2013_lrauvs.parquet')\n", "print(f\"lrauv data: {df_lrauvs.shape}\")\n", "##print(df_lrauvs.head(2))\n", "\n", "# Combine into single DataFrame for more generalized follow-on processing\n", "df = df_dorado.append(df_lrauvs)" ] }, { "cell_type": "code", "execution_count": 3, "id": "electric-signature", "metadata": {}, "outputs": [], "source": [ "# Commit with do_plots = False, change to True for plots, but don't check it in that way\n", "do_plots = False\n", "plots = None\n", "if do_plots:\n", " import colorcet\n", " import holoviews as hv\n", " from holoviews.operation.datashader import datashade\n", "\n", " hv.extension(\"bokeh\")\n", "\n", " pts_dorado = hv.Points(df, kdims=['bbp420', 'fl700_uncorr'])\n", " pts_daphne = hv.Points(df, kdims=['bb470', 'chlorophyll'])\n", " pts_tethys = hv.Points(df, kdims=['bb470', 'chlorophyll'])\n", "\n", " plots = ( datashade(pts_dorado, cmap=colorcet.fire).opts(title='dorado')\n", " + datashade(pts_daphne, cmap=colorcet.fire).opts(title='daphne') \n", " + datashade(pts_tethys, cmap=colorcet.fire).opts(title='tethys') )\n", "plots" ] }, { "cell_type": "code", "execution_count": 4, "id": "boxed-population", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Time to execute this notebook: 103.8 seconds\n" ] } ], "source": [ "print(f\"Time to execute this notebook: {(time.time() - t_start):.1f} seconds\")" ] } ], "metadata": { "kernelspec": { "display_name": "Django Shell-Plus", "language": "python", "name": "django_extensions" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 5 }