{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Access SOOP CO2 data in Parquet\n", "\n", "A jupyter notebook to show how to access and plot SOOP CO2 data available as a [Parquet](https://parquet.apache.org) dataset on S3" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "dataset_name = \"vessel_co2_delayed_qc\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install/Update packages and Load common functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# only run once, then restart session if needed\n", "!pip install uv\n", "\n", "import os\n", "import sys\n", "\n", "def is_colab():\n", " try:\n", " import google.colab\n", " return True\n", " except ImportError:\n", " return False\n", "\n", "# Get the current directory of the notebook\n", "current_dir = os.getcwd()\n", "\n", "# Check if requirements.txt exists in the current directory\n", "local_requirements = os.path.join(current_dir, 'requirements.txt')\n", "if os.path.exists(local_requirements):\n", " requirements_path = local_requirements\n", "else:\n", " # Fall back to the online requirements.txt file\n", " requirements_path = 'https://raw.githubusercontent.com/aodn/aodn_cloud_optimised/main/notebooks/requirements.txt'\n", "\n", "# Install packages using uv and the determined requirements file\n", "if is_colab():\n", " os.system(f'uv pip install --system -r {requirements_path}')\n", "else:\n", " os.system('uv venv')\n", " os.system(f'uv pip install -r {requirements_path}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "import os\n", "if not os.path.exists('parquet_queries.py'):\n", " print('Downloading parquet_queries.py')\n", " url = 'https://raw.githubusercontent.com/aodn/aodn_cloud_optimised/main/aodn_cloud_optimised/lib/ParquetDataQuery.py'\n", " response = requests.get(url)\n", " with open('parquet_queries.py', 'w') as f:\n", " f.write(response.text)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/lbesnard/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning\n", " warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')\n" ] } ], "source": [ "from parquet_queries import create_time_filter, create_bbox_filter, query_unique_value, plot_spatial_extent, \\\n", " get_temporal_extent, get_schema_metadata\n", "import pyarrow.parquet as pq\n", "import pyarrow.dataset as pds\n", "import pyarrow as pa\n", "import pandas as pd\n", "import pyarrow.compute as pc\n", "import matplotlib.pyplot as plt\n", "from matplotlib.collections import LineCollection\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Location of the parquet dataset" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "BUCKET_OPTIMISED_DEFAULT=\"aodn-cloud-optimised\"\n", "dname = f\"s3://anonymous@{BUCKET_OPTIMISED_DEFAULT}/{dataset_name}.parquet/\"\n", "parquet_ds = pq.ParquetDataset(dname,partitioning='hive')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Understanding the Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get partition keys\n", "Partitioning in Parquet involves organising data files based on the values of one or more columns, known as partition keys. When data is written to Parquet files with partitioning enabled, the files are physically stored in a directory structure that reflects the partition keys. This directory structure makes it easier to retrieve and process specific subsets of data based on the partition keys." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "timestamp: int32\n", "polygon: string\n", "platform_code: string\n" ] } ], "source": [ "dataset = pds.dataset(dname, format=\"parquet\", partitioning=\"hive\")\n", "\n", "partition_keys = dataset.partitioning.schema\n", "print(partition_keys)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## List unique partition values" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['ZMFR', 'VLMJ']\n", "CPU times: user 19.8 ms, sys: 0 ns, total: 19.8 ms\n", "Wall time: 18.9 ms\n" ] } ], "source": [ "%%time\n", "unique_partition_value = query_unique_value(parquet_ds, 'platform_code')\n", "print(list(unique_partition_value)[0:2]) # showing a subset only" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualise Spatial Extent of the dataset\n", "In this section, we're plotting the polygons where data exists. This helps then with creating a bounding box where there is data" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjMAAACQCAYAAAD9ReqrAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/TGe4hAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAVeUlEQVR4nO3de1BU1x3A8R8Qd4HKAspLFAWiwVZFJCkUWzO2UsGmbfqY1ljbinXIi4zTQJOGpgG1D6hakoxjatoZ8Y9kQpJOGzutMWmtNA8xqRZ8a9VoglEwiZHFxIDAr3+k3MnKXtwVlvXA9zOzM+y5Zy/nt/dy98fl/DghqqoCAABgqNBgDwAAAGAgSGYAAIDRSGYAAIDRSGYAAIDRSGYAAIDRSGYAAIDRSGYAAIDRSGYAAIDRrgv2AAaqp6dHTp8+LVFRURISEhLs4QAAAB+oqrS3t0tycrKEhg7s3orxyczp06clJSUl2MMAAABXobm5WSZMmDCgfRifzERFRYnIx2+Gy+UK8mgAAIAv3G63pKSkWJ/jA2F8MtP7pyWXy0UyAwCAYQZjiggTgAEAgNGMvzNzzVmxwnt7fb339rlz/dsPAADwwJ0ZAABgtGsimVm/fr2kpqZKeHi45Obmyuuvvx7sIQEAAEMEPZl5+umnpbS0VCorK+U///mPzJw5UwoKCuTs2bPBHhoAADBA0JOZmpoaKS4ulqVLl8pnPvMZ2bBhg0RGRsrGjRuDPTQAAGCAoCYznZ2dsnv3bsnPz7faQkNDJT8/XxoaGry+pqOjQ9xut8cDAACMXEGtZnr33Xelu7tbEhMTPdoTExPl8OHDXl9TVVUlK1euHIrh2euv0mjTJu/t588HYCAAAA+Brii9mkpTqlMDLuh/ZvJXeXm5tLW1WY/m5uZgDwkAAARRUO/MxMXFSVhYmLS2tnq0t7a2SlJSktfXOJ1OcTqdQzE8AABggKDemXE4HHLjjTfKtm3brLaenh7Ztm2b5OXlBXFkAADAFEH/D8ClpaWyZMkSuemmmyQnJ0ceeeQR+eCDD2Tp0qXBHhoAADBA0JOZhQsXyjvvvCMVFRXS0tIiWVlZsnXr1j6TggEAALwJUVUN9iAGwu12S3R0tLS1tQ3dqtl2s99FRJqa/NtXVpb3druZ9wBgAn+rigbLyZPe2+0qSmNivLenpvq3//7Y7ctfga68GuKqq8H8/DaumgkAAOCTSGYAAIDRSGYAAIDRSGYAAIDRSGYAAIDRSGYAAIDRgv5/ZgAAw5BdCba//75isPhbgm3nakq2Ax2zXUm13cLHg7X/awh3ZgAAgNFIZgAAgNFIZgAAgNFIZgAAgNFIZgAAgNGoZgIADB92VUtFRd7bB6tSZzAXILZjVzFlV7Xk7+KaBuPODAAAMBrJDAAAMBrJDAAAMBrJDAAAMBrJDAAAMBrVTAAA8wSraslOf9VMdlVIdtVGdvztb8dufSkD1mCyw50ZAABgNJIZAABgNJIZAABgNJIZAABgNJIZAABgNKqZ+mM3s9tuZjoAjCT9Vb8E6zpZX++9vb9qI3/Y7ae/98JuTIO1ZpO/7I6NXQwGVDlxZwYAABiNZAYAABiNZAYAABiNZAYAABiNZAYAABiNaiYAwNWxq9IRGbx1hPzdf6ArhOwqgfp7L6iADTjuzAAAAKORzAAAAKORzAAAAKORzAAAAKORzAAAAKOFqKoGexAD4Xa7JTo6Wtra2sTlcg3NN+1vjQ9/Z9LHxHhvLyry3m7AGhkAhhm7686mTfavCXQ1kx27a6qdYI0zmLKyvLf3V5EVAIP5+c2dGQAAYDSSGQAAYDSSGQAAYDSSGQAAYDSSGQAAYDTWZgKAkcauOsmumsVubaGRWAmEaxJ3ZgAAgNFIZgAAgNFIZgAAgNFIZgAAgNFIZgAAgNGoZroa/a3N5O+s/9RU7+2swQRgoPxdU2k4VCcNhxgGi906Vf19hhmKOzMAAMBoAUtmUlNTJSQkxONRXV3t0Wfv3r0yZ84cCQ8Pl5SUFFm9enWghgMAAIapgP6ZadWqVVJcXGw9j4qKsr52u90yf/58yc/Plw0bNsi+ffvkRz/6kcTExMjtt98eyGEBAIBhJKDJTFRUlCQlJXnd9uSTT0pnZ6ds3LhRHA6HTJs2TZqamqSmpoZkBgAA+Cygc2aqq6tl7NixMmvWLFmzZo10dXVZ2xoaGuTmm28Wh8NhtRUUFMiRI0fk/ffft91nR0eHuN1ujwcAABi5AnZnZvny5ZKdnS1jxoyRHTt2SHl5uZw5c0ZqampERKSlpUXS0tI8XpOYmGhti42N9brfqqoqWblyZaCG7Zv+Ko3s1jZpagrAQADgKthVUbIGk5nsqpaKiry3D8NqWb/uzDzwwAN9JvVe/jh8+LCIiJSWlsrcuXMlMzNT7rzzTvntb38r69atk46OjgENuLy8XNra2qxHc3PzgPYHAADM5tedmbKyMimyy/T+Lz093Wt7bm6udHV1ycmTJyUjI0OSkpKktbXVo0/vc7t5NiIiTqdTnE6nP8MGAADDmF/JTHx8vMTHx1/VN2pqapLQ0FBJSEgQEZG8vDx58MEH5dKlSzJq1CgREfn73/8uGRkZtn9iAgAAuFxAJgA3NDTII488Inv27JE33nhDnnzySbn33nvl+9//vpWofO973xOHwyHLli2TAwcOyNNPPy2PPvqolJaWBmJIAABgmArIBGCn0yl1dXWyYsUK6ejokLS0NLn33ns9EpXo6Gh58cUXpaSkRG688UaJi4uTiooKyrIBAIBfApLMZGdny86dO6/YLzMzU15++eVADCGw+psJblcNAACXs7uW2FVFYmSwq07yl915ZHfeGVzlxNpMAADAaCQzAADAaCQzAADAaCQzAADAaCQzAADAaCQzAADAaAFbaHJY669skgXZAPhqOC9Ma1defK0tculvGfRQXOPt3qO5c723U8rPnRkAAGA2khkAAGA0khkAAGA0khkAAGA0khkAAGA0qpmuht2M8mDvC4BZhvPPv11s/i5yGOhKHX+PwVBUDvn73oE7MwAAwGwkMwAAwGgkMwAAwGgkMwAAwGgkMwAAwGghqqrBHsRAuN1uiY6Olra2NnG5XMEeDgAA8MFgfn5zZwYAABiNZAYAABjN+H+a1/tXMrfbHeSRAAAAX/V+bg/GbBfjk5n29nYREUlJSQnySAAAgL/a29slOjp6QPswfgJwT0+PnD59WqKioiQkJETcbrekpKRIc3PziJkQTMzEPFwRMzEPV8TsElWV9vZ2SU5OltDQgc16Mf7OTGhoqEyYMKFPu8vlGjEnSC9iHhmIeWQg5pFhpMc80DsyvZgADAAAjEYyAwAAjDbskhmn0ymVlZXidDqDPZQhQ8wjAzGPDMQ8MhDz4DJ+AjAAABjZht2dGQAAMLKQzAAAAKORzAAAAKORzAAAAKMZm8ycPHlSli1bJmlpaRIRESHXX3+9VFZWSmdnp0e/vXv3ypw5cyQ8PFxSUlJk9erVffb17LPPytSpUyU8PFxmzJghW7ZsGaow/ParX/1KZs+eLZGRkRITE+O1T0hISJ9HXV2dR5/6+nrJzs4Wp9MpkydPlk2bNgV+8FfBl3jfeustueWWWyQyMlISEhLkvvvuk66uLo8+psRrJzU1tc8xra6u9ujjy7lukvXr10tqaqqEh4dLbm6uvP7668Ee0qBZsWJFn+M5depUa/tHH30kJSUlMnbsWBk9erR8+9vfltbW1iCO2H8vvfSSfO1rX5Pk5GQJCQmR5557zmO7qkpFRYWMGzdOIiIiJD8/X44ePerR59y5c7J48WJxuVwSExMjy5YtkwsXLgxhFP65UsxFRUV9jnthYaFHH5Nirqqqks9+9rMSFRUlCQkJ8o1vfEOOHDni0ceXc9mXa/gVqaGef/55LSoq0hdeeEGPHz+umzdv1oSEBC0rK7P6tLW1aWJioi5evFj379+vTz31lEZEROjjjz9u9Xn11Vc1LCxMV69erQcPHtSf//znOmrUKN23b18wwrqiiooKramp0dLSUo2OjvbaR0S0trZWz5w5Yz0uXrxobX/jjTc0MjJSS0tL9eDBg7pu3ToNCwvTrVu3DlEUvrtSvF1dXTp9+nTNz8/XxsZG3bJli8bFxWl5ebnVx6R47UyaNElXrVrlcUwvXLhgbfflXDdJXV2dOhwO3bhxox44cECLi4s1JiZGW1tbgz20QVFZWanTpk3zOJ7vvPOOtf3OO+/UlJQU3bZtm+7atUs/97nP6ezZs4M4Yv9t2bJFH3zwQf3Tn/6kIqJ//vOfPbZXV1drdHS0Pvfcc7pnzx79+te/rmlpaR7XqsLCQp05c6bu3LlTX375ZZ08ebIuWrRoiCPx3ZViXrJkiRYWFnoc93Pnznn0MSnmgoICra2t1f3792tTU5N+5Stf0YkTJ3pcm650LvtyDfeFscmMN6tXr9a0tDTr+WOPPaaxsbHa0dFhtf30pz/VjIwM6/l3v/tdveWWWzz2k5ubq3fccUfgBzwAtbW1/SYzl/8QfdL999+v06ZN82hbuHChFhQUDOIIB5ddvFu2bNHQ0FBtaWmx2n73u9+py+WyjruJ8V5u0qRJ+vDDD9tu9+VcN0lOTo6WlJRYz7u7uzU5OVmrqqqCOKrBU1lZqTNnzvS67fz58zpq1Ch99tlnrbZDhw6piGhDQ8MQjXBwXX5N6unp0aSkJF2zZo3Vdv78eXU6nfrUU0+pqurBgwdVRPTf//631ef555/XkJAQffvtt4ds7FfLLpm59dZbbV9jesxnz55VEdF//etfqurbuezLNdwXxv6ZyZu2tjYZM2aM9byhoUFuvvlmcTgcVltBQYEcOXJE3n//fatPfn6+x34KCgqkoaFhaAYdICUlJRIXFyc5OTmyceNGjyXWh1PMDQ0NMmPGDElMTLTaCgoKxO12y4EDB6w+wyHe6upqGTt2rMyaNUvWrFnjcRvWl3PdFJ2dnbJ7926PYxYaGir5+fnGHbP+HD16VJKTkyU9PV0WL14sb731loiI7N69Wy5duuQR/9SpU2XixInDJv4TJ05IS0uLR4zR0dGSm5trxdjQ0CAxMTFy0003WX3y8/MlNDRUXnvttSEf82Cpr6+XhIQEycjIkLvuukvee+89a5vpMbe1tYmIWJ/DvpzLvlzDfWH8QpO9jh07JuvWrZO1a9dabS0tLZKWlubRr/cNa2lpkdjYWGlpafF4E3v7tLS0BH7QAbJq1Sr50pe+JJGRkfLiiy/K3XffLRcuXJDly5eLiNjG7Ha75eLFixIRERGMYV8Vu1h6t/XXx6R4ly9fLtnZ2TJmzBjZsWOHlJeXy5kzZ6SmpkZEfDvXTfHuu+9Kd3e312N2+PDhII1qcOXm5sqmTZskIyNDzpw5IytXrpQ5c+bI/v37paWlRRwOR585YqZflz6pN47+rr0tLS2SkJDgsf26666TMWPGGPs+FBYWyre+9S1JS0uT48ePy89+9jNZsGCBNDQ0SFhYmNEx9/T0yI9//GP5/Oc/L9OnTxcR8elc9uUa7otrLpl54IEH5De/+U2/fQ4dOuQxWe7tt9+WwsJC+c53viPFxcWBHuKgu5qY+/PQQw9ZX8+aNUs++OADWbNmjZXMBNtgx2sqf96H0tJSqy0zM1McDofccccdUlVVNaL+HfpwsWDBAuvrzMxMyc3NlUmTJskzzzxjRHKNq3PbbbdZX8+YMUMyMzPl+uuvl/r6epk3b14QRzZwJSUlsn//fnnllVeC8v2vuWSmrKxMioqK+u2Tnp5ufX369Gn54he/KLNnz5bf//73Hv2SkpL6zJrufZ6UlNRvn97tQ8HfmP2Vm5srv/jFL6Sjo0OcTqdtzC6Xa0gupIMZb1JSUp8qF1+P8VDFa2cg70Nubq50dXXJyZMnJSMjw6dz3RRxcXESFhYW9J/LoRQTEyM33HCDHDt2TL785S9LZ2ennD9/3uM32uEUf28cra2tMm7cOKu9tbVVsrKyrD5nz571eF1XV5ecO3du2LwP6enpEhcXJ8eOHZN58+YZG/M999wjf/3rX+Wll16SCRMmWO1JSUlXPJd9uYb7ZHCm/QTHqVOndMqUKXrbbbdpV1dXn+29kyI7OzuttvLy8j4TgL/61a96vC4vL8/oCcCX++Uvf6mxsbHW8/vvv1+nT5/u0WfRokXX9ITYK00A/mSVy+OPP64ul0s/+ugjVTUz3it54oknNDQ01KqE8OVcN0lOTo7ec8891vPu7m4dP378sJkAfLn29naNjY3VRx991Jo0+cc//tHafvjw4WE5AXjt2rVWW1tbm9cJwLt27bL6vPDCC8ZMhr08Zm+am5s1JCREN2/erKrmxdzT06MlJSWanJys//3vf/ts9+Vc9uUa7gtjk5lTp07p5MmTdd68eXrq1CmPUrde58+f18TERP3BD36g+/fv17q6Oo2MjOxTmn3dddfp2rVr9dChQ1pZWXlNl2a/+eab2tjYqCtXrtTRo0drY2OjNjY2ant7u6qq/uUvf9E//OEPum/fPj169Kg+9thjGhkZqRUVFdY+ekuV77vvPj106JCuX7/+mi1VvlK8vWV98+fP16amJt26davGx8d7Lc02IV5vduzYoQ8//LA2NTXp8ePH9YknntD4+Hj94Q9/aPXx5Vw3SV1dnTqdTt20aZMePHhQb7/9do2JifGoeDBZWVmZ1tfX64kTJ/TVV1/V/Px8jYuL07Nnz6rqx+WsEydO1H/+85+6a9cuzcvL07y8vCCP2j/t7e3Wz6uIaE1NjTY2Nuqbb76pqh+XZsfExOjmzZt17969euutt3otzZ41a5a+9tpr+sorr+iUKVOu2TJl1f5jbm9v15/85Cfa0NCgJ06c0H/84x+anZ2tU6ZM8fjQNinmu+66S6Ojo7W+vt7jM/jDDz+0+lzpXPblGu4LY5OZ2tpaFRGvj0/as2ePfuELX1Cn06njx4/X6urqPvt65pln9IYbblCHw6HTpk3Tv/3tb0MVht+WLFniNebt27er6sdlfFlZWTp69Gj91Kc+pTNnztQNGzZod3e3x362b9+uWVlZ6nA4ND09XWtra4c+GB9cKV5V1ZMnT+qCBQs0IiJC4+LitKysTC9duuSxH1Pi9Wb37t2am5ur0dHRGh4erp/+9Kf117/+dZ/fWnw5102ybt06nThxojocDs3JydGdO3cGe0iDZuHChTpu3Dh1OBw6fvx4XbhwoR47dszafvHiRb377rs1NjZWIyMj9Zvf/KbHL2om2L59u9ef3SVLlqjqx7/VP/TQQ5qYmKhOp1PnzZunR44c8djHe++9p4sWLdLRo0ery+XSpUuXWr/IXIv6i/nDDz/U+fPna3x8vI4aNUonTZqkxcXFfRJ0k2K2+wz+5PXVl3PZl2v4lYT8f0AAAABGGlb/ZwYAAIw8JDMAAMBoJDMAAMBoJDMAAMBoJDMAAMBoJDMAAMBoJDMAAMBoJDMAAMBoJDMAAMBoJDMAAMBoJDMAAMBoJDMAAMBo/wNQscvPmYAiLgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_spatial_extent(parquet_ds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get Temporal Extent of the dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similary to the spatial extent, we're retrieving the minimum and maximum timestamp partition values of the dataset. This is not necessarely accurately representative of the TIME values, as the timestamp partition can be yearly/monthly... but is here to give an idea" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(datetime.datetime(2008, 1, 1, 11, 0), datetime.datetime(2024, 4, 1, 11, 0))" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_temporal_extent(parquet_ds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read Metadata\n", "\n", "For all parquet dataset, we create a sidecar file in the root of the dataset named **_common_matadata**. This contains the variable attributes." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'TIME': {'type': 'timestamp[ns]',\n", " 'standard_name': 'time',\n", " 'long_name': 'analysis_time',\n", " 'axis': 'T',\n", " 'valid_min': 0.0,\n", " 'valid_max': 999999.0,\n", " 'ancillary_variables': 'TIME_quality_control'},\n", " 'TIME_quality_control': {'type': 'float',\n", " 'standard_name': 'time status_flag',\n", " 'long_name': 'Quality Control flag for time',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'LATITUDE': {'type': 'double',\n", " 'standard_name': 'latitude',\n", " 'long_name': 'latitude',\n", " 'units': 'degrees_north',\n", " 'axis': 'Y',\n", " 'valid_min': -90.0,\n", " 'valid_max': 90.0,\n", " 'reference_datum': 'geographical coordinates, WGS84 projection',\n", " 'ancillary_variables': 'LATITUDE_quality_control'},\n", " 'LATITUDE_quality_control': {'type': 'float',\n", " 'standard_name': 'latitude status_flag',\n", " 'long_name': 'Quality Control flag for latitude',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'LONGITUDE': {'type': 'double',\n", " 'standard_name': 'longitude',\n", " 'long_name': 'longitude',\n", " 'units': 'degrees_east',\n", " 'axis': 'X',\n", " 'valid_min': -180.0,\n", " 'valid_max': 180.0,\n", " 'reference_datum': 'geographical coordinates, WGS84 projection',\n", " 'ancillary_variables': 'LONGITUDE_quality_control'},\n", " 'LONGITUDE_quality_control': {'type': 'float',\n", " 'standard_name': 'longitude status_flag',\n", " 'long_name': 'Quality Control flag for longitude',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'TEMP': {'type': 'double',\n", " 'standard_name': 'sea_surface_temperature',\n", " 'long_name': 'sea surface temperature',\n", " 'units': 'degree_Celsius',\n", " 'valid_min': -2.0,\n", " 'valid_max': 40.0,\n", " 'ancillary_variables': 'TEMP_quality_control'},\n", " 'TEMP_quality_control': {'type': 'float',\n", " 'standard_name': 'sea_surface_temperature status_flag',\n", " 'long_name': 'Quality Control flag for sea_surface_temperature',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'TEMP_2': {'type': 'double',\n", " 'long_name': 'equilibrator water temperature',\n", " 'units': 'degree_Celsius',\n", " 'valid_min': -2.0,\n", " 'valid_max': 40.0,\n", " 'ancillary_variables': 'TEMP_2_quality_control'},\n", " 'TEMP_2_quality_control': {'type': 'float',\n", " 'long_name': 'Quality Control flag for sea_surface_temperature',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'PSAL': {'type': 'double',\n", " 'standard_name': 'sea_surface_salinity',\n", " 'long_name': 'sea surface salinity',\n", " 'units': '1e-3',\n", " 'valid_min': 0.0,\n", " 'valid_max': 42.0,\n", " 'ancillary_variables': 'PSAL_quality_control'},\n", " 'PSAL_quality_control': {'type': 'float',\n", " 'standard_name': 'sea_surface_salinity status_flag',\n", " 'long_name': 'Quality Control flag for sea_surface_salinity',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'WSPD': {'type': 'double',\n", " 'standard_name': 'wind_speed',\n", " 'long_name': 'wind speed',\n", " 'units': 'm s-1',\n", " 'ancillary_variables': 'WSPD_quality_control'},\n", " 'WSPD_quality_control': {'type': 'float',\n", " 'standard_name': 'wind_speed status_flag',\n", " 'long_name': 'Quality Control flag for wind speed',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'WDIR': {'type': 'double',\n", " 'long_name': 'wind direction',\n", " 'units': 'degree',\n", " 'ancillary_variables': 'WDIR_quality_control',\n", " 'comment': 'true wind direction where 0 is North and 90 is East'},\n", " 'WDIR_quality_control': {'type': 'float',\n", " 'long_name': 'Quality Control flag for wind direction',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'Press_Equil': {'type': 'double',\n", " 'long_name': 'equilibrator head space pressure',\n", " 'units': 'hPa',\n", " 'ancillary_variables': 'Press_Equil_quality_control'},\n", " 'Press_Equil_quality_control': {'type': 'float',\n", " 'long_name': 'Quality Control flag for equilibrator head space pressure',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'Press_ATM': {'type': 'double',\n", " 'long_name': 'barometric pressure',\n", " 'units': 'hPa',\n", " 'ancillary_variables': 'Press_ATM_quality_control'},\n", " 'Press_ATM_quality_control': {'type': 'float',\n", " 'long_name': 'Quality Control flag for barometric pressure',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'xCO2EQ_PPM': {'type': 'double',\n", " 'long_name': 'mole fraction of CO2 in the equilibrator head space (dry)',\n", " 'units': '1e-6',\n", " 'ancillary_variables': 'xCO2EQ_PPM_quality_control',\n", " 'comment': 'the unit 1e-6 is also called parts per million (ppm)'},\n", " 'xCO2EQ_PPM_quality_control': {'type': 'float',\n", " 'long_name': 'Quality Control flag for xCO2EQ_PPM',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'xCO2ATM_PPM': {'type': 'double',\n", " 'long_name': 'mole fraction of CO2 in the atmosphere (dry) measured every 4 hours after standard runs',\n", " 'units': '1e-6',\n", " 'ancillary_variables': 'xCO2ATM_PPM_quality_control',\n", " 'comment': 'the unit 1e-6 is also called parts per million (ppm)'},\n", " 'xCO2ATM_PPM_quality_control': {'type': 'float',\n", " 'long_name': 'Quality Control flag for xCO2ATM_PPM',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'xCO2ATM_PPM_INTERPOLATED': {'type': 'double',\n", " 'long_name': 'mole fraction of CO2 in the atmosphere (dry) measured every 4 hours after standard runs and values linearly interpolated to the times shown',\n", " 'units': '1e-6',\n", " 'ancillary_variables': 'xCO2ATM_PPM_INTERPOLATED_quality_control',\n", " 'comment': 'the unit 1e-6 is also called parts per million (ppm)'},\n", " 'xCO2ATM_PPM_INTERPOLATED_quality_control': {'type': 'float',\n", " 'long_name': 'Quality Control flag for xCO2ATM_PPM_INTERPOLATED',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'fCO2SW_UATM': {'type': 'double',\n", " 'long_name': 'fugacity of carbon dioxide at surface water salinity and temperature',\n", " 'units': 'microatmospheres',\n", " 'ancillary_variables': 'fCO2SW_UATM_quality_control'},\n", " 'fCO2SW_UATM_quality_control': {'type': 'float',\n", " 'long_name': 'Quality Control flag for fCO2SW_UATM',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'fCO2ATM_UATM_INTERPOLATED': {'type': 'double',\n", " 'long_name': 'fugacity of CO2 in the atmosphere',\n", " 'units': 'microatmospheres',\n", " 'ancillary_variables': 'fCO2ATM_UATM_INTERPOLATED_quality_control'},\n", " 'fCO2ATM_UATM_INTERPOLATED_quality_control': {'type': 'float',\n", " 'long_name': 'Quality Control flag for fCO2ATM_UATM_INTERPOLATED',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'DfCO2': {'type': 'double',\n", " 'long_name': 'Difference between fCO2SW and fCO2ATM',\n", " 'units': 'microatmospheres',\n", " 'ancillary_variables': 'DfCO2_quality_control'},\n", " 'DfCO2_quality_control': {'type': 'float',\n", " 'long_name': 'Quality Control flag for DfCO2',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'LICORflow': {'type': 'double',\n", " 'long_name': 'Gas flow through infrared gas analyser',\n", " 'units': 'ml min-1',\n", " 'ancillary_variables': 'LICORflow_quality_control'},\n", " 'LICORflow_quality_control': {'type': 'float',\n", " 'long_name': 'Quality Control flag for LICORflow',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'H2OFLOW': {'type': 'double',\n", " 'long_name': 'water flow to equilibrator',\n", " 'units': 'L min-1',\n", " 'ancillary_variables': 'H2OFLOW_quality_control'},\n", " 'H2OFLOW_quality_control': {'type': 'float',\n", " 'long_name': 'Quality Control flag for H2OFLOW',\n", " 'quality_control_conventions': 'WOCE quality control procedure',\n", " 'valid_min': 2,\n", " 'valid_max': 4,\n", " 'flag_values': [2, 3, 4],\n", " 'flag_meanings': 'good questionable bad',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005',\n", " 'ancillary_variables': 'SUBFLAG'},\n", " 'SUBFLAG': {'type': 'float',\n", " 'long_name': 'secondary flags, only for questionable measurements, WOCE flag 3 (Pierrot et Al 2009)',\n", " 'valid_min': 1,\n", " 'valid_max': 10,\n", " 'flag_values': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n", " 'flag_meanings': 'Outside_of_standard_range Questionable_or_interpolated_SST Questionable_EQU_temperature Anomalous_EQU_temperature-SST_+or-1degC Questionable_sea-surface_salinity Questionable_pressure Low_EQU_gas_flow Questionable_air_value Interpolated_standard Other_see_metadata',\n", " 'references': 'Pierrot,D. et al. 2009, Recommendations for Autonomous Underway pCO2 Measuring Systems and Data Reduction Routines, Deep-Sea Research II, doi:10.1016/j.dsr2.2008.12.005'},\n", " 'TYPE': {'type': 'string',\n", " 'long_name': 'measurement type (equilibrator, standard or atmosphere)',\n", " 'units': 'categorical'},\n", " 'timestamp': {'type': 'int64'},\n", " 'polygon': {'type': 'string'},\n", " 'platform_code': {'type': 'string'},\n", " 'cruise_id': {'type': 'string'},\n", " 'vessel_name': {'type': 'string'},\n", " 'filename': {'type': 'string'},\n", " 'dataset_metadata': {'metadata_uuid': '63db5801-cc19-40ef-83b3-85ccba884cf7',\n", " 'title': 'Upper Ocean Thermal Data collected using XBT (expendable bathythermographs)',\n", " 'principal_investigator': 'Cowley, Rebecca',\n", " 'principal_investigator_email': 'rebecca.cowley@csiro.au',\n", " 'featureType': 'profile'}}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# parquet_meta = pa.parquet.read_schema(os.path.join(dname + '_common_metadata')) # parquet metadata\n", "metadata = get_schema_metadata(dname) # schema metadata\n", "metadata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Query and Plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a TIME and BoundingBox filter" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "filter_time = create_time_filter(parquet_ds, date_start='2020-12-23 10:14:00', date_end='2024-01-01 07:50:00')\n", "filter_geo = create_bbox_filter(parquet_ds, lat_min=-34, lat_max=-32, lon_min=150, lon_max=155)\n", "\n", "\n", "filter = filter_geo & filter_time" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 10199 entries, 0 to 10198\n", "Data columns (total 44 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 TIME 10199 non-null datetime64[ns]\n", " 1 TIME_quality_control 10199 non-null float32 \n", " 2 LATITUDE 10199 non-null float64 \n", " 3 LATITUDE_quality_control 10199 non-null float32 \n", " 4 LONGITUDE 10199 non-null float64 \n", " 5 LONGITUDE_quality_control 10199 non-null float32 \n", " 6 TEMP 10199 non-null float64 \n", " 7 TEMP_quality_control 10199 non-null float32 \n", " 8 TEMP_2 10196 non-null float64 \n", " 9 TEMP_2_quality_control 10199 non-null float32 \n", " 10 PSAL 10199 non-null float64 \n", " 11 PSAL_quality_control 10199 non-null float32 \n", " 12 WSPD 9592 non-null float64 \n", " 13 WSPD_quality_control 10199 non-null float32 \n", " 14 WDIR 10157 non-null float64 \n", " 15 WDIR_quality_control 10199 non-null float32 \n", " 16 Press_Equil 10199 non-null float64 \n", " 17 Press_Equil_quality_control 10199 non-null float32 \n", " 18 Press_ATM 10199 non-null float64 \n", " 19 Press_ATM_quality_control 10199 non-null float32 \n", " 20 xCO2EQ_PPM 9931 non-null float64 \n", " 21 xCO2EQ_PPM_quality_control 10199 non-null float32 \n", " 22 xCO2ATM_PPM 268 non-null float64 \n", " 23 xCO2ATM_PPM_quality_control 10199 non-null float32 \n", " 24 xCO2ATM_PPM_INTERPOLATED 10199 non-null float64 \n", " 25 xCO2ATM_PPM_INTERPOLATED_quality_control 10199 non-null float32 \n", " 26 fCO2SW_UATM 9931 non-null float64 \n", " 27 fCO2SW_UATM_quality_control 10199 non-null float32 \n", " 28 fCO2ATM_UATM_INTERPOLATED 10199 non-null float64 \n", " 29 fCO2ATM_UATM_INTERPOLATED_quality_control 10199 non-null float32 \n", " 30 DfCO2 9931 non-null float64 \n", " 31 DfCO2_quality_control 10199 non-null float32 \n", " 32 LICORflow 10199 non-null float64 \n", " 33 LICORflow_quality_control 10199 non-null float32 \n", " 34 H2OFLOW 10199 non-null float64 \n", " 35 H2OFLOW_quality_control 10199 non-null float32 \n", " 36 SUBFLAG 0 non-null float32 \n", " 37 TYPE 10199 non-null object \n", " 38 cruise_id 10199 non-null object \n", " 39 vessel_name 10199 non-null object \n", " 40 filename 10199 non-null object \n", " 41 timestamp 10199 non-null category \n", " 42 polygon 10199 non-null category \n", " 43 platform_code 10199 non-null category \n", "dtypes: category(3), datetime64[ns](1), float32(19), float64(17), object(4)\n", "memory usage: 2.5+ MB\n", "CPU times: user 415 ms, sys: 54.1 ms, total: 469 ms\n", "Wall time: 3.38 s\n" ] } ], "source": [ "%%time\n", "# using pandas instead of pyarrow so that filters can directly be applied to the data, and not just the partition\n", "df = pd.read_parquet(dname, engine='pyarrow',filters=filter)\n", "df.info()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TIMETIME_quality_controlLATITUDELATITUDE_quality_controlLONGITUDELONGITUDE_quality_controlTEMPTEMP_quality_controlTEMP_2TEMP_2_quality_control...H2OFLOWH2OFLOW_quality_controlSUBFLAGTYPEcruise_idvessel_namefilenametimestamppolygonplatform_code
02021-05-10 18:05:01.0000000002.0-33.9998712.0151.3956292.020.4352.020.662.0...2.092.0NaNEQUIN2021_V03RV InvestigatorIMOS_SOOP-CO2_GST_20210508T033536Z_VLMJ_FV01.nc16198272000103000000010000000500000000000000002062400000...VLMJ
12021-05-10 18:06:23.0000000002.0-33.9967692.0151.3993012.020.4292.020.662.0...2.092.0NaNEQUIN2021_V03RV InvestigatorIMOS_SOOP-CO2_GST_20210508T033536Z_VLMJ_FV01.nc16198272000103000000010000000500000000000000002062400000...VLMJ
22021-05-10 18:07:41.9999997442.0-33.9936542.0151.4029562.020.4382.020.672.0...2.092.0NaNEQUIN2021_V03RV InvestigatorIMOS_SOOP-CO2_GST_20210508T033536Z_VLMJ_FV01.nc16198272000103000000010000000500000000000000002062400000...VLMJ
32021-05-10 18:09:03.9999997442.0-33.9905432.0151.4066012.020.4282.020.662.0...2.092.0NaNEQUIN2021_V03RV InvestigatorIMOS_SOOP-CO2_GST_20210508T033536Z_VLMJ_FV01.nc16198272000103000000010000000500000000000000002062400000...VLMJ
42021-05-10 18:10:25.0000000002.0-33.9874262.0151.4102622.020.4322.020.662.0...2.092.0NaNEQUIN2021_V03RV InvestigatorIMOS_SOOP-CO2_GST_20210508T033536Z_VLMJ_FV01.nc16198272000103000000010000000500000000000000002062400000...VLMJ
..................................................................
101942023-11-01 16:37:56.0000000002.0-33.8527002.0151.4717002.020.1722.020.452.0...2.122.0NaNEQUIN2023_V06RV InvestigatorIMOS_SOOP-CO2_GST_20231009T002809Z_VLMJ_FV01.nc16987968000103000000010000000500000000000000002062400000...VLMJ
101952023-11-01 16:39:14.0000000002.0-33.8503002.0151.4704002.020.1642.020.452.0...2.122.0NaNEQUIN2023_V06RV InvestigatorIMOS_SOOP-CO2_GST_20231009T002809Z_VLMJ_FV01.nc16987968000103000000010000000500000000000000002062400000...VLMJ
101962023-11-01 16:40:34.0000000002.0-33.8479002.0151.4692002.020.1472.020.432.0...2.122.0NaNEQUIN2023_V06RV InvestigatorIMOS_SOOP-CO2_GST_20231009T002809Z_VLMJ_FV01.nc16987968000103000000010000000500000000000000002062400000...VLMJ
101972023-11-01 16:41:54.0000000002.0-33.8458002.0151.4681002.020.1292.020.422.0...2.122.0NaNEQUIN2023_V06RV InvestigatorIMOS_SOOP-CO2_GST_20231009T002809Z_VLMJ_FV01.nc16987968000103000000010000000500000000000000002062400000...VLMJ
101982023-11-01 16:43:14.0000000002.0-33.8437002.0151.4671002.020.1102.020.402.0...2.122.0NaNEQUIN2023_V06RV InvestigatorIMOS_SOOP-CO2_GST_20231009T002809Z_VLMJ_FV01.nc16987968000103000000010000000500000000000000002062400000...VLMJ
\n", "

10199 rows × 44 columns

\n", "
" ], "text/plain": [ " TIME TIME_quality_control LATITUDE \\\n", "0 2021-05-10 18:05:01.000000000 2.0 -33.999871 \n", "1 2021-05-10 18:06:23.000000000 2.0 -33.996769 \n", "2 2021-05-10 18:07:41.999999744 2.0 -33.993654 \n", "3 2021-05-10 18:09:03.999999744 2.0 -33.990543 \n", "4 2021-05-10 18:10:25.000000000 2.0 -33.987426 \n", "... ... ... ... \n", "10194 2023-11-01 16:37:56.000000000 2.0 -33.852700 \n", "10195 2023-11-01 16:39:14.000000000 2.0 -33.850300 \n", "10196 2023-11-01 16:40:34.000000000 2.0 -33.847900 \n", "10197 2023-11-01 16:41:54.000000000 2.0 -33.845800 \n", "10198 2023-11-01 16:43:14.000000000 2.0 -33.843700 \n", "\n", " LATITUDE_quality_control LONGITUDE LONGITUDE_quality_control \\\n", "0 2.0 151.395629 2.0 \n", "1 2.0 151.399301 2.0 \n", "2 2.0 151.402956 2.0 \n", "3 2.0 151.406601 2.0 \n", "4 2.0 151.410262 2.0 \n", "... ... ... ... \n", "10194 2.0 151.471700 2.0 \n", "10195 2.0 151.470400 2.0 \n", "10196 2.0 151.469200 2.0 \n", "10197 2.0 151.468100 2.0 \n", "10198 2.0 151.467100 2.0 \n", "\n", " TEMP TEMP_quality_control TEMP_2 TEMP_2_quality_control ... \\\n", "0 20.435 2.0 20.66 2.0 ... \n", "1 20.429 2.0 20.66 2.0 ... \n", "2 20.438 2.0 20.67 2.0 ... \n", "3 20.428 2.0 20.66 2.0 ... \n", "4 20.432 2.0 20.66 2.0 ... \n", "... ... ... ... ... ... \n", "10194 20.172 2.0 20.45 2.0 ... \n", "10195 20.164 2.0 20.45 2.0 ... \n", "10196 20.147 2.0 20.43 2.0 ... \n", "10197 20.129 2.0 20.42 2.0 ... \n", "10198 20.110 2.0 20.40 2.0 ... \n", "\n", " H2OFLOW H2OFLOW_quality_control SUBFLAG TYPE cruise_id \\\n", "0 2.09 2.0 NaN EQU IN2021_V03 \n", "1 2.09 2.0 NaN EQU IN2021_V03 \n", "2 2.09 2.0 NaN EQU IN2021_V03 \n", "3 2.09 2.0 NaN EQU IN2021_V03 \n", "4 2.09 2.0 NaN EQU IN2021_V03 \n", "... ... ... ... ... ... \n", "10194 2.12 2.0 NaN EQU IN2023_V06 \n", "10195 2.12 2.0 NaN EQU IN2023_V06 \n", "10196 2.12 2.0 NaN EQU IN2023_V06 \n", "10197 2.12 2.0 NaN EQU IN2023_V06 \n", "10198 2.12 2.0 NaN EQU IN2023_V06 \n", "\n", " vessel_name filename \\\n", "0 RV Investigator IMOS_SOOP-CO2_GST_20210508T033536Z_VLMJ_FV01.nc \n", "1 RV Investigator IMOS_SOOP-CO2_GST_20210508T033536Z_VLMJ_FV01.nc \n", "2 RV Investigator IMOS_SOOP-CO2_GST_20210508T033536Z_VLMJ_FV01.nc \n", "3 RV Investigator IMOS_SOOP-CO2_GST_20210508T033536Z_VLMJ_FV01.nc \n", "4 RV Investigator IMOS_SOOP-CO2_GST_20210508T033536Z_VLMJ_FV01.nc \n", "... ... ... \n", "10194 RV Investigator IMOS_SOOP-CO2_GST_20231009T002809Z_VLMJ_FV01.nc \n", "10195 RV Investigator IMOS_SOOP-CO2_GST_20231009T002809Z_VLMJ_FV01.nc \n", "10196 RV Investigator IMOS_SOOP-CO2_GST_20231009T002809Z_VLMJ_FV01.nc \n", "10197 RV Investigator IMOS_SOOP-CO2_GST_20231009T002809Z_VLMJ_FV01.nc \n", "10198 RV Investigator IMOS_SOOP-CO2_GST_20231009T002809Z_VLMJ_FV01.nc \n", "\n", " timestamp polygon \\\n", "0 1619827200 0103000000010000000500000000000000002062400000... \n", "1 1619827200 0103000000010000000500000000000000002062400000... \n", "2 1619827200 0103000000010000000500000000000000002062400000... \n", "3 1619827200 0103000000010000000500000000000000002062400000... \n", "4 1619827200 0103000000010000000500000000000000002062400000... \n", "... ... ... \n", "10194 1698796800 0103000000010000000500000000000000002062400000... \n", "10195 1698796800 0103000000010000000500000000000000002062400000... \n", "10196 1698796800 0103000000010000000500000000000000002062400000... \n", "10197 1698796800 0103000000010000000500000000000000002062400000... \n", "10198 1698796800 0103000000010000000500000000000000002062400000... \n", "\n", " platform_code \n", "0 VLMJ \n", "1 VLMJ \n", "2 VLMJ \n", "3 VLMJ \n", "4 VLMJ \n", "... ... \n", "10194 VLMJ \n", "10195 VLMJ \n", "10196 VLMJ \n", "10197 VLMJ \n", "10198 VLMJ \n", "\n", "[10199 rows x 44 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_sorted = df.sort_values('TIME')\n", "\n", "# Create a list of segments\n", "points = np.array([df_sorted['LONGITUDE'], df_sorted['LATITUDE']]).T.reshape(-1, 1, 2)\n", "segments = np.concatenate([points[:-1], points[1:]], axis=1)\n", "\n", "# Create a LineCollection with segments colored by temperature\n", "norm = plt.Normalize(df_sorted['TEMP'].min(), df_sorted['TEMP'].max())\n", "lc = LineCollection(segments, cmap='RdYlBu_r', norm=norm)\n", "lc.set_array(df_sorted['TEMP'])\n", "lc.set_linewidth(2)\n", "\n", "fig, ax = plt.subplots()\n", "ax.add_collection(lc)\n", "ax.autoscale()\n", "ax.set_xlabel(metadata['LONGITUDE']['standard_name'])\n", "ax.set_ylabel(metadata['LATITUDE']['standard_name'])\n", "ax.invert_yaxis()\n", "\n", "# Adding color bar\n", "cbar = plt.colorbar(lc, ax=ax)\n", "cbar.set_label('Temperature')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a TIME and scalar/number filter" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "filter_time = create_time_filter(parquet_ds, date_start='2020-01-31 10:14:00', date_end='2022-02-01 07:50:00')\n", "\n", "expr_1 = pc.field('platform_code') == pa.scalar(\"VLMJ\")\n", "filter = expr_1 & filter_time" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 244944 entries, 0 to 244943\n", "Data columns (total 44 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 TIME 244944 non-null datetime64[ns]\n", " 1 TIME_quality_control 244944 non-null float32 \n", " 2 LATITUDE 244944 non-null float64 \n", " 3 LATITUDE_quality_control 244944 non-null float32 \n", " 4 LONGITUDE 244944 non-null float64 \n", " 5 LONGITUDE_quality_control 244944 non-null float32 \n", " 6 TEMP 244680 non-null float64 \n", " 7 TEMP_quality_control 244944 non-null float32 \n", " 8 TEMP_2 244780 non-null float64 \n", " 9 TEMP_2_quality_control 244944 non-null float32 \n", " 10 PSAL 244944 non-null float64 \n", " 11 PSAL_quality_control 244944 non-null float32 \n", " 12 WSPD 222686 non-null float64 \n", " 13 WSPD_quality_control 244944 non-null float32 \n", " 14 WDIR 237361 non-null float64 \n", " 15 WDIR_quality_control 244944 non-null float32 \n", " 16 Press_Equil 244944 non-null float64 \n", " 17 Press_Equil_quality_control 244944 non-null float32 \n", " 18 Press_ATM 241825 non-null float64 \n", " 19 Press_ATM_quality_control 244944 non-null float32 \n", " 20 xCO2EQ_PPM 236906 non-null float64 \n", " 21 xCO2EQ_PPM_quality_control 244944 non-null float32 \n", " 22 xCO2ATM_PPM 8038 non-null float64 \n", " 23 xCO2ATM_PPM_quality_control 244944 non-null float32 \n", " 24 xCO2ATM_PPM_INTERPOLATED 244944 non-null float64 \n", " 25 xCO2ATM_PPM_INTERPOLATED_quality_control 244944 non-null float32 \n", " 26 fCO2SW_UATM 236647 non-null float64 \n", " 27 fCO2SW_UATM_quality_control 244944 non-null float32 \n", " 28 fCO2ATM_UATM_INTERPOLATED 241561 non-null float64 \n", " 29 fCO2ATM_UATM_INTERPOLATED_quality_control 244944 non-null float32 \n", " 30 DfCO2 233528 non-null float64 \n", " 31 DfCO2_quality_control 244944 non-null float32 \n", " 32 LICORflow 244944 non-null float64 \n", " 33 LICORflow_quality_control 244944 non-null float32 \n", " 34 H2OFLOW 244944 non-null float64 \n", " 35 H2OFLOW_quality_control 244944 non-null float32 \n", " 36 SUBFLAG 0 non-null float32 \n", " 37 TYPE 244944 non-null object \n", " 38 cruise_id 244944 non-null object \n", " 39 vessel_name 244944 non-null object \n", " 40 filename 244944 non-null object \n", " 41 timestamp 244944 non-null category \n", " 42 polygon 244944 non-null category \n", " 43 platform_code 244944 non-null category \n", "dtypes: category(3), datetime64[ns](1), float32(19), float64(17), object(4)\n", "memory usage: 60.0+ MB\n", "CPU times: user 1.73 s, sys: 305 ms, total: 2.03 s\n", "Wall time: 8.04 s\n" ] } ], "source": [ "%%time\n", "# using pandas instead of pyarrow so that filters can directly be applied to the data, and not just the partition\n", "df = pd.read_parquet(dname, engine='pyarrow',filters=filter)\n", "df.info()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_sorted = df.sort_values('TIME')\n", "\n", "# Create a list of segments\n", "points = np.array([df_sorted['LONGITUDE'], df_sorted['LATITUDE']]).T.reshape(-1, 1, 2)\n", "segments = np.concatenate([points[:-1], points[1:]], axis=1)\n", "\n", "# Create a LineCollection with segments colored by temperature\n", "norm = plt.Normalize(df_sorted['TEMP'].min(), df_sorted['TEMP'].max())\n", "lc = LineCollection(segments, cmap='RdYlBu_r', norm=norm)\n", "lc.set_array(df_sorted['TEMP'])\n", "lc.set_linewidth(2)\n", "\n", "fig, ax = plt.subplots()\n", "ax.add_collection(lc)\n", "ax.autoscale()\n", "ax.set_xlabel(metadata['LONGITUDE']['standard_name'])\n", "ax.set_ylabel(metadata['LATITUDE']['standard_name'])\n", "ax.invert_yaxis()\n", "\n", "# Adding color bar\n", "cbar = plt.colorbar(lc, ax=ax)\n", "cbar.set_label('Temperature')\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.6" } }, "nbformat": 4, "nbformat_minor": 4 }