{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Quality Control CTD data with PySeabird\n", "### Author: Guilherme Castelão" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a minimalist example on how to use the Python Seabird package to read and apply a quality control in a CTD output file. For more details, please check the [documentation](https://seabird.readthedocs.io/en/latest/).\n", "\n", "### Requirements\n", "\n", "This notebook requires the packages seabird, supportdata, and cotede. You can install those using pip as following:\n", "\n", "```shell\n", "pip install seabird[QC]\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "#%matplotlib inline\n", "\n", "from seabird.cnv import fCNV\n", "from seabird.qc import fProfileQC" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's first download an example file with some CTD data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2019-08-29 21:17:27-- https://raw.githubusercontent.com/castelao/seabird/master/sampledata/CTD/dPIRX003.cnv\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.196.133\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.196.133|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 47291 (46K) [text/plain]\n", "Saving to: ‘dPIRX003.cnv’\n", "\n", "dPIRX003.cnv 100%[===================>] 46.18K --.-KB/s in 0.03s \n", "\n", "2019-08-29 21:17:27 (1.59 MB/s) - ‘dPIRX003.cnv’ saved [47291/47291]\n", "\n" ] } ], "source": [ "!wget https://raw.githubusercontent.com/castelao/seabird/master/sampledata/CTD/dPIRX003.cnv" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "profile = fCNV('dPIRX003.cnv')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Header: dict_keys(['sbe_model', 'seasave', 'instrument_type', 'nquan', 'nvalues', 'start_time', 'bad_flag', 'file_type', 'md5', 'datetime', 'LATITUDE', 'LONGITUDE', 'filename'])\n" ] } ], "source": [ "print(\"Header: %s\" % profile.attributes.keys())" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data: ['timeS', 'PRES', 'TEMP', 'TEMP2', 'CNDC', 'CNDC2', 'potemperature', 'potemperature2', 'PSAL', 'PSAL2', 'flag']\n" ] } ], "source": [ "print(\"Data: %s\" % profile.keys())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's apply the quality control procedure recommended by GTSPP" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "profile = fProfileQC('dPIRX003.cnv', cfg='gtspp')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The QC flags are groupped for each variable. On this example there are temperature, salinity and the respective secondary sensors." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['common', 'TEMP', 'TEMP2', 'PSAL', 'PSAL2'])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "profile.flags.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's check which tests were performed, hence which flags are available, on the the primary temperature sensor" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['valid_datetime', 'location_at_sea', 'global_range', 'profile_envelop', 'gradient', 'spike', 'woa_normbias', 'overall'])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "profile.flags['TEMP'].keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The flagging standard is described in [CoTeDe's manual](https://cotede.readthedocs.io/en/latest/) . The one used here is 0 for no QC performed, 1 for approved data, and 9 for missing data.\n", "\n", "Note that the overall flag is the combined result from all tested flags. In the example above it considers the other 7 flags and takes the highest value, therefore, if the overall is equal to 1 means that all possible tests approved that measurement, while a value of 4 means that at least one tests suggests its a bad measurement." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 9, 0, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 9, 0, 9, 0, 9, 9, 0, 9, 9,\n", " 9, 0, 9, 9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 9, 0, 9, 9, 9, 9],\n", " dtype=int8)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "profile.flags['TEMP']['spike']" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "idx = profile.flags['TEMP']['overall'] <= 2" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 1.0, 'dPIRX003.cnv')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from matplotlib import pyplot as plt\n", "\n", "plt.figure(figsize=(12,8))\n", "plt.plot(profile['TEMP'][idx], profile['PRES'][idx],'b')\n", "plt.plot(profile['TEMP'][~idx], profile['PRES'][~idx],'ro')\n", "plt.gca().invert_yaxis()\n", "# plt.plot(profile['TEMP2'], profile['PRES'],'g')\n", "plt.xlabel('Temperature')\n", "plt.ylabel('Pressure')\n", "plt.title(profile.attributes['filename'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other pre defined quality control procedures are available, please check [CoTeDe's manual](https://cotede.readthedocs.io/en/latest/) to learn the details of the tests and what is available. For instance, to apply the EuroGOOS recommendations change the cfg argument" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['valid_datetime', 'location_at_sea', 'global_range', 'gradient_depthconditional', 'spike_depthconditional', 'digit_roll_over', 'woa_normbias', 'overall'])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "profile = fProfileQC('dPIRX003.cnv', cfg='eurogoos')\n", "profile.flags['TEMP'].keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If not defined, the default configuration is a collection of tests resulted for our work on [IQuOD](http://www.iquod.org/), and is equivalent to define `cfg='cotede'`." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Deprecated cfg format. It should contain a threshold item.\n", "Deprecated cfg format. It should contain a threshold item.\n" ] }, { "data": { "text/plain": [ "dict_keys(['valid_datetime', 'location_at_sea', 'global_range', 'profile_envelop', 'gradient', 'gradient_depthconditional', 'spike', 'spike_depthconditional', 'stuck_value', 'tukey53H_norm', 'digit_roll_over', 'woa_normbias', 'cars_normbias', 'rate_of_change', 'cum_rate_of_change', 'anomaly_detection', 'overall'])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "profile = fProfileQC('dPIRX003.cnv')\n", "profile.flags['TEMP'].keys()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }