{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading CTD data with PySeabird" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Author: Guilherme Castelão" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "pySeabird is a package to parse/load CTD data files. It should be an easy task but the problem is that the format have been changing along the time. Work with multiple ships/cruises data requires first to understand each file, to normalize it into a common format for only than start your analysis. That can still be done with few general regular expression rules, but I would rather use strict rules. If I'm loading hundreds or thousands of profiles, I want to be sure that no mistake passed by. I rather ignore a file in doubt and warn it, than belive that it was loaded right and be part of my analysis.\n", "\n", "With that in mind, I wrote this package with the ability to load multiple rules, so new rules can be added without change the main engine.\n", "\n", "For more information, check the documentatio" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "from seabird.cnv import fCNV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's first download an example file with some CTD data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2018-11-12 14:00:41-- https://raw.githubusercontent.com/castelao/seabird/master/sampledata/CTD/dPIRX003.cnv\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.136.133\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.136.133|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 47291 (46K) [text/plain]\n", "Saving to: ‘dPIRX003.cnv’\n", "\n", "dPIRX003.cnv 100%[===================>] 46.18K --.-KB/s in 0.01s \n", "\n", "2018-11-12 14:00:41 (4.33 MB/s) - ‘dPIRX003.cnv’ saved [47291/47291]\n", "\n" ] } ], "source": [ "!wget https://raw.githubusercontent.com/castelao/seabird/master/sampledata/CTD/dPIRX003.cnv" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "profile = fCNV('dPIRX003.cnv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The profile dPIRX003.cnv.OK was loaded with the default rule cnv.yaml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The header (metadata)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The header is loaded into the .attributes as a dictionary. Note that the date was already converted into a datetime object.\n", "\n", "There is a new attribute, not found in the file, that is 'md5'. This is the MD5 Hash for the original file. This might be usefull to double check the inputs when reproducing some analysis.\n", "\n", "Since it's a dictionary, to extract the geographical coordinates, for example:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The profile coordinates is latitude: 12.6743, and longitude: -38.0018\n" ] } ], "source": [ "print (\"The profile coordinates is latitude: %.4f, and longitude: %.4f\" % \\\n", " (profile.attributes['LATITUDE'], profile.attributes['LONGITUDE']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or for an overview of all the attributes and data:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Header: dict_keys(['sbe_model', 'seasave', 'instrument_type', 'nquan', 'nvalues', 'start_time', 'bad_flag', 'file_type', 'md5', 'datetime', 'LATITUDE', 'LONGITUDE', 'filename'])\n", "{'sbe_model': '9', 'seasave': 'Win32 V 5.37d', 'instrument_type': 'CTD', 'nquan': '11', 'nvalues': '349', 'start_time': 'Apr 02 2008 18:52:30', 'bad_flag': '-9.990e-29', 'file_type': 'ascii', 'md5': '1ad70243bdea4bfd4c6f60ca7141bf2b', 'datetime': datetime.datetime(2008, 4, 2, 18, 52, 30), 'LATITUDE': 12.674333333333333, 'LONGITUDE': -38.00183333333333, 'filename': 'dPIRX003.cnv'}\n" ] } ], "source": [ "print(\"Header: %s\" % profile.attributes.keys())\n", "print(profile.attributes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The object profile behaves like a dictionary with the data. So to check the available data one can just" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['timeS', 'PRES', 'TEMP', 'TEMP2', 'CNDC', 'CNDC2', 'potemperature', 'potemperature2', 'PSAL', 'PSAL2', 'flag']\n" ] } ], "source": [ "print(profile.keys())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each data returns as a masked array, hence all values equal to profile.attributes['bad_flag'] will return as a masked value" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "masked_array(data=[15.7969, 15.5144, 15.4179, 15.3232, 15.1983, 15.1154,\n", " 15.076, 15.0842, 15.0535, 15.0212, 14.9611, 14.8273,\n", " 14.7881, 14.7453, 14.7223, 14.7371, 14.7438, 14.7413,\n", " 14.7404, --, 14.4535, 14.3612, 14.2561, 14.2155,\n", " 14.2098],\n", " mask=[False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, True, False, False, False, False,\n", " False],\n", " fill_value=-9.99e-29)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "profile['TEMP2'][:25]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a regular masked array, let's check the mean and standard deviation between the two temperature sensors" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "11.580700613496932 1.5675053880983678\n", "11.540421100917431 1.6110723038427497\n" ] } ], "source": [ "print(profile['TEMP'].mean(), profile['TEMP'].std())\n", "print(profile['TEMP2'].mean(), profile['TEMP2'].std())" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 1.0, 'dPIRX003.cnv')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from matplotlib import pyplot as plt\n", "\n", "plt.plot(profile['TEMP'], profile['PRES'],'b')\n", "plt.plot(profile['TEMP2'], profile['PRES'],'g')\n", "plt.gca().invert_yaxis()\n", "plt.xlabel('temperature')\n", "plt.ylabel('pressure [dbar]')\n", "plt.title(profile.attributes['filename'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also export the data into a [pandas](http://pandas.pydata.org) DataFrame for easier data manipulation later on:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timeSPRESTEMPTEMP2CNDCCNDC2potemperaturepotemperature2PSALPSAL2flagLATITUDELONGITUDE
0246.446162.015.863515.79694.4685414.49773915.837215.771335.756136.07880.012.674333-38.001833
1249.417163.015.479815.51444.4590944.46068915.454515.488936.019736.00240.012.674333-38.001833
2250.462164.015.398315.41794.4449834.44882115.372915.392435.966035.98290.012.674333-38.001833
3251.428165.015.293015.32324.4320464.43699215.267515.297735.944635.96180.012.674333-38.001833
4252.285166.015.172115.19834.4176204.42201315.146615.172835.923735.93970.012.674333-38.001833
\n", "
" ], "text/plain": [ " timeS PRES TEMP TEMP2 CNDC CNDC2 potemperature \\\n", "0 246.446 162.0 15.8635 15.7969 4.468541 4.497739 15.8372 \n", "1 249.417 163.0 15.4798 15.5144 4.459094 4.460689 15.4545 \n", "2 250.462 164.0 15.3983 15.4179 4.444983 4.448821 15.3729 \n", "3 251.428 165.0 15.2930 15.3232 4.432046 4.436992 15.2675 \n", "4 252.285 166.0 15.1721 15.1983 4.417620 4.422013 15.1466 \n", "\n", " potemperature2 PSAL PSAL2 flag LATITUDE LONGITUDE \n", "0 15.7713 35.7561 36.0788 0.0 12.674333 -38.001833 \n", "1 15.4889 36.0197 36.0024 0.0 12.674333 -38.001833 \n", "2 15.3924 35.9660 35.9829 0.0 12.674333 -38.001833 \n", "3 15.2977 35.9446 35.9618 0.0 12.674333 -38.001833 \n", "4 15.1728 35.9237 35.9397 0.0 12.674333 -38.001833 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = profile.as_DataFrame()\n", "df.head()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }