{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

\n", "\n", "

\n", "\n", "

2016 - IPGP - ObsPy Tutorial

\n", "\n", "

Introduction to File Formats and read/write in ObsPy

\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is oftentimes not taught, but fairly important to understand, at least at a basic level. This also teaches you how to work with these in ObsPy." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#%matplotlib inline\n", "from __future__ import print_function\n", "import matplotlib.pylab as plt\n", "plt.switch_backend(\"nbagg\")\n", "plt.style.use('ggplot')\n", "plt.rcParams['figure.figsize'] = 12, 8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SEED Identifiers\n", "\n", "According to the [SEED standard](www.fdsn.org/seed_manual/SEEDManual_V2.4.pdf), which is fairly well adopted, the following nomenclature is used to identify seismic receivers:\n", "\n", "* **Network code**: Identifies the network/owner of the data. Assigned by the FDSN and thus unique.\n", "* **Station code**: The station within a network. *NOT UNIQUE IN PRACTICE!* Always use together with a network code!\n", "* **Location ID**: Identifies different data streams within one station. Commonly used to logically separate multiple instruments at a single station.\n", "* **Channel codes**: Three character code: 1) Band and approximate sampling rate, 2) The type of instrument, 3) The orientation\n", "\n", "This results in full ids of the form **NET.STA.LOC.CHAN**, e.g. **IV.PRMA..HHE**.\n", "\n", "\n", "---\n", "\n", "\n", "In seismology we generally distinguish between three separate types of data:\n", "\n", "1. **Waveform Data** - The actual waveforms as time series.\n", "2. **Station Data** - Information about the stations' operators, geographical locations, and the instrument's responses.\n", "3. **Event Data** - Information about earthquakes.\n", "\n", "Some formats have elements of two or more of these." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Waveform Data\n", "\n", "![stream](images/Stream_Trace.svg)\n", "\n", "There are a myriad of waveform data formats but in Europe and the USA two formats dominate: **MiniSEED** and **SAC**\n", "\n", "\n", "### MiniSEED\n", "\n", "* This is what you get from datacenters and also what they store, thus the original data\n", "* Can store integers and single/double precision floats\n", "* Integer data (e.g. counts from a digitizer) are heavily compressed: a factor of 3-5 depending on the data\n", "* Can deal with gaps and overlaps\n", "* Multiple components per file\n", "* Contains only the really necessary parameters and some information for the data providers" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import obspy\n", "\n", "# ObsPy automatically detects the file format.\n", "st = obspy.read(\"data/example.mseed\")\n", "print(st)\n", "\n", "# Fileformat specific information is stored here.\n", "print(st[0].stats.mseed)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "st.plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# This is a quick interlude to teach you basics about how to work\n", "# with Stream/Trace objects.\n", "\n", "# Most operations work in-place, e.g. they modify the existing\n", "# objects. We'll create a copy here.\n", "st2 = st.copy()\n", "\n", "# To use only part of a Stream, use the select() function.\n", "print(st2.select(component=\"Z\"))\n", "\n", "# Stream objects behave like a list of Trace objects.\n", "tr = st2[0]\n", "\n", "tr.plot()\n", "\n", "# Some basic processing. Please note that these modify the\n", "# existing object.\n", "tr.detrend(\"linear\")\n", "tr.taper(type=\"hann\", max_percentage=0.05)\n", "tr.filter(\"lowpass\", freq=0.5)\n", "\n", "tr.plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# You can write it again by simply specifing the format.\n", "st.write(\"temp.mseed\", format=\"mseed\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### SAC\n", "\n", "* Custom format of the `sac` code.\n", "* Simple header and single precision floating point data.\n", "* Only a single component per file and no concept of gaps/overlaps.\n", "* Used a lot due to `sac` being very popular and the additional basic information that can be stored in the header." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "st = obspy.read(\"data/example.sac\")\n", "print(st)\n", "st[0].stats.sac.__dict__" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "st.plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# You can once again write it with the write() method.\n", "st.write(\"temp.sac\", format=\"sac\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Station Data\n", "\n", "![inv](images/Inventory.svg)\n", "\n", "Station data contains information about the organziation that collections the data, geographical information, as well as the instrument response. It mainly comes in three formats:\n", "\n", "* `(dataless) SEED`: Very complete but pretty complex and binary. Still used a lot, e.g. for the Arclink protocol\n", "* `RESP`: A strict subset of SEED. ASCII based. Contains **ONLY** the response.\n", "* `StationXML`: Essentially like SEED but cleaner and based on XML. Most modern format and what the datacenters nowadays serve. **Use this if you can.**\n", "\n", "\n", "ObsPy can work with all of them but today we will focus on StationXML." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "They are XML files:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "!head data/all_stations.xml" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import obspy\n", "\n", "# Use the read_inventory function to open them.\n", "inv = obspy.read_inventory(\"data/all_stations.xml\")\n", "print(inv)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see that they can contain an arbirary number of networks, stations, and channels." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# ObsPy is also able to plot a map of them.\n", "inv.plot(projection=\"local\");" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# As well as a plot the instrument response.\n", "inv.select(network=\"IV\", station=\"SALO\", channel=\"BH?\").plot_response(0.001);" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Coordinates of single channels can also be extraced. This function\n", "# also takes a datetime arguments to extract information at different\n", "# points in time.\n", "inv.get_coordinates(\"IV.SALO..BHZ\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# And it can naturally be written again, also in modified state.\n", "inv.select(channel=\"BHZ\").write(\"temp.xml\", format=\"stationxml\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Event Data\n", "\n", "![events](./images/Event.svg)\n", "\n", "Event data is essentially served in either very simple formats like NDK or the CMTSOLUTION format used by many waveform solvers:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "!cat data/GCMT_2014_04_01__Mw_8_1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Datacenters on the hand offer **QuakeML** files, which are surprisingly complex in structure but can store complex relations." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Read QuakeML files with the read_events() function.\n", "cat = obspy.read_events(\"data/GCMT_2014_04_01__Mw_8_1.xml\")\n", "print(cat)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(cat[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cat.plot(projection=\"ortho\");" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Once again they can be written with the write() function.\n", "cat.write(\"temp_quake.xml\", format=\"quakeml\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To show off some more things, I added a file containing all events from 2014 in the GCMT catalog." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import obspy\n", "\n", "cat = obspy.read_events(\"data/2014.ndk\")\n", "\n", "print(cat)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cat.plot();" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cat.filter(\"depth > 100000\", \"magnitude > 7\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Acknowledgements\n", "\n", "Background picture at the very top is from Matthias Meschede." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }