{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"background-image:url(images/meschede-seismic-waves.png); padding: 10px 30px 20px 30px; background-size:cover; background-opacity:50%; border-radius:5px; background-position: 0px -200px\">\n",
    "<p style=\"float:right; margin-top:20px; padding: 20px 60px 0px 10px; background:rgba(255,255,255,0.75); border-radius:10px;\">\n",
    "<img width=\"400px\" src=images/obspy_logo_full_524x179px.png?raw=true>\n",
    "</p>\n",
    "\n",
    "<h1 style=\"color:#BBB; padding-bottom: 80px\">2016 - IPGP - ObsPy Tutorial</h1>\n",
    "\n",
    "<h2 style=\"color:#FFF; padding-bottom: 30px\">Introduction to File Formats and read/write in ObsPy</h2>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is oftentimes not taught, but fairly important to understand, at least at a basic level. This also teaches you how to work with these in ObsPy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#%matplotlib inline\n",
    "from __future__ import print_function\n",
    "import matplotlib.pylab as plt\n",
    "plt.switch_backend(\"nbagg\")\n",
    "plt.style.use('ggplot')\n",
    "plt.rcParams['figure.figsize'] = 12, 8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SEED Identifiers\n",
    "\n",
    "According to the  [SEED standard](www.fdsn.org/seed_manual/SEEDManual_V2.4.pdf), which is fairly well adopted, the following nomenclature is used to identify seismic receivers:\n",
    "\n",
    "* **Network code**: Identifies the network/owner of the data. Assigned by the FDSN and thus unique.\n",
    "* **Station code**: The station within a network. *NOT UNIQUE IN PRACTICE!* Always use together with a network code!\n",
    "* **Location ID**: Identifies different data streams within one station. Commonly used to logically separate multiple instruments at a single station.\n",
    "* **Channel codes**: Three character code: 1) Band and approximate sampling rate, 2) The type of instrument, 3) The orientation\n",
    "\n",
    "This results in full ids of the form **NET.STA.LOC.CHAN**, e.g. **IV.PRMA..HHE**.\n",
    "\n",
    "\n",
    "---\n",
    "\n",
    "\n",
    "In seismology we generally distinguish between three separate types of data:\n",
    "\n",
    "1. **Waveform Data** - The actual waveforms as time series.\n",
    "2. **Station Data** - Information about the stations' operators, geographical locations, and the instrument's responses.\n",
    "3. **Event Data** - Information about earthquakes.\n",
    "\n",
    "Some formats have elements of two or more of these."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Waveform Data\n",
    "\n",
    "![stream](images/Stream_Trace.svg)\n",
    "\n",
    "There are a myriad of waveform data formats but in Europe and the USA two formats dominate: **MiniSEED** and **SAC**\n",
    "\n",
    "\n",
    "### MiniSEED\n",
    "\n",
    "* This is what you get from datacenters and also what they store, thus the original data\n",
    "* Can store integers and single/double precision floats\n",
    "* Integer data (e.g. counts from a digitizer) are heavily compressed: a factor of 3-5 depending on the data\n",
    "* Can deal with gaps and overlaps\n",
    "* Multiple components per file\n",
    "* Contains only the really necessary parameters and some information for the data providers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import obspy\n",
    "\n",
    "# ObsPy automatically detects the file format.\n",
    "st = obspy.read(\"data/example.mseed\")\n",
    "print(st)\n",
    "\n",
    "# Fileformat specific information is stored here.\n",
    "print(st[0].stats.mseed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "st.plot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# This is a quick interlude to teach you basics about how to work\n",
    "# with Stream/Trace objects.\n",
    "\n",
    "# Most operations work in-place, e.g. they modify the existing\n",
    "# objects. We'll create a copy here.\n",
    "st2 = st.copy()\n",
    "\n",
    "# To use only part of a Stream, use the select() function.\n",
    "print(st2.select(component=\"Z\"))\n",
    "\n",
    "# Stream objects behave like a list of Trace objects.\n",
    "tr = st2[0]\n",
    "\n",
    "tr.plot()\n",
    "\n",
    "# Some basic processing. Please note that these modify the\n",
    "# existing object.\n",
    "tr.detrend(\"linear\")\n",
    "tr.taper(type=\"hann\", max_percentage=0.05)\n",
    "tr.filter(\"lowpass\", freq=0.5)\n",
    "\n",
    "tr.plot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# You can write it again by simply specifing the format.\n",
    "st.write(\"temp.mseed\", format=\"mseed\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SAC\n",
    "\n",
    "* Custom format of the `sac` code.\n",
    "* Simple header and single precision floating point data.\n",
    "* Only a single component per file and no concept of gaps/overlaps.\n",
    "* Used a lot due to `sac` being very popular and the additional basic information that can be stored in the header."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "st = obspy.read(\"data/example.sac\")\n",
    "print(st)\n",
    "st[0].stats.sac.__dict__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "st.plot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# You can once again write it with the write() method.\n",
    "st.write(\"temp.sac\", format=\"sac\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Station Data\n",
    "\n",
    "![inv](images/Inventory.svg)\n",
    "\n",
    "Station data contains information about the organziation that collections the data, geographical information, as well as the instrument response. It mainly comes in three formats:\n",
    "\n",
    "* `(dataless) SEED`: Very complete but pretty complex and binary. Still used a lot, e.g. for the Arclink protocol\n",
    "* `RESP`: A strict subset of SEED. ASCII based. Contains **ONLY** the response.\n",
    "* `StationXML`: Essentially like SEED but cleaner and based on XML. Most modern format and what the datacenters nowadays serve. **Use this if you can.**\n",
    "\n",
    "\n",
    "ObsPy can work with all of them but today we will focus on StationXML."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "They are XML files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "!head data/all_stations.xml"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import obspy\n",
    "\n",
    "# Use the read_inventory function to open them.\n",
    "inv = obspy.read_inventory(\"data/all_stations.xml\")\n",
    "print(inv)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see that they can contain an arbirary number of networks, stations, and channels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# ObsPy is also able to plot a map of them.\n",
    "inv.plot(projection=\"local\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# As well as a plot the instrument response.\n",
    "inv.select(network=\"IV\", station=\"SALO\", channel=\"BH?\").plot_response(0.001);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Coordinates of single channels can also be extraced. This function\n",
    "# also takes a datetime arguments to extract information at different\n",
    "# points in time.\n",
    "inv.get_coordinates(\"IV.SALO..BHZ\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# And it can naturally be written again, also in modified state.\n",
    "inv.select(channel=\"BHZ\").write(\"temp.xml\", format=\"stationxml\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Event Data\n",
    "\n",
    "![events](./images/Event.svg)\n",
    "\n",
    "Event data is essentially served in either very simple formats like NDK or the CMTSOLUTION format used by many waveform solvers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "!cat data/GCMT_2014_04_01__Mw_8_1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Datacenters on the hand offer **QuakeML** files, which are surprisingly complex in structure but can store complex relations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Read QuakeML files with the read_events() function.\n",
    "cat = obspy.read_events(\"data/GCMT_2014_04_01__Mw_8_1.xml\")\n",
    "print(cat)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print(cat[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "cat.plot(projection=\"ortho\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Once again they can be written with the write() function.\n",
    "cat.write(\"temp_quake.xml\", format=\"quakeml\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To show off some more things, I added a file containing all events from 2014 in the GCMT catalog."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import obspy\n",
    "\n",
    "cat = obspy.read_events(\"data/2014.ndk\")\n",
    "\n",
    "print(cat)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "cat.plot();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "cat.filter(\"depth > 100000\", \"magnitude > 7\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Acknowledgements\n",
    "\n",
    "Background picture at the very top is from Matthias Meschede."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}