Introduction to File Formats and read/write in ObsPy
\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is oftentimes not taught, but fairly important to understand, at least at a basic level. This also teaches you how to work with these in ObsPy."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#%matplotlib inline\n",
"from __future__ import print_function\n",
"import matplotlib.pylab as plt\n",
"plt.switch_backend(\"nbagg\")\n",
"plt.style.use('ggplot')\n",
"plt.rcParams['figure.figsize'] = 12, 8"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## SEED Identifiers\n",
"\n",
"According to the [SEED standard](www.fdsn.org/seed_manual/SEEDManual_V2.4.pdf), which is fairly well adopted, the following nomenclature is used to identify seismic receivers:\n",
"\n",
"* **Network code**: Identifies the network/owner of the data. Assigned by the FDSN and thus unique.\n",
"* **Station code**: The station within a network. *NOT UNIQUE IN PRACTICE!* Always use together with a network code!\n",
"* **Location ID**: Identifies different data streams within one station. Commonly used to logically separate multiple instruments at a single station.\n",
"* **Channel codes**: Three character code: 1) Band and approximate sampling rate, 2) The type of instrument, 3) The orientation\n",
"\n",
"This results in full ids of the form **NET.STA.LOC.CHAN**, e.g. **IV.PRMA..HHE**.\n",
"\n",
"\n",
"---\n",
"\n",
"\n",
"In seismology we generally distinguish between three separate types of data:\n",
"\n",
"1. **Waveform Data** - The actual waveforms as time series.\n",
"2. **Station Data** - Information about the stations' operators, geographical locations, and the instrument's responses.\n",
"3. **Event Data** - Information about earthquakes.\n",
"\n",
"Some formats have elements of two or more of these."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Waveform Data\n",
"\n",
"![stream](images/Stream_Trace.svg)\n",
"\n",
"There are a myriad of waveform data formats but in Europe and the USA two formats dominate: **MiniSEED** and **SAC**\n",
"\n",
"\n",
"### MiniSEED\n",
"\n",
"* This is what you get from datacenters and also what they store, thus the original data\n",
"* Can store integers and single/double precision floats\n",
"* Integer data (e.g. counts from a digitizer) are heavily compressed: a factor of 3-5 depending on the data\n",
"* Can deal with gaps and overlaps\n",
"* Multiple components per file\n",
"* Contains only the really necessary parameters and some information for the data providers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import obspy\n",
"\n",
"# ObsPy automatically detects the file format.\n",
"st = obspy.read(\"data/example.mseed\")\n",
"print(st)\n",
"\n",
"# Fileformat specific information is stored here.\n",
"print(st[0].stats.mseed)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"st.plot()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# This is a quick interlude to teach you basics about how to work\n",
"# with Stream/Trace objects.\n",
"\n",
"# Most operations work in-place, e.g. they modify the existing\n",
"# objects. We'll create a copy here.\n",
"st2 = st.copy()\n",
"\n",
"# To use only part of a Stream, use the select() function.\n",
"print(st2.select(component=\"Z\"))\n",
"\n",
"# Stream objects behave like a list of Trace objects.\n",
"tr = st2[0]\n",
"\n",
"tr.plot()\n",
"\n",
"# Some basic processing. Please note that these modify the\n",
"# existing object.\n",
"tr.detrend(\"linear\")\n",
"tr.taper(type=\"hann\", max_percentage=0.05)\n",
"tr.filter(\"lowpass\", freq=0.5)\n",
"\n",
"tr.plot()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# You can write it again by simply specifing the format.\n",
"st.write(\"temp.mseed\", format=\"mseed\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### SAC\n",
"\n",
"* Custom format of the `sac` code.\n",
"* Simple header and single precision floating point data.\n",
"* Only a single component per file and no concept of gaps/overlaps.\n",
"* Used a lot due to `sac` being very popular and the additional basic information that can be stored in the header."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"st = obspy.read(\"data/example.sac\")\n",
"print(st)\n",
"st[0].stats.sac.__dict__"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"st.plot()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# You can once again write it with the write() method.\n",
"st.write(\"temp.sac\", format=\"sac\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Station Data\n",
"\n",
"![inv](images/Inventory.svg)\n",
"\n",
"Station data contains information about the organziation that collections the data, geographical information, as well as the instrument response. It mainly comes in three formats:\n",
"\n",
"* `(dataless) SEED`: Very complete but pretty complex and binary. Still used a lot, e.g. for the Arclink protocol\n",
"* `RESP`: A strict subset of SEED. ASCII based. Contains **ONLY** the response.\n",
"* `StationXML`: Essentially like SEED but cleaner and based on XML. Most modern format and what the datacenters nowadays serve. **Use this if you can.**\n",
"\n",
"\n",
"ObsPy can work with all of them but today we will focus on StationXML."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"They are XML files:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"!head data/all_stations.xml"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import obspy\n",
"\n",
"# Use the read_inventory function to open them.\n",
"inv = obspy.read_inventory(\"data/all_stations.xml\")\n",
"print(inv)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can see that they can contain an arbirary number of networks, stations, and channels."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# ObsPy is also able to plot a map of them.\n",
"inv.plot(projection=\"local\");"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# As well as a plot the instrument response.\n",
"inv.select(network=\"IV\", station=\"SALO\", channel=\"BH?\").plot_response(0.001);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Coordinates of single channels can also be extraced. This function\n",
"# also takes a datetime arguments to extract information at different\n",
"# points in time.\n",
"inv.get_coordinates(\"IV.SALO..BHZ\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# And it can naturally be written again, also in modified state.\n",
"inv.select(channel=\"BHZ\").write(\"temp.xml\", format=\"stationxml\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Event Data\n",
"\n",
"![events](./images/Event.svg)\n",
"\n",
"Event data is essentially served in either very simple formats like NDK or the CMTSOLUTION format used by many waveform solvers:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"!cat data/GCMT_2014_04_01__Mw_8_1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Datacenters on the hand offer **QuakeML** files, which are surprisingly complex in structure but can store complex relations."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Read QuakeML files with the read_events() function.\n",
"cat = obspy.read_events(\"data/GCMT_2014_04_01__Mw_8_1.xml\")\n",
"print(cat)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(cat[0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"cat.plot(projection=\"ortho\");"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Once again they can be written with the write() function.\n",
"cat.write(\"temp_quake.xml\", format=\"quakeml\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To show off some more things, I added a file containing all events from 2014 in the GCMT catalog."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import obspy\n",
"\n",
"cat = obspy.read_events(\"data/2014.ndk\")\n",
"\n",
"print(cat)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"cat.plot();"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"cat.filter(\"depth > 100000\", \"magnitude > 7\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Acknowledgements\n",
"\n",
"Background picture at the very top is from Matthias Meschede."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}