{ "cells": [ { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "import os\n", "from IPython.display import Video # new module" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Lecture 8:\n", "\n", "- more about **matplotlib**: adding notes and saving images\n", "\n", "- about DataFrames and Series, two new _data structures_, that are part of the **Pandas** package \n", "\n", "- some basic filtering tricks with **Pandas**\n", "\n", "- how to read in and save data files with **Pandas**\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More tricks in matplotlib. \n", "\n", "A few lectures ago we read in the record of an earthquake and plotted it: " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "EQ=np.loadtxt('Datasets/seismicRecord/earthquake.txt') # read in data\n", "plt.plot(EQ,'r-') # plots as a red line\n", "plt.xlabel('Arbitrary Time') # puts a label on the X axis\n", "plt.ylabel('Velocity'); # puts a label on the Y axis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a few things that can be improved about this plot:\n", "\n", "1) What are the time units? \n", "\n", "2) Seismologists can recognize the arrival of different phases, including the \"P\" wave (for primary) and the \"S\" wave (for secondary or also shear). It would be nice to label the P and S wave arrivals. \n", "\n", "So let's start with point one, converting the arbitrary time units to minutes. \n", "\n", "The measurement rate of this seismometer was 20 measurements per second. Let's convert this to minutes. First we'll need to create a data structure (say, an array) of the original \"arbitrary time\" and then convert \"arbitrary time\" to minutes by dividing by 1200 (the number of seconds in 20 minutes):\n", "\n", "$20{\\hbox{measurements}\\over {\\hbox{seconds}}} \\times 60 {\\hbox{seconds}\\over {\\hbox{minute}}} =1200 {\\hbox{measurements}\\over {\\hbox{minute}}}$.\n", "\n", "To create a data structure of 'arbitrary time' with the same length as the original array, we can use the _built-in_ **len( )** function which gives us the length of a list or a 1D array, then use **np.arange( )** to give us an array from 0 to the length of the array. Because it is an array (and not a list), we can divide the whole array by 1200 to create a new array that is in minutes. \n", "We can then plot the minutes on the $x$-axis and the velocities on the $y$-axis:\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "velocity=np.loadtxt('Datasets/seismicRecord/earthquake.txt') # read in data\n", "# remember np.arange? it makes an array that is N long.\n", "# here, we want N to be the length of the velocity array, so len(velocity)\n", "time_units=np.arange(len(velocity)) # makes an array of arbitrary time units\n", "# now I want an array that is normalized to minutes: \n", "minutes=time_units/1200. # sampling rate=20/sec = 1/1200 minutes\n", "# the plt.plot method can plot X, versus Y with plt.plot(X,Y), so:\n", "plt.plot(minutes,velocity,'r-') # plots X=Minutes, Y= velocity as a red line\n", "# we can change the labels to reflect the new reality:\n", "plt.xlabel('Minutes') # puts a label on the X axis\n", "plt.ylabel('Velocity'); # puts a label on the Y axis\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is something you should notice in the above script. \n", "\n", "Did you see you we just took the **time_units** array and divided it by 1200? 1200 is a scalar, so each element is divided by 1200. Try that with a list! As we did a few lectures ago, we can make a list with **range( )** that is very similar to time_units but a list, not an **NumPy** array. But if we try to divide it by scalar, we get an error! " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "unsupported operand type(s) for /: 'range' and 'float'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mlist_of_units\u001b[0m\u001b[0;34m=\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m360\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mlist_of_units\u001b[0m\u001b[0;34m/\u001b[0m\u001b[0;36m1200.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for /: 'range' and 'float'" ] } ], "source": [ "list_of_units= range(360)\n", "list_of_units/1200." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As mentioned before, this is one big advantage of arrays over lists. \n", "\n", "### Saving plots and putting in notes \n", "\n", "Saving turns out to be easy with the **plt.savefig( )** method. The argument is the desired filename (including any PATH if desired). \n", "\n", "I mentioned earlier that it would be nice to label the 'P' and 'S' wave arrivals. You may remember that the first wave that hits a station is the 'P' wave (for primary). The second wave is the 'S' wave (for secondary). P waves are 'compressional' waves, while the S waves are 'shear' waves. They are a bit slower, which is why they arrive second. So, let's label the arrivals in our plot.\n", "\n", "To put notes on a figure, we use **plt.text( )** which has many options. Check out this web site for hints on making beautiful notes: \n", "\n", "http://matplotlib.org/users/text_props.html\n", "\n", "So the keyword arguments we want are _rotation_ which allows us to rotate the note by some angle (say, 90), and the vertical (**verticalalignment**or **va**)\tand horizontal (**horizontalalignment** or **ha**) alignments with respect to the x and y values given. " ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# same plot as before: \n", "plt.plot(minutes,velocity,'r-') # plots as a red line\n", "plt.xlabel('Minutes') # puts a label on the X axis\n", "plt.ylabel('Velocity') # puts a label on the Y axis\n", "# here I add a few more decorations with the plt.text method\n", "plt.text(1.08,100000, \"P wave\",rotation=90,va='bottom',ha='center') # put on the P wave label\n", "plt.text(11.76,-70000, \"S wave\",rotation=90,va='top',ha='center') # put on the S wave label\n", "plt.ylim([-150000,250000]) # increase vertical axis bounds to include S wave label\n", "plt.savefig('seismogram.png') # let's just save this as a png file." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's do some seismology!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The farther away an earthquake is from a receiver, the more time there is between the arrivals of the P and S waves. This makes sense if you think about racing a little kid (who is on a tricycle) around a track. The distance between you will just keep increasing as you run because the kid on the trike is slower (like the S wave). [Well, until you lap the little tyke.] \n", "\n", "You can use the difference between the arrival times of the two waves to calculate the distance to the earthquake source, if we know the velocities of the waves through the Earth. So first we need to know how these two waves behave. \n", "\n", "There are plenty of data on earthquakes and the arrival times of different waves. Here is a short video demonstration that I found here: https://www.iris.edu/hq/inclass/animation/traveltime_curves_how_they_are_created\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Video('Figures/A_6_seismictraveltimeirisbounc.mp4',width=500)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Now we can look at some data from the model for the Earth. I found the data we need on this website: https://earthquake.usgs.gov/learn/topics/ttgraph.php and saved it in the datafile Datasets/TravelTimeDelta/DeltaTime.txt \n", "\n", "Let's take a look at the contents of the file. We can look at the first 10 lines combining the tricks for printing out lines in a file we learned in Lecture 1, with the list slicing we learned a few lectures ago. To look at the top 10 lines, we can do this:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['https://earthquake.usgs.gov/learn/topics/ttgraph.php\\n',\n", " 'Delta Time of P S-P Time\\n',\n", " ' Deg M S M S\\n',\n", " ' 0.0 0 5.4 0 4.0\\n',\n", " ' 0.5 0 10.6 0 7.8\\n',\n", " ' 1.0 0 17.7 0 13.5\\n',\n", " ' 1.5 0 24.6 0 19.0\\n',\n", " ' 2.0 0 31.4 0 24.4\\n',\n", " ' 2.5 0 38.3 0 29.9\\n',\n", " ' 3.0 0 45.2 0 35.4\\n']" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "open('Datasets/TravelTimeDelta/DeltaTime.txt').readlines()[0:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, the bottom ten lines can be seen this way: " ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[' 96.0 13 23.6 11 16.1\\n',\n", " ' 97.0 13 28.1 11 20.1\\n',\n", " ' 98.0 13 32.6 11 24.1\\n',\n", " ' 99.0 13 37.0 11 28.1\\n',\n", " '100.0 13 41.5 11 32.0\\n',\n", " '101.0 13 45.9 11 35.8\\n',\n", " '102.0 13 50.4 11 39.7\\n',\n", " '103.0 13 54.8 11 43.6\\n',\n", " '104.0 13 59.2 11 47.5\\n',\n", " '105.0 14 3.7 11 51.4\\n']" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "open('Datasets/TravelTimeDelta/DeltaTime.txt').readlines()[-10:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first line in our dataset contains WORDS, so we can't read it in with **np.loadtxt**. One strategy to remove the text would be to edit the text file, but instead let's up our game and use the wonderful **Pandas** package - a useful recent addition to Python. \n", "\n", "\n", "\n", "### The Joy of Pandas\n", "\n", "**Pandas** is a relatively new package for Python. It allows us to read in more complicated data file formats than **NumPy**, and wrangle the data in powerful ways. It also provides many useful data analysis tools.\n", "\n", "There are two basic data structures in **Pandas**, the **DataFrame**, which is essentially a spreadsheet with multiple columns while the **Series** is a single column of data. A **Series** is like a **list** or **array** on steroids. \n", "\n", "The DeltaTime file includes the website where I got the original data at the top of the file, a description in line 2 of what the data are about in general, and then some column headers in line 3. This kind of file does not play nicely with **np.loadtxt( )**, but we can use the **Pandas** function, **read_csv( )** to read in the datafile. This function not only reads in 'comma separated variable' files (.csv), but also other data formats once we tell it how the file is delimited. \n", "\n", "Of course we must first import **Pandas** into the notebook:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A few things you need to let **Pandas** know. \n", "\n", "- We need to skip the first two rows. We use the keyword argument **skiprows=2** to do that. \n", "- **pd.read_csv( )** reads 'comma separated variables' by default but this file is _whitespace_ delimited. _whitespace_ is either spaces or tabs. The keyword argument **delim_whitespace=True** will split on white space.\n", "- the _header_ is in the third row; this is the row with the column names in it. Python starts counting from zero, but we skipped two rows, so **header=0** will read in the first row after the skipped rows as the header. \n", "- fun fact: with skiprows we don't need the header statement, but you need to know what it is. For example, you could use **header=2** instead of **skiprows**. Why 2? because **header** starts with 0 as the default. \n", "\n" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DegMSM.1S.1
00.005.404.0
10.5010.607.8
21.0017.7013.5
31.5024.6019.0
42.0031.4024.4
\n", "
" ], "text/plain": [ " Deg M S M.1 S.1\n", "0 0.0 0 5.4 0 4.0\n", "1 0.5 0 10.6 0 7.8\n", "2 1.0 0 17.7 0 13.5\n", "3 1.5 0 24.6 0 19.0\n", "4 2.0 0 31.4 0 24.4" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DeltaTimeData=pd.read_csv('Datasets/TravelTimeDelta/DeltaTime.txt',\\\n", " delim_whitespace=True,skiprows=2,header=0)\n", "# we specify the path of the file (relative to our current directory), \n", "# then all the other arguments.\n", "DeltaTimeData.head() # this is \"panda-ish\" for looking at the \"head\" of the object we read in. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**DeltaTimeData** is now a Pandas **DataFrame**. \n", "\n", "So what is a **DataFrame**? It is a new data container that is more sophisticated than any we have learned about so far (**lists, tuples, sets, dictionaries, arrays**). \n", "It has named columns (like an Excel spreadsheet) and identifies the rows by _indices_ starting with 0. \n", "\n", "The file we read in included column headers and **Pandas** knows which line they were in (after the header or skiprows arguments). \n", "\n", "If we want to be sure, we can use the **DataFrame.columns** attribute on the DeltaTimeData DataFrame:\n" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Deg', 'M', 'S', 'M.1', 'S.1'], dtype='object')" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DeltaTimeData.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that a **DataFrame** is of type _object_, similar to one of the **NumPy** array types that mixed data types we briefly encountered before. Let's explore these objects with Pandas DataFrames. \n", "\n", "We see that the columns of **DeltaTimeData** are: \n", "- \"Deg\": the degrees away from the source (the angle from the center of the Earth)\n", "- \"M\": the time of the P wave arrival in minutes\n", "- \"S\": the P wave arrival in seconds (added to the minutes)\n", "- \"M.1\": the difference in the P and S wave arrival time in minutes and \n", "- \"S.1\" the seconds for the time differende in the P and S waves. \n", "\n", "Each one of these columns is a **Pandas Series.** So to review:, **DataFrames** are like Excel spreadsheets and **Series** are one column of the spreadsheet. \n", "\n", "If we like (and I do), we can change the column names by setting **DataFrame.columns** to a list with the new (more meaningful) column names: " ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DegreesP_wave_minutesP_wave_secondsS-P_minutesS-P_seconds
00.005.404.0
10.5010.607.8
21.0017.7013.5
31.5024.6019.0
42.0031.4024.4
\n", "
" ], "text/plain": [ " Degrees P_wave_minutes P_wave_seconds S-P_minutes S-P_seconds\n", "0 0.0 0 5.4 0 4.0\n", "1 0.5 0 10.6 0 7.8\n", "2 1.0 0 17.7 0 13.5\n", "3 1.5 0 24.6 0 19.0\n", "4 2.0 0 31.4 0 24.4" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DeltaTimeData.columns=['Degrees','P_wave_minutes',\\\n", " 'P_wave_seconds','S-P_minutes','S-P_seconds']\n", "DeltaTimeData.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To save a DataFrame to a file, we use the **to_csv** method: " ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "DeltaTimeData.to_csv('PSArrival.csv', index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Without the argument **index=False**, there is an annoying extra column with all the DataFrame's index numbers, with **index** set to False, these do not appear. You can check it out with excel or something. \n", "\n", "Also, there are many other file formats besides 'comma separated variable' (.csv) which can be saved using the **sep** argument. **sep** stands for \"separator\". For example, sep='\\t' makes it a tab delimited (separated) file: " ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "DeltaTimeData.to_csv('PSArrival.txt',sep='\\t', index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Back to the science\n", "\n", "What we really want for our \"science\" problem is the arrival time in decimal minutes, not minutes and seconds as in this data file. We can do this by defining a new column (\"P\\_decimal\\_minutes\"), converting the seconds data to decimal minutes (by dividing by 60) and adding that to the minutes: " ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DegreesP_wave_minutesP_wave_secondsS-P_minutesS-P_secondsP_decimal_minutes
00.005.404.00.090000
10.5010.607.80.176667
21.0017.7013.50.295000
31.5024.6019.00.410000
42.0031.4024.40.523333
\n", "
" ], "text/plain": [ " Degrees P_wave_minutes P_wave_seconds S-P_minutes S-P_seconds \\\n", "0 0.0 0 5.4 0 4.0 \n", "1 0.5 0 10.6 0 7.8 \n", "2 1.0 0 17.7 0 13.5 \n", "3 1.5 0 24.6 0 19.0 \n", "4 2.0 0 31.4 0 24.4 \n", "\n", " P_decimal_minutes \n", "0 0.090000 \n", "1 0.176667 \n", "2 0.295000 \n", "3 0.410000 \n", "4 0.523333 " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DeltaTimeData['P_decimal_minutes']=DeltaTimeData['P_wave_minutes']\\\n", " +DeltaTimeData['P_wave_seconds']/60.\n", "DeltaTimeData.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how we have a new column which is the decimal minutes after the Earthquake that the P wave arrived at that angular distance (Deg). \n", "\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Degrees', 'P_wave_minutes', 'P_wave_seconds', 'S-P_minutes',\n", " 'S-P_seconds', 'P_decimal_minutes'],\n", " dtype='object')" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DeltaTimeData.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We would also like the time of S wave arrival, rather than the time between the S and P wave arrivals" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DegreesP_wave_minutesP_wave_secondsS-P_minutesS-P_secondsP_decimal_minutesSP_decimal_minutesS_decimal_minutes
00.005.404.00.0900000.0666670.156667
10.5010.607.80.1766670.1300000.306667
21.0017.7013.50.2950000.2250000.520000
31.5024.6019.00.4100000.3166670.726667
42.0031.4024.40.5233330.4066670.930000
\n", "
" ], "text/plain": [ " Degrees P_wave_minutes P_wave_seconds S-P_minutes S-P_seconds \\\n", "0 0.0 0 5.4 0 4.0 \n", "1 0.5 0 10.6 0 7.8 \n", "2 1.0 0 17.7 0 13.5 \n", "3 1.5 0 24.6 0 19.0 \n", "4 2.0 0 31.4 0 24.4 \n", "\n", " P_decimal_minutes SP_decimal_minutes S_decimal_minutes \n", "0 0.090000 0.066667 0.156667 \n", "1 0.176667 0.130000 0.306667 \n", "2 0.295000 0.225000 0.520000 \n", "3 0.410000 0.316667 0.726667 \n", "4 0.523333 0.406667 0.930000 " ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DeltaTimeData['SP_decimal_minutes']=DeltaTimeData['S-P_minutes']+\\\n", " DeltaTimeData['S-P_seconds']/60. # convert delay time to decimal minutes\n", "DeltaTimeData['S_decimal_minutes']=DeltaTimeData['P_decimal_minutes']+\\\n", " DeltaTimeData['SP_decimal_minutes'] # calculate S wave arrival time in minutes\n", "DeltaTimeData.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK. Now we are ready to make a plot of our travel time and angular distances (just like in the movie!). Combining **Pandas DataFrames** with **matplotlib** turns out to be pretty simply, where we just use the name of the **Series** we want to plot as an argument in **plt.plot( )**. " ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(DeltaTimeData['Degrees'],DeltaTimeData['P_decimal_minutes'],'r-',label='P waves',linewidth=2)\n", " # plots the P wave arrival as red lines\n", "# notice the linewidth=2 makes the line heavier. the default is linewidth=1\n", "plt.plot(DeltaTimeData['Degrees'],DeltaTimeData['S_decimal_minutes'],'b-',label='S waves') \n", " # plots the S wave arrival as blue lines\n", "plt.xlabel('Arc Distance (degrees)') # labels the x axis\n", "plt.ylabel('Time (minutes)'); # labels the y axis\n", "plt.legend(loc=2); # location 2 is in the upper left hand corner" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or we could plot the data as squares and triangles" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(DeltaTimeData['Degrees'],DeltaTimeData['P_decimal_minutes'],'r^',label='P waves')\n", " # plots the P wave arrival as red triangles (^)\n", "plt.plot(DeltaTimeData['Degrees'],DeltaTimeData['S_decimal_minutes'],'bs',label='S waves') \n", " # plots the S wave arrival as blue squares (s)\n", "plt.xlabel('Arc Distance (degrees)') # labels the x axis\n", "plt.ylabel('Time (minutes)'); # labels the y axis\n", "plt.legend(loc=2); # location 2 is in the upper left hand corner" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have two different symbols, red triangles and blue squares. We used the **label** argument to label the symbol types and then use the **plt.legend( )** method to place the legend onto the figure. \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next lecture, we'll use this on real P and S wave arrival data. We will calulate the time delay between the P and S wave arrival, find that time delay in our DeltaTimeData DataFrame, find the corresponding angular distance, and then calculate the actual great circle between the two points to help find the location of the source. So stay tuned. :) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we go, let's leave a nice data file to work on. We can make a **DataFrame** of our earthquake data called **EQ**." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789...25190251912519225193251942519525196251972519825199
00.00.0008330.0016670.00250.0033330.0041670.0050.0058330.0066670.0075...20.99166720.992520.99333320.99416720.99520.99583320.99666720.997520.99833320.999167
11807.01749.0000001694.0000001618.00001516.0000001394.0000001282.0001198.0000001077.000000957.0000...-9275.000000-10063.0000-10806.000000-11515.000000-12214.000-12915.000000-13599.000000-14264.0000-14888.000000-15489.000000
\n", "

2 rows × 25200 columns

\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 \\\n", "0 0.0 0.000833 0.001667 0.0025 0.003333 0.004167 \n", "1 1807.0 1749.000000 1694.000000 1618.0000 1516.000000 1394.000000 \n", "\n", " 6 7 8 9 ... 25190 \\\n", "0 0.005 0.005833 0.006667 0.0075 ... 20.991667 \n", "1 1282.000 1198.000000 1077.000000 957.0000 ... -9275.000000 \n", "\n", " 25191 25192 25193 25194 25195 \\\n", "0 20.9925 20.993333 20.994167 20.995 20.995833 \n", "1 -10063.0000 -10806.000000 -11515.000000 -12214.000 -12915.000000 \n", "\n", " 25196 25197 25198 25199 \n", "0 20.996667 20.9975 20.998333 20.999167 \n", "1 -13599.000000 -14264.0000 -14888.000000 -15489.000000 \n", "\n", "[2 rows x 25200 columns]" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "EQ=pd.DataFrame([minutes,velocity])\n", "EQ.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hmmm, that doesn't look like what we really need. First, there are only two rows (minutes and velocity) and we want\n", "two columns, not two rows. To do that, we can transpose the **DataFrame**, just like a **NumPy** array: " ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01
00.0000001807.0
10.0008331749.0
20.0016671694.0
30.0025001618.0
40.0033331516.0
\n", "
" ], "text/plain": [ " 0 1\n", "0 0.000000 1807.0\n", "1 0.000833 1749.0\n", "2 0.001667 1694.0\n", "3 0.002500 1618.0\n", "4 0.003333 1516.0" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "EQ=pd.DataFrame([minutes,velocity]).transpose()\n", "EQ.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That is better, but we would really like column headers with names, not numbers so we change the column headers with the **column** attribute (as before). " ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MinutesVelocity
00.0000001807.0
10.0008331749.0
20.0016671694.0
30.0025001618.0
40.0033331516.0
\n", "
" ], "text/plain": [ " Minutes Velocity\n", "0 0.000000 1807.0\n", "1 0.000833 1749.0\n", "2 0.001667 1694.0\n", "3 0.002500 1618.0\n", "4 0.003333 1516.0" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "EQ.columns=['Minutes','Velocity']\n", "EQ.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we now we can save our DataFrames as a file. " ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "EQ.to_csv('minutes_velocity.csv',index=None)\n", "DeltaTimeData.to_csv('DeltaTimeData.csv',index=None)\n", "# by setting index to None, we don't have the indices as a column in the datafile. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can safely leave this project until later. \n" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "# cleanup\n", "os.remove('seismogram.png')\n", "os.remove('PSArrival.csv') \n", "os.remove('PSArrival.txt') \n", "# this command will delete anything that starts with PSArrival.\n", "os.remove('minutes_velocity.csv') # I already put this into the next Lecture so don't worry. \n", "os.remove('DeltaTimeData.csv')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 1 }