{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "ipub": { "ignore": true } }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Analysing structured data with data frames \n", "\n", "(c) 2019 [Steve Phelps](mailto:sphelps@sphelps.net)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Data frames\n", "\n", "- The `pandas` module provides a powerful data-structure called a data frame.\n", "\n", "- It is similar, but not identical to:\n", " - a table in a relational database,\n", " - an Excel spreadsheet,\n", " - a dataframe in R.\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Types of data\n", "\n", "Data frames can be used to represent:\n", "\n", "- [Panel data](https://en.wikipedia.org/wiki/Panel_data)\n", "- [Time series](https://en.wikipedia.org/wiki/Time_series) data\n", "- [Relational data](https://en.wikipedia.org/wiki/Relational_model)\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Loading data\n", "\n", "- Data frames can be read and written to/from:\n", " - financial web sites\n", " - database queries\n", " - database tables\n", " - CSV files\n", " - json files\n", " \n", "- Beware that data frames are memory resident;\n", " - If you read a large amount of data your PC might crash\n", " - With big data, typically you would read a subset or summary of the data via e.g. a select statement." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Importing pandas\n", "\n", "- The pandas module is usually imported with the alias `pd`.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Series\n", "\n", "- A Series contains a one-dimensional array of data, *and* an associated sequence of labels called the *index*.\n", "\n", "- The index can contain numeric, string, or date/time values.\n", "\n", "- When the index is a time value, the series is a [time series](https://en.wikipedia.org/wiki/Time_series).\n", "\n", "- The index must be the same length as the data.\n", "\n", "- If no index is supplied it is automatically generated as `range(len(data))`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Creating a series from an array\n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0.03245675, 0.41263151, -0.27993028, -0.95398035, -0.01473876])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "data = np.random.randn(5)\n", "data" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 0.032457\n", "b 0.412632\n", "c -0.279930\n", "d -0.953980\n", "e -0.014739\n", "dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_series = pd.Series(data, index=['a', 'b', 'c', 'd', 'e'])\n", "my_series" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Plotting a series\n", "\n", "- We can plot a series by invoking the `plot()` method on an instance of a `Series` object.\n", "\n", "- The x-axis will autimatically be labelled with the series index." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "my_series.plot()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Creating a series with automatic index\n", "\n", "- In the following example the index is creating automatically:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0.032457\n", "1 0.412632\n", "2 -0.279930\n", "3 -0.953980\n", "4 -0.014739\n", "dtype: float64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(data)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Creating a Series from a `dict`\n", "\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 0.0\n", "b 1.0\n", "c 2.0\n", "dtype: float64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = {'a' : 0., 'b' : 1., 'c' : 2.}\n", "my_series = pd.Series(d)\n", "my_series" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Indexing a series with `[]`\n", "\n", "- Series can be accessed using the same syntax as arrays and dicts.\n", "\n", "- We use the labels in the index to access each element.\n", "\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_series['b']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- We can also use the label like an attribute:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_series.b" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Slicing a series\n", "\n", "\n", "- We can specify a range of labels to obtain a slice:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b 1.0\n", "c 2.0\n", "dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_series[['b', 'c']]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Arithmetic and vectorised functions\n", "\n", "- `numpy` vectorization works for series objects too.\n", "\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 0.0\n", "b 1.0\n", "c 4.0\n", "dtype: float64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = {'a' : 0., 'b' : 1., 'c' : 2.}\n", "squared_values = pd.Series(d) ** 2\n", "squared_values" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 3.0\n", "b 5.0\n", "c 7.0\n", "dtype: float64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = pd.Series({'a' : 0., 'b' : 1., 'c' : 2.})\n", "y = pd.Series({'a' : 3., 'b' : 4., 'c' : 5.})\n", "x + y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Time series" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',\n", " '2000-01-05'],\n", " dtype='datetime64[ns]', freq='D')" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dates = pd.date_range('1/1/2000', periods=5)\n", "dates" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2000-01-01 0.032457\n", "2000-01-02 0.412632\n", "2000-01-03 -0.279930\n", "2000-01-04 -0.953980\n", "2000-01-05 -0.014739\n", "Freq: D, dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "time_series = pd.Series(data, index=dates)\n", "time_series" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Plotting a time-series" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ax = time_series.plot()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Missing values\n", "\n", "- Pandas uses `nan` to represent missing data.\n", "\n", "- So `nan` is used to represent missing, invalid or unknown data values.\n", "\n", "- It is important to note that this only convention only applies within pandas.\n", " - Other frameworks have very different semantics for these values.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## DataFrame\n", "\n", "- A data frame has multiple columns, each of which can hold a *different* type of value.\n", "\n", "- Like a series, it has an index which provides a label for each and every row. \n", "\n", "- Data frames can be constructed from:\n", " - dict of arrays,\n", " - dict of lists,\n", " - dict of dict\n", " - dict of Series\n", " - 2-dimensional array\n", " - a single Series\n", " - another DataFrame" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "## Creating a dict of series" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'x': a 1.0\n", " b 2.0\n", " c 3.0\n", " dtype: float64, 'y': a 4.0\n", " b 5.0\n", " c 6.0\n", " d 7.0\n", " dtype: float64, 'z': a 0.1\n", " b 0.2\n", " c 0.3\n", " d 0.4\n", " dtype: float64}" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "series_dict = {\n", " 'x' : \n", " pd.Series([1., 2., 3.], index=['a', 'b', 'c']),\n", " 'y' : \n", " pd.Series([4., 5., 6., 7.], index=['a', 'b', 'c', 'd']),\n", " 'z' :\n", " pd.Series([0.1, 0.2, 0.3, 0.4], index=['a', 'b', 'c', 'd'])\n", "}\n", "\n", "series_dict" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Converting the dict to a data frame" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xyz
a1.04.00.1
b2.05.00.2
c3.06.00.3
dNaN7.00.4
\n", "
" ], "text/plain": [ " x y z\n", "a 1.0 4.0 0.1\n", "b 2.0 5.0 0.2\n", "c 3.0 6.0 0.3\n", "d NaN 7.0 0.4" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(series_dict)\n", "df" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Plotting data frames\n", "\n", "- When plotting a data frame, each column is plotted as its own series on the same graph.\n", "\n", "- The column names are used to label each series.\n", "\n", "- The row names (index) is used to label the x-axis." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ax = df.plot()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- The outer dimension is the column index.\n", "\n", "- When we retrieve a single column, the result is a Series" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 1.0\n", "b 2.0\n", "c 3.0\n", "d NaN\n", "Name: x, dtype: float64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['x']" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['x']['b']" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.x.b" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Projections\n", "\n", "- Data frames can be sliced just like series.\n", "- When we slice columns we call this a *projection*, because it is analogous to specifying a subset of attributes in a relational query, e.g. `SELECT x FROM table`.\n", "- If we project a single column the result is a series:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "b 2.0\n", "c 3.0\n", "Name: x, dtype: float64" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "slice = df['x'][['b', 'c']]\n", "slice" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(slice)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Projecting multiple columns\n", "\n", "- When we include multiple columns in the projection the result is a DataFrame." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xy
a1.04.0
b2.05.0
c3.06.0
dNaN7.0
\n", "
" ], "text/plain": [ " x y\n", "a 1.0 4.0\n", "b 2.0 5.0\n", "c 3.0 6.0\n", "d NaN 7.0" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "slice = df[['x', 'y']]\n", "slice" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(slice)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Vectorization\n", "\n", "- Vectorized functions and operators work just as with series objects:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 5.0\n", "b 7.0\n", "c 9.0\n", "d NaN\n", "dtype: float64" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['x'] + df['y']" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xyz
a1.016.00.01
b4.025.00.04
c9.036.00.09
dNaN49.00.16
\n", "
" ], "text/plain": [ " x y z\n", "a 1.0 16.0 0.01\n", "b 4.0 25.0 0.04\n", "c 9.0 36.0 0.09\n", "d NaN 49.0 0.16" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df ** 2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Logical indexing\n", "\n", "- We can use logical indexing to retrieve a subset of the data.\n", "\n" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a False\n", "b True\n", "c True\n", "d False\n", "Name: x, dtype: bool" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['x'] >= 2" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xyz
b2.05.00.2
c3.06.00.3
\n", "
" ], "text/plain": [ " x y z\n", "b 2.0 5.0 0.2\n", "c 3.0 6.0 0.3" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['x'] >= 2]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Descriptive statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- To quickly obtain descriptive statistics on numerical values use the `describe` method." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xyz
count3.04.0000004.000000
mean2.05.5000000.250000
std1.01.2909940.129099
min1.04.0000000.100000
25%1.54.7500000.175000
50%2.05.5000000.250000
75%2.56.2500000.325000
max3.07.0000000.400000
\n", "
" ], "text/plain": [ " x y z\n", "count 3.0 4.000000 4.000000\n", "mean 2.0 5.500000 0.250000\n", "std 1.0 1.290994 0.129099\n", "min 1.0 4.000000 0.100000\n", "25% 1.5 4.750000 0.175000\n", "50% 2.0 5.500000 0.250000\n", "75% 2.5 6.250000 0.325000\n", "max 3.0 7.000000 0.400000" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Accessing a single statistic\n", "\n", "- The result is itself a DataFrame, so we can index a particular statistic like so:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()['x']['mean']" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Accessing the row and column labels\n", "\n", "- The row labels (index) and column labels can be accessed:\n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['a', 'b', 'c', 'd'], dtype='object')" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.index" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['x', 'y', 'z'], dtype='object')" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Head and tail\n", "\n", "- Data frames have `head()` and `tail()` methods which behave analgously to the Unix commands of the same name." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Financial data\n", "\n", "- Pandas was originally developed to analyse financial data.\n", "\n", "- We can download tabulated data in a portable format called [Comma Separated Values (CSV)](https://www.loc.gov/preservation/digital/formats/fdd/fdd000323.shtml)." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "googl = pd.read_csv('data/GOOGL.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Examining the first few rows\n", "\n", "- When working with large data sets it is useful to view just the first/last few rows in the dataset.\n", "\n", "- We can use the `head()` method to retrieve the first rows:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateOpenHighLowCloseAdj CloseVolume
02013-11-13503.878876516.941956503.753754516.751770516.7517703155600
12013-11-14517.477478520.395386515.690674518.133118518.1331182331000
22013-11-15517.952942519.519531515.670654517.297302517.2973022550000
32013-11-18518.393372524.894897515.135132516.291321516.2913213515800
42013-11-19516.376404517.892883512.037048513.113098513.1130982260900
\n", "
" ], "text/plain": [ " Date Open High Low Close Adj Close \\\n", "0 2013-11-13 503.878876 516.941956 503.753754 516.751770 516.751770 \n", "1 2013-11-14 517.477478 520.395386 515.690674 518.133118 518.133118 \n", "2 2013-11-15 517.952942 519.519531 515.670654 517.297302 517.297302 \n", "3 2013-11-18 518.393372 524.894897 515.135132 516.291321 516.291321 \n", "4 2013-11-19 516.376404 517.892883 512.037048 513.113098 513.113098 \n", "\n", " Volume \n", "0 3155600 \n", "1 2331000 \n", "2 2550000 \n", "3 3515800 \n", "4 2260900 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "googl.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Examining the last few rows" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateOpenHighLowCloseAdj CloseVolume
15052019-11-061290.0899661292.9899901282.2700201291.0100101291.0100101231300
15062019-11-071294.2800291322.6500241293.7500001306.9399411306.9399412257000
15072019-11-081301.5200201317.1099851301.5200201309.0000001309.0000001519600
15082019-11-111304.0000001304.9000241295.8699951298.2800291298.280029861700
15092019-11-121298.5699461309.3499761294.2399901297.2099611297.2099611442600
\n", "
" ], "text/plain": [ " Date Open High Low Close \\\n", "1505 2019-11-06 1290.089966 1292.989990 1282.270020 1291.010010 \n", "1506 2019-11-07 1294.280029 1322.650024 1293.750000 1306.939941 \n", "1507 2019-11-08 1301.520020 1317.109985 1301.520020 1309.000000 \n", "1508 2019-11-11 1304.000000 1304.900024 1295.869995 1298.280029 \n", "1509 2019-11-12 1298.569946 1309.349976 1294.239990 1297.209961 \n", "\n", " Adj Close Volume \n", "1505 1291.010010 1231300 \n", "1506 1306.939941 2257000 \n", "1507 1309.000000 1519600 \n", "1508 1298.280029 861700 \n", "1509 1297.209961 1442600 " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "googl.tail()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Converting to datetime values\n", "\n", "- So far, the `Date` attribute is of type string." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2013-11-13'" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "googl.Date[0]" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(googl.Date[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- In order to work with time-series data, we need to construct an index containing time values.\n", "\n", "- Time values are of type `datetime` or `Timestamp`.\n", "\n", "- We can use the function `to_datetime()` to convert strings to time values." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 2013-11-13\n", "1 2013-11-14\n", "2 2013-11-15\n", "3 2013-11-18\n", "4 2013-11-19\n", "Name: Date, dtype: datetime64[ns]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.to_datetime(googl['Date']).head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Setting the index\n", "\n", "- Now we need to set the index of the data-frame so that it contains the sequence of dates.\n" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Timestamp('2013-11-13 00:00:00')" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "googl.set_index(pd.to_datetime(googl['Date']), inplace=True)\n", "googl.index[0]" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas._libs.tslibs.timestamps.Timestamp" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(googl.index[0])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Plotting series\n", "\n", "- We can plot a series in a dataframe by invoking its `plot()` method.\n", "\n", "- Here we plot a time-series of the daily traded volume:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ax = googl['Volume'].plot()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Adjusted closing prices as a time series" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "googl['Adj Close'].plot()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "source": [ "### Slicing series using date/time stamps\n", "\n", "- We can slice a time series by specifying a range of dates or times.\n", "\n", "- Date and time stamps are specified strings representing dates in the required format." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "googl['Adj Close']['1-1-2016':'1-1-2017'].plot()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Resampling \n", "\n", "- We can *resample* to obtain e.g. weekly or monthly prices.\n", "\n", "- In the example below the `'W'` denotes weekly.\n", "\n", "- See [the documentation](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) for other frequencies.\n", "\n", "- We group data into weeks, and then take the last value in each week.\n", "\n", "- For details of other ways to resample the data, see [the documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Resampled time-series plot" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "Date\n", "2013-11-17 517.297302\n", "2013-11-24 516.461487\n", "2013-12-01 530.325317\n", "2013-12-08 535.470459\n", "2013-12-15 530.925903\n", "Freq: W-SUN, Name: Adj Close, dtype: float64" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weekly_prices = googl['Adj Close'].resample('W').last()\n", "weekly_prices.head()" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "weekly_prices.plot()\n", "plt.title('Prices for GOOGL sampled at weekly frequency')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Converting prices to log returns" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "weekly_rets = np.diff(np.log(weekly_prices))\n", "plt.plot(weekly_rets)\n", "plt.xlabel('t'); plt.ylabel('$r_t$')\n", "plt.title('Weekly log-returns for GOOGL')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Converting the returns to a series\n", "\n", "- Notice that in the above plot the time axis is missing the dates.\n", "\n", "- This is because the `np.diff()` function returns an array instead of a data-frame.\n" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numpy.ndarray" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(weekly_rets)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- We can convert it to a series thus:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Date\n", "2013-11-24 -0.001617\n", "2013-12-01 0.026490\n", "2013-12-08 0.009655\n", "2013-12-15 -0.008523\n", "2013-12-22 0.036860\n", "Freq: W-SUN, dtype: float64" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weekly_rets_series = pd.Series(weekly_rets, index=weekly_prices.index[1:])\n", "weekly_rets_series.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Plotting with the correct time axis" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Now when we plot the series we will obtain the correct time axis:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(weekly_rets_series)\n", "plt.title('GOOGL weekly log-returns'); plt.xlabel('t'); plt.ylabel('$r_t$')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Plotting a return histogram" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD4CAYAAAAXUaZHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAASuElEQVR4nO3dfZBddX3H8fe3RB5kWxKMbpkE3TCmtkhslS21MtW7xgqCCjPFmTjUBkonU5/K1HRKKNNhpjPMYDvU2umDzYg1jtYFqS0pUTFGVutMQRNElgcxEVNMSIMKRBcz2Oi3f+xhelk32bvn3Ht378/3a2Zn7z3n/M757MnJZ8+e+xSZiSSpLD+30AEkSd1nuUtSgSx3SSqQ5S5JBbLcJalASxY6AMDy5ctzZGSk1tinnnqKk08+ubuB+mRQs5u7v8zdX4OUe9euXd/NzOfPNm9RlPvIyAg7d+6sNXZiYoJWq9XdQH0yqNnN3V/m7q9Byh0R/320eV6WkaQCWe6SVCDLXZIKZLlLUoEsd0kqkOUuSQWy3CWpQJa7JBXIcpekAi2KV6hqcIxs2sbGNUe4bNO2vm537/UX9nV70qDzzF2SCmS5S1KBLHdJKpDlLkkFstwlqUCWuyQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFchyl6QCWe6SVKA5yz0iPhQRj0XEfW3T/ioivh4R90bEv0XE0rZ5V0fEnoh4KCLO61VwSdLRdXLm/mHg/BnTtgNnZebLgG8AVwNExJnAOuCl1Zh/iIjjupZWktSROcs9M78IPD5j2mcz80h1905gZXX7ImA8M5/OzG8Be4BzuphXktSByMy5F4oYAW7LzLNmmfcfwE2Z+dGI+Dvgzsz8aDXvRuDTmXnLLOM2ABsAhoeHzx4fH6/1A0xNTTE0NFRr7EIbxOyT+w8xfBIcPNzf7a5ZcUrjdQzi/gZz99sg5R4bG9uVmaOzzWv0AdkRcQ1wBPjYM5NmWWzW3x6ZuRnYDDA6OpqtVqtWhomJCeqOXWiDmP2y6gOyb5js72er77201Xgdg7i/wdz9Nqi5Z6r9PzQi1gNvBNbm/5/+7wNOb1tsJfBo/XiSpDpqPRUyIs4HrgLenJk/bJu1FVgXESdExCpgNfDl5jElSfMx55l7RHwcaAHLI2IfcC3Tz445AdgeETB9nf0PM/P+iLgZeIDpyzXvzMwf9yq8JGl2c5Z7Zr51lsk3HmP564DrmoSSJDXjK1QlqUCWuyQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFchyl6QCWe6SVCDLXZIKZLlLUoEsd0kqkOUuSQWy3CWpQJa7JBXIcpekAlnuklQgy12SCmS5S1KB5iz3iPhQRDwWEfe1TTs1IrZHxO7q+7JqekTE30bEnoi4NyJe0cvwkqTZdXLm/mHg/BnTNgE7MnM1sKO6D/AGYHX1tQH4x+7ElCTNx5zlnplfBB6fMfkiYEt1ewtwcdv0j+S0O4GlEXFat8JKkjoTmTn3QhEjwG2ZeVZ1/8nMXNo2/4nMXBYRtwHXZ+aXquk7gKsyc+cs69zA9Nk9w8PDZ4+Pj9f6AaamphgaGqo1dqENYvbJ/YcYPgkOHu7vdtesOKXxOgZxf4O5+22Qco+Nje3KzNHZ5i3p8rZilmmz/vbIzM3AZoDR0dFstVq1NjgxMUHdsQttELNftmkbG9cc4YbJbh86x7b30lbjdQzi/gZz99ug5p6p7v/QgxFxWmYeqC67PFZN3wec3rbcSuDRJgH100Y2bVvoCJIWubpPhdwKrK9urwdubZv+e9WzZl4JHMrMAw0zSpLmac4z94j4ONAClkfEPuBa4Hrg5oi4AngEeEu1+KeAC4A9wA+By3uQWZI0hznLPTPfepRZa2dZNoF3Ng0lSWrGV6hKUoEsd0kqkOUuSQWy3CWpQJa7JBXIcpekAlnuklQgy12SCmS5S1KBLHdJKpDlLkkFstwlqUCWuyQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFchyl6QCNSr3iPjjiLg/Iu6LiI9HxIkRsSoi7oqI3RFxU0Qc362wkqTO1C73iFgB/BEwmplnAccB64D3Au/LzNXAE8AV3QgqSepc08syS4CTImIJ8FzgAPBa4JZq/hbg4obbkCTNU2Rm/cERVwLXAYeBzwJXAndm5our+acDn67O7GeO3QBsABgeHj57fHy8VoapqSmGhobq/QALrG72yf2HepCmc8MnwcHD/d3mmhWnNF7HoB4r5u6vQco9Nja2KzNHZ5u3pO5KI2IZcBGwCngS+ATwhlkWnfW3R2ZuBjYDjI6OZqvVqpVjYmKCumMXWt3sl23a1v0w87BxzRFumKx96NSy99JW43UM6rFi7v4a1NwzNbks8zrgW5n5ncz8X+CTwKuApdVlGoCVwKMNM0qS5qlJuT8CvDIinhsRAawFHgDuAC6pllkP3NosoiRpvmqXe2bexfQDp3cDk9W6NgNXAe+JiD3A84Abu5BTkjQPjS6cZua1wLUzJj8MnNNkvZKkZnyFqiQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFchyl6QCWe6SVCDLXZIKZLlLUoEsd0kqkOUuSQWy3CWpQJa7JBXIcpekAlnuklQgy12SCmS5S1KBLHdJKlCjco+IpRFxS0R8PSIejIjfjIhTI2J7ROyuvi/rVlhJUmeanrm/H/hMZv4y8KvAg8AmYEdmrgZ2VPclSX1Uu9wj4heAVwM3AmTmjzLzSeAiYEu12Bbg4qYhJUnzE5lZb2DErwGbgQeYPmvfBVwJ7M/MpW3LPZGZP3VpJiI2ABsAhoeHzx4fH6+VY2pqiqGhoVpjF1rd7JP7D/UgTeeGT4KDh/u7zTUrTmm8jkE9VszdX4OUe2xsbFdmjs42r0m5jwJ3Audm5l0R8X7g+8C7Oyn3dqOjo7lz585aOSYmJmi1WrXGLrS62Uc2bet+mHnYuOYIN0wu6es2915/YeN1DOqxYu7+GqTcEXHUcm9yzX0fsC8z76ru3wK8AjgYEadVGz4NeKzBNiRJNdQu98z8H+DbEfGSatJapi/RbAXWV9PWA7c2SihJmremf1u/G/hYRBwPPAxczvQvjJsj4grgEeAtDbchSZqnRuWemfcAs13vWdtkvZKkZnyFqiQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFchyl6QCWe6SVCDLXZIKZLlLUoEsd0kqkOUuSQWy3CWpQJa7JBXIcpekAlnuklQgy12SCmS5S1KBLHdJKlDjco+I4yLiqxFxW3V/VUTcFRG7I+KmiDi+eUxJ0nx048z9SuDBtvvvBd6XmauBJ4ArurANSdI8NCr3iFgJXAh8sLofwGuBW6pFtgAXN9mGJGn+mp65/w3wp8BPqvvPA57MzCPV/X3AiobbkCTNU2RmvYERbwQuyMx3REQL+BPgcuC/MvPF1TKnA5/KzDWzjN8AbAAYHh4+e3x8vFaOqakphoaGao1daHWzT+4/1IM0nRs+CQ4e7u8216w4pfE6BvVYMXd/DVLusbGxXZk5Otu8JQ3Wey7w5oi4ADgR+AWmz+SXRsSS6ux9JfDobIMzczOwGWB0dDRbrVatEBMTE9Qdu9DqZr9s07buh5mHjWuOcMNkk0Nn/vZe2mq8jkE9VszdX4Oae6bal2Uy8+rMXJmZI8A64POZeSlwB3BJtdh64NbGKSVJ89KL57lfBbwnIvYwfQ3+xh5sQ5J0DF352zozJ4CJ6vbDwDndWK8kqR5foSpJBervo2JSTSNdeBB545ojtR6M3nv9hY23LfWbZ+6SVCDLXZIKZLlLUoG85t5A0+vAda8BS9JcPHOXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFchyl6QCWe6SVCDLXZIKZLlLUoEsd0kqkOUuSQWy3CWpQJa7JBWodrlHxOkRcUdEPBgR90fEldX0UyNie0Tsrr4v615cSVInmpy5HwE2ZuavAK8E3hkRZwKbgB2ZuRrYUd2XJPVR7XLPzAOZeXd1+wfAg8AK4CJgS7XYFuDipiElSfPTlWvuETECvBy4CxjOzAMw/QsAeEE3tiFJ6lxkZrMVRAwBXwCuy8xPRsSTmbm0bf4TmflT190jYgOwAWB4ePjs8fHxWtufmppiaGioXviGJvcfajR++CQ4eLhLYfroZy33mhWndD/MPCzkMd6EuXtvbGxsV2aOzjavUblHxHOA24DbM/Ovq2kPAa3MPBARpwETmfmSY61ndHQ0d+7cWSvDxMQErVar1timRjZtazR+45oj3DC5pEtp+udnLffe6y/sQZrOLeQx3oS5ey8ijlruTZ4tE8CNwIPPFHtlK7C+ur0euLXuNiRJ9TQ5/ToXeBswGRH3VNP+DLgeuDkirgAeAd7SLKIkab5ql3tmfgmIo8xeW3e9kqTmfIWqJBXIcpekAlnuklQgy12SCmS5S1KBLHdJKpDlLkkFstwlqUCWuyQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAg/dBmFKfNf2s3LoW+rNbNdg8c5ekAg38mfvk/kNctkBnVpK0WHnmLkkFstwlqUCWuyQVyHKXpAL1rNwj4vyIeCgi9kTEpl5tR5L003rybJmIOA74e+C3gX3AVyJia2Y+0IvtSVIT7a9l2LjmSF+fgder1zP06sz9HGBPZj6cmT8CxoGLerQtSdIMkZndX2nEJcD5mfkH1f23Ab+Rme9qW2YDsKG6+xLgoZqbWw58t0HchTSo2c3dX+bur0HK/aLMfP5sM3r1IqaYZdqzfotk5mZgc+MNRezMzNGm61kIg5rd3P1l7v4a1Nwz9eqyzD7g9Lb7K4FHe7QtSdIMvSr3rwCrI2JVRBwPrAO29mhbkqQZenJZJjOPRMS7gNuB44APZeb9vdgWXbi0s4AGNbu5+8vc/TWouZ+lJw+oSpIWlq9QlaQCWe6SVKCBKPeIODUitkfE7ur7sqMs95mIeDIibpsxfVVE3FWNv6l6kHcx5V5fLbM7Ita3TZ+o3sLhnurrBT3Oe8y3jIiIE6r9t6fanyNt866upj8UEef1Mme3ckfESEQcbtu/H+hn7g6zvzoi7o6II9XrR9rnzXrc9EPD3D9u2+d9faJFB7nfExEPRMS9EbEjIl7UNm/B9nctmbnov4C/BDZVtzcB7z3KcmuBNwG3zZh+M7Cuuv0B4O2LJTdwKvBw9X1ZdXtZNW8CGO1T1uOAbwJnAMcDXwPOnLHMO4APVLfXATdVt8+slj8BWFWt57gByD0C3NfPY7lG9hHgZcBHgEs6OW4Wc+5q3tQi3t9jwHOr229vO1YWbH/X/RqIM3em37pgS3V7C3DxbAtl5g7gB+3TIiKA1wK3zDW+BzrJfR6wPTMfz8wngO3A+X3K166Tt4xo/3luAdZW+/ciYDwzn87MbwF7qvUt9twLbc7smbk3M+8FfjJj7EIeN01yL6ROct+RmT+s7t7J9Gt0YPH8P+3YoJT7cGYeAKi+z+fyxPOAJzPzSHV/H7Ciy/mOppPcK4Bvt92fme+fqz9f/7zHhTRXjmctU+3PQ0zv307G9kqT3ACrIuKrEfGFiPitXoc9Wq7KfPbbYt/nx3JiROyMiDsjol8nWjD/3FcAn645dsEtms9QjYjPAb84y6xrmq56lmlde/5nF3IfK9+lmbk/In4e+FfgbUz/mdsLneynoy3T0308hya5DwAvzMzvRcTZwL9HxEsz8/vdDnkUTfbbYt/nx/LCzHw0Is4APh8Rk5n5zS5lO5aOc0fE7wKjwGvmO3axWDTlnpmvO9q8iDgYEadl5oGIOA14bB6r/i6wNCKWVGdtXX0rhC7k3ge02u6vZPpaO5m5v/r+g4j4F6b/rOxVuXfylhHPLLMvIpYApwCPdzi2V2rnzumLqU8DZOauiPgm8EvAzp6nfnauZ8xnvx31uOmDRv/emflo9f3hiJgAXs70tfBe6yh3RLyO6ZOz12Tm021jWzPGTvQkZZcMymWZrcAzj06vB27tdGD1H/gO4JlH7Oc1vqFOct8OvD4illXPpnk9cHtELImI5QAR8RzgjcB9PczayVtGtP88lwCfr/bvVmBd9ayUVcBq4Ms9zNqV3BHx/Jj+7AGqs8jVTD9Q1i9N3qZj1uOmRzlnqp27yntCdXs5cC7Qr895mDN3RLwc+CfgzZnZfjK2kPu7noV+RLeTL6avj+4AdlffT62mjwIfbFvuP4HvAIeZ/k17XjX9DKbLZg/wCeCERZb796tse4DLq2knA7uAe4H7gffT42egABcA32D6LOqaatpfMH2gA5xY7b891f48o23sNdW4h4A39Pn4qJUb+J1q334NuBt40wIc23Nl//XqWH4K+B5w/7GOm8WeG3gVMFnt80ngikWW+3PAQeCe6mvrYtjfdb58+wFJKtCgXJaRJM2D5S5JBbLcJalAlrskFchyl6QCWe6SVCDLXZIK9H8jxou66WsQfwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "weekly_rets_series.hist()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 313.000000\n", "mean 0.002937\n", "std 0.032039\n", "min -0.099918\n", "25% -0.013341\n", "50% 0.004653\n", "75% 0.021327\n", "max 0.229571\n", "dtype: float64" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weekly_rets_series.describe()" ] } ], "metadata": { "celltoolbar": "Edit Metadata", "ipub": { "customcss": "fitch.css" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": "3", "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "227.8px" }, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 1 }