{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[(watch section)](https://class.coursera.org/compinvesting1-003/lecture/view?lecture_id=163) \n", "[(read the wiki)](http://wiki.quantsoftware.org/index.php?title=QSTK_Tutorial_1)\n", "\n", "### Imports\n", "\n", "* numpy, pylab and matplotlib provide a number of functions to Python that give it MATLAB-like capabilities. \n", "* datetime helps us manipulate dates. \n", "* The qstkutil items are from the QuantSoftware ToolKit" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import QSTK.qstkutil.qsdateutil as du\n", "import QSTK.qstkutil.tsutil as tsu\n", "import QSTK.qstkutil.DataAccess as da\n", "\n", "import datetime as dt\n", "import matplotlib.pyplot as plt\n", "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some notebook magic, please ignore" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Some symbols and dates\n", "\n", "We'll be using historical adjusted close data. QSTK has a DataAccess class designed to quickly read this data into pandas DataFrame object. We must first select which symbols we're interested in, and for which time periods. Note that the wiki tutorial script uses 2006, but in the video tutorial, we use 2010. The end date differs to so as we have 2 weeks of data. SPX is the S&P 500." ] }, { "cell_type": "code", "collapsed": false, "input": [ "ls_symbols = [\"AAPL\", \"GLD\", \"GOOG\", \"$SPX\", \"XOM\"]\n", "dt_start = dt.datetime(2010, 1, 1)\n", "dt_end = dt.datetime(2010, 1, 15)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The time at which the market closes is 16:00." ] }, { "cell_type": "code", "collapsed": false, "input": [ "dt_timeofday = dt.timedelta(hours=16)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function ```getNYSEdays(dt_start, dt_end, dt_timeofday)``` returns the days of the interval for which the New York stock exchange was open. It adds the time of day given as param to each datetime member of the result list." ] }, { "cell_type": "code", "collapsed": false, "input": [ "ldt_timestamps = du.getNYSEdays(dt_start, dt_end, dt_timeofday)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "ldt_timestamps" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "[Timestamp('2010-01-04 16:00:00', tz=None),\n", " Timestamp('2010-01-05 16:00:00', tz=None),\n", " Timestamp('2010-01-06 16:00:00', tz=None),\n", " Timestamp('2010-01-07 16:00:00', tz=None),\n", " Timestamp('2010-01-08 16:00:00', tz=None),\n", " Timestamp('2010-01-11 16:00:00', tz=None),\n", " Timestamp('2010-01-12 16:00:00', tz=None),\n", " Timestamp('2010-01-13 16:00:00', tz=None),\n", " Timestamp('2010-01-14 16:00:00', tz=None)]" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note than Jan 4 was the first open day of the interval, and Jan 14, the last.\n", "\n", "### Data access\n", "\n", "Create an object that will be ready to read from our Yahoo data source" ] }, { "cell_type": "code", "collapsed": false, "input": [ "c_dataobj = da.DataAccess('Yahoo')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "```c_dataobj.get_data``` creates a list of dataframe objects. The dict-zipping converts this list into a dictionary for easier access." ] }, { "cell_type": "code", "collapsed": false, "input": [ "ls_keys = ['open', 'high', 'low', 'close', 'volume', 'actual_close']\n", "ldf_data = c_dataobj.get_data(ldt_timestamps, ls_symbols, ls_keys)\n", "d_data = dict(zip(ls_keys, ldf_data))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that in the list of keys, ```'close'``` refers to adjusted close, and actual_close is the raw close price. Looking at the type of the object created by get_data" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ldf_data.__class__, ldf_data[1].__class__" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "(list, pandas.core.frame.DataFrame)" ] } ], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "d_data['close']" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
| \n", " | AAPL | \n", "GLD | \n", "GOOG | \n", "$SPX | \n", "XOM | \n", "
|---|---|---|---|---|---|
| 2010-01-04 16:00:00 | \n", "213.10 | \n", "109.80 | \n", "626.75 | \n", "1132.99 | \n", "64.55 | \n", "
| 2010-01-05 16:00:00 | \n", "213.46 | \n", "109.70 | \n", "623.99 | \n", "1136.52 | \n", "64.80 | \n", "
| 2010-01-06 16:00:00 | \n", "210.07 | \n", "111.51 | \n", "608.26 | \n", "1137.14 | \n", "65.36 | \n", "
| 2010-01-07 16:00:00 | \n", "209.68 | \n", "110.82 | \n", "594.10 | \n", "1141.69 | \n", "65.15 | \n", "
| 2010-01-08 16:00:00 | \n", "211.07 | \n", "111.37 | \n", "602.02 | \n", "1144.98 | \n", "64.89 | \n", "
| 2010-01-11 16:00:00 | \n", "209.21 | \n", "112.85 | \n", "601.11 | \n", "1146.98 | \n", "65.62 | \n", "
| 2010-01-12 16:00:00 | \n", "206.83 | \n", "110.49 | \n", "590.48 | \n", "1136.22 | \n", "65.29 | \n", "
| 2010-01-13 16:00:00 | \n", "209.75 | \n", "111.54 | \n", "587.09 | \n", "1145.68 | \n", "65.03 | \n", "
| 2010-01-14 16:00:00 | \n", "208.53 | \n", "112.03 | \n", "589.85 | \n", "1148.46 | \n", "65.04 | \n", "