{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Standardized Option Quotes Data Format\n", "======================================\n", "\n", "To facilitate model calibration, we propose the definition of a standard data set, which contains all the\n", "necessary information. The data is held in a [Panda](http://pandas.pydata.org) table of type *option_quotes*, with one row per quote and\n", "8 columns. This table is defined in *quantlib/reference/data_structures.py*. The column names are defined in reference/names.py, and are as follows:\n", "\n", "* TRADE_DATE: Quote date, or time stamp\n", "* STRIKE: Ditto\n", "* EXPIRY_DATE: Option expiry date\n", "* OPTION_TYPE: Call/Put flag, coded as \"C\" or \"P\"\n", "* SPOT: Price of underlying asset\n", "* EXERCISE_STYLE: European/American, coded as \"Amer\" or \"Euro\"\n", "* PRICE_BID: Bid price\n", "* PRICE_ASK: Ask price\n", "\n", "We do not include the dividend yield nor the risk-free rate in the data set: The \n", "implied forward price and risk-free rate are estimated from the call/put parity. \n", "\n", "This notebook demonstrates the creation of such data file by processing the quotes on the S&P 500 index options (SPX) \n", "provided by the Chicago Board of Options Exchange (CBOE).\n", "\n", "\n", "Obtaining SPX Option Quotes\n", "---------------------------\n", "\n", "SPX delayed options quotes are published by the [CBOE](http://www.cboe.com/DelayedQuote/QuoteTableDownload.aspx), \n", "in a comma-separated format. The file provides:\n", "\n", "* the value of the underlying index\n", "* bid and ask prices for calls and puts, by strike and expiry date. The expiry date can be extracted from the option ticker. \n", "* other information, such as volume and open interest, also by strike and option type.\n", "\n", "SPX Option Data Processing\n", "--------------------------\n", "\n", "We provide below the procedure for converting the raw SPX option data file into the standardized option quotes data format.\n", "\n" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "### SPX Utility functions\n", "\n", "These functions parse the SPX option names, and extract expiry date and strike.\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from __future__ import print_function\n", "import pandas\n", "import datetime\n", "import dateutil\n", "import re\n", "import os\n", "import quantlib.reference.names as nm\n", "import quantlib.reference.data_structures as ds\n", "\n", "def ExpiryMonth(s):\n", " \"\"\"\n", " Convert SPX contract months into month number\n", " \"\"\"\n", " call_months = \"ABCDEFGHIJKL\"\n", " put_months = \"MNOPQRSTUVWX\"\n", "\n", " try:\n", " m = call_months.index(s)\n", " except ValueError:\n", " m = put_months.index(s)\n", "\n", " return m\n", "\n", "spx_symbol = re.compile(\"\\\\(SPX(1[0-9])([0-9]{2})([A-Z])([0-9]{3,4})-E\\\\)\")\n", "\n", "def parseSPX(s):\n", " \"\"\"\n", " Parse an SPX quote string, return expiry date and strike\n", " \"\"\"\n", " tokens = spx_symbol.split(s)\n", "\n", " if len(tokens) == 1:\n", " return {'dtExpiry': None, 'strike': -1}\n", "\n", " year = 2000 + int(tokens[1])\n", " day = int(tokens[2])\n", " month = ExpiryMonth(tokens[3])\n", " strike = float(tokens[4])\n", "\n", " dtExpiry = datetime.date(year, month, day)\n", "\n", " return ({'dtExpiry': dtExpiry, 'strike': strike})\n", "\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reading the SPX raw data file\n", "\n", "The csv file downloaded from the CBOE site can be converted into a \n", "standard *option_quotes* panda data frame by the following function. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "def read_SPX_file(option_data_file):\n", " \"\"\"\n", " Read SPX csv file,\n", " return spot and a data frame of type option_quotes\n", " \"\"\"\n", " \n", " # read two lines for spot price and trade date\n", " with open(option_data_file) as fid:\n", " lineOne = fid.readline()\n", " spot = float(lineOne.split(',')[1])\n", "\n", " lineTwo = fid.readline()\n", " dt = lineTwo.split('@')[0]\n", " dtTrade = dateutil.parser.parse(dt).date()\n", "\n", " print('Dt Calc: %s Spot: %f' % (dtTrade, spot))\n", " \n", " # read all option price records as a data frame\n", " df = pandas.io.parsers.read_csv(option_data_file, header=0, sep=',', skiprows=[0,1])\n", " \n", " # split and stack calls and puts\n", " \n", " call_df = df[['Calls', 'Bid', 'Ask']]\n", " call_df = call_df.rename(columns={'Calls':'Spec', 'Bid':'PBid', 'Ask': 'PAsk'}) \n", " call_df['Type'] = nm.CALL_OPTION\n", " \n", " put_df = df[['Puts', 'Bid.1', 'Ask.1']]\n", " put_df = put_df.rename(columns = {'Puts':'Spec', 'Bid.1':'PBid',\n", " 'Ask.1':'PAsk'}) \n", " put_df['Type'] = nm.PUT_OPTION\n", " \n", " df_all = call_df.append(put_df, ignore_index=True)\n", " \n", " # parse Calls and Puts columns for strike and contract month\n", " # insert into data frame\n", " \n", " cp = [parseSPX(s) for s in df_all['Spec']]\n", " \n", " option_quotes = ds.option_quotes_template()\n", " option_quotes = option_quotes.reindex(index=range(len(cp)))\n", " \n", " # Fill the option_quotes data frame\n", " \n", " option_quotes[nm.STRIKE] = [x['strike'] for x in cp] \n", " option_quotes[nm.EXPIRY_DATE] = [x['dtExpiry'] for x in cp]\n", " option_quotes[nm.OPTION_TYPE] = df_all['Type']\n", " option_quotes[nm.EXERCISE_STYLE] = nm.EURO_EXERCISE\n", " option_quotes[nm.PRICE_BID] = df_all['PBid']\n", " option_quotes[nm.PRICE_ASK] = df_all['PAsk']\n", " option_quotes[nm.TRADE_DATE] = dtTrade\n", " \n", " option_quotes = option_quotes[(option_quotes[nm.STRIKE] > 0) & \\\n", " (option_quotes[nm.PRICE_BID]>0) & \\\n", " (option_quotes[nm.PRICE_ASK]>0)]\n", " \n", " option_quotes[nm.SPOT] = spot\n", " \n", " return option_quotes\n", "\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Example\n", "-------\n", "\n", "In the example below, the file 'SPX-Options-24jan2011.csv' was downloaded from the CBOE web site.\n", "The standardized option quotes data file is saved as a csv file and as a panda data frame.\n", "\n", "File paths are relative to the notebooks folder, so it's important that the notebook browser be\n", "started with the command:\n", "\n", "ipython notebook --pylab inline path-to-the-notebooks-folder \n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "option_data_file = os.path.join('..', 'data', 'SPX-Options-24jan2011.csv')\n", "\n", "df_SPX = read_SPX_file(option_data_file)\n", "print('%d records processed' % len(df_SPX))\n", " \n", "# save a csv file and pickled data frame\n", "df_SPX.to_csv(os.path.join('..', 'data', 'df_SPX_24jan2011.csv'), index=False)\n", "df_SPX.to_pickle(os.path.join('..', 'data', 'df_SPX_24jan2011.pkl'))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Dt Calc: 2011-01-24 Spot: 1290.590000\n", "1472 records processed" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 3 } ], "metadata": {} } ] }