{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Standardized Option Quotes Data Format\n",
      "======================================\n",
      "\n",
      "To facilitate model calibration, we propose the definition of a standard data set, which contains all the\n",
      "necessary information. The data is held in a [Panda](http://pandas.pydata.org) table of type *option_quotes*, with one row per quote and\n",
      "8 columns. This table is defined in *quantlib/reference/data_structures.py*. The column names are defined in reference/names.py, and are as follows:\n",
      "\n",
      "* TRADE_DATE: Quote date, or time stamp\n",
      "* STRIKE: Ditto\n",
      "* EXPIRY_DATE: Option expiry date\n",
      "* OPTION_TYPE: Call/Put flag, coded as \"C\" or \"P\"\n",
      "* SPOT: Price of underlying asset\n",
      "* EXERCISE_STYLE: European/American, coded as \"Amer\" or \"Euro\"\n",
      "* PRICE_BID: Bid price\n",
      "* PRICE_ASK: Ask price\n",
      "\n",
      "We do not include the dividend yield nor the risk-free rate in the data set: The \n",
      "implied forward price and risk-free rate are estimated from the call/put parity. \n",
      "\n",
      "This notebook demonstrates the creation of such data file by processing the quotes on the S&P 500 index options (SPX) \n",
      "provided by the Chicago Board of Options Exchange (CBOE).\n",
      "\n",
      "\n",
      "Obtaining SPX Option Quotes\n",
      "---------------------------\n",
      "\n",
      "SPX delayed options quotes are published by the [CBOE](http://www.cboe.com/DelayedQuote/QuoteTableDownload.aspx), \n",
      "in a comma-separated format. The file provides:\n",
      "\n",
      "* the value of the underlying index\n",
      "* bid and ask prices for calls and puts, by strike and expiry date. The expiry date can be extracted from the option ticker. \n",
      "* other information, such as volume and open interest, also by strike and option type.\n",
      "\n",
      "SPX Option Data Processing\n",
      "--------------------------\n",
      "\n",
      "We provide below the procedure for converting the raw SPX option data file into the standardized option quotes data format.\n",
      "\n"
     ]
    },
    {
     "cell_type": "raw",
     "metadata": {},
     "source": [
      "### SPX Utility functions\n",
      "\n",
      "These functions parse the SPX option names, and extract expiry date and strike.\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from __future__ import print_function\n",
      "import pandas\n",
      "import datetime\n",
      "import dateutil\n",
      "import re\n",
      "import os\n",
      "import quantlib.reference.names as nm\n",
      "import quantlib.reference.data_structures as ds\n",
      "\n",
      "def ExpiryMonth(s):\n",
      "    \"\"\"\n",
      "    Convert SPX contract months into month number\n",
      "    \"\"\"\n",
      "    call_months = \"ABCDEFGHIJKL\"\n",
      "    put_months = \"MNOPQRSTUVWX\"\n",
      "\n",
      "    try:\n",
      "        m = call_months.index(s)\n",
      "    except ValueError:\n",
      "        m = put_months.index(s)\n",
      "\n",
      "    return m\n",
      "\n",
      "spx_symbol = re.compile(\"\\\\(SPX(1[0-9])([0-9]{2})([A-Z])([0-9]{3,4})-E\\\\)\")\n",
      "\n",
      "def parseSPX(s):\n",
      "    \"\"\"\n",
      "    Parse an SPX quote string, return expiry date and strike\n",
      "    \"\"\"\n",
      "    tokens = spx_symbol.split(s)\n",
      "\n",
      "    if len(tokens) == 1:\n",
      "        return {'dtExpiry': None, 'strike': -1}\n",
      "\n",
      "    year = 2000 + int(tokens[1])\n",
      "    day = int(tokens[2])\n",
      "    month = ExpiryMonth(tokens[3])\n",
      "    strike = float(tokens[4])\n",
      "\n",
      "    dtExpiry = datetime.date(year, month, day)\n",
      "\n",
      "    return ({'dtExpiry': dtExpiry, 'strike': strike})\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Reading the SPX raw data file\n",
      "\n",
      "The csv file downloaded from the CBOE site can be converted into a \n",
      "standard *option_quotes* panda data frame by the following function. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def read_SPX_file(option_data_file):\n",
      "    \"\"\"\n",
      "    Read SPX csv file,\n",
      "    return spot and a data frame of type option_quotes\n",
      "    \"\"\"\n",
      "    \n",
      "    # read two lines for spot price and trade date\n",
      "    with open(option_data_file) as fid:\n",
      "        lineOne = fid.readline()\n",
      "        spot = float(lineOne.split(',')[1])\n",
      "\n",
      "        lineTwo = fid.readline()\n",
      "        dt = lineTwo.split('@')[0]\n",
      "        dtTrade = dateutil.parser.parse(dt).date()\n",
      "\n",
      "        print('Dt Calc: %s Spot: %f' % (dtTrade, spot))\n",
      "    \n",
      "    # read all option price records as a data frame\n",
      "    df = pandas.io.parsers.read_csv(option_data_file, header=0, sep=',', skiprows=[0,1])\n",
      "    \n",
      "    # split and stack calls and puts\n",
      "    \n",
      "    call_df = df[['Calls', 'Bid', 'Ask']]\n",
      "    call_df = call_df.rename(columns={'Calls':'Spec', 'Bid':'PBid', 'Ask': 'PAsk'}) \n",
      "    call_df['Type'] = nm.CALL_OPTION\n",
      "    \n",
      "    put_df = df[['Puts', 'Bid.1', 'Ask.1']]\n",
      "    put_df = put_df.rename(columns = {'Puts':'Spec', 'Bid.1':'PBid',\n",
      "    'Ask.1':'PAsk'}) \n",
      "    put_df['Type'] = nm.PUT_OPTION\n",
      "        \n",
      "    df_all = call_df.append(put_df,  ignore_index=True)\n",
      "    \n",
      "    # parse Calls and Puts columns for strike and contract month\n",
      "    # insert into data frame\n",
      "    \n",
      "    cp = [parseSPX(s) for s in df_all['Spec']]\n",
      "    \n",
      "    option_quotes = ds.option_quotes_template()\n",
      "    option_quotes = option_quotes.reindex(index=range(len(cp)))\n",
      "    \n",
      "    # Fill the option_quotes data frame\n",
      "    \n",
      "    option_quotes[nm.STRIKE] = [x['strike'] for x in cp] \n",
      "    option_quotes[nm.EXPIRY_DATE] = [x['dtExpiry'] for x in cp]\n",
      "    option_quotes[nm.OPTION_TYPE] = df_all['Type']\n",
      "    option_quotes[nm.EXERCISE_STYLE] = nm.EURO_EXERCISE\n",
      "    option_quotes[nm.PRICE_BID] = df_all['PBid']\n",
      "    option_quotes[nm.PRICE_ASK] = df_all['PAsk']\n",
      "    option_quotes[nm.TRADE_DATE] = dtTrade\n",
      "    \n",
      "    option_quotes = option_quotes[(option_quotes[nm.STRIKE] > 0) & \\\n",
      "                    (option_quotes[nm.PRICE_BID]>0) & \\\n",
      "                    (option_quotes[nm.PRICE_ASK]>0)]\n",
      "                    \n",
      "    option_quotes[nm.SPOT] = spot\n",
      "    \n",
      "    return option_quotes\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Example\n",
      "-------\n",
      "\n",
      "In the example below, the file 'SPX-Options-24jan2011.csv' was downloaded from the CBOE web site.\n",
      "The standardized option quotes data file is saved as a csv file and as a panda data frame.\n",
      "\n",
      "File paths are relative to the notebooks folder, so it's important that the notebook browser be\n",
      "started with the command:\n",
      "\n",
      "ipython notebook --pylab inline path-to-the-notebooks-folder \n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "option_data_file = os.path.join('..', 'data', 'SPX-Options-24jan2011.csv')\n",
      "\n",
      "df_SPX = read_SPX_file(option_data_file)\n",
      "print('%d records processed' % len(df_SPX))\n",
      "    \n",
      "# save a csv file and pickled data frame\n",
      "df_SPX.to_csv(os.path.join('..', 'data', 'df_SPX_24jan2011.csv'), index=False)\n",
      "df_SPX.to_pickle(os.path.join('..', 'data', 'df_SPX_24jan2011.pkl'))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Dt Calc: 2011-01-24 Spot: 1290.590000\n",
        "1472 records processed"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n"
       ]
      }
     ],
     "prompt_number": 3
    }
   ],
   "metadata": {}
  }
 ]
}