{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# Info from the web\n",
    "\n",
    "**This notebook goes with [a blog post at Agile*](http://ageo.co/xlines02).**\n",
    "\n",
    "We're going to get some info from Wikipedia, and some financial prices from Yahoo Finance. We'll make good use of [the `requests` library](http://docs.python-requests.org/en/master/), a really nicely designed Python library for making web requests in Python.\n",
    "    \n",
    "## Geological ages from Wikipedia\n",
    "\n",
    "We'll start with the Jurassic, then generalize."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "url = \"http://en.wikipedia.org/wiki/Jurassic\"  # Line 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I used `View Source` in my browser to figure out where the age range is on the page, and what it looks like. The most predictable spot, that will work on every period's page, is in the infobox. It's given as a range, in italic text, with \"million years ago\" right after it.\n",
    "\n",
    "Try to find the same string here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import requests  # I don't count these lines.\n",
    "r = requests.get(url)  # Line 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we have the entire text of the webpage, along with some metadata. The text is stored in `r.text`, and I happen to know roughly where the relevant bit of text is: around the 7500th character, give or take:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'>\\n<td colspan=\"2\" style=\"text-align:center;background-color:rgb(0,187,231)\"><b>Jurassic Period</b><br />\\n<i>201.3–145&#160;million years ago</i><br />\\n<div id=\"Timeline-row\" style=\"margin: 4px auto 0;'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "r.text[7400:7600]  # I don't count these lines either."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can get at that bit of text using a [regular expression](https://docs.python.org/2/library/re.html):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'201.3–145&#160;million years ago'"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import re\n",
    "\n",
    "s = re.search(r'<i>(.+?million years ago)</i>', r.text)\n",
    "text = s.group(1)\n",
    "text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And if we're really cunning, we can get the start and end ages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "According to Wikipedia, the Jurassic lasted 56.30 Ma.\n"
     ]
    }
   ],
   "source": [
    "start, end = re.search(r'<i>([\\.0-9]+)–([\\.0-9]+)&#160;million years ago</i>', r.text).groups()  # Line 3\n",
    "duration = float(start) - float(end)  # Line 4\n",
    "\n",
    "print(\"According to Wikipedia, the Jurassic lasted {:.2f} Ma.\".format(duration))  # Line 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An exercise for you, dear reader: Make a function to get the start and end ages of *any* geologic period, taking the name of the period as an argument. I have left some hints."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def get_age(period):\n",
    "    url =   # Make a URL out of a base URL and the period name\n",
    "    r =     # Make the request.\n",
    "    start, end =   # Provide the regex.\n",
    "    return float(start), float(end)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should be able to call your function like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "period = \"Jurassic\"\n",
    "get_age(period)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can make a function that makes the sentence we made before, calling the function you just wrote:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def duration(period):\n",
    "    t0, t1 = get_age(period)\n",
    "    duration = t0 - t1\n",
    "    response = \"According to Wikipedia, the {0} lasted {1:.2f} Ma.\".format(period, duration)\n",
    "    return response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "duration('Cretaceous')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Natural gas prices from Yahoo Finance\n",
    "\n",
    "[Here is an explanation](http://mdbitz.com/2010/02/02/understanding-yahoo-finance-stock-quotes-and-sl1d1t1c1ohgv/) of how to form Yahoo Finance queries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "url = \"http://download.finance.yahoo.com/d/quotes.csv\"  # Line 6\n",
    "params = {'s': 'HHG17.NYM', 'f': 'l1'}  # Line 7\n",
    "\n",
    "r = requests.get(url, params=params)  # Line 8"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Henry Hub price for Feb 2017: $2.86\n"
     ]
    }
   ],
   "source": [
    "price = float(r.text)  # Line 9\n",
    "\n",
    "print(\"Henry Hub price for Feb 2017: ${:.2f}\".format(price))  # Line 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The symbol `s` we're passing is `HHF17.NYM`. This means:\n",
    "\n",
    "The ticker symbols we're passing look like XXMYY.NYM, with components as follows:\n",
    "\n",
    "* `XX` — commodity 'benchmark' symbol, as explained below.\n",
    "* `M` — a month code, symbolizing January to December: `[F,G,H,J,K,M,N,Q,U,V,X,Z]`\n",
    "* `YY` — a two-digit year, like 17.\n",
    "* `.NYM` — the Nymex symbol.\n",
    "\n",
    "Benchmarks that seem to work with this service:\n",
    "\n",
    "* `CL` — West Texas Intermediate or WTI, light sweet crude\n",
    "* `BB` — Brent crude penultimate financial futures\n",
    "* `BZ` — Brent look-alike crude oil futures\n",
    "* `MB` — Gulf Coast Sour Crude\n",
    "* `RE` — Russian Export Blend Crude Oil (REBCO) futures\n",
    "\n",
    "Gas spot prices that work:\n",
    "\n",
    "* `NG` — Henry Hub physical futures\n",
    "* `HH` — Henry Hub last day financial futures\n",
    "\n",
    "Symbols that don't work:\n",
    "\n",
    "* `DC` — Dubai crude calendar futures\n",
    "* `WCC` — Canadian Heavy (differential, cf CL)\n",
    "* `WCE` — Western Canadian select crude oil futures (differential, cf CL) — but this seems to be the same price as WCC, which can't be right\n",
    "* `LN` — European options"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As an exercise, write a function to get the futures price for a given benchmark, based on the contract price 90 days from 'now', whenever now is."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "def get_symbol(benchmark):\n",
    "    \n",
    "    # I'll help you with the time.\n",
    "    # We compute a time 45 days in the future for a price\n",
    "    future = time.gmtime(time.time() + 90*24*60*60)\n",
    "    month = future.tm_mon\n",
    "    year = future.tm_year\n",
    "    month_codes = ['F', 'G', 'H', 'J', 'K', 'M', 'N', 'Q', 'U', 'V', 'X', 'Z']\n",
    "\n",
    "    # This is where you come in.\n",
    "    month =  #### Get the appropriate code for the month.\n",
    "    year =   #### Make a string for the year.\n",
    "\n",
    "    return benchmark + month + year + \".NYM\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This should work:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "get_symbol('CL')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "<hr />\n",
    "\n",
    "<div>\n",
    "<img src=\"https://avatars1.githubusercontent.com/u/1692321?s=50\"><p style=\"text-align:center\">© Agile Geoscience 2016</p>\n",
    "</div>"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
   "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}