{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Info from the web\n", "\n", "**This notebook goes with [a blog post at Agile*](http://ageo.co/xlines02).**\n", "\n", "We're going to get some info from Wikipedia, and some financial prices from Yahoo Finance. We'll make good use of [the `requests` library](http://docs.python-requests.org/en/master/), a really nicely designed Python library for making web requests in Python.\n", " \n", "## Geological ages from Wikipedia\n", "\n", "We'll start with the Jurassic, then generalize." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "url = \"http://en.wikipedia.org/wiki/Jurassic\" # Line 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I used `View Source` in my browser to figure out where the age range is on the page, and what it looks like. The most predictable spot, that will work on every period's page, is in the infobox. It's given as a range, in italic text, with \"million years ago\" right after it.\n", "\n", "Try to find the same string here." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import requests # I don't count these lines.\n", "r = requests.get(url) # Line 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have the entire text of the webpage, along with some metadata. The text is stored in `r.text`, and I happen to know roughly where the relevant bit of text is: around the 7500th character, give or take:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'>\\n
© Agile Geoscience 2016
\n", "