{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import requests # how python goes onto the internet!\n", "import re # regex\n", "from bs4 import BeautifulSoup # a python HTML parser (version 3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# let's turn to stock prices\n", "# http://finance.google.com/finance?q=aapl" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "symbol = 'aapl'" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Let's grab the raw html from the page\n", "r = requests.get('http://finance.google.com/finance?q='+symbol) # the url of the google finance page goes in here" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "b = BeautifulSoup(r.text, \"html5lib\") # create a beautifulsoup object\n", "# b = BeautifulSoup(r.text, 'html.parser') # try this line instead if you have problems" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "u'\\n\\n \\n \\n \\n Apple Inc.: NASDAQ:AAPL quotes & news - Google Finance\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n
\\n
\\n \\n \\n \\n \\n
\\n \\n \\n \\n \\n \\n \\n \\n \\n Help\\n \\n |\\n \\n Sign in\\n \\n \\n
\\n
\\n
\\n
\\n
\\n
\\n \\n
\\n
\\n
\\n \\n \"Google\\n \\n
\\n
\\n
\\n \\n \\n \\n \\n \\n \\n \\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n \\n
\\n
\\n

\\n Recent Quotes\\n \\n (\\n \\n 30 days\\n \\n )\\n \\n

\\n
\\n You have no recent quotes\\n
\\n
\\n
\\n \\n chg\\n \\n |\\n \\n %\\n \\n
\\n
\\n \\n
\\n
\\n
\\n
\\n
\\n \\n \\n \\n
\\n
\\n
\\n \\n
\\n
\\n
\\n
\\n

\\n Apple Inc.\\n

\\n (Public, NASDAQ:AAPL)\\n \\n Watch this stock\\n \\n
\\n
\\n Find more results for\\n \\n \\n aapl\\n \\n \\n
\\n
\\n
\\n \\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n \\n
\\n
\\n
\\n
\\n
\\n
\\n \\n \\n 108.54\\n \\n \\n
\\n \\n \\n -2.42\\n \\n \\n (-2.18%)\\n \\n \\n
\\n
\\n
\\n After Hours:\\n \\n 108.39\\n \\n \\n -0.15\\n \\n \\n (-0.14%)\\n \\n
\\n Apr 7, 7:59PM EDT\\n
\\n \\n \\n NASDAQ\\nreal-time data -\\n \\n Disclaimer\\n \\n \\n \\n
\\n Currency in USD\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n
\\n Range\\n \\n 108.12 - 110.42\\n
\\n 52 week\\n \\n 92.00 - 134.54\\n
\\n Open\\n \\n 109.95\\n
\\n Vol / Avg.\\n \\n 31.80M/33.01M\\n
\\n Mkt cap\\n \\n 608.11B\\n
\\n P/E\\n \\n 11.53\\n
\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n
\\n Div/yield\\n \\n 0.52/1.92\\n
\\n EPS\\n \\n 9.41\\n
\\n Shares\\n \\n 5.54B\\n
\\n Beta\\n \\n 1.01\\n
\\n Inst. own\\n \\n 59%\\n
\\n
\\n
\\n \\n \\n
\\n
\\n
\\n \\n
\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n

\\n News\\n

\\n
\\n \\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n \\n
\\n
\\n
\\n Advertisement\\n
\\n
\\n
\\n
\\n
\\n
\\n

\\n Events\\n

\\n
\\n \\n
\\n
\\n
\\n
\\n Apr 25, 2016\\n
\\n
\\n Q2 2016 Apple Inc Earnings Release\\n-\\n \\n 4:00PM EDT\\n \\n -\\n \\n \"Add\\n \\n
\\n
\\n
\\n
\\n Mar 9, 2016\\n
\\n
\\n Apple Inc Annual Shareholders Meeting (Estimated)\\n
\\n
\\n
\\n
\\n Feb 26, 2016\\n
\\n
\\n Apple Inc Annual Shareholders Meeting\\n
\\n
\\n
\\n
\\n Jan 26, 2016\\n
\\n
\\n Q1 2016 Apple Inc Earnings Call\\n
\\n
\\n
\\n
\\n Jan 26, 2016\\n
\\n
\\n Q1 2016 Apple Inc Earnings Release\\n
\\n
\\n \\n \\n \\n \\n \\n \\n
\\n \\n More events from DailyFinance \\xbb\\n \\n \\n \\n
\\n
\\n
\\n

\\n Key stats and ratios\\n

\\n
\\n
\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n
\\n \\n Q4 (Dec \\'15)\\n \\n 2015\\n
\\n Net profit margin\\n \\n 24.20%\\n \\n 22.85%\\n
\\n Operating margin\\n \\n 31.86%\\n \\n 30.48%\\n
\\n EBITD margin\\n \\n -\\n \\n 34.97%\\n
\\n Return on average assets\\n \\n 25.23%\\n \\n 20.45%\\n
\\n Return on average equity\\n \\n 59.48%\\n \\n 46.25%\\n
\\n Employees\\n \\n 110,000\\n \\n -\\n
\\n CDP Score\\n \\n -\\n \\n \\n 99 A\\n \\n
\\n \\n
\\n
\\n

\\n Address\\n

\\n
\\n
\\n 1 Infinite Loop\\n
\\n CUPERTINO, CA 95014-2083\\n
\\n United States\\n-\\n \\n Map\\n \\n
\\n +1-408-9961010 (Phone)\\n
\\n
\\n
\\n

\\n Website links\\n

\\n
\\n \\n
\\n

\\n External links\\n

\\n
\\n
\\n
\\n \\n Analyst Estimates\\n \\n -\\n \\n MarketWatch\\n \\n
\\n
\\n \\n SEC Filings\\n \\n -\\n \\n EDGAR Online\\n \\n
\\n
\\n \\n Major Holders\\n \\n -\\n \\n MSN Money\\n \\n
\\n
\\n \\n Research Reports\\n \\n -\\n \\n Reuters\\n \\n
\\n
\\n \\n Transcripts\\n \\n -\\n \\n SeekingAlpha\\n \\n
\\n
\\n \\n About Company\\n \\n -\\n \\n Wikipedia\\n \\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n \\n
\\n
\\n \\n Volume delayed by 15 mins.\\n
\\n Prices are not from all markets.\\n
\\n Sources include SIX.\\n
\\n
\\n
\\n
\\n
\\n \\n \\n
\\n
\\n
\\n
\\n
\\n
\\n Advertisement\\n
\\n
\\n
\\n
\\n
\\n
\\n \\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n

\\n Description\\n

\\n
\\n
\\n
\\n
\\n
\\n
\\n Apple Inc. (Apple) designs, manufactures and markets mobile communication and media devices, personal computers, and portable digital music players, and a variety of related software, services, peripherals, networking solutions, and third-party digital content and applications. The Company\\'s products and services include iPhone, iPad, Mac, iPod, Apple TV, a portfolio of consumer and professional software applications, the iOS and OS X operating systems, iCloud, and a variety of accessory, service and support offerings. The Company also delivers digital content and applications through the iTunes Store, App StoreSM, iBookstoreSM, and Mac App Store. The Company distributes its products worldwide through its retail stores, online stores, and direct sales force, as well as through third-party cellular network carriers, wholesalers, retailers, and value-added resellers. In February 2012, the Company acquired app-search engine Chomp.\\n \\n
\\n
\\n
\\n
\\n \\n
\\n
\\n
\\n
\\n

\\n Officers and directors\\n

\\n
\\n
\\n
\\n
\\n
\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n
\\n Art D. Levinson Ph.D.\\n \\n \\n Independent Chairman of the Board\\n
\\n \\n \\n \\n Age: 65\\n
\\n \\n Bio\\xa0&\\xa0Compensation\\n \\n -\\n \\n Reuters\\n \\n
\\n
\\n Timothy D. Cook\\n \\n \\n Chief Executive Officer, Director\\n
\\n \\n \\n \\n Age: 55\\n
\\n \\n Bio\\xa0&\\xa0Compensation\\n \\n -\\n \\n Reuters\\n \\n
\\n \\n Trading\\xa0Activity\\n \\n -\\n \\n Yahoo\\xa0Finance\\n \\n
\\n
\\n Luca Maestri\\n \\n \\n Chief Financial Officer, Senior Vice President, Principal Accounting Officer\\n
\\n \\n \\n \\n Age: 52\\n
\\n \\n Bio\\xa0&\\xa0Compensation\\n \\n -\\n \\n Reuters\\n \\n
\\n
\\n Jeff Williams\\n \\n \\n Chief Operating Officer\\n
\\n \\n \\n \\n Age: 52\\n
\\n \\n Bio\\xa0&\\xa0Compensation\\n \\n -\\n \\n Reuters\\n \\n
\\n
\\n .. Bruce Sewell\\n \\n \\n Senior Vice President, General Counsel, Secretary\\n
\\n \\n \\n \\n Age: 57\\n
\\n \\n Bio\\xa0&\\xa0Compensation\\n \\n -\\n \\n Reuters\\n \\n
\\n
\\n Phil Schiller\\n \\n \\n Senior Vice President - Worldwide Marketing\\n
\\n \\n \\n \\n Age: 55\\n
\\n \\n Bio\\xa0&\\xa0Compensation\\n \\n -\\n \\n Reuters\\n \\n
\\n
\\n Angela J. Ahrendts\\n \\n \\n Senior Vice President - Retail and Online Stores\\n
\\n \\n \\n \\n Age: 55\\n
\\n \\n Bio\\xa0&\\xa0Compensation\\n \\n -\\n \\n Reuters\\n \\n
\\n
\\n Eddy Cue\\n \\n \\n Senior Vice President - Internet Software and Services\\n
\\n \\n \\n \\n Age: 51\\n
\\n \\n Bio\\xa0&\\xa0Compensation\\n \\n -\\n \\n Reuters\\n \\n
\\n
\\n Craig Federighi\\n \\n \\n Senior Vice President - Software Engineering\\n
\\n \\n \\n \\n Age: 46\\n
\\n \\n Bio\\xa0&\\xa0Compensation\\n \\n -\\n \\n Reuters\\n \\n
\\n
\\n Dan Riccio\\n \\n \\n Senior Vice President - Hardware Engineering\\n
\\n \\n \\n \\n Age: 53\\n
\\n \\n Bio\\xa0&\\xa0Compensation\\n \\n -\\n \\n Reuters\\n \\n
\\n
\\n \\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n \\n \\n \\n
\\n \\n
\\n
\\n \\n \\n'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.prettify() # will print the html nicely" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[,\n", " ,\n", " ,\n", " ,\n", " ,\n", " (30 days),\n", " chg,\n", " %,\n", " \\n109.38\\n,\n", " 109.38,\n", " -0.10\\n(-0.09%)\\n,\n", " -0.10,\n", " (-0.09%),\n", " 109.38,\n", " 0.00,\n", " (0.00%),\n", " NASDAQ\\nreal-time data -\\nDisclaimer\\n,\n", " 4:00PM EDT,\n", " Settings,\n", " Technicals,\n", " \\n\"Link\\n\\nLink to this view\\n,\n", " \\nLink to this view,\n", " Volume delayed by 15 mins.
Prices are not from all markets.
Sources include SIX.
\\n


,\n", " Add or remove columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# find all span tags\n", "b.findAll('span')\n", "# b.find_all('span') for bs4 users" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# the pattern we recognized from the website\n", "re_tag = re.compile(\"ref_\\d+_l\") " ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#this tag finds the tag with the price in it!!!!\n", "span_tag = b.find('span', attrs={'id': re_tag}) \n", "# use find to return THE ONE AND ONLY span tags with an id that matches our regex\n", "# use findAll to find all matches" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "u'109.38'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quote = span_tag.text\n", "quote" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": true }, "outputs": [], "source": [ "##### EXERCISE #####\n", "# Make a method get_stock_price that takes in ANY stock ticker and grabs the current price\n", "# If the stock ticker doesn't exist, return -1\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def get_stock_price(ticker):\n", " response = requests.get(\"http://google.com/finance?q=\"+ticker)\n", " parser = BeautifulSoup(response.text, \"html.parser\")\n", " pattern = re.compile(\"ref_\\d+_l\")\n", " span_tag = parser.find(\"span\", attrs={\"id\":pattern})\n", " if span_tag:\n", " return span_tag.text\n", " else:\n", " return -1\n", " \n", " \n", " " ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "31.19\n", "217.96\n" ] } ], "source": [ "for ticker in ['ge', 'spy']:\n", " print get_stock_price(ticker)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": true }, "outputs": [], "source": [ "###### UFO #######" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [], "source": [ "r = requests.get(\"http://www.nuforc.org/webreports/ndxe201608.html\")\n", "b = BeautifulSoup(r.text, 'html.parser')" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8/2/16 00:45\n", "8/2/16 00:45\n", "8/2/16 00:45\n", "Alexandria (UK/Scotland)\n", "Alexandria (UK/Scotland)\n", "\n", "\n", "\n", "Light\n", "Light\n", "2 minutes\n", "2 minutes\n", "Myself and my partner where out smoking when we both seen a light (the same size as a star but brighter) travelling at great speed (abo\n", "Myself and my partner where out smoking when we both seen a light (the same size as a star but brighter) travelling at great speed (abo\n", "8/2/16\n", "8/2/16\n" ] } ], "source": [ "# Let's take a look at the first sighting\n", "for tr in b.findAll('tr', attrs = {'valign':'TOP'})[:1]:\n", " # the findChildren method returns all children underneath it\n", " for child in tr.findChildren():\n", " print child.text" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# OK, it's a bit messy, Let's clean it up\n", "# Looks like the first element is the date, the 4th is the city, 6th if state, 8th is shape (this ones blank)\n", "# 13th is the summary" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{'City': [u'Alexandria (UK/Scotland)',\n", " u'Hiltons',\n", " u'Port Colbourne (Ontario)(Canada)',\n", " u'Zurich (Switzerland)',\n", " u'Shoreline',\n", " u'Shelbyville',\n", " u'Colorado Springs',\n", " u'Omaha',\n", " u'Jacksonville',\n", " u'Lakewood'],\n", " 'Date': [u'8/2/16 00:45',\n", " u'8/1/16 22:45',\n", " u'8/1/16 22:30',\n", " u'8/1/16 22:00',\n", " u'8/1/16 21:45',\n", " u'8/1/16 21:40',\n", " u'8/1/16 21:30',\n", " u'8/1/16 21:27',\n", " u'8/1/16 20:45',\n", " u'8/1/16 03:25'],\n", " 'Shape': [u'',\n", " u'Formation',\n", " u'Light',\n", " u'',\n", " u'Unknown',\n", " u'Light',\n", " u'Formation',\n", " u'Circle',\n", " u'Changing',\n", " u'Light'],\n", " 'State': [u'', u'VA', u'ON', u'', u'WA', u'IN', u'CO', u'NE', u'FL', u'CO'],\n", " 'Summary': [u'Myself and my partner where out smoking when we both seen a light (the same size as a star but brighter) travelling at great speed (abo',\n", " u'Two slow moving strobing lights moving steady in a western direction.',\n", " u'LOTS OF ACTIVITY OVER LAKE ERIE DURING FIREWORKS ON CIVIC HOLIDAY.',\n", " u'Four floating lights, hovered about 500m above ground then disappeared. ((anonymous report))',\n", " u'I noticed a light way up in the sky, and quickly realized a second light following. ((anonymous report))',\n", " u\"Orange orb gliding above tree line from east to west. It blinked it's light off then on 3-4 times and dropped a faint silver object.\",\n", " u'5 colored lights moving in sky east to west.',\n", " u'Black ufo spotted on a late night drive.',\n", " u'Three bright reds lights crossing the sky, and straight up and out of site.',\n", " u'Sky cracked open.']}" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ufo_sightings = {\n", " 'Date':[],\n", " 'City':[],\n", " 'State':[],\n", " 'Shape':[],\n", " 'Summary':[]\n", " }\n", "\n", "for tr in b.findAll('tr', attrs = {'valign':'TOP'}):\n", " # the findChildren method returns all children underneath it\n", " ufo_sighting_info = []\n", " for child in tr.findChildren():\n", " ufo_sighting_info.append(child.text)\n", " ufo_sightings['Date'].append(ufo_sighting_info[0])\n", " ufo_sightings['City'].append(ufo_sighting_info[3])\n", " ufo_sightings['State'].append(ufo_sighting_info[5])\n", " ufo_sightings['Shape'].append(ufo_sighting_info[7])\n", " ufo_sightings['Summary'].append(ufo_sighting_info[12])\n", "\n", "ufo_sightings" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CityDateShapeStateSummary
0Alexandria (UK/Scotland)8/2/16 00:45Myself and my partner where out smoking when w...
1Hiltons8/1/16 22:45FormationVATwo slow moving strobing lights moving steady ...
2Port Colbourne (Ontario)(Canada)8/1/16 22:30LightONLOTS OF ACTIVITY OVER LAKE ERIE DURING FIREWOR...
3Zurich (Switzerland)8/1/16 22:00Four floating lights, hovered about 500m above...
4Shoreline8/1/16 21:45UnknownWAI noticed a light way up in the sky, and quick...
5Shelbyville8/1/16 21:40LightINOrange orb gliding above tree line from east t...
6Colorado Springs8/1/16 21:30FormationCO5 colored lights moving in sky east to west.
7Omaha8/1/16 21:27CircleNEBlack ufo spotted on a late night drive.
8Jacksonville8/1/16 20:45ChangingFLThree bright reds lights crossing the sky, and...
9Lakewood8/1/16 03:25LightCOSky cracked open.
\n", "
" ], "text/plain": [ " City Date Shape State \\\n", "0 Alexandria (UK/Scotland) 8/2/16 00:45 \n", "1 Hiltons 8/1/16 22:45 Formation VA \n", "2 Port Colbourne (Ontario)(Canada) 8/1/16 22:30 Light ON \n", "3 Zurich (Switzerland) 8/1/16 22:00 \n", "4 Shoreline 8/1/16 21:45 Unknown WA \n", "5 Shelbyville 8/1/16 21:40 Light IN \n", "6 Colorado Springs 8/1/16 21:30 Formation CO \n", "7 Omaha 8/1/16 21:27 Circle NE \n", "8 Jacksonville 8/1/16 20:45 Changing FL \n", "9 Lakewood 8/1/16 03:25 Light CO \n", "\n", " Summary \n", "0 Myself and my partner where out smoking when w... \n", "1 Two slow moving strobing lights moving steady ... \n", "2 LOTS OF ACTIVITY OVER LAKE ERIE DURING FIREWOR... \n", "3 Four floating lights, hovered about 500m above... \n", "4 I noticed a light way up in the sky, and quick... \n", "5 Orange orb gliding above tree line from east t... \n", "6 5 colored lights moving in sky east to west. \n", "7 Black ufo spotted on a late night drive. \n", "8 Three bright reds lights crossing the sky, and... \n", "9 Sky cracked open. " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(ufo_sightings) # MAGIC" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# SOME MORE EXAMPLES" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# NY TIMES ARTICLES ON HOME PAGE" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Release of Code Raises Fears That the N.S.A. Was Hacked\n", "Early Voting Leaves Trump With Less Time to Catch Up\n", "Ousted Fox News Chief Ailes Advising Trump for Debates\n", "Clinton Warns Supporters to Avoid Complacency 8:42 PM ET\n", "F.B.I. Gives Congress Documents Related to Clinton E-Mail Case 5:38 PM ET\n", " Podcast: Why They Don’t Trust Her\n", "Death Toll Rises as Louisiana Faces a Flood ‘Disaster’\n", "For a Glimpse of Climate Change, Look to the South \n", "\n", " Deadly Floods \n", "Hand in Hand: Did German Twins’ Finish Cross a Line?\n", "\n", " Meet the Biles Family \n", "Rio Today: Simone Biles Soars to Fourth Gold Medal\n", "Your Evening Briefing\n", "18 Recipes to Replace That Sad Desk Lunch\n", "How to Get the Most out of Visiting World-Famous Sites\n", "John McLaughlin, Scrappy Political Pundit, Dies at 89\n", "Univision Is Said to Buy Gawker for $135 Million\n", "Evictions by Armed Men Rattle Mexican Tourist Town\n", "In August, City Council Slows but It Does Not Stop\n", "Russia Uses Iran Base to Raid Syria, Expanding Its Reach 7:43 PM ET\n", "Bus of Nepal Quake Survivors Falls Off Road, Killing 27 5:39 PM ET\n", "Pennsylvania’s Top Prosecutor Resigns After Conviction 8:30 PM ET\n", "El Chapo’s Son Is Kidnapped From a Party in Mexico 10:05 PM ET\n", "University of California, Berkeley’s Chancellor Resigns 10:09 PM ET\n", "\n", "In This ‘Groundhog Day,’ a Strikingly Fresh Déjà Vu\n", "\n", "\n", "Q. and A.: Anthony Weiner Is Keeping Busy\n", "\n", "\n", "Where Germany (but Not German) Inspired Twain\n", "\n", "Obamacare Will Survive Aetna’s Retreat\n", "Editorial: Mr. Trump’s Foreign Policy Confusions \n", "Room for Debate: How Do Olympians Stay So Driven? \n", "Op-Ed: The Demise of a Prison Lord \n", "Join us on Facebook » \n", "The Umbrella Movement Fights Back\n", "Taking Note: Why Donald Trump’s Test for Immigrants Won’t Work \n", "Op-Ed: How Do Trump’s Conspiracy Theories Go Over in the Middle East? Dangerously. \n", "\n", "Brazil Bureau Chief’s Notebook: When the Olympics Media Circus Comes to Town\n", "\n", "\n", "I Tracked and Tried to Outsmart ‘Hamilton’ Scalpers — With 341 Lines of Code\n", "\n", "\n", "Brazil Bureau Chief’s Notebook: When the Olympics Media Circus Comes to Town\n", "\n", "\n", "Play Today’s Puzzle \n", "\n", "\n", "Play Today’s Puzzle \n", "\n", "\n", "Come On Along With Us\n", "\n", "341 Lines of Code to Track ‘Hamilton’ Scalpers\n", "Op-Ed: Get Out of Gun Control, Apple\n", "Taking Summer School to Get Ahead, Not Catch Up\n", "MTV Classic: Millennial Nostalgia, Only on TV\n", "Peter Thiel: Privacy Issues Won’t End With Gawker\n", "Caleb Carr’s New Thriller Takes On Fancy Forensics\n", "The Way to Get Rid of Racism? Redefine It.\n", "How Do Olympians Stay So Driven?\n", "Guantánamo Transfer Is Largest of the Obama Era\n", "How the Arab World Came Apart\n", "Op-Ed: The Umbrella Movement Fights Back\n", "The Next Steps for Marijuana in Canada\n", "\n", "\n", " Tulum Journal: Evictions by Armed Men Rattle a Mexican Tourist Paradise \n", "\n", "\n", "\n", " Son of El Chapo Is Kidnapped at Gunpoint From a Party in Mexico \n", "\n", "\n", "\n", " Deal Professor: Tech Giants Gobble Start-Ups in an Antitrust Blind Spot \n", "\n", "\n", "\n", " Ford Promises Fleets of Driverless Cars Within Five Years \n", "\n", "\n", "\n", " Editorial: Obamacare Will Survive Aetna’s Retreat \n", "\n", "\n", "\n", " Roger Cohen: Brazil’s Uplifting Olympics \n", "\n", "\n", "\n", " As Louisiana Floodwaters Recede, the Scope of Disaster Comes Into View \n", "\n", "\n", "\n", " Pennsylvania Attorney General Quits on Heels of Perjury Conviction \n", "\n", "\n", "\n", " Deal Professor: Tech Giants Gobble Start-Ups in an Antitrust Blind Spot \n", "\n", "\n", "\n", " China Launches Quantum Satellite in Bid to Pioneer Secure Communications \n", "\n", "\n", "\n", " Brantley in Britain: Review: ‘Groundhog Day,’ All Over Again, Now With Song and Dance \n", "\n", "\n", "\n", " Books of The Times: Review: ‘The Fire This Time,’ Stoked by Baldwin’s Legacy \n", "\n", "\n", "\n", " Political Memo: Martha’s Vineyard Longs for a President Who R.S.V.P.s ‘Yes’ \n", "\n", "\n", "\n", " Trump Casinos’ Tax Debt Was $30 Million. Then Christie Took Office. \n", "\n", "\n", "\n", " Sharing a Life, but Not a Worldview: Readers React to Trump-Clinton Couples \n", "\n", "\n", "\n", " Pippa Middleton Goes Out on Her Own. But She’s Not Alone. \n", "\n", "\n", "\n", " 8 Gold-Worthy Olympics Movies \n", "\n", "\n", "\n", " Review: ‘When Two Worlds Collide’ Portrays a Battle for the Amazon \n", "\n", "\n", "\n", " First-Degree Murder Charge Added in Killing of Queens Imam and Aide \n", "\n", "\n", "\n", " Searching for a Great White Shark, in the Waters Off Long Island \n", "\n", "\n", "\n", " On Olympics: The Anatomy of a Dive Across the Finish Line \n", "\n", "\n", "\n", " Rio 2016: Rio Olympics Today: Simone Biles Soars to Fourth Gold Medal \n", "\n", "\n", "\n", " Critic’s Notebook: Finding Some Treasures at FringeNYC \n", "\n", "\n", "\n", " Brantley in Britain: Summertime, and the Revivals in London are Breezy \n", "\n", "\n", "\n", " The Psychiatric Question: Is It Fair to Analyze Donald Trump From Afar? \n", "\n", "\n", "\n", " China Launches Quantum Satellite in Bid to Pioneer Secure Communications \n", "\n", "\n", "\n", " Bobby Hutcherson, Vibraphonist With Coloristic Range of Sound, Dies at 75 \n", "\n", "\n", "\n", " Joel Cornette, Star Basketball Player for Butler, Dies at 35 \n", "\n", "\n", "\n", " After His Show Is Canceled, Larry Wilmore Thanks Fans and Comedy Central \n", "\n", "\n", "\n", " The Strange Business of Scoring ‘Stranger Things’ \n", "\n", "\n", "\n", " Osteoporosis, a Disease With Few Treatment Options, May Soon Have One More \n", "\n", "\n", "\n", " The Psychiatric Question: Is It Fair to Analyze Donald Trump From Afar? \n", "\n", "\n", "\n", " Pursuits: Uncovering Gay History in San Francisco \n", "\n", "\n", "\n", " 36 Hours: 36 Hours in Minneapolis \n", "\n", "\n", "\n", " Books of The Times: From Bare Knuckles to Idealism in ‘Bobby Kennedy: The Making of a Liberal Icon’ \n", "\n", "\n", "\n", " Egos: ‘Inside My Head I’m a Girl’: Three Ways of Growing Up Gay \n", "\n", "\n", "\n", " Your Money Adviser: Earlier Date for Filing Fafsa Form for College Aid \n", "\n", "\n", "\n", " Employees Sue Four More Universities Over Retirement Plan Fees \n", "\n", "\n", "\n", " Recipe Lab: Building a Better Vegetable Gratin \n", "\n", "\n", "\n", " The Movement to Define Native American Cuisine \n", "\n", "\n", "\n", " Editorial: Donald Trump Courts the Gun Zealots \n", "\n", "\n", "\n", " Frank Bruni: To Get to Harvard, Go to Haiti? \n", "\n", "\n", "\n", " An Island in Maine, Four Sisters, Four Houses \n", "\n", "\n", "\n", " What's Selling Now: Homes That Sold for $700,000 to $899,000 \n", "\n", "\n", "\n", " Who Will Be President? \n", "\n", "\n", "\n", " Public Safety: Is Terrorism Getting Worse? In the West, Yes. In the World, No. \n", "\n", "\n", "\n", " First Words: The Easiest Way to Get Rid of Racism? Just Redefine It. \n", "\n", "\n", "\n", " On Sports: How Do You Tell a Better Story in Sports? \n", "\n", "\n", "\n", " Driven: Video Review: The Ferrari 488 GTB Is an Operatic Thrill \n", "\n", "\n", "\n", " New Rules Require Heavy-Duty Trucks to Reduce Emissions by 25% Over the Next Decade \n", "\n", "\n", "\n", " A Brooklyn Cocktail Bar Dedicated to One Unexpected Ingredient \n", "\n", "\n", "\n", " Three's a Trend: Fashion, Food and Fitness Greats Give Back to Rio \n", "\n", "\n", "\n", " Cooking Club: The Gray Ladle: Summer Produce \n", "\n", "\n", "\n", " Brazil Bureau Chief’s Notebook: When the Olympics Media Circus Comes to Town \n", "\n", "Brown Water in a Rental\n", "Search for Homes for Sale or Rent\n", "Sell Your Home\n" ] } ], "source": [ "response = requests.get(\"http://www.nytimes.com/\")\n", "parser = BeautifulSoup(response.text, \"html.parser\")\n", "for story in parser.findAll(\"h2\", attrs={'class':'story-heading'}):\n", " print story.text" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# WIKIPEDIA FEATURED ARTICLE\n", "response = requests.get(\"https://en.wikipedia.org/wiki/Wikipedia:Today%27s_featured_article\")\n", "parser = BeautifulSoup(response.text, \"html.parser\")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "u'\\n\\n\\nToday\\'s featured article\\n\\nAt the top of the Main Page, a summarized lead section from one of Wikipedia\\'s featured articles is displayed as \"Today\\'s featured article\" (TFA). The current month\\'s queue can be found here. TFAs are scheduled by the TFA coordinators, Brianboulton (Brian), Crisco 1492 (Chris) and Dank (Dan). Community discussion of suggestions takes place at the TFA requests page.\\nIf you notice an error in a future TFA summary, you\\'re welcome to fix it yourself, but if the mistake is in today\\'s or tomorrow\\'s summary, you can leave a message at WP:ERRORS to ask an administrator to fix it. The summaries are formatted as a single paragraph of around 1,150 characters (including spaces), with no reference tags or alternative names. Only the link to the specified featured article is bolded, and this must be the first link. The summary should be preceded by an appropriate image when available; fair use images are not allowed.\\nThe editnotice template for Today\\'s Featured Article is {{TFA-editnotice}}. It is automatically applied by {{Editnotices/Namespace/Main}} when the article\\'s title matches the contents of {{TFA title}}. To contact the TFA coordinators, please leave a message on the TFA talk page, or type \"{{@TFA}}\" in a signed comment on any talk page.\\n\\n\\n\\nShortcuts:\\n\\nWP:TFA\\nWP:TOFA\\n\\n\\n\\n\\nv\\nt\\ne\\n\\n\\nFeatured content:\\n\\nFeatured articles \\u2190\\nFeatured lists\\nFeatured pictures\\nFeatured portals\\nFeatured topics\\n\\nToday\\'s featured article (TFA):\\n\\nThis month\\'s queue\\nRecent TFAs and statistics\\nCurrent TFA requests\\nPotential TFA requests\\nTFA oddities\\nMost viewed TFAs\\nFeatured articles yet to appear as TFA\\n\\nFeatured article tools:\\n\\nFeatured article criteria\\nFeatured article candidates\\nFeatured article review\\nFeatured article log\\nFeatured article statistics\\nRandom featured article\\nFormer featured articles\\n\\n\\n\\n\\n\\n\\n\\nToday\\'s featured article archive\\n\\n2004\\n2005\\n2006\\n2007\\n2008\\n2009\\n2010\\n2011\\n2012\\n2013\\n2014\\n2015\\n2016\\n\\n\\nJanuary\\nFebruary\\nMarch\\nApril\\nMay\\nJune\\nJuly\\nAugust\\nSeptember\\nOctober\\nNovember\\nDecember\\n\\n\\nPurge cache for this page \\xb7 Purge the main page cache\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nToday\\'s featured article\\n\\n\\n\\n\\n\\nBanksia scabrella, commonly known as the Burma Road banksia, is a species of woody shrub in the genus Banksia. It is classified in the series Abietinae, a group of several species of shrubs with small round or oval flower spikes. It occurs in several isolated populations south of Geraldton, Western Australia; the largest is south and east of Mount Adams. Found on sandy soils in heathland or shrubland, it grows to 2\\xa0m (7\\xa0ft) high and 3\\xa0m (10\\xa0ft) across with fine needle-like leaves. Appearing in spring and summer, the flower spikes are tan to cream with purple styles. B. scabrella is killed by fire and regenerates by seed. Originally collected in 1966, it was one of several species previously considered to be forms of Banksia sphaerocarpa, before it was finally described by banksia expert Alex George in his 1981 revision of the genus. Like many members of the Abietinae, it is rarely seen in cultivation, but has been described as having horticultural potential. (Full\\xa0article...)\\nRecently featured:\\n\\n\\n24th Waffen Mountain Division of the SS Karstj\\xe4ger\\nThe Seduction of Ingmar Bergman\\nMeteorological history of Hurricane Dean\\n\\n\\n\\n\\n\\nArchive\\nBy email\\nMore featured articles...\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nTomorrow\\'s featured article\\n\\n\\n\\n\\n\\n\\n\\nHMS Formidable was an Illustrious-class aircraft carrier ordered for the Royal Navy before World War II. Transferred to the Mediterranean Fleet as a replacement for the crippled sister ship Illustrious, Formidable\\'s aircraft played a key role in the Battle of Cape Matapan in early 1941, then provided cover for Allied ships and attacked Axis forces until the carrier was badly damaged by German dive bombers in May. Assigned to the Eastern Fleet in the Indian Ocean in early 1942, the carrier covered the invasion of Diego Suarez in Vichy Madagascar in mid-1942 against the possibility of a sortie by the Japanese into the Indian Ocean. The ship participated in Operation Torch, the invasion of French North Africa, in November, and covered the invasions of Sicily and mainland Italy in 1943. Formidable made several attacks on the German battleship\\xa0Tirpitz in Norway with the Home Fleet in mid-1944, and in 1945 attacked targets in the Japanese Home Islands. After repatriating liberated Allied prisoners of war and soldiers and ferrying British personnel across the globe, the ship was placed in reserve, and finally sold for scrap in 1953. (Full\\xa0article...)\\nRecently featured:\\n\\n\\nBanksia scabrella\\n24th Waffen Mountain Division of the SS Karstj\\xe4ger\\nThe Seduction of Ingmar Bergman\\n\\n\\n\\n\\n\\nArchive\\nBy email\\nMore featured articles...\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n'" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parser.find('div', attrs={'id':'mw-content-text'}).text" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [sfdat26-env]", "language": "python", "name": "Python [sfdat26-env]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 0 }