{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# EventVestor: Contract Wins\n", "\n", "In this notebook, we'll take a look at EventVestor's *Contract Wins* dataset, available on the [Quantopian Store](https://www.quantopian.com/store). This dataset spans January 01, 2007 through the current day, and documents major contract wins by companies.\n", "\n", "### Blaze\n", "Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as [Blaze](http://blaze.pydata.org). Blaze provides the Quantopian user with a convenient interface to access very large datasets.\n", "\n", "Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.\n", "\n", "It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.\n", "\n", "Helpful links:\n", "* [Query building for Blaze](http://blaze.pydata.org/en/latest/queries.html)\n", "* [Pandas-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-pandas.html)\n", "* [SQL-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-sql.html).\n", "\n", "Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:\n", "> `from odo import odo` \n", "> `odo(expr, pandas.DataFrame)`\n", "\n", "### Free samples and limits\n", "One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.\n", "\n", "There is a *free* version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.\n", "\n", "With preamble in place, let's get started:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# import the dataset\n", "from quantopian.interactive.data.eventvestor import contract_win\n", "# or if you want to import the free dataset, use:\n", "# from quantopian.data.eventvestor import contract_win_free\n", "\n", "# import data operations\n", "from odo import odo\n", "# import other libraries we will use\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "dshape(\"\"\"var * {\n", " event_id: ?float64,\n", " asof_date: datetime,\n", " trade_date: ?datetime,\n", " symbol: ?string,\n", " event_type: ?string,\n", " event_headline: ?string,\n", " contract_amount: ?float64,\n", " amount_units: ?string,\n", " contract_entity: ?string,\n", " event_rating: ?float64,\n", " timestamp: datetime,\n", " sid: ?int64\n", " }\"\"\")" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Let's use blaze to understand the data a bit using Blaze dshape()\n", "contract_win.dshape" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "4135" ], "text/plain": [ "4135" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# And how many rows are there?\n", "# N.B. we're using a Blaze function to do this, not len()\n", "contract_win.count()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
event_idasof_datetrade_datesymbolevent_typeevent_headlinecontract_amountamount_unitscontract_entityevent_ratingtimestampsid
09074712007-01-032007-01-03CECEContract WinCECO Environmental Gets Two Orders for $55M Plus55.0$MNaN12007-01-041396
11488872007-01-042007-01-04ATKContract WinAlliant Techsystems Gets $90M Contract from U....90.0$MU.S. Department of Homeland Security12007-01-05NaN
29083412007-01-042007-01-04BCRXContract WinBioCryst Pharma Gets $102.6M Contract From US ...102.6$MU.S. Department of Health and Human Services12007-01-0510905
" ], "text/plain": [ " event_id asof_date trade_date symbol event_type \\\n", "0 907471 2007-01-03 2007-01-03 CECE Contract Win \n", "1 148887 2007-01-04 2007-01-04 ATK Contract Win \n", "2 908341 2007-01-04 2007-01-04 BCRX Contract Win \n", "\n", " event_headline contract_amount \\\n", "0 CECO Environmental Gets Two Orders for $55M Plus 55.0 \n", "1 Alliant Techsystems Gets $90M Contract from U.... 90.0 \n", "2 BioCryst Pharma Gets $102.6M Contract From US ... 102.6 \n", "\n", " amount_units contract_entity event_rating \\\n", "0 $M NaN 1 \n", "1 $M U.S. Department of Homeland Security 1 \n", "2 $M U.S. Department of Health and Human Services 1 \n", "\n", " timestamp sid \n", "0 2007-01-04 1396 \n", "1 2007-01-05 NaN \n", "2 2007-01-05 10905 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Let's see what the data looks like. We'll grab the first three rows.\n", "contract_win[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's go over the columns:\n", "- **event_id**: the unique identifier for this contract win.\n", "- **asof_date**: EventVestor's timestamp of event capture.\n", "- **trade_date**: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.\n", "- **symbol**: stock ticker symbol of the affected company.\n", "- **event_type**: this should always be *Contract Win*.\n", "- **contract_amount**: the amount of amount_units the contract is for.\n", "- **amount_units**: the currency or other units for the value of the contract. Most commonly in millions of dollars.\n", "- **contract_entity**: name of the customer, if available\n", "- **event_rating**: this is always 1. The meaning of this is uncertain.\n", "- **timestamp**: this is our timestamp on when we registered the data.\n", "- **sid**: the equity's unique identifier. Use this instead of the symbol." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We've done much of the data processing for you. Fields like `timestamp` and `sid` are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the `sid` across all our equity databases.\n", "\n", "We can select columns and rows with ease. Below, we'll fetch all contract wins by Boeing. We'll display only the contract_amount, amount_units, contract_entity, and timestamp. We'll sort by date." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampcontract_amountamount_unitscontract_entity
02007-04-192500$MSouth Korea
12007-04-20295$MCIT Aerospace
22007-04-241600$MAviation Capital Group
32007-04-253600$MVirgin Atlantic
42007-04-27700$MSpiceJet
52007-05-174700$MTUI Group
62007-05-302400$MRussian Airline S7
72007-06-011900$MRyanair Holdings PLC
82007-06-053000$MKuwait Airways
92007-06-07500$MPhilippine Airlines
102007-06-191420$MGE Commercial Aviation Services
" ], "text/plain": [ " timestamp contract_amount amount_units contract_entity\n", "0 2007-04-19 2500 $M South Korea\n", "1 2007-04-20 295 $M CIT Aerospace\n", "2 2007-04-24 1600 $M Aviation Capital Group\n", "3 2007-04-25 3600 $M Virgin Atlantic \n", "4 2007-04-27 700 $M SpiceJet\n", "5 2007-05-17 4700 $M TUI Group\n", "6 2007-05-30 2400 $M Russian Airline S7\n", "7 2007-06-01 1900 $M Ryanair Holdings PLC\n", "8 2007-06-05 3000 $M Kuwait Airways\n", "9 2007-06-07 500 $M Philippine Airlines\n", "..." ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ba_sid = symbols('BA').sid\n", "wins = contract_win[contract_win.sid == ba_sid][['timestamp', 'contract_amount','amount_units','contract_entity']].sort('timestamp')\n", "# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.\n", "wins" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, suppose we want the above as a DataFrame:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampcontract_amountamount_unitscontract_entity
02007-04-192500.0$MSouth Korea
12007-04-20295.0$MCIT Aerospace
22007-04-241600.0$MAviation Capital Group
32007-04-253600.0$MVirgin Atlantic
42007-04-27700.0$MSpiceJet
52007-05-174700.0$MTUI Group
62007-05-302400.0$MRussian Airline S7
72007-06-011900.0$MRyanair Holdings PLC
82007-06-053000.0$MKuwait Airways
92007-06-07500.0$MPhilippine Airlines
102007-06-191420.0$MGE Commercial Aviation Services
112007-06-208800.0$MInternational Lease Finance Corp
122007-06-212700.0$MAir France KLM
132007-07-012000.0$MU.S. Air Force
142007-07-06810.0$MCIT Aerospace
152007-07-094000.0$MAir Berlin
162007-08-03523.0$MAeroSvit
172007-08-041100.0$MAir New Zealand
182007-08-091400.0$MCathay Pacific Airways
192007-08-313100.0$MNorwegian Air Shuttle ASA
202007-09-073800.0$MChina Southern Airlines
212007-09-121100.0$MUS Air Force
222007-10-171500.0$MNaN
232007-11-065000.0$MLAN Airlines
242007-11-095200.0$MCathay Pacific
252007-11-123200.0$MEmirates
262007-11-14523.0$Mtransavia.com
272007-11-23716.0$MKLM Royal Dutch Airlines
282007-12-051700.0$MLion Air
292007-12-111500.0$MBabcock & Brown
...............
1912014-01-078800.0$Mflydubai
1922014-01-213900.0$MGE Capital Aviation Services
1932014-02-06228.0$MLinhas Aereas de Mocambique
1942014-02-15357.5$MCargolux Airlines
1952014-03-134400.0$MSpiceJet Ltd.
1962014-03-20830.0$MComair Limited
1972014-05-01452.0$MRyanair
1982014-05-311100.0$MJapan Transocean Air
1992014-06-171600.0$MTurkish Airlines
2002014-06-27272.0$MBelarus flag carrier Belavia Airlines
2012014-07-1056000.0$MEmirates Airline
2022014-07-15980.0$MOkay Airways Company
2032014-07-152000.0$MAvolon
2042014-07-161900.0$MIntrepid Aviation
2052014-07-171890.0$MQatar Airways
2062014-07-17152.0$MNaN
2072014-08-268800.0$MBOC Aviation
2082014-09-182100.0$MAvolon
2092014-09-212100.0$MEthiopian Airlines
2102014-10-07990.0$MAlaska Airlines
2112014-12-0211000.0$MRyanair
2122014-12-16438.0$MJetlines
2132014-12-19186.0$MBOC Aviation
2142014-12-243300.0$MKuwait Airways
2152015-01-07514.0$MAir New Zealand
2162015-01-071240.0$MQatar Airways
2172015-01-163600.0$MAir Europa
2182015-02-131600.0$MTransavia Company
2192015-02-131500.0$MKorean Air
2202015-03-28900.0$MAll Nippon Airways
\n", "

221 rows × 4 columns

\n", "
" ], "text/plain": [ " timestamp contract_amount amount_units \\\n", "0 2007-04-19 2500.0 $M \n", "1 2007-04-20 295.0 $M \n", "2 2007-04-24 1600.0 $M \n", "3 2007-04-25 3600.0 $M \n", "4 2007-04-27 700.0 $M \n", "5 2007-05-17 4700.0 $M \n", "6 2007-05-30 2400.0 $M \n", "7 2007-06-01 1900.0 $M \n", "8 2007-06-05 3000.0 $M \n", "9 2007-06-07 500.0 $M \n", "10 2007-06-19 1420.0 $M \n", "11 2007-06-20 8800.0 $M \n", "12 2007-06-21 2700.0 $M \n", "13 2007-07-01 2000.0 $M \n", "14 2007-07-06 810.0 $M \n", "15 2007-07-09 4000.0 $M \n", "16 2007-08-03 523.0 $M \n", "17 2007-08-04 1100.0 $M \n", "18 2007-08-09 1400.0 $M \n", "19 2007-08-31 3100.0 $M \n", "20 2007-09-07 3800.0 $M \n", "21 2007-09-12 1100.0 $M \n", "22 2007-10-17 1500.0 $M \n", "23 2007-11-06 5000.0 $M \n", "24 2007-11-09 5200.0 $M \n", "25 2007-11-12 3200.0 $M \n", "26 2007-11-14 523.0 $M \n", "27 2007-11-23 716.0 $M \n", "28 2007-12-05 1700.0 $M \n", "29 2007-12-11 1500.0 $M \n", ".. ... ... ... \n", "191 2014-01-07 8800.0 $M \n", "192 2014-01-21 3900.0 $M \n", "193 2014-02-06 228.0 $M \n", "194 2014-02-15 357.5 $M \n", "195 2014-03-13 4400.0 $M \n", "196 2014-03-20 830.0 $M \n", "197 2014-05-01 452.0 $M \n", "198 2014-05-31 1100.0 $M \n", "199 2014-06-17 1600.0 $M \n", "200 2014-06-27 272.0 $M \n", "201 2014-07-10 56000.0 $M \n", "202 2014-07-15 980.0 $M \n", "203 2014-07-15 2000.0 $M \n", "204 2014-07-16 1900.0 $M \n", "205 2014-07-17 1890.0 $M \n", "206 2014-07-17 152.0 $M \n", "207 2014-08-26 8800.0 $M \n", "208 2014-09-18 2100.0 $M \n", "209 2014-09-21 2100.0 $M \n", "210 2014-10-07 990.0 $M \n", "211 2014-12-02 11000.0 $M \n", "212 2014-12-16 438.0 $M \n", "213 2014-12-19 186.0 $M \n", "214 2014-12-24 3300.0 $M \n", "215 2015-01-07 514.0 $M \n", "216 2015-01-07 1240.0 $M \n", "217 2015-01-16 3600.0 $M \n", "218 2015-02-13 1600.0 $M \n", "219 2015-02-13 1500.0 $M \n", "220 2015-03-28 900.0 $M \n", "\n", " contract_entity \n", "0 South Korea \n", "1 CIT Aerospace \n", "2 Aviation Capital Group \n", "3 Virgin Atlantic \n", "4 SpiceJet \n", "5 TUI Group \n", "6 Russian Airline S7 \n", "7 Ryanair Holdings PLC \n", "8 Kuwait Airways \n", "9 Philippine Airlines \n", "10 GE Commercial Aviation Services \n", "11 International Lease Finance Corp \n", "12 Air France KLM \n", "13 U.S. Air Force \n", "14 CIT Aerospace \n", "15 Air Berlin \n", "16 AeroSvit \n", "17 Air New Zealand \n", "18 Cathay Pacific Airways \n", "19 Norwegian Air Shuttle ASA \n", "20 China Southern Airlines \n", "21 US Air Force \n", "22 NaN \n", "23 LAN Airlines \n", "24 Cathay Pacific \n", "25 Emirates \n", "26 transavia.com \n", "27 KLM Royal Dutch Airlines \n", "28 Lion Air \n", "29 Babcock & Brown \n", ".. ... \n", "191 flydubai \n", "192 GE Capital Aviation Services \n", "193 Linhas Aereas de Mocambique \n", "194 Cargolux Airlines \n", "195 SpiceJet Ltd. \n", "196 Comair Limited \n", "197 Ryanair \n", "198 Japan Transocean Air \n", "199 Turkish Airlines \n", "200 Belarus flag carrier Belavia Airlines \n", "201 Emirates Airline \n", "202 Okay Airways Company \n", "203 Avolon \n", "204 Intrepid Aviation \n", "205 Qatar Airways \n", "206 NaN \n", "207 BOC Aviation \n", "208 Avolon \n", "209 Ethiopian Airlines \n", "210 Alaska Airlines \n", "211 Ryanair \n", "212 Jetlines \n", "213 BOC Aviation \n", "214 Kuwait Airways \n", "215 Air New Zealand \n", "216 Qatar Airways \n", "217 Air Europa \n", "218 Transavia Company \n", "219 Korean Air \n", "220 All Nippon Airways \n", "\n", "[221 rows x 4 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ba_df = odo(wins, pd.DataFrame)\n", "# Printing a pandas DataFrame displays the first 30 and last 30 items, and truncates the middle.\n", "ba_df" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }