# EventVestor: Earnings Guidance

In this notebook, we'll take a look at EventVestor's *Earnings Guidance* dataset, available on the [Quantopian Store](https://www.quantopian.com/store). This dataset spans January 01, 2007 through the current day, and documents forward looking earnings guidance provided by companies.

### Blaze
Before we dig into the data, we want to tell you about how  you generally access Quantopian Store data sets. These datasets are available through an API service known as [Blaze](http://blaze.pydata.org). Blaze provides the Quantopian user with a convenient interface to access very large datasets.

Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.

It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.

Helpful links:
* [Query building for Blaze](http://blaze.pydata.org/en/latest/queries.html)
* [Pandas-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-pandas.html)
* [SQL-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-sql.html).

Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:
> `from odo import odo`  
> `odo(expr, pandas.DataFrame)`

### Free samples and limits
One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.

There is a *free* version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.

With preamble in place, let's get started:

In [9]:
# import the dataset
from quantopian.interactive.data.eventvestor import earnings_guidance
# or if you want to import the free dataset, use:
# from quantopian.interactive.data.eventvestor import earnings_guidance_free

# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd

In [10]:
# Let's use blaze to understand the data a bit using Blaze dshape()
earnings_guidance.dshape

dshape("""var * {
  event_id: ?float64,
  asof_date: datetime,
  trade_date: ?datetime,
  symbol: ?string,
  event_type: ?string,
  event_headline: ?string,
  event_phase: ?string,
  guidance_content: ?string,
  guidance_gaap: ?string,
  guidance_trend: ?string,
  guidance_quality: ?string,
  fiscal_quarter: ?string,
  eps_low: ?float64,
  eps_high: ?float64,
  revenue_low: ?float64,
  revenue_high: ?float64,
  netincome_low: ?float64,
  netincome_high: ?float64,
  fiscal_year: ?string,
  annual_trend: ?string,
  event_rating: ?float64,
  timestamp: datetime,
  sid: ?int64
  }""")

In [11]:
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
earnings_guidance.count()

In [12]:
# Let's see what the data looks like. We'll grab the first three rows.
earnings_guidance[:3]

Unnamed: 0,event_id,asof_date,trade_date,symbol,event_type,event_headline,event_phase,guidance_content,guidance_gaap,guidance_trend,guidance_quality,fiscal_quarter,eps_low,eps_high,revenue_low,revenue_high,netincome_low,netincome_high,fiscal_year,annual_trend,event_rating,timestamp,sid
0,933903,2007-01-02,2007-01-02,INOD,Guidance,Numerex Raises 4Q and FY 06 Guidance,,Other Financial,GAAP,Higher,Open Ended,4Q 06,0.0,0.0,0,0,0,0,FY 06,Higher,1,2007-01-03,9581
1,138379,2007-01-02,2007-01-03,SLG,Guidance,SL Green Realty Issues FY 07 FFO Guidance,,Other Financial,Non-GAAP,,,,0.0,0.0,0,0,0,0,FY 07,New,1,2007-01-03,17448
2,137809,2007-01-03,2007-01-03,AEO,Guidance,American Eagle Raises 4Q 06 EPS Guidance,,EPS,GAAP,Higher,Range,4Q 06,0.64,0.65,0,0,0,0,,,1,2007-01-04,11086


Let's go over the columns:
- **event_id**: the unique identifier for this event.
- **asof_date**: EventVestor's timestamp of event capture.
- **trade_date**: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.
- **symbol**: stock ticker symbol of the affected company.
- **event_type**: this should always be *Guidance*.
- **event_headline**: a brief description of the event
- **event_phase**: the inclusion of this field is likely an error on the part of the data vendor. We're currently attempting to resolve this.
- **guidance_content**: values include *EPS, EPS & Financial, Operational, Other Financial*
- **guidance_gaap**: values include *GAAP, Non-GAAP*
- **guidance_trend**: values include *Higher, Lower, Narrows, New, Reiterate, Withdrawal*
- **guidance_quality**: values include *Open Ended, Point, Range*
- **fiscal_quarter**: fiscal quarter for which guidance is provided
- **eps_low**: low end of the quarterly EPS guidance
- **eps_high**: high end of the quarterly EPS guidance
- **revenue_low**: low end of the quarterly revenue guidance
- **revenue_high**: high end of the quarterly revenue guidance
- **netincome_low**: low end of the quarterly net income guidance
- **netincome_high**: high end of the quarterly net income guidance
- **fiscal_year**: fiscal year for which the quarterly guidance is provided
- **annual_trend**: the annual guidance trend. Values include *Higher, Lower, Narrows, New, Reiterate, Withdrawal*
- **event_rating**: this is always 1. The meaning of this is uncertain.
- **timestamp**: this is our timestamp on when we registered the data.
- **sid**: the equity's unique identifier. Use this instead of the symbol.

We've done much of the data processing for you. Fields like `timestamp` and `sid` are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the `sid` across all our equity databases.

We can select columns and rows with ease. Below, we'll fetch all of Apple's entries from 2012.

In [13]:
# get apple's sid first
aapl_sid = symbols('AAPL').sid
aapl_earnings = earnings_guidance[('2011-12-31' < earnings_guidance['asof_date']) & (earnings_guidance['asof_date'] <'2013-01-01') & (earnings_guidance.sid==aapl_sid)]
# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.
aapl_earnings.sort('asof_date')

Unnamed: 0,event_id,asof_date,trade_date,symbol,event_type,event_headline,event_phase,guidance_content,guidance_gaap,guidance_trend,guidance_quality,fiscal_quarter,eps_low,eps_high,revenue_low,revenue_high,netincome_low,netincome_high,fiscal_year,annual_trend,event_rating,timestamp,sid
0,1385926,2012-01-24,2012-01-25,AAPL,Guidance,Apple Issues 2Q 12 Guidance,,EPS & Financial,GAAP,New,Point,2Q 12,8.5,8.5,3250,3250,0,0,,,1,2012-01-25,24
1,1421102,2012-04-24,2012-04-25,AAPL,Guidance,Apple Issues 3Q 12 Guidance,,EPS & Financial,GAAP,New,Point,3Q 12,8.68,8.68,34000,34000,0,0,,,1,2012-04-25,24
2,1456501,2012-07-24,2012-07-25,AAPL,Guidance,Apple Issues 4Q 12 Guidance,,EPS & Financial,GAAP,New,Point,4Q 12,7.65,7.65,34000,34000,0,0,,,1,2012-07-25,24
3,1496798,2012-10-25,2012-10-26,AAPL,Guidance,Apple Issues 1Q 13 Guidance,,EPS & Financial,GAAP,New,Point,1Q 13,11.75,11.75,5200,5200,0,0,,,1,2012-10-26,24


Finally, suppose we want a DataFrame of all earnings guidances releases in 2012 in which revenue_low and revenue_high differ. Then we'll compute by how much they differ!

In [14]:
# manipulate with Blaze first:
twentytwelve = earnings_guidance[(earnings_guidance['asof_date'] < '2012-12-31')&('2012-01-01' <= earnings_guidance['asof_date'])]
# now that we've got a much smaller object (len: ~39000 rows), we can convert it to a pandas DataFrame
df = odo(twentytwelve, pd.DataFrame)
df = df[df.revenue_low != df.revenue_high]
df['revenue_difference'] = df.revenue_high - df.revenue_low
df.sort('revenue_difference', ascending=False, inplace=True)
df.index = range(len(df))
# When printed: pandas DataFrames display the head(30) and tail(30) rows, and truncate the middle.
df

Unnamed: 0,event_id,asof_date,trade_date,symbol,event_type,event_headline,event_phase,guidance_content,guidance_gaap,guidance_trend,...,revenue_low,revenue_high,netincome_low,netincome_high,fiscal_year,annual_trend,event_rating,timestamp,sid,revenue_difference
0,1930090,2012-07-26,2012-07-26,ATE,Guidance,Advantest Corp. Issues 2Q 12 & Narrows FY 12 O...,,Other Financial,GAAP,New,...,72000.00,77000.00,0.0,0.0,FY 12,Narrows,1,2012-07-27,23052,5000.00
1,1496843,2012-10-25,2012-10-26,AMZN,Guidance,Amazon.com Issues 4Q 12 Guidance,,Other Financial,GAAP,New,...,20250.00,22750.00,0.0,0.0,,,1,2012-10-26,16841,2500.00
2,1496029,2012-10-25,2012-10-25,TSM,Guidance,Taiwan Semiconductor Issues 4Q 12 Guidance,,Other Financial,GAAP,New,...,129000.00,131000.00,0.0,0.0,,,1,2012-10-26,17773,2000.00
3,1490534,2012-09-10,2012-09-10,TSM,Guidance,Taiwan Semiconductor Raises 3Q 12 Guidance,,Other Financial,GAAP,Higher,...,136000.00,138000.00,0.0,0.0,,,1,2012-09-11,17773,2000.00
4,1454839,2012-07-19,2012-07-19,TSM,Guidance,Taiwan Semiconductor Issues 3Q 12 Guidance,,Other Financial,GAAP,New,...,136000.00,138000.00,0.0,0.0,,,1,2012-07-20,17773,2000.00
5,1384154,2012-01-18,2012-01-18,TSM,Guidance,Taiwan Semiconductor Issues 1Q & FY 12 Guidance,,Other Financial,GAAP,New,...,103000.00,105000.00,0.0,0.0,FY 12,New,1,2012-01-19,17773,2000.00
6,1423149,2012-04-26,2012-04-26,TSM,Guidance,Taiwan Semiconductor Issues 2Q & Revises FY 12...,,Operational,,New,...,126000.00,128000.00,0.0,0.0,FY 12,,1,2012-04-27,17773,2000.00
7,1419797,2012-04-20,2012-04-20,INTU,Guidance,Intuit Reaffirms 3Q & FY 12 Earnings Guidance,,EPS & Financial,GAAP,Reiterate,...,0.00,1950.00,0.0,0.0,FY 12,Reiterate,1,2012-04-21,8655,1950.00
8,1388974,2012-01-31,2012-02-01,AMZN,Guidance,Amazon Issues 1Q 12 Guidance,,Other Financial,GAAP,New,...,12000.00,13400.00,0.0,0.0,,,1,2012-02-01,16841,1400.00
9,1422959,2012-04-26,2012-04-27,AMZN,Guidance,Amazon.com Issues 2Q 12 Guidance,,Other Financial,GAAP,New,...,11900.00,13300.00,0.0,0.0,,,1,2012-04-27,16841,1400.00
