# Lecture 34: A Look Ahead; Examples of Regression Example, Sampling from a Finite Population


## Stat 110, Prof. Joe Blitzstein, Harvard University

----

## The Top-10 List

1. Conditioning ... is the soul of statistics!
1. Symmetry ... is powerful but dangerous
1. Random variables and their distributions
1. Stories (proofs, backgrounds of the distributions covered)
1. Linearity
1. Indicator random variables
1. LOTUS
1. Law of Large Numbers
1. Central Limit Theorem
1. Markov Chains

Items 1 through 4 deal with the Big Picture<sup>&trade;</sup> questions: _What is randomness? How do we think about uncertainty?_

Items 5 through 7 are for computing expected values (mean, variance &amp; standard deviation).

Items 8 through 10 are important for understanding long-run behavior.

## Where to go from here?

Some topics to study from here on out:

* Statistical inference (we have data, need to estimate parameters or make predictions)
* Regress &amp; linear models
* Finance
* Computational biology
* Stochastic processes

## Advice

* Learn R
* Learn C
* Read Mostly Harmless Econometrics

## Ex. A Simple Linear Regression

You've seen this before:

\begin{align}
  Y &= \beta_0 + \beta_1 \, X + \epsilon
\end{align}

* We want to use $X$ to predict $Y$
* $\beta_j$ are linear coeffiecients, with $\beta_0$ being the value of $Y$ when $x=0$ (default value)
* $\epsilon$ error term (since $X$ is not perfect)
* a common assumption is $\mathbb{E}(\epsilon | X) = 0$ (centered at 0, $\epsilon$'s distribution may or may not be normal)

So how would we solve for $\beta_1$?

We can start by treating $Cov$ as an _operator_!

\begin{align}
  Cov(Y, X) &= Cov\left( (\beta_0 + \beta_1 \, X + \epsilon), X \right) \\
  &= Cov(\beta_0, X) + Cov\left( (\beta_1 \, X), X\right) + Cov(\epsilon, X) \\
  \\
  \text{now } Cov(\beta_0, X) &= 0 &\quad \text{ since } Cov \text{ of constant with anything is } 0 \\
  \\
  \text{and } Cov\left( (\beta_1 \, X), X\right) &= \beta_1 \, Cov(X, X) &\quad \text{by definition of }Var \\
  &= \beta_1 \, Var(X) \\
  \\
  \text{and since } \mathbb{E}(\epsilon) &= \mathbb{E}\left( \mathbb{E}(\epsilon|X) \right) = \mathbb{E}(0) = 0 \\
  \text{and further } \mathbb{E}(\epsilon \, X) &= \mathbb{E}\left( \mathbb{E}(\epsilon \, X | X)  \right) &\quad \text{ by Adam's Law} \\
  &= \mathbb{E}\left( X \mathbb{E}(\epsilon | X) \right) &\quad \text{ since } X \text{ is known, we can pull it out} \\
  &= \mathbb{E}(0) \\
  &= 0 \\
  \text{so }Cov(\epsilon, X) &= \mathbb{E}(\epsilon \, X) - \mathbb{E}(\epsilon) \, \mathbb{E}(X) \\
  &= 0 - 0 = 0 \\\\
  \Rightarrow \beta_1 &= \frac{Cov(X,Y)}{Var(X)} &\quad \text{(population version)} 
\end{align}

### Calculate $\beta_1$ with $Cov(X,Y)$ and $Var(X)$

Here we calculate $\beta_1 = \frac{Cov(X,Y)}{Var(X)} \,$ using [numpy.cov](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.cov.html):

In [1]:
import numpy as np

X = np.array([95, 85, 80, 70, 60])
Y = np.array([85, 95, 70, 65, 70])

# numpy.cov(X, Y) returns the matrix
# [ Cov(X,X), Cov(X,Y)]
# [ Cov(X,Y), Cov(Y,Y)]
covM = np.cov(X,Y)

beta_1 = covM[0,1]/covM[0,0]
beta_1

0.64383561643835618

### Calculate $\beta_1$ via sklearn LinearRegression API

For comparison's sake, we also obtain $\beta_1$ via [`sklearn.linear_model.LinearRegression`](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)

In [2]:
from sklearn import linear_model

regr = linear_model.LinearRegression()
regr.fit(np.matrix(X).T, Y).coef_[0]

0.64383561643835607

----

## Ex. Sampling from a Finite Population

Here's the set-up:

* We have a finite population , size $N$.
* Let $Y_1, Y_2, ..., Y_N$ be some value of interest (height, weight, opinion.
* Each person in the population can be uniquely identified.
* $Y_j$ are fixed, non-random values, but nevertheless _unknown_.
* Using some sampling scheme to obtain a sample of size $n$.
* Using this sample, we want to infer to the sum of $Y$ (or perhaps the average).
* Assume that the _inclusion probability_ of person $j$ ending up in our sample is $p_j$ (assume that the true value is known). 
* Our sample data takes the form $(X_1, Z_1), (X_2, Z_2), ..., (X_n, Z_n)$, where
   * $Z_j$ is the ID of the $j^{th}$ person in our sample
   * $X_j = Y_j$

### The difference between $X_j$ and $Y_j$

It is important to understand the difference between $X_j$ and $Y_j$:

* $Y_j$ is a **_fixed, non-random value_**.
* $X_j$, due to our random sampling (person $j$ was randomly selected from the population with probability $p_j$), is a **_random variable_**.

### How do we get an unbiased estimator for the total?

Let $t_y$ be the true population total $\sum_{1}^{N} Y_i$. How can we use random sampling of this finite population to find $\hat{t_y}$? 

The claim is that $\sum_{j=1}^{n} \frac{X_j}{P_{Z_j}}$ is an unbiased estimator for $t_y$ we are looking for.

\begin{align}
  t_y &= \sum_{j=1}^{n} \frac{X_j}{P_{Z_j}} \\
  &= \sum_{j=1}^{N} \frac{I_j \, Y_j}{P_j} &\quad \text{where } I_j = 1 \text{ if person } j \text{ included in sample}\\\\
  \mathbb{E}(t_y) &=  \mathbb{E}\left( \sum_{j=1}^{N} \frac{I_j \, Y_j}{P_j} \right) &\quad \text{ find expected value to get } \hat{t_y} \\
  &= \sum_{j=1}^{N} \frac{P_j \, Y_j}{P_j} &\quad \text{ by linearity} \\
  &= \boxed{ \sum_{j=1}^{N} Y_j }
\end{align}

This is known as the [Horvitz-Thompson Estimator](https://en.wikipedia.org/wiki/Horvitz%E2%80%93Thompson_estimator), or alternately inverse probability weighting.

_But is an unbiased estimator good?_

### Basu's Circus Elephants

Statistics is not easy, and it requires a lot of effort to keep your eyes open and question whether or not a tentative method is really going to yield a proper answer. Here is an anecdote to illustrate an example of when blindly applying an Horvitz-Thompson estimate ends in disaster.

> The circus owner is planning to ship his 50 adult elephants and so he needs a rough estimate of the total
> weight of the elephants. As weighing an elephant is a cumbersome process, the owner wants to estimate
> the total weight by weighing just one elephant. Which elephant should he weigh?<p/>So the owner looks
> back on his records and discovers a list of the elephants' weights taken 3 years ago. He finds that 3 years
> ago Sambo the middle-sized elephant was the average (in weight) elephant in his herd. He checks with
> the elephant trainer who reassures him (the owner) that Sambo may still be considered to be the average
> elephant in the herd. Therefore, the owner plans to weigh Sambo and take 50y (where y is the present
> weight of Sambo) as an estimate of the total weight of the 50 elephants.<p/>But the circus statistician is
> horrified when he learns of the owner's proposed sampling plan. "How can you get an unbiased estimate
> of Y this way?", protests the statistician.<p/>So, together they work out a compromise sampling plan. With
> the help of a table of random numbers they devise a plan that allots a selection probability of 99/100 to
> Sambo and equal selection probabilities of 1/4900 to each of the other 49 elephants. Naturally, Sambo is
> selected and the owner is happy. <p/>"How are you going to estimate Y?", asks the statistician.<p/>"Why? The
> estimate ought to be 50y of course," says the owner.<p/>"Oh! No! That cannot possibly be right," says the
> statistician, "I recently read an article in the Annals of Mathematical Statistics where it is proved that
> the Horvitz-Thompson estimator is the unique hyperadmissible estimator in the class of all generalized
> polynomial unbiased estimators."<p/>"What is the Horvitz-Thompson estimate in this case?" asks the owner,
> duly impressed.<p/>"Since the selection probability for Sambo in our plan was 99/100," says the statistician,
> "the proper estimate of Y is 100y/99 and not 50y."<p/>"And, how would you have estimated Y", inquires
> the incredulous owner, "if our sampling plan made us select, say, the big elephant Jumbo?"<p/>"According to
> what I understand of the Horvitz-Thompson estimation method," says the unhappy statistician, "the proper
> estimate of Y would then have been 4900y, where y is Jumbo's weight".<p/>That is how the statistician lost
> his circus job (and perhaps became a teacher of statistics)

----

View [Lecture 34: A Look Ahead | Statistics 110](http://bit.ly/2Npa7ze) on YouTube.