## Linear Regression: Model Estimation

**Functions**

`sm.OLS`

### Exercise 32
Use the OLS function to estimate the coefficients of the Fama-French portfolios (monthly data) on the
market, size and value factors. Include a constant in the regressions. Use only the four
extremum portfolios â€“ that is the 1-1, 1-5, 5-1 and 5-5 portfolios. Estimate the model with
homoskedastic errors and with White's covariance estimator.

In [1]:
import pandas as pd

ff = pd.read_hdf("data/ff.h5", "ff")
ff.head()

factors = ff.iloc[:, :3]
portfolios = ff.iloc[:, 4:]
portfolios = portfolios[["SMALL LoBM", "SMALL HiBM", "BIG LoBM", "BIG HiBM"]]

In [2]:
import statsmodels.api as sm

factors = sm.add_constant(factors)
all_results = {}
homosk_results = {}

In [3]:
mod = sm.OLS(portfolios["SMALL LoBM"], factors)
res = mod.fit(cov_type="HC0")
all_results["SMALL LoBM"] = res
homosk_results["SMALL LoBM"] = sm.OLS(portfolios["SMALL LoBM"], factors).fit()
res.summary()

0,1,2,3
Dep. Variable:,SMALL LoBM,R-squared:,0.655
Model:,OLS,Adj. R-squared:,0.654
Method:,Least Squares,F-statistic:,180.8
Date:,"Wed, 22 Sep 2021",Prob (F-statistic):,1.7699999999999998e-95
Time:,11:07:17,Log-Likelihood:,-3776.8
No. Observations:,1117,AIC:,7562.0
Df Residuals:,1113,BIC:,7582.0
Df Model:,3,,
Covariance Type:,HC0,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,-0.4355,0.172,-2.526,0.012,-0.773,-0.098
Mkt-RF,1.2832,0.119,10.766,0.000,1.050,1.517
SMB,1.4336,0.177,8.120,0.000,1.088,1.780
HML,0.4214,0.274,1.536,0.125,-0.116,0.959

0,1,2,3
Omnibus:,896.514,Durbin-Watson:,2.065
Prob(Omnibus):,0.0,Jarque-Bera (JB):,102194.78
Skew:,2.973,Prob(JB):,0.0
Kurtosis:,49.48,Cond. No.,5.68


In [4]:
mod = sm.OLS(portfolios["SMALL HiBM"], factors)
res = mod.fit(cov_type="HC0")
all_results["SMALL HiBM"] = res
homosk_results["SMALL HiBM"] = sm.OLS(portfolios["SMALL HiBM"], factors).fit()
res.summary()

0,1,2,3
Dep. Variable:,SMALL HiBM,R-squared:,0.939
Model:,OLS,Adj. R-squared:,0.938
Method:,Least Squares,F-statistic:,941.0
Date:,"Wed, 22 Sep 2021",Prob (F-statistic):,1.19e-304
Time:,11:07:17,Log-Likelihood:,-2506.2
No. Observations:,1117,AIC:,5020.0
Df Residuals:,1113,BIC:,5041.0
Df Model:,3,,
Covariance Type:,HC0,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,0.3696,0.062,6.008,0.000,0.249,0.490
Mkt-RF,0.9830,0.024,40.929,0.000,0.936,1.030
SMB,1.3001,0.065,20.068,0.000,1.173,1.427
HML,0.9124,0.058,15.624,0.000,0.798,1.027

0,1,2,3
Omnibus:,589.467,Durbin-Watson:,2.272
Prob(Omnibus):,0.0,Jarque-Bera (JB):,15321.552
Skew:,1.888,Prob(JB):,0.0
Kurtosis:,20.747,Cond. No.,5.68


In [5]:
mod = sm.OLS(portfolios["BIG LoBM"], factors)
res = mod.fit(cov_type="HC0")
all_results["BIG LoBM"] = res
homosk_results["BIG LoBM"] = sm.OLS(portfolios["BIG LoBM"], factors).fit()
res.summary()

0,1,2,3
Dep. Variable:,BIG LoBM,R-squared:,0.952
Model:,OLS,Adj. R-squared:,0.952
Method:,Least Squares,F-statistic:,4050.0
Date:,"Wed, 22 Sep 2021",Prob (F-statistic):,0.0
Time:,11:07:17,Log-Likelihood:,-1758.6
No. Observations:,1117,AIC:,3525.0
Df Residuals:,1113,BIC:,3545.0
Df Model:,3,,
Covariance Type:,HC0,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,0.3594,0.035,10.270,0.000,0.291,0.428
Mkt-RF,1.0229,0.009,109.366,0.000,1.005,1.041
SMB,-0.1485,0.023,-6.524,0.000,-0.193,-0.104
HML,-0.2629,0.016,-16.645,0.000,-0.294,-0.232

0,1,2,3
Omnibus:,109.852,Durbin-Watson:,1.791
Prob(Omnibus):,0.0,Jarque-Bera (JB):,248.017
Skew:,0.578,Prob(JB):,1.3899999999999998e-54
Kurtosis:,4.998,Cond. No.,5.68


In [6]:
mod = sm.OLS(portfolios["BIG HiBM"], factors)
res = mod.fit(cov_type="HC0")
all_results["BIG HiBM"] = res
homosk_results["BIG HiBM"] = sm.OLS(portfolios["BIG HiBM"], factors).fit()
res.summary()

0,1,2,3
Dep. Variable:,BIG HiBM,R-squared:,0.838
Model:,OLS,Adj. R-squared:,0.838
Method:,Least Squares,F-statistic:,376.3
Date:,"Wed, 22 Sep 2021",Prob (F-statistic):,1.0399999999999999e-168
Time:,11:07:17,Log-Likelihood:,-2956.0
No. Observations:,1117,AIC:,5920.0
Df Residuals:,1113,BIC:,5940.0
Df Model:,3,,
Covariance Type:,HC0,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,0.1042,0.103,1.016,0.309,-0.097,0.305
Mkt-RF,1.1823,0.038,31.058,0.000,1.108,1.257
SMB,-0.1573,0.069,-2.268,0.023,-0.293,-0.021
HML,1.0162,0.073,13.961,0.000,0.874,1.159

0,1,2,3
Omnibus:,436.752,Durbin-Watson:,1.808
Prob(Omnibus):,0.0,Jarque-Bera (JB):,9887.813
Skew:,1.257,Prob(JB):,0.0
Kurtosis:,17.357,Cond. No.,5.68


### Exercise 33
Are the parameter standard errors similar using the two covariance estimators?
If not, what does this mean? 

In [7]:
from IPython.display import HTML, display

for key in all_results:
    white_res = all_results[key]
    homosk_res = homosk_results[key]
    std_err = pd.DataFrame({"Homosk": homosk_res.bse, "White": white_res.bse})
    display(HTML(key))
    display(std_err)

Unnamed: 0,Homosk,White
const,0.215569,0.172414
Mkt-RF,0.043167,0.119187
SMB,0.070797,0.176554
HML,0.063235,0.274367


Unnamed: 0,Homosk,White
const,0.069118,0.061514
Mkt-RF,0.013841,0.024017
SMB,0.022699,0.064785
HML,0.020275,0.058401


Unnamed: 0,Homosk,White
const,0.035391,0.034997
Mkt-RF,0.007087,0.009353
SMB,0.011623,0.022765
HML,0.010382,0.015797


Unnamed: 0,Homosk,White
const,0.103384,0.102525
Mkt-RF,0.020702,0.038067
SMB,0.033953,0.069341
HML,0.030327,0.072789


In [8]:
for key in all_results:
    white_res = all_results[key]
    homosk_res = homosk_results[key]
    t_stats = pd.DataFrame({"Homosk": homosk_res.tvalues, "White": white_res.tvalues})
    display(HTML(key))
    display(t_stats)

Unnamed: 0,Homosk,White
const,-2.020037,-2.525647
Mkt-RF,29.725191,10.765843
SMB,20.248866,8.119638
HML,6.66408,1.535905


Unnamed: 0,Homosk,White
const,5.346747,6.007648
Mkt-RF,71.022641,40.929483
SMB,57.274909,20.06801
HML,45.003201,15.623556


Unnamed: 0,Homosk,White
const,10.155474,10.269912
Mkt-RF,144.336719,109.365862
SMB,-12.777091,-6.523564
HML,-25.327247,-16.64512


Unnamed: 0,Homosk,White
const,1.007916,1.016364
Mkt-RF,57.107702,31.057592
SMB,-4.632601,-2.268375
HML,33.508204,13.960789


### Exercise 34
How much of the variation is explained by these three regressors?

In [9]:
rsquare = {}
for key in all_results:
    rsquare[key] = all_results[key].rsquared
pd.Series(rsquare, name="R2").to_frame()

Unnamed: 0,R2
SMALL LoBM,0.654851
SMALL HiBM,0.938606
BIG LoBM,0.951723
BIG HiBM,0.838409


In [10]:
capm_factor = factors.iloc[:, :2]
for key in portfolios:
    res = sm.OLS(portfolios[key], capm_factor).fit()
    rsquare[key] = res.rsquared
pd.Series(rsquare, name="R2").to_frame()

Unnamed: 0,R2
SMALL LoBM,0.508956
SMALL HiBM,0.629589
BIG LoBM,0.915187
BIG HiBM,0.674243
