{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Regresssion lineaire\n",
"\n",
"avec [Statsmodel](http://www.statsmodels.org/stable/index.html)\n",
"\n",
"\n",
"dataset: auto mpg **Mileage per gallon performances of various cars**\n",
"\n",
"Disponible sur https://www.kaggle.com/uciml/autompg-dataset\n",
"\n",
"A prédire:\n",
"* mpg: continuous\n",
"\n",
"Les variables\n",
"\n",
"* cylinders: multi-valued discrete\n",
"* displacement: continuous\n",
"* horsepower: continuous\n",
"* weight: continuous\n",
"* acceleration: continuous\n",
"\n",
"On ne prends pas en compte:\n",
"\n",
"* model year: multi-valued discrete\n",
"* origin: multi-valued discrete\n",
"* car name: string (unique for each instance)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import statsmodels.formula.api as smf\n",
"\n",
"df = pd.read_csv('../data/autos_mpg.csv')\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"OLS Regression Results\n",
"\n",
" Dep. Variable: | mpg | R-squared: | 0.717 | \n",
"
\n",
"\n",
" Model: | OLS | Adj. R-squared: | 0.713 | \n",
"
\n",
"\n",
" Method: | Least Squares | F-statistic: | 165.5 | \n",
"
\n",
"\n",
" Date: | Sat, 22 Sep 2018 | Prob (F-statistic): | 4.84e-104 | \n",
"
\n",
"\n",
" Time: | 17:58:03 | Log-Likelihood: | -1131.1 | \n",
"
\n",
"\n",
" No. Observations: | 398 | AIC: | 2276. | \n",
"
\n",
"\n",
" Df Residuals: | 391 | BIC: | 2304. | \n",
"
\n",
"\n",
" Df Model: | 6 | | | \n",
"
\n",
"\n",
" Covariance Type: | nonrobust | | | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | t | P>|t| | [0.025 | 0.975] | \n",
"
\n",
"\n",
" Intercept | 42.7111 | 2.693 | 15.861 | 0.000 | 37.417 | 48.005 | \n",
"
\n",
"\n",
" cylinders | -0.5256 | 0.404 | -1.302 | 0.194 | -1.320 | 0.268 | \n",
"
\n",
"\n",
" displacement | 0.0106 | 0.009 | 1.133 | 0.258 | -0.008 | 0.029 | \n",
"
\n",
"\n",
" horsepower | -0.0529 | 0.016 | -3.277 | 0.001 | -0.085 | -0.021 | \n",
"
\n",
"\n",
" weight | -0.0051 | 0.001 | -6.441 | 0.000 | -0.007 | -0.004 | \n",
"
\n",
"\n",
" acceleration | 0.0043 | 0.120 | 0.036 | 0.972 | -0.232 | 0.241 | \n",
"
\n",
"\n",
" origin | 1.4269 | 0.345 | 4.136 | 0.000 | 0.749 | 2.105 | \n",
"
\n",
"
\n",
"\n",
"\n",
" Omnibus: | 32.659 | Durbin-Watson: | 0.886 | \n",
"
\n",
"\n",
" Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 43.338 | \n",
"
\n",
"\n",
" Skew: | 0.624 | Prob(JB): | 3.88e-10 | \n",
"
\n",
"\n",
" Kurtosis: | 4.028 | Cond. No. | 3.99e+04 | \n",
"
\n",
"
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.99e+04. This might indicate that there are
strong multicollinearity or other numerical problems."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: mpg R-squared: 0.717\n",
"Model: OLS Adj. R-squared: 0.713\n",
"Method: Least Squares F-statistic: 165.5\n",
"Date: Sat, 22 Sep 2018 Prob (F-statistic): 4.84e-104\n",
"Time: 17:58:03 Log-Likelihood: -1131.1\n",
"No. Observations: 398 AIC: 2276.\n",
"Df Residuals: 391 BIC: 2304.\n",
"Df Model: 6 \n",
"Covariance Type: nonrobust \n",
"================================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"--------------------------------------------------------------------------------\n",
"Intercept 42.7111 2.693 15.861 0.000 37.417 48.005\n",
"cylinders -0.5256 0.404 -1.302 0.194 -1.320 0.268\n",
"displacement 0.0106 0.009 1.133 0.258 -0.008 0.029\n",
"horsepower -0.0529 0.016 -3.277 0.001 -0.085 -0.021\n",
"weight -0.0051 0.001 -6.441 0.000 -0.007 -0.004\n",
"acceleration 0.0043 0.120 0.036 0.972 -0.232 0.241\n",
"origin 1.4269 0.345 4.136 0.000 0.749 2.105\n",
"==============================================================================\n",
"Omnibus: 32.659 Durbin-Watson: 0.886\n",
"Prob(Omnibus): 0.000 Jarque-Bera (JB): 43.338\n",
"Skew: 0.624 Prob(JB): 3.88e-10\n",
"Kurtosis: 4.028 Cond. No. 3.99e+04\n",
"==============================================================================\n",
"\n",
"Warnings:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"[2] The condition number is large, 3.99e+04. This might indicate that there are\n",
"strong multicollinearity or other numerical problems.\n",
"\"\"\""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lm = smf.ols(formula='mpg ~ cylinders + displacement + horsepower + weight + acceleration + origin ', data=df).fit()\n",
"lm.summary()\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" mpg | \n",
" cylinders | \n",
" displacement | \n",
" horsepower | \n",
" weight | \n",
" acceleration | \n",
" model year | \n",
" origin | \n",
"
\n",
" \n",
" \n",
" \n",
" Unnamed: 0 | \n",
" 1.000000 | \n",
" 0.585131 | \n",
" -0.363040 | \n",
" -0.386976 | \n",
" -0.417861 | \n",
" -0.318869 | \n",
" 0.287634 | \n",
" 0.996800 | \n",
" 0.199702 | \n",
"
\n",
" \n",
" mpg | \n",
" 0.585131 | \n",
" 1.000000 | \n",
" -0.775396 | \n",
" -0.804203 | \n",
" -0.771437 | \n",
" -0.831741 | \n",
" 0.420289 | \n",
" 0.579267 | \n",
" 0.563450 | \n",
"
\n",
" \n",
" cylinders | \n",
" -0.363040 | \n",
" -0.775396 | \n",
" 1.000000 | \n",
" 0.950721 | \n",
" 0.838939 | \n",
" 0.896017 | \n",
" -0.505419 | \n",
" -0.348746 | \n",
" -0.562543 | \n",
"
\n",
" \n",
" displacement | \n",
" -0.386976 | \n",
" -0.804203 | \n",
" 0.950721 | \n",
" 1.000000 | \n",
" 0.893646 | \n",
" 0.932824 | \n",
" -0.543684 | \n",
" -0.370164 | \n",
" -0.609409 | \n",
"
\n",
" \n",
" horsepower | \n",
" -0.417861 | \n",
" -0.771437 | \n",
" 0.838939 | \n",
" 0.893646 | \n",
" 1.000000 | \n",
" 0.860574 | \n",
" -0.684259 | \n",
" -0.411651 | \n",
" -0.453669 | \n",
"
\n",
" \n",
" weight | \n",
" -0.318869 | \n",
" -0.831741 | \n",
" 0.896017 | \n",
" 0.932824 | \n",
" 0.860574 | \n",
" 1.000000 | \n",
" -0.417457 | \n",
" -0.306564 | \n",
" -0.581024 | \n",
"
\n",
" \n",
" acceleration | \n",
" 0.287634 | \n",
" 0.420289 | \n",
" -0.505419 | \n",
" -0.543684 | \n",
" -0.684259 | \n",
" -0.417457 | \n",
" 1.000000 | \n",
" 0.288137 | \n",
" 0.205873 | \n",
"
\n",
" \n",
" model year | \n",
" 0.996800 | \n",
" 0.579267 | \n",
" -0.348746 | \n",
" -0.370164 | \n",
" -0.411651 | \n",
" -0.306564 | \n",
" 0.288137 | \n",
" 1.000000 | \n",
" 0.180662 | \n",
"
\n",
" \n",
" origin | \n",
" 0.199702 | \n",
" 0.563450 | \n",
" -0.562543 | \n",
" -0.609409 | \n",
" -0.453669 | \n",
" -0.581024 | \n",
" 0.205873 | \n",
" 0.180662 | \n",
" 1.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Unnamed: 0 mpg cylinders displacement horsepower \\\n",
"Unnamed: 0 1.000000 0.585131 -0.363040 -0.386976 -0.417861 \n",
"mpg 0.585131 1.000000 -0.775396 -0.804203 -0.771437 \n",
"cylinders -0.363040 -0.775396 1.000000 0.950721 0.838939 \n",
"displacement -0.386976 -0.804203 0.950721 1.000000 0.893646 \n",
"horsepower -0.417861 -0.771437 0.838939 0.893646 1.000000 \n",
"weight -0.318869 -0.831741 0.896017 0.932824 0.860574 \n",
"acceleration 0.287634 0.420289 -0.505419 -0.543684 -0.684259 \n",
"model year 0.996800 0.579267 -0.348746 -0.370164 -0.411651 \n",
"origin 0.199702 0.563450 -0.562543 -0.609409 -0.453669 \n",
"\n",
" weight acceleration model year origin \n",
"Unnamed: 0 -0.318869 0.287634 0.996800 0.199702 \n",
"mpg -0.831741 0.420289 0.579267 0.563450 \n",
"cylinders 0.896017 -0.505419 -0.348746 -0.562543 \n",
"displacement 0.932824 -0.543684 -0.370164 -0.609409 \n",
"horsepower 0.860574 -0.684259 -0.411651 -0.453669 \n",
"weight 1.000000 -0.417457 -0.306564 -0.581024 \n",
"acceleration -0.417457 1.000000 0.288137 0.205873 \n",
"model year -0.306564 0.288137 1.000000 0.180662 \n",
"origin -0.581024 0.205873 0.180662 1.000000 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.corr()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# regression polynomiale\n",
"\n",
"Comparer \n",
" \n",
" mpg = β0 + β1 × horsepower + ε\n",
" \n",
"avec\n",
"\n",
" mpg = β0 + β1 × horsepower + β2 × horsepower^2 + ε\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}