{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Regresssion lineaire\n", "\n", "avec [Statsmodel](http://www.statsmodels.org/stable/index.html)\n", "\n", "\n", "dataset: auto mpg **Mileage per gallon performances of various cars**\n", "\n", "Disponible sur https://www.kaggle.com/uciml/autompg-dataset\n", "\n", "A prédire:\n", "* mpg: continuous\n", "\n", "Les variables\n", "\n", "* cylinders: multi-valued discrete\n", "* displacement: continuous\n", "* horsepower: continuous\n", "* weight: continuous\n", "* acceleration: continuous\n", "\n", "On ne prends pas en compte:\n", "\n", "* model year: multi-valued discrete\n", "* origin: multi-valued discrete\n", "* car name: string (unique for each instance)\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import statsmodels.formula.api as smf\n", "\n", "df = pd.read_csv('../data/autos_mpg.csv')\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: mpg R-squared: 0.717
Model: OLS Adj. R-squared: 0.713
Method: Least Squares F-statistic: 165.5
Date: Sat, 22 Sep 2018 Prob (F-statistic): 4.84e-104
Time: 17:58:03 Log-Likelihood: -1131.1
No. Observations: 398 AIC: 2276.
Df Residuals: 391 BIC: 2304.
Df Model: 6
Covariance Type: nonrobust
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
Intercept 42.7111 2.693 15.861 0.000 37.417 48.005
cylinders -0.5256 0.404 -1.302 0.194 -1.320 0.268
displacement 0.0106 0.009 1.133 0.258 -0.008 0.029
horsepower -0.0529 0.016 -3.277 0.001 -0.085 -0.021
weight -0.0051 0.001 -6.441 0.000 -0.007 -0.004
acceleration 0.0043 0.120 0.036 0.972 -0.232 0.241
origin 1.4269 0.345 4.136 0.000 0.749 2.105
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Omnibus: 32.659 Durbin-Watson: 0.886
Prob(Omnibus): 0.000 Jarque-Bera (JB): 43.338
Skew: 0.624 Prob(JB): 3.88e-10
Kurtosis: 4.028 Cond. No. 3.99e+04


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.99e+04. This might indicate that there are
strong multicollinearity or other numerical problems." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: mpg R-squared: 0.717\n", "Model: OLS Adj. R-squared: 0.713\n", "Method: Least Squares F-statistic: 165.5\n", "Date: Sat, 22 Sep 2018 Prob (F-statistic): 4.84e-104\n", "Time: 17:58:03 Log-Likelihood: -1131.1\n", "No. Observations: 398 AIC: 2276.\n", "Df Residuals: 391 BIC: 2304.\n", "Df Model: 6 \n", "Covariance Type: nonrobust \n", "================================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "--------------------------------------------------------------------------------\n", "Intercept 42.7111 2.693 15.861 0.000 37.417 48.005\n", "cylinders -0.5256 0.404 -1.302 0.194 -1.320 0.268\n", "displacement 0.0106 0.009 1.133 0.258 -0.008 0.029\n", "horsepower -0.0529 0.016 -3.277 0.001 -0.085 -0.021\n", "weight -0.0051 0.001 -6.441 0.000 -0.007 -0.004\n", "acceleration 0.0043 0.120 0.036 0.972 -0.232 0.241\n", "origin 1.4269 0.345 4.136 0.000 0.749 2.105\n", "==============================================================================\n", "Omnibus: 32.659 Durbin-Watson: 0.886\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 43.338\n", "Skew: 0.624 Prob(JB): 3.88e-10\n", "Kurtosis: 4.028 Cond. No. 3.99e+04\n", "==============================================================================\n", "\n", "Warnings:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "[2] The condition number is large, 3.99e+04. This might indicate that there are\n", "strong multicollinearity or other numerical problems.\n", "\"\"\"" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lm = smf.ols(formula='mpg ~ cylinders + displacement + horsepower + weight + acceleration + origin ', data=df).fit()\n", "lm.summary()\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0mpgcylindersdisplacementhorsepowerweightaccelerationmodel yearorigin
Unnamed: 01.0000000.585131-0.363040-0.386976-0.417861-0.3188690.2876340.9968000.199702
mpg0.5851311.000000-0.775396-0.804203-0.771437-0.8317410.4202890.5792670.563450
cylinders-0.363040-0.7753961.0000000.9507210.8389390.896017-0.505419-0.348746-0.562543
displacement-0.386976-0.8042030.9507211.0000000.8936460.932824-0.543684-0.370164-0.609409
horsepower-0.417861-0.7714370.8389390.8936461.0000000.860574-0.684259-0.411651-0.453669
weight-0.318869-0.8317410.8960170.9328240.8605741.000000-0.417457-0.306564-0.581024
acceleration0.2876340.420289-0.505419-0.543684-0.684259-0.4174571.0000000.2881370.205873
model year0.9968000.579267-0.348746-0.370164-0.411651-0.3065640.2881371.0000000.180662
origin0.1997020.563450-0.562543-0.609409-0.453669-0.5810240.2058730.1806621.000000
\n", "
" ], "text/plain": [ " Unnamed: 0 mpg cylinders displacement horsepower \\\n", "Unnamed: 0 1.000000 0.585131 -0.363040 -0.386976 -0.417861 \n", "mpg 0.585131 1.000000 -0.775396 -0.804203 -0.771437 \n", "cylinders -0.363040 -0.775396 1.000000 0.950721 0.838939 \n", "displacement -0.386976 -0.804203 0.950721 1.000000 0.893646 \n", "horsepower -0.417861 -0.771437 0.838939 0.893646 1.000000 \n", "weight -0.318869 -0.831741 0.896017 0.932824 0.860574 \n", "acceleration 0.287634 0.420289 -0.505419 -0.543684 -0.684259 \n", "model year 0.996800 0.579267 -0.348746 -0.370164 -0.411651 \n", "origin 0.199702 0.563450 -0.562543 -0.609409 -0.453669 \n", "\n", " weight acceleration model year origin \n", "Unnamed: 0 -0.318869 0.287634 0.996800 0.199702 \n", "mpg -0.831741 0.420289 0.579267 0.563450 \n", "cylinders 0.896017 -0.505419 -0.348746 -0.562543 \n", "displacement 0.932824 -0.543684 -0.370164 -0.609409 \n", "horsepower 0.860574 -0.684259 -0.411651 -0.453669 \n", "weight 1.000000 -0.417457 -0.306564 -0.581024 \n", "acceleration -0.417457 1.000000 0.288137 0.205873 \n", "model year -0.306564 0.288137 1.000000 0.180662 \n", "origin -0.581024 0.205873 0.180662 1.000000 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.corr()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# regression polynomiale\n", "\n", "Comparer \n", " \n", " mpg = β0 + β1 × horsepower + ε\n", " \n", "avec\n", "\n", " mpg = β0 + β1 × horsepower + β2 × horsepower^2 + ε\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }