{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyzing Linear Regression Model in Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Analyze a Simple Linear Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Goal: Obtaining statistical summary about the linear regression line the topsoil lead concentration (`lead` column, as y-axis) and the topsoil cadmium concentration (`cadmium` column, as x-axis). " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/lizhoufan/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:10: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead\n", " # Remove the CWD from sys.path while we load stuff.\n" ] } ], "source": [ "### Previous steps necessary\n", "# import packages\n", "import pandas as pd\n", "import numpy as np\n", "from sklearn.linear_model import LinearRegression\n", "# import dataset\n", "data = pd.read_csv(\"meuse.csv\")\n", "# build the model\n", "regression_model = LinearRegression()\n", "lr = LinearRegression().fit(data.cadmium.reshape((-1, 1)), data.lead)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### R^2\n", "R-squared measures how close the data are fitted to the regression line." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.6383156080918473\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/lizhoufan/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead\n", " \"\"\"Entry point for launching an IPython kernel.\n" ] } ], "source": [ "print(lr.score(data.cadmium.reshape((-1, 1)), data.lead))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "R-squared measures how close the data are fitted to the regression line. In here, we can conclude that about 63% of the variance of the prediction of `lead` based on `cadmium` can be explained by the linear model `m1`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For other more advanced, please refer to our later posts regarding advanced topics in Linear Regression Modeling." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }