{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### OSCR Machine Learning in Python\n", "\n", "**Linear Regression Module**\n", "\n", "**© Kaixin Wang**, Fall 2019\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Module/Package import" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np # numpy module for linear algebra\n", "import pandas as pd # pandas module for data manipulation\n", "import matplotlib.pyplot as plt # module for plotting\n", "import seaborn as sns # another module for plotting" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import warnings # to handle warning messages\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from sklearn.linear_model import LinearRegression # package for linear model\n", "import statsmodels.api as sm # another package for linear model\n", "import statsmodels.formula.api as smf\n", "import scipy as sp" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split # split data into training and testing sets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dataset import" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset that we will be using is the `meuse` dataset.\n", "\n", "As described by the author of the data: \"This data set gives locations and topsoil heavy metal concentrations, along with a number of soil and landscape variables at the observation locations, collected in a flood plain of the river Meuse, near the village of Stein (NL). Heavy metal concentrations are from composite samples of an area of approximately 15 m $\\times$ 15 m.\"" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | x | \n", "y | \n", "cadmium | \n", "copper | \n", "lead | \n", "zinc | \n", "elev | \n", "dist | \n", "om | \n", "ffreq | \n", "soil | \n", "lime | \n", "landuse | \n", "dist.m | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "181072 | \n", "333611 | \n", "11.7 | \n", "85 | \n", "299 | \n", "1022 | \n", "7.909 | \n", "0.001358 | \n", "13.6 | \n", "1 | \n", "1 | \n", "1 | \n", "Ah | \n", "50 | \n", "
1 | \n", "181025 | \n", "333558 | \n", "8.6 | \n", "81 | \n", "277 | \n", "1141 | \n", "6.983 | \n", "0.012224 | \n", "14.0 | \n", "1 | \n", "1 | \n", "1 | \n", "Ah | \n", "30 | \n", "
2 | \n", "181165 | \n", "333537 | \n", "6.5 | \n", "68 | \n", "199 | \n", "640 | \n", "7.800 | \n", "0.103029 | \n", "13.0 | \n", "1 | \n", "1 | \n", "1 | \n", "Ah | \n", "150 | \n", "
3 | \n", "181298 | \n", "333484 | \n", "2.6 | \n", "81 | \n", "116 | \n", "257 | \n", "7.655 | \n", "0.190094 | \n", "8.0 | \n", "1 | \n", "2 | \n", "0 | \n", "Ga | \n", "270 | \n", "
4 | \n", "181307 | \n", "333330 | \n", "2.8 | \n", "48 | \n", "117 | \n", "269 | \n", "7.480 | \n", "0.277090 | \n", "8.7 | \n", "1 | \n", "2 | \n", "0 | \n", "Ah | \n", "380 | \n", "