{ "cells": [ { "cell_type": "markdown", "id": "5b7cfe86", "metadata": {}, "source": [ "## Linear Regression Exercise" ] }, { "cell_type": "markdown", "id": "55e146d6", "metadata": {}, "source": [ "We will use a couple of the variables we measured, soil moisture content and aggregate stability to see if there is any relationship between them. The data is located here: https://github.com/gabbymyers/516X-Project/blob/master/_data/Class%20Exercise%20Data.xlsx. \n", "\n", "Soil moisture content refers to the moisture in the soil. We determine this by collecting a sample from each of our 36 plots at the Northeast Iowa Research Farm. The sample is taken back to the lab and the moisture content is determined by weighing the soil before and after drying. \n", "\n", "Aggregate stability is a measure of how the soil aggregates (groups of soil particles) fall apart when wet. We use the SLAKES app to get the aggregate stability numbers for our samples. The SLAKES app take continuous images of soil aggregates as they are submerged in water and returns aggregate stability values on a range of 0-14. The lower the number, the more stable the aggregates are. Stable aggregates indicate the soil likely has better water infiltration, reducing the risk of runoff. Cover crops are known to increase aggreagate stability, so I would expect our cover crop treatments to have lower aggregate stability values from the SLAKES app." ] }, { "cell_type": "markdown", "id": "ab8678f1", "metadata": {}, "source": [ "We have different 12 different treatments in triplicate on our 36 plots. I have these listed below. \n", "\n", "* 1C: Corn (on corn/soy rotation) with spring UAN \n", "* 1S: Soy (on corn/soy rotation) with spring UAN \n", "* 2C: Corn (on corn/soy rotation) with spring manure \n", "* 2S: Soy (on corn/soy rotation) with spring manure \n", "* 3.1: Continuous Corn \n", "* 3.2: Continuous Corn + Interseeded Cover crop (30 inch row) \n", "* 4.1: Continuous Corn + Perennial Groundcover \n", "* 4.2: Continuous Corn + Interseeded Cover crop (60 inch row) \n", "* 5C: Corn (on corn/soy rotation) + cereal rye \n", "* 5S: Soy (on corn/soy rotation) + cereal rye \n", "* 6C: Corn (on corn/soy rotation) + fall manure \n", "* 6S: Soy (on corn/soy rotation) + fall manure " ] }, { "cell_type": "markdown", "id": "7dec9b3a", "metadata": {}, "source": [ "### Research Question:\n", "\n", "Is there any relationship between aggregate stability and moisture content?" ] }, { "cell_type": "code", "execution_count": 45, "id": "40929ec1", "metadata": {}, "outputs": [], "source": [ "# imports\n", "import pandas as pd\n", "import seaborn as sns\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import statsmodels.api as sm\n", "# allow plots to appear directly in the notebook\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "id": "49ec3ad1", "metadata": {}, "source": [ "#### Read the soil moisture data into its own data frame." ] }, { "cell_type": "code", "execution_count": 16, "id": "3aca81e7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Plot NumberTreatmentBlockSample DateMoisture ContentUnnamed: 5Unnamed: 6Unnamed: 7
012S210.050720NaNNaNNaN
126S210.053265NaNNaNNaN
231C210.075589NaNNaNNaN
343.1210.059354NaNNaNNaN
454.1210.082545NaNNaNNaN
\n", "
" ], "text/plain": [ " Plot Number Treatment Block Sample Date Moisture Content Unnamed: 5 \\\n", "0 1 2S 2 1 0.050720 NaN \n", "1 2 6S 2 1 0.053265 NaN \n", "2 3 1C 2 1 0.075589 NaN \n", "3 4 3.1 2 1 0.059354 NaN \n", "4 5 4.1 2 1 0.082545 NaN \n", "\n", " Unnamed: 6 Unnamed: 7 \n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soil_moisture = pd.read_excel(___, sheet_name = ______)\n", "soil_moisture.head()" ] }, { "cell_type": "markdown", "id": "5e169d5f", "metadata": {}, "source": [ "Get rid of unneeded columns" ] }, { "cell_type": "code", "execution_count": 19, "id": "20394f09", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Plot NumberTreatmentBlockSample DateMoisture Content
012S210.050720
126S210.053265
231C210.075589
343.1210.059354
454.1210.082545
\n", "
" ], "text/plain": [ " Plot Number Treatment Block Sample Date Moisture Content\n", "0 1 2S 2 1 0.050720\n", "1 2 6S 2 1 0.053265\n", "2 3 1C 2 1 0.075589\n", "3 4 3.1 2 1 0.059354\n", "4 5 4.1 2 1 0.082545" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soil_moisture = soil_moisture.iloc[____:____ ]\n", "soil_moisture.head()" ] }, { "cell_type": "markdown", "id": "bc793af3", "metadata": {}, "source": [ "See how many weeks of data you have for soil moisture" ] }, { "cell_type": "code", "execution_count": 20, "id": "3428bdf6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "max(_____)" ] }, { "cell_type": "markdown", "id": "34958712", "metadata": {}, "source": [ "We have 12 weeks of data for the soil moisture content. " ] }, { "cell_type": "markdown", "id": "1d2b93ab", "metadata": {}, "source": [ "#### Read the aggregate stability data into its own data frame. " ] }, { "cell_type": "code", "execution_count": 21, "id": "e2c922f5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlotTreatmentBlockDateAggregate StabilityUnnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14Unnamed: 15
012S212.5NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
126S216.2NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
231C212.5NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
343.1211.7NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
454.1210.8NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " Plot Treatment Block Date Aggregate Stability Unnamed: 5 Unnamed: 6 \\\n", "0 1 2S 2 1 2.5 NaN NaN \n", "1 2 6S 2 1 6.2 NaN NaN \n", "2 3 1C 2 1 2.5 NaN NaN \n", "3 4 3.1 2 1 1.7 NaN NaN \n", "4 5 4.1 2 1 0.8 NaN NaN \n", "\n", " Unnamed: 7 Unnamed: 8 Unnamed: 9 Unnamed: 10 Unnamed: 11 Unnamed: 12 \\\n", "0 NaN NaN NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN NaN NaN \n", "\n", " Unnamed: 13 Unnamed: 14 Unnamed: 15 \n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aggregate_stability = pd.read_excel(____, sheet_name = _____ )\n", "aggregate_stability.head()" ] }, { "cell_type": "markdown", "id": "b5f9e9c5", "metadata": {}, "source": [ "We need to filter out the extra columns that are contained in the aggregate stability data frame. " ] }, { "cell_type": "code", "execution_count": 22, "id": "78c30fb8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlotTreatmentBlockDateAggregate Stability
012S212.5
126S216.2
231C212.5
343.1211.7
454.1210.8
\n", "
" ], "text/plain": [ " Plot Treatment Block Date Aggregate Stability\n", "0 1 2S 2 1 2.5\n", "1 2 6S 2 1 6.2\n", "2 3 1C 2 1 2.5\n", "3 4 3.1 2 1 1.7\n", "4 5 4.1 2 1 0.8" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aggregate_stability = aggregate_stability.iloc[___:___ ]\n", "aggregate_stability.head()" ] }, { "cell_type": "markdown", "id": "f8ba85f6", "metadata": {}, "source": [ "Make sure that the dates match the soil moisture so we have the same amount of data:" ] }, { "cell_type": "code", "execution_count": 23, "id": "2d1a42b5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "max(________)" ] }, { "cell_type": "markdown", "id": "9aa84032", "metadata": {}, "source": [ "We only have 10 weeks of data for aggregate stability, so we need to filter out the extra two weeks of soil moisture data. 10 weeks * 36 plots = 360 rows of data. " ] }, { "cell_type": "code", "execution_count": 24, "id": "44ab1b52", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soil_moisture = soil_moisture.iloc[___:____]\n", "max(soil_moisture['Sample Date'])" ] }, { "cell_type": "code", "execution_count": 25, "id": "cb896722", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Plot NumberTreatmentBlockSample DateMoisture Content
012S210.050720
126S210.053265
231C210.075589
343.1210.059354
454.1210.082545
..................
355323.21100.159200
356333.13100.153900
357345C3100.148500
358354.23100.161500
359363.23100.168700
\n", "

360 rows × 5 columns

\n", "
" ], "text/plain": [ " Plot Number Treatment Block Sample Date Moisture Content\n", "0 1 2S 2 1 0.050720\n", "1 2 6S 2 1 0.053265\n", "2 3 1C 2 1 0.075589\n", "3 4 3.1 2 1 0.059354\n", "4 5 4.1 2 1 0.082545\n", ".. ... ... ... ... ...\n", "355 32 3.2 1 10 0.159200\n", "356 33 3.1 3 10 0.153900\n", "357 34 5C 3 10 0.148500\n", "358 35 4.2 3 10 0.161500\n", "359 36 3.2 3 10 0.168700\n", "\n", "[360 rows x 5 columns]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soil_moisture" ] }, { "cell_type": "markdown", "id": "683576a5", "metadata": {}, "source": [ "I wanted to make sure that the last row of data is correct. It is because it is plot 36 on date 10. " ] }, { "cell_type": "markdown", "id": "fe6c9906", "metadata": {}, "source": [ "#### Merging the data frames\n", "First we need to rename the columns so the are the same across the different data frames. " ] }, { "cell_type": "code", "execution_count": 27, "id": "c1548f92", "metadata": {}, "outputs": [], "source": [ "soil_moisture = soil_moisture.rename(columns={'Plot Number': 'Plot'})\n", "soil_moisture = soil_moisture.rename(columns={'Sample Date': 'Date'})" ] }, { "cell_type": "markdown", "id": "6182b952", "metadata": {}, "source": [ "Now we can merge the two data frames" ] }, { "cell_type": "code", "execution_count": 37, "id": "cdfb893f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlotTreatmentBlockDateMoisture ContentAggregate Stability
012S210.0507202.5
126S210.0532656.2
231C210.0755892.5
343.1210.0593541.7
454.1210.0825450.8
\n", "
" ], "text/plain": [ " Plot Treatment Block Date Moisture Content Aggregate Stability\n", "0 1 2S 2 1 0.050720 2.5\n", "1 2 6S 2 1 0.053265 6.2\n", "2 3 1C 2 1 0.075589 2.5\n", "3 4 3.1 2 1 0.059354 1.7\n", "4 5 4.1 2 1 0.082545 0.8" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_data = pd.merge(left=___, right=______, \n", " left_on=[__,___,__,__], right_on=[__,___,__,__])\n", "merged_data.head()" ] }, { "cell_type": "markdown", "id": "6d632ced", "metadata": {}, "source": [ "Count the number of na values in the data frame" ] }, { "cell_type": "code", "execution_count": 38, "id": "38490dd4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Plot 0\n", "Treatment 0\n", "Block 0\n", "Date 0\n", "Moisture Content 0\n", "Aggregate Stability 4\n", "dtype: int64" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "code", "execution_count": 39, "id": "dfae02e3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(360, 6)" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_data.shape" ] }, { "cell_type": "markdown", "id": "0faa77c1", "metadata": {}, "source": [ "There are four na values in the aggregate stability column. Drop those rows:" ] }, { "cell_type": "code", "execution_count": 40, "id": "15fd3c50", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(356, 6)" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_data = merged_data.____()\n", "merged_data.shape" ] }, { "cell_type": "markdown", "id": "6ff610e0", "metadata": {}, "source": [ "#### Explore the data" ] }, { "cell_type": "markdown", "id": "548f5d7d", "metadata": {}, "source": [ "###### Make box plots of the data" ] }, { "cell_type": "code", "execution_count": 106, "id": "c9c2e025", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 1.0, 'Moisture Content by Treatment')" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ax = sns.boxplot(x = ________, y = _________, data = merged_data)\n", "plt.title('Moisture Content by Treatment')" ] }, { "cell_type": "code", "execution_count": 107, "id": "646a1816", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 1.0, 'Aggregate Stability by Treatment')" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ax = sns.boxplot(x = __________, y = ________, data = merged_data)\n", "plt.title('Aggregate Stability by Treatment')" ] }, { "cell_type": "markdown", "id": "2a95b4ab", "metadata": {}, "source": [ "###### How many unstable aggregate stability measurements do we have and which treatments have the most?\n", "\n", "Unstable is defined as values greater than or equal to 7. \n", "\n", "Start by making a data frame of the unstable measurements:" ] }, { "cell_type": "code", "execution_count": 108, "id": "608ba708", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlotTreatmentBlockDateMoisture ContentAggregate Stability
563.2210.10601410.3
19206S110.16403611.1
27281C110.0461677.8
31323.2110.0820449.9
4163.2220.2974268.1
\n", "
" ], "text/plain": [ " Plot Treatment Block Date Moisture Content Aggregate Stability\n", "5 6 3.2 2 1 0.106014 10.3\n", "19 20 6S 1 1 0.164036 11.1\n", "27 28 1C 1 1 0.046167 7.8\n", "31 32 3.2 1 1 0.082044 9.9\n", "41 6 3.2 2 2 0.297426 8.1" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unstable_df = merged_data[____]>=___]\n", "unstable_df.head()" ] }, { "cell_type": "code", "execution_count": 109, "id": "3ba5d223", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PlotTreatmentBlockDateMoisture ContentAggregate Stability
063.2210.10601410.3
1206S110.16403611.1
2281C110.0461677.8
3323.2110.0820449.9
463.2220.2974268.1
\n", "
" ], "text/plain": [ " Plot Treatment Block Date Moisture Content Aggregate Stability\n", "0 6 3.2 2 1 0.106014 10.3\n", "1 20 6S 1 1 0.164036 11.1\n", "2 28 1C 1 1 0.046167 7.8\n", "3 32 3.2 1 1 0.082044 9.9\n", "4 6 3.2 2 2 0.297426 8.1" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unstable_df = unstable_df.reset_index(drop=True)\n", "unstable_df.head()" ] }, { "cell_type": "code", "execution_count": 110, "id": "b96db93d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "17" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(____)" ] }, { "cell_type": "markdown", "id": "ef9eed0a", "metadata": {}, "source": [ "There are 17 unstable measurements. " ] }, { "cell_type": "markdown", "id": "72d1e263", "metadata": {}, "source": [ "See which treatments the unstable measurements belong to, using \"value_counts\":" ] }, { "cell_type": "code", "execution_count": 111, "id": "340e99e0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.2 4\n", "3.1 3\n", "6C 2\n", "1C 2\n", "2C 1\n", "6S 1\n", "1S 1\n", "5C 1\n", "5S 1\n", "4.2 1\n", "Name: Treatment, dtype: int64" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "markdown", "id": "a682b4cd", "metadata": {}, "source": [ "The treatment with the most unstable measurements is 3.2, followed by 3.1." ] }, { "cell_type": "markdown", "id": "9c0ba903", "metadata": {}, "source": [ "###### How are the unstable measurements distributed among the blocks?" ] }, { "cell_type": "code", "execution_count": 112, "id": "dd19158b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 7\n", "2 6\n", "3 4\n", "Name: Block, dtype: int64" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "markdown", "id": "e449958c", "metadata": {}, "source": [ "The block with the most unstable measurements is 1. " ] }, { "cell_type": "markdown", "id": "4e968cda", "metadata": {}, "source": [ "### Regression" ] }, { "cell_type": "markdown", "id": "858d0b5c", "metadata": {}, "source": [ "Plot moisture content vs. aggregate stability to see if you can see any relationship." ] }, { "cell_type": "code", "execution_count": 58, "id": "94b723dd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0, 0.5, 'Aggregate Stability')" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter()\n", "plt.title('MC vs. AS')\n", "plt.xlabel('Soil Moisture Content')\n", "plt.ylabel('Aggregate Stability')" ] }, { "cell_type": "code", "execution_count": 61, "id": "0143fdf6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(356,)\n", "(356,)\n" ] } ], "source": [ "X = merged_data['Moisture Content']\n", "Y = merged_data['Aggregate Stability']\n", "print(X.shape)\n", "print(Y.shape)" ] }, { "cell_type": "markdown", "id": "eaca029c", "metadata": {}, "source": [ "Use scipy.stats.linregress (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html)" ] }, { "cell_type": "code", "execution_count": 67, "id": "eaab2ea3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R-squared: 0.001122\n", "Slope: 0.946637\n", "Intercept: 2.570731\n" ] } ], "source": [ "res = sp.stats.___________\n", "print(f\"R-squared: {res.rvalue**2:.6f}\")\n", "print(f\"Slope: {res.slope:.6f}\")\n", "print(f\"Intercept: {res.intercept:.6f}\")" ] }, { "cell_type": "code", "execution_count": 77, "id": "2e6e714b", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(X, Y, 'o', label='original data')\n", "plt.plot(X, res.intercept + res.slope*X, 'r', label='fitted line')\n", "plt.title('MC vs. AS')\n", "plt.xlabel('Soil Moisture Content')\n", "plt.ylabel('Aggregate Stability')\n", "plt.legend()\n", "plt.show()\n", "plt.savefig('MC_AS_Regress.jpg', bbox_inches='tight')" ] }, { "cell_type": "markdown", "id": "76e225fd", "metadata": {}, "source": [ "#### Using the method we learned in class: " ] }, { "cell_type": "code", "execution_count": 72, "id": "95bfbe3f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(356, 1)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":1: FutureWarning: Support for multi-dimensional indexing (e.g. `obj[:, None]`) is deprecated and will be removed in a future version. Convert to a numpy array before indexing instead.\n", " X = X[:, np.newaxis]\n" ] } ], "source": [ "X = X[:, np.newaxis]\n", "print(X.shape)" ] }, { "cell_type": "code", "execution_count": 73, "id": "124724eb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LinearRegression()" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = LinearRegression(fit_intercept=True)\n", "model\n", "model.fit(____,________)" ] }, { "cell_type": "code", "execution_count": 74, "id": "f8c5397f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.94663685])" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.coef_" ] }, { "cell_type": "code", "execution_count": 75, "id": "2cc426c3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.5707314408029305" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.intercept_" ] }, { "cell_type": "markdown", "id": "c73ccd84", "metadata": {}, "source": [ "### Equation: Y = 0.9466X + 2.57073" ] }, { "cell_type": "markdown", "id": "83f668da", "metadata": {}, "source": [ "### R^2 value: 0.001122" ] }, { "cell_type": "markdown", "id": "982e2d6d", "metadata": {}, "source": [ "The regression model is not powerful at all, with a very low R^2 value. " ] }, { "cell_type": "code", "execution_count": null, "id": "8d2131db", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 5 }