{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Statistics example problems\n",
"\n",
"[1. ANOVA - geological example](#1.-ANOVA-geological-example)\n",
"\n",
"[2. Non-parametric tests](#2.-Non-parametric-tests)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from scipy import stats\n",
"import pandas as pd\n",
"import pingouin as pg"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. ANOVA geological example\n",
"\n",
"#### a. Implementation\n",
"\n",
"Data come from Table 10.1 of McKillup and Dyar, Geostatistics Explained, Cambridge University Press, 2010 (excerpt available on class Google Drive). Values represent the weight percent of MgO present in tourmalines from three locations in Maine. \n",
"\n",
"Use two different methods to test whether there is a significant difference in the mean MgO content between the three different sites.\n",
"\n",
"##### Method 1: Scipy"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('data/MgO_example/MgO_Maine.csv') # dataframe"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Mount Mica \n",
" Sebago Batholith \n",
" Black Mountain \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 7 \n",
" 4 \n",
" 1 \n",
" \n",
" \n",
" 1 \n",
" 8 \n",
" 5 \n",
" 2 \n",
" \n",
" \n",
" 2 \n",
" 10 \n",
" 7 \n",
" 4 \n",
" \n",
" \n",
" 3 \n",
" 11 \n",
" 8 \n",
" 5 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Mount Mica Sebago Batholith Black Mountain\n",
"0 7 4 1\n",
"1 8 5 2\n",
"2 10 7 4\n",
"3 11 8 5"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"F_onewayResult(statistic=10.799999999999999, pvalue=0.004058306777237465)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stats.f_oneway(df['Mount Mica'],df['Sebago Batholith'],df['Black Mountain'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Method 2: Pingouin"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"df2 = pd.read_csv('data/MgO_example/MgO_Maine_list.csv')"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" MgO \n",
" Location \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 7 \n",
" Mount Mica \n",
" \n",
" \n",
" 1 \n",
" 8 \n",
" Mount Mica \n",
" \n",
" \n",
" 2 \n",
" 10 \n",
" Mount Mica \n",
" \n",
" \n",
" 3 \n",
" 11 \n",
" Mount Mica \n",
" \n",
" \n",
" 4 \n",
" 4 \n",
" Sebago Batholith \n",
" \n",
" \n",
" 5 \n",
" 5 \n",
" Sebago Batholith \n",
" \n",
" \n",
" 6 \n",
" 7 \n",
" Sebago Batholith \n",
" \n",
" \n",
" 7 \n",
" 8 \n",
" Sebago Batholith \n",
" \n",
" \n",
" 8 \n",
" 1 \n",
" Black Mountain \n",
" \n",
" \n",
" 9 \n",
" 2 \n",
" Black Mountain \n",
" \n",
" \n",
" 10 \n",
" 4 \n",
" Black Mountain \n",
" \n",
" \n",
" 11 \n",
" 5 \n",
" Black Mountain \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" MgO Location\n",
"0 7 Mount Mica\n",
"1 8 Mount Mica\n",
"2 10 Mount Mica\n",
"3 11 Mount Mica\n",
"4 4 Sebago Batholith\n",
"5 5 Sebago Batholith\n",
"6 7 Sebago Batholith\n",
"7 8 Sebago Batholith\n",
"8 1 Black Mountain\n",
"9 2 Black Mountain\n",
"10 4 Black Mountain\n",
"11 5 Black Mountain"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Source \n",
" ddof1 \n",
" ddof2 \n",
" F \n",
" p-unc \n",
" np2 \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" Location \n",
" 2 \n",
" 9 \n",
" 10.8 \n",
" 0.004058 \n",
" 0.705882 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Source ddof1 ddof2 F p-unc np2\n",
"0 Location 2 9 10.8 0.004058 0.705882"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pg.anova(data=df2,dv='MgO',between='Location')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Post-hoc test"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" A \n",
" B \n",
" mean(A) \n",
" mean(B) \n",
" diff \n",
" se \n",
" T \n",
" p-tukey \n",
" hedges \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" Black Mountain \n",
" Mount Mica \n",
" 3 \n",
" 9 \n",
" -6 \n",
" 1.290994 \n",
" -4.64758 \n",
" 0.001000 \n",
" -2.857683 \n",
" \n",
" \n",
" 1 \n",
" Black Mountain \n",
" Sebago Batholith \n",
" 3 \n",
" 6 \n",
" -3 \n",
" 1.290994 \n",
" -2.32379 \n",
" 0.061925 \n",
" -1.428841 \n",
" \n",
" \n",
" 2 \n",
" Mount Mica \n",
" Sebago Batholith \n",
" 9 \n",
" 6 \n",
" 3 \n",
" 1.290994 \n",
" 2.32379 \n",
" 0.061925 \n",
" 1.428841 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B mean(A) mean(B) diff se \\\n",
"0 Black Mountain Mount Mica 3 9 -6 1.290994 \n",
"1 Black Mountain Sebago Batholith 3 6 -3 1.290994 \n",
"2 Mount Mica Sebago Batholith 9 6 3 1.290994 \n",
"\n",
" T p-tukey hedges \n",
"0 -4.64758 0.001000 -2.857683 \n",
"1 -2.32379 0.061925 -1.428841 \n",
"2 2.32379 0.061925 1.428841 "
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pg.pairwise_tukey(data=df2,dv='MgO',between='Location')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### b. ANOVA interpretation\n",
"\n",
"Write a summary of your interpretation of the statistical results conducted above. Address the following questions.\n",
"\n",
"* What is the null hypothesis being tested?\n",
"* Should the null hypothesis be accepted or rejected?\n",
"* What does the post-hoc test tell you?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Non-parametric tests"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### a. Wilcoxon signed-rank test: implementation\n",
"\n",
"This example uses data from:\n",
"http://www.biostathandbook.com/wilcoxonsignedrank.html\n",
"\n",
"The data are observations of aluminum content in 13 different poplar clones in a polluted area. The scientific question is whether there is a significant change in the aluminum content between August and November."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"df_al = pd.read_csv('data/wilcoxon_example/Al_content.csv',\n",
" delimiter='\\t')"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Clone \n",
" August \n",
" November \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" Columbia River \n",
" 18.3 \n",
" 12.7 \n",
" \n",
" \n",
" 1 \n",
" Fritzi Pauley \n",
" 13.3 \n",
" 11.1 \n",
" \n",
" \n",
" 2 \n",
" Hazendans \n",
" 16.5 \n",
" 15.3 \n",
" \n",
" \n",
" 3 \n",
" Primo \n",
" 12.6 \n",
" 12.7 \n",
" \n",
" \n",
" 4 \n",
" Raspalje \n",
" 9.5 \n",
" 10.5 \n",
" \n",
" \n",
" 5 \n",
" Hoogvorst \n",
" 13.6 \n",
" 15.6 \n",
" \n",
" \n",
" 6 \n",
" Balsam Spire \n",
" 8.1 \n",
" 11.2 \n",
" \n",
" \n",
" 7 \n",
" Gibecq \n",
" 8.9 \n",
" 14.2 \n",
" \n",
" \n",
" 8 \n",
" Beaupre \n",
" 10.0 \n",
" 16.3 \n",
" \n",
" \n",
" 9 \n",
" Unal \n",
" 8.3 \n",
" 15.5 \n",
" \n",
" \n",
" 10 \n",
" Trichobel \n",
" 7.9 \n",
" 19.9 \n",
" \n",
" \n",
" 11 \n",
" Gaver \n",
" 8.1 \n",
" 20.4 \n",
" \n",
" \n",
" 12 \n",
" Wolterson \n",
" 13.4 \n",
" 36.8 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Clone August November\n",
"0 Columbia River 18.3 12.7\n",
"1 Fritzi Pauley 13.3 11.1\n",
"2 Hazendans 16.5 15.3\n",
"3 Primo 12.6 12.7\n",
"4 Raspalje 9.5 10.5\n",
"5 Hoogvorst 13.6 15.6\n",
"6 Balsam Spire 8.1 11.2\n",
"7 Gibecq 8.9 14.2\n",
"8 Beaupre 10.0 16.3\n",
"9 Unal 8.3 15.5\n",
"10 Trichobel 7.9 19.9\n",
"11 Gaver 8.1 20.4\n",
"12 Wolterson 13.4 36.8"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_al"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAD4CAYAAADFAawfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAKn0lEQVR4nO3cf4jk913H8dfbu0hLG2g1awlN1hURoQRN5Ih/REoNRdJGrApKA0oF4fzDQgqCRv+x9a8oWvxHxNMGo9aWQFotDf4I2FADmnoX05p4LZZyamzIJYTS3D9K0rd/7Fxyve7eTpKb3ffOPR6w3OzM92beHz63T+a+M7PV3QFgrm876AEAuDShBhhOqAGGE2qA4YQaYLijq7jTa665pre2tlZx1wBr6dSpU89298ZOt60k1FtbWzl58uQq7hpgLVXVf+52m1MfAMMJNcBwQg0wnFADDCfUAMMJNcBwS709r6rOJHk+yYtJXujuY6scCoCXvZL3Uf9odz+7skkA2JFTHwDDLfuMupP8fVV1kj/q7hMXH1BVx5McT5LNzc1XPdDWXQ+86r/7Wpy5+/YDeVyAvSz7jPqW7v6hJO9K8stV9faLD+juE919rLuPbWzs+HF1AF6FpULd3V9d/Hk2ySeT3LzKoQB42Z6hrqo3VNXV5y8n+bEkj696MAC2LXOO+i1JPllV54//y+7+25VOBcBL9gx1d38lyQ/uwywA7MDb8wCGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhOqAGGE2qA4YQaYDihBhhu6VBX1ZGq+teq+vQqBwLgm72SZ9R3Jjm9qkEA2NlSoa6q65LcnuRPVjsOABc7uuRxv5/kV5NcvdsBVXU8yfEk2dzcfM2D7betux44sMc+c/ftB/bYwHx7PqOuqh9Pcra7T13quO4+0d3HuvvYxsbGZRsQ4Eq3zKmPW5L8RFWdSfLxJLdW1V+sdCoAXrJnqLv717v7uu7eSvLeJP/Q3T+38skASOJ91ADjLftiYpKkux9K8tBKJgFgR55RAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMNyeoa6q11XV56rq81X1RFV9aD8GA2Db0SWO+d8kt3b3uaq6KsnDVfU33f3PK54NgCwR6u7uJOcW3161+OpVDgXAy5Y6R11VR6rqsSRnkzzY3Y+sdCoAXrJUqLv7xe6+Mcl1SW6uqhsuPqaqjlfVyao6+cwzz1zmMQGuXK/oXR/d/bUkDyW5bYfbTnT3se4+trGxcXmmA2Cpd31sVNWbFpdfn+SdSb644rkAWFjmXR/XJrm3qo5kO+z3dfenVzsWAOct866PLyS5aR9mAWAHPpkIMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMNyeoa6q66vqM1V1uqqeqKo792MwALYdXeKYF5L8Snc/WlVXJzlVVQ9297+veDYAssQz6u5+qrsfXVx+PsnpJG9d9WAAbFvmGfVLqmoryU1JHtnhtuNJjifJ5ubm5ZiNFdu664EDe+wzd99+YI8Nh83SLyZW1RuT3J/kA9399Ytv7+4T3X2su49tbGxczhkBrmhLhbqqrsp2pD/a3Z9Y7UgAXGiZd31Uko8kOd3dH179SABcaJln1Lck+fkkt1bVY4uvd694LgAW9nwxsbsfTlL7MAsAO/DJRIDhhBpgOKEGGE6oAYYTaoDhhBpgOKEGGE6oAYYTaoDhhBpgOKEGGE6oAYYTaoDhhBpgOKEGGE6oAYYTaoDhhBpgOKEGGE6oAYYTaoDhhBpgOKEGGE6oAYYTaoDhhBpgOKEGGE6oAYYTaoDhhBpgOKEGGE6oAYbbM9RVdU9Vna2qx/djIAC+2TLPqP80yW0rngOAXewZ6u7+bJLn9mEWAHZw9HLdUVUdT3I8STY3Ny/X3V4Rtu564KBHgLVyUD9TZ+6+fSX3e9leTOzuE919rLuPbWxsXK67BbjiedcHwHBCDTDcMm/P+1iSf0ry/VX1ZFX94urHAuC8PV9M7O479mMQAHbm1AfAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEINMJxQAwwn1ADDCTXAcEuFuqpuq6ovVdWXq+quVQ8FwMv2DHVVHUnyB0neleRtSe6oqretejAAti3zjPrmJF/u7q909/8l+XiS96x2LADOO7rEMW9N8t8XfP9kkh+++KCqOp7k+OLbc1X1pT3u95okzy4z5CFnnTuo317hJKt1JeznlbDGZAXrfI3/rr97txuWCXXtcF1/yxXdJ5KcWHaiqjrZ3ceWPf6wss71ciWs80pYY3K41rnMqY8nk1x/wffXJfnqasYB4GLLhPpfknxfVX1PVX17kvcm+dRqxwLgvD1PfXT3C1X1/iR/l+RIknu6+4nL8NhLnyY55KxzvVwJ67wS1pgconVW97ecbgZgEJ9MBBhOqAGG25dQV9U9VXW2qh6/4LrvqKoHq+o/Fn++eT9mWaVd1vnBqvqfqnps8fXug5zxtaqq66vqM1V1uqqeqKo7F9ev1X5eYp3rtp+vq6rPVdXnF+v80OL6ddvP3dZ5KPZzX85RV9Xbk5xL8mfdfcPiut9J8lx33734/SFv7u5fW/kwK7TLOj+Y5Fx3/+5Bzna5VNW1Sa7t7ker6uokp5L8ZJJfyBrt5yXW+bNZr/2sJG/o7nNVdVWSh5PcmeSns177uds6b8sh2M99eUbd3Z9N8txFV78nyb2Ly/dm+4fgUNtlnWulu5/q7kcXl59Pcjrbn15dq/28xDrXSm87t/j2qsVXZ/32c7d1HgoHeY76Ld39VLL9Q5Hkuw5wllV7f1V9YXFq5FD/F/JCVbWV5KYkj2SN9/OidSZrtp9VdaSqHktyNsmD3b2W+7nLOpNDsJ9eTFy9P0zyvUluTPJUkt870Gkuk6p6Y5L7k3ygu79+0POsyg7rXLv97O4Xu/vGbH/q+OaquuGAR1qJXdZ5KPbzIEP99OI84PnzgWcPcJaV6e6nF/9AvpHkj7P92wgPtcU5vvuTfLS7P7G4eu32c6d1ruN+ntfdX0vyULbP267dfp534ToPy34eZKg/leR9i8vvS/LXBzjLypz/x77wU0ke3+3Yw2DxosxHkpzu7g9fcNNa7edu61zD/dyoqjctLr8+yTuTfDHrt587rvOw7Od+vevjY0neke1fK/h0kt9M8ldJ7kuymeS/kvxMdx/qF+J2Wec7sv3fqk5yJskvnT/3dxhV1Y8k+cck/5bkG4urfyPb52/XZj8vsc47sl77+QPZfrHwSLafuN3X3b9VVd+Z9drP3db55zkE++kj5ADDeTERYDihBhhOqAGGE2qA4YQaYDihBhhOqAGG+38/s+3T2oZmYwAAAABJRU5ErkJggg==",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure()\n",
"plt.hist(df_al['November']);"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"SkewtestResult(statistic=3.449022139607473, pvalue=0.0005626205706886182)"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stats.skewtest(df_al['November'])"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"NormaltestResult(statistic=21.55304457655946, pvalue=2.0884103462437462e-05)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stats.normaltest(df_al['November'])"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"WilcoxonResult(statistic=16.0, pvalue=0.039794921875)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stats.wilcoxon(df_al['August'],df_al['November'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### b. Interpretation\n",
"\n",
"Under what situations are non-parametric statistics useful? What are the potential drawbacks in using non-parametric statistics when a parametric approach is justified?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3.10.6 ('data-book')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:38:29) [Clang 13.0.1 ]"
},
"vscode": {
"interpreter": {
"hash": "0ef88d3abb6b62f34a20525ce337090c4512fe8aecf32c74604482b944e1c3bd"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}