In statistics, a Q–Q plot (quantile-quantile plot) is a probability plot, a graphical method for comparing two probability distributions by plotting their quantiles against each other. If the two distributions being compared are similar, the points in the Q–Q plot will approximately lie on the identity line $y = x$.
The goal of hypothesis testing is to see if there is enough evidence against the null hypothesis. If there is not enough evidence, then we fail to reject the null hypothesis
We can decrease the magnitude of errors by increasing a sample size!
The mean length of the lumber is supposed to be 8.5 feet. A builder wants to check whether the shipment of lumber she receives has a mean length different from 8.5 feet. If the builder observes that the sample mean of 61 pieces of lumber is 8.3 feet with a sample standard deviation of 1.2 feet. What will she conclude? Is 8.3 very different from 8.5?
Thus, we are asking if $-1.3$ is very far away from zero, since that corresponds to the case when $\bar{x}$ is equal to $\mu_0$. If it is far away, then it is unlikely that the null hypothesis is true and one rejects it. Otherwise, one cannot reject the null hypothesis.
$p$-value is defined to be the smallest Type I error rate ($\alpha$) that you have to be willing to tolerate if you want to reject the null hypothesis.
$p$-value (or probability value) is the probability that the test statistic equals the observed value or a more extreme value under the assumption that the null hypothesis is true.
If our p-value is less than or equal to $\alpha$, then there is enough evidence to reject the null hypothesis. If our p-value is greater than $\alpha$, there is not enough evidence to reject the null hypothesis.
import seaborn as sns
penguins = sns.load_dataset("penguins")
penguins
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | |
---|---|---|---|---|---|---|---|
0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | MALE |
1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | FEMALE |
2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | FEMALE |
3 | Adelie | Torgersen | NaN | NaN | NaN | NaN | NaN |
4 | Adelie | Torgersen | 36.7 | 19.3 | 193.0 | 3450.0 | FEMALE |
... | ... | ... | ... | ... | ... | ... | ... |
339 | Gentoo | Biscoe | NaN | NaN | NaN | NaN | NaN |
340 | Gentoo | Biscoe | 46.8 | 14.3 | 215.0 | 4850.0 | FEMALE |
341 | Gentoo | Biscoe | 50.4 | 15.7 | 222.0 | 5750.0 | MALE |
342 | Gentoo | Biscoe | 45.2 | 14.8 | 212.0 | 5200.0 | FEMALE |
343 | Gentoo | Biscoe | 49.9 | 16.1 | 213.0 | 5400.0 | MALE |
344 rows × 7 columns
import matplotlib.pyplot as plt
g = sns.pairplot(penguins, corner=True, diag_kind='kde')
g.fig.set_size_inches(4,4)
penguins.body_mass_g.dropna().plot(kind='hist')
<AxesSubplot: ylabel='Frequency'>
What kind of t-test is this?
from scipy import stats
print(stats.ttest_1samp(penguins.body_mass_g.dropna(), popmean=4201))
print('Our p-value is not below alpha, thus we cannot reject the null hypothesis.')
print(f'We do not have enough statistical evidence to reject the null hypothesis.')
print('We conclude that there is not enough statistical evidence that indicates that the mean\n mass of penguins differs from 4201 g.')
Ttest_1sampResult(statistic=0.01739630065824133, pvalue=0.9861306342800811) Our p-value is not below alpha, thus we cannot reject the null hypothesis. We do not have enough statistical evidence to reject the null hypothesis. We conclude that there is not enough statistical evidence that indicates that the mean mass of penguins differs from 4201 g.
What kind of t-test is this?
from scipy import stats
print(stats.ttest_1samp(penguins.body_mass_g.dropna(), popmean=4300, alternative='less'))
print('Our p-value is below **alpha**, thus we cannot accept the null hypothesis.')
print(f'We do not have enough statistical evidence to accept the null hypothesis.')
print('We conclude that there is enough statistical evidence that indicates that the mean\n mass of penguins is less than 4300 g.')
Ttest_1sampResult(statistic=-2.265564736887436, pvalue=0.012052359189978712) Our p-value is below **alpha**, thus we cannot accept the null hypothesis. We do not have enough statistical evidence to accept the null hypothesis. We conclude that there is enough statistical evidence that indicates that the mean mass of penguins is less than 4300 g.
BIG QUESTION: Does flipper length vary depending on the sex of the penguin?
# QQPLOT
import pingouin as pg
ax = pg.qqplot(penguins.flipper_length_mm, dist='norm')
# Shapiro-Wilk test
from scipy.stats import shapiro
shapiro(penguins.flipper_length_mm.dropna()) # p-val<0.05 - departure from normality
ShapiroResult(statistic=0.9515460133552551, pvalue=3.541138271501154e-09)
# visualize difference with a plot
sns.pointplot(x = 'sex', y = 'flipper_length_mm', data = penguins)
sns.despine()
SAMPLES ARE NOT PAIRED AND NOT RELATED IN ANY WAY TO ANOTHER!
Pooled standard deviation:
$$ s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n1+n2-2}} $$penguins
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | |
---|---|---|---|---|---|---|---|
0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | MALE |
1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | FEMALE |
2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | FEMALE |
4 | Adelie | Torgersen | 36.7 | 19.3 | 193.0 | 3450.0 | FEMALE |
5 | Adelie | Torgersen | 39.3 | 20.6 | 190.0 | 3650.0 | MALE |
... | ... | ... | ... | ... | ... | ... | ... |
338 | Gentoo | Biscoe | 47.2 | 13.7 | 214.0 | 4925.0 | FEMALE |
340 | Gentoo | Biscoe | 46.8 | 14.3 | 215.0 | 4850.0 | FEMALE |
341 | Gentoo | Biscoe | 50.4 | 15.7 | 222.0 | 5750.0 | MALE |
342 | Gentoo | Biscoe | 45.2 | 14.8 | 212.0 | 5200.0 | FEMALE |
343 | Gentoo | Biscoe | 49.9 | 16.1 | 213.0 | 5400.0 | MALE |
333 rows × 7 columns
from pingouin import ttest, mwu
penguins.dropna(inplace=True) # drop null values
males = penguins.loc[penguins.sex=='MALE', 'flipper_length_mm']
females = penguins.loc[penguins.sex=='FEMALE', 'flipper_length_mm']
print(males.shape, females.shape)
ttest(males, females)
(168,) (165,)
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | 4.807866 | 325.278352 | two-sided | 0.000002 | [4.22, 10.06] | 0.526244 | 5880.711 | 0.997654 |
# use Mann-Whitney non-parametric test (to account for non-normal data)
mwu(males, females, alternative='two-sided')
U-val | alternative | p-val | RBC | CLES | |
---|---|---|---|---|---|
MWU | 18173.0 | two-sided | 9.011341e-07 | -0.311183 | 0.655592 |
from scipy.stats import bartlett
stat, p = bartlett(males, females)
print(f'the p-value: {p}')
print('populations have equal variances at alpha=0.05')
the p-value: 0.05197230787956047 populations have equal variances at alpha=0.05
Analysis of Variance
Under the null, the $F$ should be close to 1, if the ratio is large, we have evidence against teh null.
where $G$ is the number of groups, $N$ is the number of observations (penguins), Y_{ik} is the total body mass of k-group (species) of an $i$th individual
penguins.groupby('species')['body_mass_g'].agg(['mean', 'std', 'size'])
mean | std | size | |
---|---|---|---|
species | |||
Adelie | 3706.164384 | 458.620135 | 146 |
Chinstrap | 3733.088235 | 384.335081 | 68 |
Gentoo | 5092.436975 | 501.476154 | 119 |
sns.boxplot(data=penguins, x='species', y='body_mass_g', hue='sex')
<AxesSubplot: xlabel='species', ylabel='body_mass_g'>
#ANOVA
import pingouin as pg
penguins.anova(dv='body_mass_g', between='species', detailed=False)
C:\Users\barguzin\Anaconda3\envs\geo_env\lib\site-packages\pingouin\parametric.py:992: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object. To preserve the previous behavior, use >>> .groupby(..., group_keys=False) To adopt the future behavior and silence this warning, use >>> .groupby(..., group_keys=True) sserror = grp.apply(lambda x: (x - x.mean()) ** 2).sum()
Source | ddof1 | ddof2 | F | p-unc | np2 | |
---|---|---|---|---|---|---|
0 | species | 2 | 330 | 341.894895 | 3.744505e-81 | 0.674489 |
# ANOVA VIA SCIPY
import scipy.stats as stats
# stats f_oneway functions takes the groups as input and returns ANOVA F and p value
fvalue, pvalue = stats.f_oneway(penguins.loc[penguins.species=='Adelie', 'body_mass_g'],
penguins.loc[penguins.species=='Chinstrap', 'body_mass_g'],
penguins.loc[penguins.species=='Gentoo', 'body_mass_g'])
print(fvalue, pvalue)
341.8948949481461 3.74450512630046e-81
import statsmodels.api as sm
from statsmodels.formula.api import ols
model = ols('body_mass_g ~ C(species)', data=penguins).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
anova_table
sum_sq | df | F | PR(>F) | |
---|---|---|---|---|
C(species) | 1.451902e+08 | 2.0 | 341.894895 | 3.744505e-81 |
Residual | 7.006945e+07 | 330.0 | NaN | NaN |