{ "cells": [ { "cell_type": "markdown", "id": "57c13277-a239-452b-a202-e499e0a068b0", "metadata": {}, "source": [ "### Independent-Sample Tests\n", "#### Two-sample Test\n", "In [Individual Comparisons by Ranking Methods](https://www.jstor.org/stable/3001968#metadata_info_tab_contents), Wilcoxon considers two sprays designed to kill flying insects. A subset of the data, the percentage of flies killed in repeated trials of two treatments, is recorded below." ] }, { "cell_type": "code", "execution_count": 1, "id": "fe264441-ebf5-47f0-83fb-81b824751b2d", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "x = np.array([61, 62, 67, 63, 56, 58])\n", "y = np.array([60, 68, 59, 72, 64])" ] }, { "cell_type": "markdown", "id": "8525c260-6699-4f90-b20c-85e0d948bb3a", "metadata": {}, "source": [ "In the paper, Wilcoxon describes a test to assess whether the two samples are drawn from the same population that is now commonly described as a *nonparametric* version of the independent sample t-test - that is, a version of the t-test that does not make a normality (or any particular distributional) assumption. \n", "\n", "Given samples `x` and `y`, Wilcoxon introduces a statistic which is proportional to an empirical estimate of the probability that a random observation from the distribution underlying `x` will be less than a random observation from the distribution underlying `y`. Suppose we want to test the null hypothesis that the samples are drawn from the same distribution against the alternative that they are drawn from different distributions which tend to produce samples that give lower values of the statistic. Under certain assumptions, this can be argued as evidence that the location of the distribution underlying `x` is less than the location of the distribution underlying `y`. To perform this test, we pass the data into [`scipy.stats.mannwhitneyu`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html) with `alternative='less'`." ] }, { "cell_type": "code", "execution_count": 2, "id": "9fc1c550-67a2-4483-b760-e119b16e6b2d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.16450216450216448" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from scipy import stats\n", "_, pvalue = stats.mannwhitneyu(x, y, alternative='less')\n", "pvalue # p-value is greater than our threshold; test is inconclusive" ] }, { "cell_type": "markdown", "id": "4b709400-3601-4ce5-9a36-538498abab45", "metadata": {}, "source": [ "Like the mean comparison test in Efron's example from the previous tutorial on [Permutation Tests](https://nbviewer.org/github/scipy/scipy-cookbook/blob/main/ipython/ResamplingAndMonteCarloMethods/resampling_tutorial_2.ipynb), this is an example of an \"independent sample\" test of the null hypothesis that group labels (`x`, `y`) are entirely random. In fact, because `mannwhitneyu` claims to produce an exact $p$-value, we would expect `permutation_test` to return precisely the same $p$-value (using `mannwhitneyu` only to compute the statistic)." ] }, { "cell_type": "code", "execution_count": 3, "id": "f526ac5a-b0a8-4c60-a47d-324c22215431", "metadata": {}, "outputs": [], "source": [ "def statistic(x, y):\n", " # return just the Mann-Whitney U statistic\n", " return stats.mannwhitneyu(x, y, alternative='less').statistic\n", "\n", "# \"independent\" is the default `permutation type`, so we are not required to pass it here\n", "# We pass `alternative='less'` because lesser values of the statistic are more extreme\n", "res = stats.permutation_test((x, y), statistic, permutation_type='independent', alternative='less')\n", "np.testing.assert_allclose(res.pvalue, pvalue, atol=1e-15)" ] }, { "cell_type": "markdown", "id": "5c346fa8-1b34-4443-8220-c018b6d0f5a8", "metadata": {}, "source": [ "Just as with `monte_carlo_test`, vectorizing the `statistic` function can greatly improve the speed of the test." ] }, { "cell_type": "code", "execution_count": 4, "id": "4f2dae73-2bf1-4c66-89b3-ac1ff0487e4c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "272 ms ± 4.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "# Before\n", "%timeit stats.permutation_test((x, y), statistic, alternative='less')" ] }, { "cell_type": "code", "execution_count": 5, "id": "b5c8a5c2-e958-4164-bd19-0b9fffc342b5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "56.5 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "# After \n", "def statistic_vectorized(x, y, axis=0):\n", " # return just the Mann-Whitney U statistic\n", " return stats.mannwhitneyu(x, y, axis=axis, alternative='less').statistic\n", "\n", "%timeit stats.permutation_test((x, y), statistic_vectorized, alternative='less', vectorized=True)" ] }, { "cell_type": "markdown", "id": "a751fc04-e796-4d8e-bf5a-3c47da9e4323", "metadata": {}, "source": [ "Although `mannwhitneyu` provides an exact $p$-value for the data above, `permutation_test` comes in handy when there are ties in the samples. As the [`mannwhitneyu` documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html) states,\n", "> `'exact'`: computes the exact p-value by comparing the observed statistic against the exact distribution of the statistic under the null hypothesis. **No correction is made for ties.**\n", "\n", "The complete data set in Wilcoxon's original paper had ties." ] }, { "cell_type": "code", "execution_count": 6, "id": "ccadcdf2-1341-48c6-bcc3-92be7b80b725", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.014763014763014764 0.01351981351981352\n" ] } ], "source": [ "x = [60, 67, 61, 62, 67, 63, 56, 58]\n", "y = [68, 68, 59, 72, 64, 67, 70, 74]\n", "res1 = stats.mannwhitneyu(x, y, method='exact', alternative='two-sided')\n", "# By default, only 9,999 random permutations are used. \n", "# We pass n_resamples=np.inf to ensure that all 12,870 possible permutations are used\n", "res2 = stats.permutation_test((x, y), statistic_vectorized, alternative='two-sided', vectorized=True, n_resamples=np.inf)\n", "print(res1.pvalue, res2.pvalue)" ] }, { "cell_type": "markdown", "id": "7c85b266-314c-490f-9a22-a5c1e1400496", "metadata": {}, "source": [ "The two $p$-values are similar despite the ties, but only `permutation_test` is truly \"exact\" in this case. Either way, our 1% threshold for statistical significance is not met, and the test is inconclusive." ] }, { "cell_type": "markdown", "id": "6cfb7ac5-6ec6-43a4-ae3b-f5ef32050f17", "metadata": {}, "source": [ "#### Multi-sample Test\n", "`scipy.stats.kruskal` is a many-sample extension of the Mann-Whitney U test, but SciPy provides only an approximate (asymptotic) $p$-value. It is possible to perform an exact version of the test using `permutation_test` for very small samples, and a randomized test using a subset of the possible permutations may yield more accurate results than the approximation implemented by `kruskal`, especially if there are ties or the sample size is small. Using the (artificial) data for milk cap production from [Kruskal and Wallis' original paper](https://www.tandfonline.com/doi/abs/10.1080/01621459.1952.10483441), we have:" ] }, { "cell_type": "code", "execution_count": 7, "id": "d8879fa7-9576-4d84-acfb-85962ebb0b02", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "KruskalResult(statistic=5.656410256410254, pvalue=0.059118869289796136)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = [340, 345, 330, 342, 338]\n", "y = [339, 333, 344]\n", "z = [347, 343, 349, 355]\n", "stats.kruskal(x, y, z)" ] }, { "cell_type": "markdown", "id": "246840a9-f7fe-4af6-942e-ad97540e5730", "metadata": {}, "source": [ "At the expense of some time, the exact p-value for this data is given by `permutation_test`." ] }, { "cell_type": "code", "execution_count": 8, "id": "5b274945-72d2-4597-a2dd-e9d7efbe293e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.048629148629148626" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def statistic(x, y, z, axis=0):\n", " return stats.kruskal(x, y, z, axis=axis).statistic\n", "\n", "res = stats.permutation_test((x, y, z), statistic, vectorized=True, alternative='greater', n_resamples=np.inf)\n", "res.pvalue" ] }, { "cell_type": "markdown", "id": "fc877591-9f75-4990-a035-a6a48b4e4597", "metadata": {}, "source": [ "Note that we passed `alternative='greater'` into `permutation_test` but not into `kruskal`. This is because the `kruskal` test is inherently one-sided: data generated under the null hypothesis tends to generate small positive values of the statistic with greater values always being more exceptional. This raises the point that setting up a permutation test requires some study of both the underlying statistic and SciPy's implementation. Another example of this is shown in the next section." ] }, { "cell_type": "markdown", "id": "2eb247ee-5dc9-4951-9124-11bd3a88c510", "metadata": {}, "source": [ "#### Gotchas\n", "Suppose that we wish to perform the two-sample Kolmogorov-Smirnov test to test the null hypothesis that two samples were drawn from the same distribution against the alternative that the distribution $X$ underlying sample `x` is [stochastically greater](https://en.wikipedia.org/wiki/Stochastic_ordering) than the distribution $Y$ underlying sample `y`. Roughly speaking, this is the alternative that $X$ \"tends to be\" greater than $Y$.\n", "\n", "Here, we'll use randomly-generated data that best illustrates some confusing (but important) points. We choose shapes of the samples to generate the `RuntimeWarning` reported in [gh-14019](https://github.com/scipy/scipy/issues/14019)." ] }, { "cell_type": "code", "execution_count": 9, "id": "08329d40-cd24-482b-9a40-24e2ed90b23e", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from scipy import stats\n", "import matplotlib.pyplot as plt\n", "\n", "# Indeed, the distribution $X$ is stochastically greater the distribution $y$\n", "X = stats.norm(loc=+0.2)\n", "Y = stats.norm(loc=0)\n", "x = X.rvs(size=801)\n", "y = Y.rvs(size=399)\n", "\n", "grid = np.linspace(-4, 4, 100)\n", "plt.plot(grid, X.pdf(grid), 'C0')\n", "plt.plot(grid, Y.pdf(grid), 'C1')\n", "plt.hist(x, density=True, color='C0', bins=30, alpha=0.5)\n", "plt.hist(y, density=True, color='C1', bins=30, alpha=0.5)\n", "plt.title('Distribution PDFs and Sample Histograms')\n", "plt.legend(['x', 'y'])\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "d10f5fe3-43f8-46e9-8347-01b926170eaa", "metadata": {}, "source": [ "Our first difficulty is determining the correct value of `alternative` to pass into `ks_2samp`. From its [documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html), we see that the alternatives are expressed not in terms of the values of the samples or the location of the underlying distributions, but in terms of _cumulative density functions_ of the underlying distributions. \n", "\n", "> - `two-sided`: The null hypothesis is that the two distributions are identical, $F(x)=G(x)$ for all $x$; the alternative is that they are not identical.\n", "> - `less`: The null hypothesis is that $F(x) >= G(x)$ for all $x$; the alternative is that $F(x) < G(x)$ for at least one $x$.\n", "> - `greater`: The null hypothesis is that $F(x) <= G(x)$ for all $x$; the alternative is that $F(x) > G(x)$ for at least one $x$.\n", "\n", "Note that if a distribution $X$ tends to be greater than $Y$, we find that the cumulative distribution function of $X$ lies _below_ the cumulative distribution function of $Y$." ] }, { "cell_type": "code", "execution_count": 10, "id": "5ebe1d2f-f067-46f1-a415-986b19b933be", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(grid, X.cdf(grid), 'C0')\n", "plt.plot(grid, Y.cdf(grid), 'C1')\n", "plt.title('Distribution CDFs')\n", "plt.legend(['x', 'y'])\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "c44e1bfe-8ebe-41d1-b96d-90f970cb8311", "metadata": {}, "source": [ "Therefore, to test the alternative that $X$ is stochastically greater than $Y$, we pass `alternative='less'` into `ks_2samp`." ] }, { "cell_type": "code", "execution_count": 11, "id": "97815b21-eb3e-401e-9cff-1da838141a70", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "KstestResult(statistic=0.12484394506866417, pvalue=0.00022195733373729093)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\matth\\AppData\\Local\\Temp\\ipykernel_35644\\3756467023.py:1: RuntimeWarning: ks_2samp: Exact calculation unsuccessful. Switching to method=asymp.\n", " res1 = stats.ks_2samp(x, y, alternative='less', method='exact')\n" ] } ], "source": [ "res1 = stats.ks_2samp(x, y, alternative='less', method='exact')\n", "print(res1)" ] }, { "cell_type": "markdown", "id": "33065824-d0c3-4ad6-9c08-296a160636fd", "metadata": {}, "source": [ "The $p$-value is tiny, confirming what we already know: the data are inconsistent with the null hypothesis, and we have evidence to reject it in favor of the alternative.\n", "\n", "The warning states that `ks_2samp` was unable to compute an exact $p$-value, and an asymptotic $p$-value is being returned instead. To determine whether the asymptotic $p$-value is accurate for these sample sizes, we can perform a permutation test." ] }, { "cell_type": "code", "execution_count": 12, "id": "649978b2-02ff-42a6-90d1-e04b3051c31c", "metadata": {}, "outputs": [], "source": [ "def statistic(x, y):\n", " return stats.ks_2samp(x, y, alternative='less').statistic\n", "\n", "# This would be extremely slow!\n", "# res2 = stats.permutation_test((x, y), statistic, alternative='greater')" ] }, { "cell_type": "markdown", "id": "69c52b5c-f7aa-4eaf-b2ad-9e53723aa9c7", "metadata": {}, "source": [ "The calculation above would be extremely slow to run. Unfortunately, `ks_2samp` does not accept an `axis` argument, so we can't speed it up using vectorization without truly implementing the statistic ourselves. However, lack of vectorization is not be the bottleneck here. Note that the call to `ks_2samp` is quite slow with the default parameters *even for 1D inputs*." ] }, { "cell_type": "code", "execution_count": 13, "id": "4825ddb9-5bf9-445f-bb82-031d3b330fc8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "374 ms ± 2.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "# No need for the warning; we know the exact calculation is unsuccessful\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "%timeit stats.ks_2samp(x, y, alternative='less', method='exact')" ] }, { "cell_type": "markdown", "id": "f728c880-bbde-4003-87a6-c4090925088d", "metadata": {}, "source": [ "By default, `permutation_test` needs to call `ks_2samp` 9999 times, which would take about an hour. We can speed this up dramatically by noting `permutation_test` only uses `ks_2samp` to compute the test statistic, so the `pvalue` attribute of the `ks_2samp` result object is not used at all. We can use `ks_2samp` to compute essentially the same value of the test statistic, but much faster, by specifying `method='asymp'`." ] }, { "cell_type": "code", "execution_count": 14, "id": "abfe9b45-4e7b-4284-bd1b-2620eced89ff", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "167 µs ± 947 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" ] } ], "source": [ "# method='asymp' and method='exact' result in the same statistic value\n", "res1 = stats.ks_2samp(x, y, alternative='less', method='asymp')\n", "res2 = stats.ks_2samp(x, y, alternative='less', method='exact')\n", "np.testing.assert_allclose(res1.statistic, res2.statistic, atol=1e-15)\n", "\n", "# but method='asymp' is much faster\n", "%timeit stats.ks_2samp(x, y, alternative='less', method='asymp')" ] }, { "cell_type": "markdown", "id": "bafdc39b-6132-4121-b7b8-6853603b129a", "metadata": {}, "source": [ "Now we can run a randomized `permutation_test` in reasonable time. " ] }, { "cell_type": "code", "execution_count": 15, "id": "c0c64b0b-31c5-4586-b696-36f03642040b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0002219573337372917 0.9999\n" ] } ], "source": [ "def statistic(x, y):\n", " return stats.ks_2samp(x, y, alternative='less', method='asymp').statistic\n", "\n", "res3 = stats.permutation_test((x, y), statistic, alternative='less')\n", "print(res1.pvalue, res3.pvalue)" ] }, { "cell_type": "markdown", "id": "c9cb3241-fea3-4b3d-90af-a60a7852ffbd", "metadata": {}, "source": [ "This was much faster, but something is still wrong. Either the approximate $p$-value is wildly inaccurate, or we have set up our test incorrectly. The latter turns out to be the case: the value of `alternative` passed into `ks_2samp` changes *the definition of the test statistic*, but a *greater* statistic is always considered more extreme. Therefore, even if we wish to perform a test equivalent to `ks_2samp` with `alternative='less'`, we actually need to pass `alternative='greater'` into `permutation_test`!" ] }, { "cell_type": "code", "execution_count": 16, "id": "3fc268c6-db67-4bee-9c3e-b50ea2896e98", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0002219573337372917 0.0005\n" ] } ], "source": [ "# greater values of the statistic returned by `ks_2samp` are more extreme\n", "res4= stats.permutation_test((x, y), statistic, alternative='greater')\n", "print(res1.pvalue, res4.pvalue)" ] }, { "cell_type": "markdown", "id": "9b06fd03-ca61-498c-9542-5dabc43bcddd", "metadata": {}, "source": [ "At last, `permutation_test` is invoked correctly. Indeed, the asymptotic $p$-value produced by `ks_2samp` appears to be reliable for these sample sizes." ] }, { "cell_type": "markdown", "id": "a8013eef-9aba-475d-b83a-ffbbbe311e8d", "metadata": {}, "source": [ "### Other Tests\n", "As we can see, `permutation_test` with `permutation_type='independent'` is a versatile tool for comparing independent samples. Provided only data and a statistic, it can produce the null distribution and replicate the $p$-value of many such tests in SciPy, and it may be more accurate than these existing implementations, especially for small samples and when there are ties:\n", "\n", "- [`ttest_ind`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html)\n", "- [`cramervonmises_2samp`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.cramervonmises_2samp.html)\n", "- [`ks_2samp`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html)\n", "- [`epps_singleton_2samp`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.epps_singleton_2samp.html)\n", "- [`mannwhitneyu`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html)\n", "- [`kruskal`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html)\n", "- [`friedmanchisquare`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.friedmanchisquare.html)\n", "- [`brunnermunzel`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.brunnermunzel.html)\n", "- [`ansari`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ansari.html)\n", "- [`bartlett`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.bartlett.html)\n", "- [`levene`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.levene.html)\n", "- [`anderson_ksamp`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson_ksamp.html)\n", "- [`fligner`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.fligner.html)\n", "- [`median_test`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.median_test.html)\n", "- [`mood`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mood.html)\n", "\n", "In addition, `permutation_test` with `permutation_type='independent'` can be used to perform tests not yet implemented in SciPy.\n", "\n", "However, there are other types of permutation tests that do not assume that the samples are entirely independent. We continue the study of `permutation_test` with [Paired-Sample Tests](https://nbviewer.org/github/scipy/scipy-cookbook/blob/main/ipython/ResamplingAndMonteCarloMethods/resampling_tutorial_2b.ipynb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.5" } }, "nbformat": 4, "nbformat_minor": 5 }