{ "cells": [ { "cell_type": "markdown", "id": "cfae45f6", "metadata": {}, "source": [ "# Fisher検定とPearsonのχ²検定のどちらを使うべきか\n", "\n", "* 黒木玄\n", "* 2022-08-13" ] }, { "cell_type": "markdown", "id": "5581a2e0", "metadata": {}, "source": [ "誤り❌と正解⭕:\n", "\n", "❌ Fisher検定は正確な検定である.
\n", "⭕ Fisher検定のP値には過剰に大きくなりすぎる傾向がある.\n", "\n", "❌ χ²検定はFisher検定の近似なので, Fisher検定よりも正確でないのは当然である.
\n", "⭕ χ²検定はFisher検定とは違ってconditional testではない.
\n", "⭕ χ²検定の性質をFisher検定の近似とみなして理解しようとすることは誤りである.
\n", "⭕ χ²検定の方が第一種の過誤の確率が名目有意水準に近くなる場合が多い.\n", "\n", "❌ 小さな値のセルを持つ分割表でχ²検定を使ってはいけない.
\n", "⭕ 実際には多くの場合にはχ²検定を使っても大丈夫である.
\n", "⭕ 小さな値のセルを持つ分割表でFisher検定を使うと検出力が低くなってしまう.\n", "\n", "❌ 小さな値のセルを持つ分割表でχ²検定を使う場合にはYatesの連続性補正を使うべきである.
\n", "⭕ χ²検定においてはいかなる場合もYatesの連続性補正は使うべきではない.
\n", "⭕ Yatesの連続性補正されたχ²検定では, Fisher検定よりもP値がさらに過剰に大きめになる傾向がある.
\n", "⭕ 第一種の過誤の確率を確実に有意水準以下に押さえたければ, Yatesの連続補正されたχ²検定ではなく, Fisher検定の方を使うべきである. Yatesの連続補正されたχ²検定を使うことにメリットがある場面はないように思われる.\n", "\n", "⭕ Fisher検定には, 第一種の過誤の確率を確実に名目有意水準以下にできるという利点がある.
\n", "⭕ Fisher検定には, 第一種の過誤の確率が名目有意水準よりも過剰に小さくなり易いという欠点がある.
\n", "⭕ Fisher検定の検出力はχ²検定よりも低い.\n", "\n", "⭕ χ²検定には, 第一種の過誤の確率が名目有意水準に近くなるという利点がある.
\n", "⭕ χ²検定の検出力はFisher検定よりも高い.
\n", "⭕ χ²検定には, 第一種の過誤の確率が名目有意水準よりも少し大きくなる場合があるという欠点がある.\n", "\n", "⭕ χ²検定を行うときにも, オッズ比などの信頼区間も計算しておくべきである.
\n", "⭕ Rの`chisq.test`は信頼区間を表示してくれないので不便である.
\n", "https://twitter.com/genkuroki/status/1496109997721538565
\n", "⭕ Rで`chisq.test`と整合的な信頼区間を表示したいなら, 自分でその方法を探す必要がある.
\n", "⭕ `chisq.test`と整合性がないWald型の信頼区間を表示してくれるパッケージならば容易に見つかる.\n", "\n", "⭕ Fisher検定を行うときにも, オッズ比の信頼区間も計算しておくべきである.
\n", "⭕ Rの`fisher.test`が表示するP値と信頼区間の間には整合性がない.
\n", "https://twitter.com/genkuroki/status/1496064721652690946\n", "\n", "__お勧め文献__\n", "\n", "* [https://www.jstage.jst.go.jp/article/dds/30/3/30_235/_pdf](https://www.jstage.jst.go.jp/article/dds/30/3/30_235/_pdf)
\n", "連載 第3回
\n", "医学データの統計解析の基本 2つの割合の比較
\n", "朝倉こう子・濱﨑俊光" ] }, { "cell_type": "code", "execution_count": 1, "id": "7ed40c3a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "⪅ (generic function with 1 method)" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Distributions\n", "using Random\n", "using Roots\n", "using StatsBase\n", "using StatsFuns\n", "using StatsPlots\n", "default(size=(480, 300), titlefontsize=10, tickfontsize=6, guidefontsize=9)\n", "\n", "safemul(x, y) = x == 0 ? x : isinf(x) ? typeof(x)(Inf) : x*y\n", "safediv(x, y) = x == 0 ? x : isinf(y) ? zero(y) : x/y\n", "x ⪅ y = x < y || x ≈ y" ] }, { "cell_type": "code", "execution_count": 2, "id": "1a7b60ee", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "confint_or_wald (generic function with 1 method)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "oddsratiohat(a, b, c, d) = safediv(a*d, b*c)\n", "stderr_logoddsratiohat(a, b, c, d) = √(1/a + 1/b + 1/c + 1/d)\n", "\n", "function pvalue_or_wald(a, b, c, d; ω=1)\n", " logORhat = log(oddsratiohat(a, b, c, d))\n", " SEhat_logORhat = stderr_logoddsratiohat(a, b, c, d)\n", " 2ccdf(Normal(0, 1), safediv(abs(logORhat - log(ω)), SEhat_logORhat))\n", "end\n", "\n", "function confint_or_wald(a, b, c, d; α=0.05)\n", " z = quantile(Normal(), 1-α/2)\n", " ORhat = oddsratiohat(a, b, c, d)\n", " SEhat_logORhat = stderr_logoddsratiohat(a, b, c, d)\n", " [safemul(exp(-z*SEhat_logORhat), ORhat), safemul(exp(z*SEhat_logORhat), ORhat)]\n", "end" ] }, { "cell_type": "code", "execution_count": 3, "id": "886569ed", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "confint_or_pearson_chisq (generic function with 1 method)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function delta(a, b, c, d; ω=1)\n", " A, B, C = 1-ω, a+d+ω*(b+c), a*d-ω*b*c\n", " isinf(ω) ? typeof(ω)(-min(b, c)) : safediv(2C, B + √(B^2 - 4A*C))\n", "end\n", "\n", "# correction = 0.5 は連続性補正を与える.\n", "function _chisqstat_or(a, b, c, d, δ; correction=0.0)\n", " ã, b̃, c̃, d̃ = a-δ, b+δ, c+δ, d-δ\n", " safemul(max(0, abs(δ)-correction)^2, 1/ã + 1/b̃ + 1/c̃ + 1/d̃)\n", "end\n", "\n", "function chisqstat_or(a, b, c, d; ω=1, correction=0.0)\n", " δ = delta(a, b, c, d; ω)\n", " _chisqstat_or(a, b, c, d, δ; correction)\n", "end\n", "\n", "function pvalue_or_pearson_chisq(a, b, c, d; ω=1, correction=0.0)\n", " χ² = chisqstat_or(a, b, c, d; ω, correction)\n", " ccdf(Chisq(1), χ²)\n", "end\n", "\n", "function confint_or_pearson_chisq(a, b, c, d; α=0.05, correction=0.0)\n", " (a+b==0 || c+d==0 || a+c==0 || b+d==0) && return [0, Inf]\n", " f(logω) = logit(pvalue_or_pearson_chisq(a, b, c, d; ω=exp(logω), correction)) - logit(α)\n", " ps = if a == 0 || d == 0\n", " [0, exp(find_zero(f, 0.0))]\n", " elseif b == 0 || c == 0\n", " [exp(find_zero(f, 0.0)), Inf]\n", " else\n", " ORhat = oddsratiohat(a, b, c, d)\n", " ω_L, ω_U = ORhat/2, 2ORhat\n", " [exp(find_zero(f, log(ω_L))), exp(find_zero(f, log(ω_U)))]\n", " end\n", "end" ] }, { "cell_type": "code", "execution_count": 4, "id": "660ddd85", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "confint_or_fisher_sterne (generic function with 1 method)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_pdf_le(x, (dist, y)) = pdf(dist, x) ⪅ y\n", "\n", "function _search_boundary(f, x0, Δx, param)\n", " x = x0\n", " if f(x, param)\n", " while f(x - Δx, param) x -= Δx end\n", " else\n", " x += Δx\n", " while !f(x, param) x += Δx end\n", " end\n", " x\n", "end\n", "\n", "function pvalue_fisher_sterne(dist::DiscreteUnivariateDistribution, x)\n", " Px = pdf(dist, x)\n", " Px == 0 && return Px\n", " Px == 1 && return Px\n", " m = mode(dist)\n", " Px ≈ pdf(dist, m) && return one(Px)\n", " if x < m\n", " y = _search_boundary(_pdf_le, 2m - x, 1, (dist, Px))\n", " cdf(dist, x) + ccdf(dist, y-1)\n", " else # x > m\n", " y = _search_boundary(_pdf_le, 2m - x, -1, (dist, Px))\n", " cdf(dist, y) + ccdf(dist, x-1)\n", " end\n", "end\n", "\n", "function pvalue_or_fisher_sterne(a, b, c, d; ω=1)\n", " fnch = if ω == 1\n", " Hypergeometric(a+b, c+d, a+c)\n", " else\n", " FisherNoncentralHypergeometric(a+b, c+d, a+c, ω)\n", " end\n", " pvalue_fisher_sterne(fnch, a)\n", "end\n", "\n", "function confint_or_fisher_sterne(a, b, c, d; α = 0.05)\n", " (a+b==0 || c+d==0 || a+c==0 || b+d==0) && return [0, Inf]\n", " f(logω) = logit(pvalue_or_fisher_sterne(a, b, c, d; ω=exp(logω))) - logit(α)\n", " if a == 0 || d == 0\n", " [0.0, exp(find_zero(f, 0.0))]\n", " elseif b == 0 || c == 0\n", " [exp(find_zero(f, 0.0)), Inf]\n", " else\n", " ω_L, ω_U = confint_or_wald(a, b, c, d; α = α/10)\n", " ps = exp.(find_zeros(f, log(ω_L), log(ω_U)))\n", " # 次の行は稀に区間にならない場合への対策\n", " [first(ps), last(ps)]\n", " end\n", "end" ] }, { "cell_type": "code", "execution_count": 5, "id": "10be6c85", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "plot_probalphaerror (generic function with 1 method)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function another_p(q, ω)\n", " u = q/(1 - q)\n", " p = ω*u/(1 + ω*u)\n", "end\n", "\n", "function sim(; m = 10, n = 10, q = 0.5, ω = 1, p = another_p(q, ω), L=10^5)\n", " pval_chisq = similar(zeros(), L)\n", " pval_fisher = similar(zeros(), L)\n", " bin1, bin2 = Binomial(m, p), Binomial(n, q)\n", " Threads.@threads for i in 1:L\n", " a, c = rand(bin1), rand(bin2)\n", " b, d = m - a, n - c\n", " pval_chisq[i] = pvalue_or_pearson_chisq(a, b, c, d; ω)\n", " pval_fisher[i] = pvalue_or_fisher_sterne(a, b, c, d; ω)\n", " end\n", " F = ecdf(pval_chisq)\n", " G = ecdf(pval_fisher)\n", " f(x) = F(x)\n", " g(x) = G(x)\n", " f, g\n", "end\n", "\n", "function plot_probalphaerror(; m = 10, n = 10, q = 0.5, ω = 1, L = 10^5, kwargs...)\n", " f, g = sim(; m, n, q, ω, L)\n", " \n", " xs = 0:0.001:1\n", " P1 = plot(legend=:topleft)\n", " plot!(xs, f; label=\"Pearson χ²\")\n", " plot!(xs, g; label=\"Fisher\", ls=:dash)\n", " plot!(identity; label=\"\", c=:black, ls=:dot)\n", " plot!(xtick=0:0.1:1, ytick=0:0.1:1)\n", " plot!(xguide=\"α\", yguide=\"probability of α-error\")\n", " title!(\"m=$m, n=$n, q=$q, ω=$ω\")\n", " plot!(; kwargs...)\n", " \n", " xs = 0:0.0001:0.1\n", " P2 = plot(legend=:topleft)\n", " plot!(xs, f; label=\"Pearson χ²\")\n", " plot!(xs, g; label=\"Fisher\", ls=:dash)\n", " plot!(identity; label=\"\", c=:black, ls=:dot)\n", " plot!(xtick=0:0.01:1, ytick=0:0.01:1)\n", " plot!(xguide=\"α\", yguide=\"probability of α-error\")\n", " title!(\"m=$m, n=$n, q=$q, ω=$ω\")\n", " plot!(; kwargs...)\n", " \n", " plot(P1, P2; size=(700, 350))\n", "end" ] }, { "cell_type": "code", "execution_count": 6, "id": "93ac5e66", "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_probalphaerror(m = 10, n = 10, q = 0.4, ω = 1)" ] }, { "cell_type": "code", "execution_count": 7, "id": "2f389fc3", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_probalphaerror(m = 20, n = 20, q = 0.4, ω = 1)" ] }, { "cell_type": "code", "execution_count": 8, "id": "c8582ece", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_probalphaerror(m = 40, n = 40, q = 0.4, ω = 1)" ] }, { "cell_type": "code", "execution_count": 9, "id": "321e7ffd", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_probalphaerror(m = 80, n = 80, q = 0.4, ω = 1)" ] }, { "cell_type": "code", "execution_count": 10, "id": "050eda40", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_probalphaerror(m = 20, n = 10, q = 0.2, ω = 1)" ] }, { "cell_type": "code", "execution_count": 11, "id": "55815856", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_probalphaerror(m = 40, n = 20, q = 0.2, ω = 1)" ] }, { "cell_type": "code", "execution_count": 12, "id": "3d30b61d", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_probalphaerror(m = 80, n = 40, q = 0.2, ω = 1)" ] }, { "cell_type": "code", "execution_count": 13, "id": "473a3a49", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "power (generic function with 1 method)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function mysupport(dist, c=5)\n", " μ, σ = mean(dist), std(dist)\n", " a = max(minimum(dist), round(Int, μ-c*σ))\n", " b = min(maximum(dist), round(Int, μ+c*σ))\n", " a:b\n", "end\n", "\n", "function power(pvaluefunc, m, n, q;\n", " ω = 1, p = another_p(q, ω), α = 0.05)\n", " bin1, bin2 = Binomial(m, p), Binomial(n, q)\n", " supp1, supp2 = mysupport(bin1), mysupport(bin2)\n", " sum(Iterators.product(supp1, supp2)) do (a, c)\n", " (pvaluefunc(a, m-a, c, n-c; ω) < α) * pdf(bin1, a) * pdf(bin2, c)\n", " end\n", "end" ] }, { "cell_type": "code", "execution_count": 14, "id": "17e2b5fd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "plot_powers (generic function with 1 method)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function logtick(; xlim=(0.03, 500))\n", " xmin, xmax = xlim\n", " a = floor(Int, log10(xmin))\n", " b = ceil(Int, log10(xmax))\n", " nums = [1, 2, 3, 4, 5, 6, 7, 8, 9]\n", " mask = Bool[1, 1, 0, 0, 1, 0, 0, 0, 0]\n", " \n", " logtick = foldl(vcat, ([10.0^k*x for x in nums if xmin ≤ 10.0^k*x ≤ xmax] for k in a:b))\n", " logticklabel_a = foldl(vcat,\n", " ([mask[i] ? string(round(10.0^k*x; digits=-k)) : \"\"\n", " for (i, x) in enumerate(nums) if xmin ≤ 10.0^k*x ≤ xmax]\n", " for k in a:-1))\n", " logticklabel_b = foldl(vcat,\n", " ([mask[i] ? string(10^k*x) : \"\"\n", " for (i, x) in enumerate(nums) if xmin ≤ 10.0^k*x ≤ xmax]\n", " for k in 0:b))\n", " logticklabel = vcat(logticklabel_a, logticklabel_b)\n", " (logtick, logticklabel)\n", "end\n", "\n", "function plot_powers(; m = 10, n = 10, q = 0.5, α = 0.05,\n", " ωmin = 0.01, ωmax = 100.0, kwargs...)\n", " f(ω) = power(pvalue_or_pearson_chisq, m, n, q; p=another_p(q, ω), α)\n", " g(ω) = power(pvalue_or_fisher_sterne, m, n, q; p=another_p(q, ω), α)\n", " plot(legend=:bottomright)\n", " plot!(f, ωmin, ωmax; label=\"Pearson χ²\")\n", " plot!(g, ωmin, ωmax; label=\"Fisher\", ls=:dash)\n", " plot!(xscale=:log10, xtick=logtick(xlim=(ωmin, ωmax)), ytick=0:0.05:1)\n", " plot!(xguide=\"ω\", yguide=\"power\")\n", " title!(\"m=$m, n=$n, q=$q, α=$α\")\n", " plot!(; kwargs...)\n", "end" ] }, { "cell_type": "code", "execution_count": 15, "id": "48de75fa", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_powers(m = 10, n = 10, q = 0.4, α = 0.05, ωmin = 0.001, ωmax = 300.0)" ] }, { "cell_type": "code", "execution_count": 16, "id": "63c24888", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_powers(m = 20, n = 20, q = 0.4, α = 0.05, ωmin = 0.05, ωmax = 20.0)" ] }, { "cell_type": "code", "execution_count": 17, "id": "c9daefd6", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_powers(m = 40, n = 40, q = 0.4, α = 0.05, ωmin = 0.1, ωmax = 10.0)" ] }, { "cell_type": "code", "execution_count": 18, "id": "5cac5b6e", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_powers(m = 80, n = 80, q = 0.4, α = 0.05, ωmin = 0.2, ωmax = 5.0)" ] }, { "cell_type": "code", "execution_count": 19, "id": "4412919b", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_powers(m = 160, n = 160, q = 0.4, α = 0.05, ωmin = 0.33, ωmax = 3.0)" ] }, { "cell_type": "code", "execution_count": 20, "id": "00ccbc8f", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_powers(m = 40, n = 20, q = 0.4, α = 0.05, ωmin = 0.1, ωmax = 10.0)" ] }, { "cell_type": "code", "execution_count": 21, "id": "ea1cefaf", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 40, 10, 10, 4\n", "plot_powers(m = 50, n = 14, q = 50/63, α = 0.05, ωmin = 0.1, ωmax = 100.0)" ] }, { "cell_type": "code", "execution_count": 22, "id": "0299190e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "power(pvalue_or_pearson_chisq, m, n, q; p = another_p(q, ω)) = 0.5415496553181354\n", "power(pvalue_or_fisher_sterne, m, n, q; p = another_p(q, ω)) = 0.4594987393142976\n" ] } ], "source": [ "m, n, q = 50, 14, 50/63\n", "ω = 7\n", "@show power(pvalue_or_pearson_chisq, m, n, q; p=another_p(q, ω))\n", "@show power(pvalue_or_fisher_sterne, m, n, q; p=another_p(q, ω));" ] }, { "cell_type": "code", "execution_count": 23, "id": "a0b2ef3b", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 40, 30, 20, 20\n", "plot_powers(m = 70, n = 40, q = 60/110, α = 0.05, ωmin = 0.2, ωmax = 5.0)" ] }, { "cell_type": "code", "execution_count": 24, "id": "158c0b4f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "power(pvalue_or_pearson_chisq, m, n, q; p = another_p(q, ω)) = 0.602837136107077\n", "power(pvalue_or_fisher_sterne, m, n, q; p = another_p(q, ω)) = 0.543752512625262\n" ] } ], "source": [ "m, n, q = 70, 40, 60/110\n", "ω = 2.5\n", "@show power(pvalue_or_pearson_chisq, m, n, q; p=another_p(q, ω))\n", "@show power(pvalue_or_fisher_sterne, m, n, q; p=another_p(q, ω));" ] }, { "cell_type": "code", "execution_count": null, "id": "b173d131", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "jupytext": { "formats": "ipynb,jl:hydrogen" }, "kernelspec": { "display_name": "Julia 1.7.3", "language": "julia", "name": "julia-1.7" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.7.3" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }