--- title: "The sign test" editor: markdown: wrap: 72 --- ## Packages {r} library(tidyverse) library(smmr)  smmr is new. See later how to install it. ## Duality between confidence intervals and hypothesis tests - Tests and CIs really do the same thing, if you look at them the right way. They are both telling you something about a parameter, and they use same things about data. - To illustrate, some data (two groups): {r inference-3-R-1} my_url <- "http://ritsokiguess.site/datafiles/duality.txt" twogroups <- read_delim(my_url," ")  ## The data {r inference-3-R-2} twogroups  ## 95% CI (default) for difference in means, group 1 minus group 2: {r inference-3-R-3} t.test(y ~ group, data = twogroups)  ## 90% CI {r inference-3-R-4} t.test(y ~ group, data = twogroups, conf.level = 0.90)  ## Hypothesis test Null is that difference in means is zero: {r inference-3-R-5} t.test(y ~ group, mu=0, data = twogroups)  ## Comparing results Recall null here is $H_0 : \mu_1 - \mu_2 = 0$. P-value 0.0668. - 95% CI from $-5.6$ to $0.2$, contains $0$. - 90% CI from $-5.0$ to $-0.3$, does not contain $0$. - At $\alpha = 0.05$, would not reject $H_0$ since P-value $> 0.05$. - At $\alpha = 0.10$, *would* reject $H_0$ since P-value $< 0.10$. ## Test and CI Not just coincidence. Let $C = 100(1 - \alpha)$, so C% gives corresponding CI to level-$\alpha$ test. Then following always true. (Symbol $\iff$ means "if and only if".) | Test decision | | Confidence interval | |:--------------------------|:---------------:|:--------------------------| | Reject $H_0$ at level $\alpha$ | $\iff$ | $C\%$ CI does not contain $H_0$ value | | Do not reject $H_0$ at level $\alpha$ | $\iff$ | $C\%$ CI contains $H_0$ value | Idea: "Plausible" parameter value inside CI, not rejected; "Implausible" parameter value outside CI, rejected. ## The value of this - If you have a test procedure but no corresponding CI: - you make a CI by including all the parameter values that would not be rejected by your test. - Use: - $\alpha = 0.01$ for a 99% CI, - $\alpha = 0.05$ for a 95% CI, - $\alpha = 0.10$ for a 90% CI, and so on. ## Testing for non-normal data - The IRS ("Internal Revenue Service") is the US authority that deals with taxes (like Revenue Canada). - One of their forms is supposed to take no more than 160 minutes to complete. A citizen's organization claims that it takes people longer than that on average. - Sample of 30 people; time to complete form recorded. - Read in data, and do $t$-test of $H_0 : \mu = 160$ vs. $H_a : \mu > 160$. - For reading in, there is only one column, so can pretend it is delimited by anything. ## Read in data {r inference-3-R-6, message=FALSE} my_url <- "http://ritsokiguess.site/datafiles/irs.txt" irs <- read_csv(my_url) irs  ## Test whether mean is 160 or greater {r inference-3-R-7} with(irs, t.test(Time, mu = 160, alternative = "greater"))  Reject null; mean (for all people to complete form) greater than 160. ## But, look at a graph {r inference-3-R-8, fig.height=3.5} ggplot(irs, aes(x = Time)) + geom_histogram(bins = 6)  ## Comments - Skewed to right. - Should look at *median*, not mean. ## The sign test - But how to test whether the median is greater than 160? - Idea: if the median really is 160 ($H_0$ true), the sampled values from the population are equally likely to be above or below 160. - If the population median is greater than 160, there will be a lot of sample values greater than 160, not so many less. Idea: test statistic is number of sample values greater than hypothesized median. ## Getting a P-value for sign test 1/3 - How to decide whether "unusually many" sample values are greater than 160? Need a sampling distribution. - If $H_0$ true, pop. median is 160, then each sample value independently equally likely to be above or below 160. - So number of observed values above 160 has binomial distribution with $n = 30$ (number of data values) and $p = 0.5$ (160 is hypothesized to be *median*). ## Getting P-value for sign test 2/3 - Count values above/below 160: {r inference-3-R-9} irs %>% count(Time > 160)  - 17 above, 13 below. How unusual is that? Need a *binomial table*. ## Getting P-value for sign test 3/3 - R function dbinom gives the probability of eg. exactly 17 successes in a binomial with $n = 30$ and $p = 0.5$: {r inference-3-R-10} dbinom(17, 30, 0.5)  - but we want probability of 17 *or more*, so get all of those, find probability of each, and add them up: {r inference-3-R-11} tibble(x=17:30) %>% mutate(prob=dbinom(x, 30, 0.5)) %>% summarize(total=sum(prob))  or {r} pbinom(17, 30, 0.5) # prob of <= 17  and hence (note first input): {r} pbinom(16, 30, 0.5, lower.tail = FALSE)  This last is $P(X \ge 17) = P(X > 16)$. ## Using my package smmr - I wrote a package smmr to do the sign test (and some other things). Installation is a bit fiddly: - Install devtools (once) with {r} #| eval = FALSE install.packages("devtools")  - then install smmr using devtools (once): {r inference-3-R-12} #| eval = FALSE library(devtools) install_github("nxskok/smmr")  - Then load it: {r inference-3-R-13, eval=FALSE} library(smmr)  ## smmr for sign test - smmr's function sign_test needs three inputs: a data frame, a column and a null median: {r inference-3-R-14} sign_test(irs, Time, 160)  ## Comments (1/3) - Testing whether population median *greater than* 160, so want *upper-tail* P-value 0.2923. Same as before. - Also get table of values above and below; this too as we got. ## Comments (2/3) - P-values are: | Test | P-value | |:-----|--------:| | $t$ | 0.0392 | | Sign | 0.2923 | - These are very different: we reject a mean of 160 (in favour of the mean being bigger), but clearly *fail* to reject a median of 160 in favour of a bigger one. - Why is that? Obtain mean and median: {r inference-3-R-15} irs %>% summarize(mean_time = mean(Time), median_time = median(Time))  ## Comments (3/3) {.smaller} - The mean is pulled a long way up by the right skew, and is a fair bit bigger than 160. - The median is quite close to 160. - We ought to be trusting the sign test and not the t-test here (median and not mean), and therefore there is no evidence that the "typical" time to complete the form is longer than 160 minutes. - Having said that, there are clearly some people who take a lot longer than 160 minutes to complete the form, and the IRS could focus on simplifying its form for these people. - In this example, looking at any kind of average is not really helpful; a better question might be "do an unacceptably large fraction of people take longer than (say) 300 minutes to complete the form?": that is, thinking about worst-case rather than average-case. ## Confidence interval for the median - The sign test does not naturally come with a confidence interval for the median. - So we use the "duality" between test and confidence interval to say: the (95%) confidence interval for the median contains exactly those values of the null median that would not be rejected by the two-sided sign test (at $\alpha = 0.05$). ## For our data - The procedure is to try some values for the null median and see which ones are inside and which outside our CI. - smmr has pval_sign that gets just the 2-sided P-value: {r inference-3-R-16} pval_sign(160, irs, Time)  - Try a couple of null medians: {r inference-3-R-17} pval_sign(200, irs, Time) pval_sign(300, irs, Time)  - So 200 inside the 95% CI and 300 outside. ## Doing a whole bunch - Choose our null medians first: {r inference-3-R-18} (d <- tibble(null_median=seq(100,300,20)))  ## ... and then "for each null median, run the function pval_sign for that null median and get the P-value": {r inference-3-R-19} d %>% rowwise() %>% mutate(p_value = pval_sign(null_median, irs, Time))  ## Make it easier for ourselves {r inference-3-R-20} d %>% rowwise() %>% mutate(p_value = pval_sign(null_median, irs, Time)) %>% mutate(in_out = ifelse(p_value > 0.05, "inside", "outside"))  ## confidence interval for median? - 95% CI to this accuracy from 120 to 200. - Can get it more accurately by looking more closely in intervals from 100 to 120, and from 200 to 220. ## A more efficient way: bisection - Know that top end of CI between 200 and 220: {r inference-3-R-21} lo <- 200 hi <- 220  - Try the value halfway between: is it inside or outside? {r inference-3-R-22} try <- (lo + hi) / 2 try pval_sign(try,irs,Time)  - Inside, so upper end is between 210 and 220. Repeat (over): ## ... bisection continued {r inference-3-R-23} lo <- try try <- (lo + hi) / 2 try pval_sign(try, irs, Time)  - 215 is inside too, so upper end between 215 and 220. - Continue until have as accurate a result as you want. ## Bisection automatically - A loop, but not a for since we don't know how many times we're going around. Keep going while a condition is true: {r inference-3-R-24, eval=F} lo = 200 hi = 220 while (hi - lo > 1) { try = (hi + lo) / 2 ptry = pval_sign(try, irs, Time) print(c(try, ptry)) if (ptry <= 0.05) hi = try else lo = try }  ## The output from this loop {r inference-3-R-25, echo=F} lo = 200 hi = 220 while (hi - lo > 1) { try = (hi + lo) / 2 ptry = pval_sign(try, irs, Time) print(c(try, ptry)) if (ptry <= 0.05) hi = try else lo = try }  - 215 inside, 215.625 outside. Upper end of interval to this accuracy is 215. ## Using smmr - smmr has function ci_median that does this (by default 95% CI): {r inference-3-R-26} ci_median(irs, Time)  - Uses a more accurate bisection than we did. - Or get, say, 90% CI for median: {r inference-3-R-27} ci_median(irs, Time, conf.level=0.90)  - 90% CI is shorter, as it should be. ## Bootstrap - but, was the sample size (30) big enough to overcome the skewness? - Bootstrap, again: {r inference-3-R-28, echo=FALSE} set.seed(457299)  {r inference-3-R-29} tibble(sim = 1:1000) %>% rowwise() %>% mutate(my_sample = list(sample(irs$Time, replace = TRUE))) %>% mutate(my_mean = mean(my_sample)) %>% ggplot(aes(x=my_mean)) + geom_histogram(bins=10) -> g  ## The sampling distribution {r inference-3-R-30} g  ## Comments - A little skewed to right, but not nearly as much as I was expecting. - The$t\$-test for the mean might actually be OK for these data, *if the mean is what you want*. - In actual data, mean and median very different; we chose to make inference about the median. - Thus for us it was right to use the sign test.