---
title: "The sign test"
editor: 
  markdown: 
    wrap: 72
---

## Packages

```{r}
library(tidyverse)
library(smmr)
```

`smmr` is new. See later how to install it.

## Duality between confidence intervals and hypothesis tests

-   Tests and CIs really do the same thing, if you look at them the
    right way. They are both telling you something about a parameter,
    and they use same things about data.
-   To illustrate, some data (two groups):

```{r inference-3-R-1}
my_url <- "http://ritsokiguess.site/datafiles/duality.txt"
twogroups <- read_delim(my_url," ")
```

## The data

```{r inference-3-R-2}
twogroups
```

## 95% CI (default)

for difference in means, group 1 minus group 2:

```{r inference-3-R-3}
t.test(y ~ group, data = twogroups)
```

## 90% CI

```{r inference-3-R-4}
t.test(y ~ group, data = twogroups, conf.level = 0.90)
```

## Hypothesis test

Null is that difference in means is zero:

```{r inference-3-R-5}
t.test(y ~ group, mu=0, data = twogroups)
```

## Comparing results

Recall null here is $H_0 : \mu_1 - \mu_2 = 0$. P-value 0.0668.

-   95% CI from $-5.6$ to $0.2$, contains $0$.
-   90% CI from $-5.0$ to $-0.3$, does not contain $0$.
-   At $\alpha = 0.05$, would not reject $H_0$ since P-value $> 0.05$.
-   At $\alpha = 0.10$, *would* reject $H_0$ since P-value $< 0.10$.

## Test and CI

Not just coincidence. Let $C = 100(1 - \alpha)$, so C% gives
corresponding CI to level-$\alpha$ test. Then following always true.
(Symbol $\iff$ means "if and only if".)

| Test decision                         |        | Confidence interval                   |
|:--------------------------|:---------------:|:--------------------------|
| Reject $H_0$ at level $\alpha$        | $\iff$ | $C\%$ CI does not contain $H_0$ value |
| Do not reject $H_0$ at level $\alpha$ | $\iff$ | $C\%$ CI contains $H_0$ value         |

Idea: "Plausible" parameter value inside CI, not rejected; "Implausible"
parameter value outside CI, rejected.

## The value of this

-   If you have a test procedure but no corresponding CI:
-   you make a CI by including all the parameter values that would not
    be rejected by your test.
-   Use:
    -   $\alpha = 0.01$ for a 99% CI,
    -   $\alpha = 0.05$ for a 95% CI,
    -   $\alpha = 0.10$ for a 90% CI, and so on.

## Testing for non-normal data

-   The IRS ("Internal Revenue Service") is the US authority that deals
    with taxes (like Revenue Canada).
-   One of their forms is supposed to take no more than 160 minutes to
    complete. A citizen's organization claims that it takes people
    longer than that on average.
-   Sample of 30 people; time to complete form recorded.
-   Read in data, and do $t$-test of $H_0 : \mu = 160$ vs.
    $H_a : \mu > 160$.
-   For reading in, there is only one column, so can pretend it is
    delimited by anything.

## Read in data

```{r inference-3-R-6, message=FALSE}
my_url <- "http://ritsokiguess.site/datafiles/irs.txt"
irs <- read_csv(my_url)
irs
```

## Test whether mean is 160 or greater

```{r inference-3-R-7}
with(irs, t.test(Time, mu = 160, 
                 alternative = "greater"))
```

Reject null; mean (for all people to complete form) greater than 160.

## But, look at a graph

```{r inference-3-R-8, fig.height=3.5}
ggplot(irs, aes(x = Time)) + geom_histogram(bins = 6)
```

## Comments

-   Skewed to right.
-   Should look at *median*, not mean.

## The sign test

-   But how to test whether the median is greater than 160?
-   Idea: if the median really is 160 ($H_0$ true), the sampled values
    from the population are equally likely to be above or below 160.
-   If the population median is greater than 160, there will be a lot of
    sample values greater than 160, not so many less. Idea: test
    statistic is number of sample values greater than hypothesized
    median.

## Getting a P-value for sign test 1/3

-   How to decide whether "unusually many" sample values are greater
    than 160? Need a sampling distribution.
-   If $H_0$ true, pop. median is 160, then each sample value
    independently equally likely to be above or below 160.
-   So number of observed values above 160 has binomial distribution
    with $n = 30$ (number of data values) and $p = 0.5$ (160 is
    hypothesized to be *median*).

## Getting P-value for sign test 2/3

-   Count values above/below 160:

```{r inference-3-R-9}
irs %>% count(Time > 160)
```

-   17 above, 13 below. How unusual is that? Need a *binomial table*.

## Getting P-value for sign test 3/3

-   R function `dbinom` gives the probability of eg. exactly 17
    successes in a binomial with $n = 30$ and $p = 0.5$:

```{r inference-3-R-10}
dbinom(17, 30, 0.5)
```

-   but we want probability of 17 *or more*, so get all of those, find
    probability of each, and add them up:

```{r inference-3-R-11}
tibble(x=17:30) %>% 
  mutate(prob=dbinom(x, 30, 0.5)) %>% 
  summarize(total=sum(prob))
```

or

```{r}
pbinom(17, 30, 0.5) # prob of <= 17
```

and hence (note first input):

```{r}
pbinom(16, 30, 0.5, lower.tail = FALSE)
```

This last is $P(X \ge 17) = P(X > 16)$.

## Using my package `smmr`

-   I wrote a package `smmr` to do the sign test (and some other
    things). Installation is a bit fiddly:
    -   Install devtools (once) with

```{r}
#| eval = FALSE
install.packages("devtools")
```

-   then install `smmr` using `devtools` (once):

```{r inference-3-R-12}
#| eval = FALSE 
library(devtools)
install_github("nxskok/smmr")
```

-   Then load it:

```{r inference-3-R-13, eval=FALSE}
library(smmr)
```

## `smmr` for sign test

-   `smmr`'s function `sign_test` needs three inputs: a data frame, a
    column and a null median:

```{r inference-3-R-14}
sign_test(irs, Time, 160)
```

## Comments (1/3)

-   Testing whether population median *greater than* 160, so want
    *upper-tail* P-value 0.2923. Same as before.
-   Also get table of values above and below; this too as we got.

## Comments (2/3)

-   P-values are:

| Test | P-value |
|:-----|--------:|
| $t$  |  0.0392 |
| Sign |  0.2923 |

-   These are very different: we reject a mean of 160 (in favour of the
    mean being bigger), but clearly *fail* to reject a median of 160 in
    favour of a bigger one.
-   Why is that? Obtain mean and median:

```{r inference-3-R-15}
irs %>% summarize(mean_time = mean(Time), 
                  median_time = median(Time))
```

## Comments (3/3) {.smaller}

-   The mean is pulled a long way up by the right skew, and is a fair
    bit bigger than 160.
-   The median is quite close to 160.
-   We ought to be trusting the sign test and not the t-test here
    (median and not mean), and therefore there is no evidence that the
    "typical" time to complete the form is longer than 160 minutes.
-   Having said that, there are clearly some people who take a lot
    longer than 160 minutes to complete the form, and the IRS could
    focus on simplifying its form for these people.
-   In this example, looking at any kind of average is not really
    helpful; a better question might be "do an unacceptably large
    fraction of people take longer than (say) 300 minutes to complete
    the form?": that is, thinking about worst-case rather than
    average-case.

## Confidence interval for the median

-   The sign test does not naturally come with a confidence interval for
    the median.
-   So we use the "duality" between test and confidence interval to say:
    the (95%) confidence interval for the median contains exactly those
    values of the null median that would not be rejected by the
    two-sided sign test (at $\alpha = 0.05$).

## For our data

-   The procedure is to try some values for the null median and see
    which ones are inside and which outside our CI.
-   smmr has pval_sign that gets just the 2-sided P-value:

```{r inference-3-R-16}
pval_sign(160, irs, Time)
```

-   Try a couple of null medians:

```{r inference-3-R-17}
pval_sign(200, irs, Time)
pval_sign(300, irs, Time)
```

-   So 200 inside the 95% CI and 300 outside.

## Doing a whole bunch

-   Choose our null medians first:

```{r inference-3-R-18}
(d <- tibble(null_median=seq(100,300,20)))
```

## ... and then

"for each null median, run the function `pval_sign` for that null median
and get the P-value":

```{r inference-3-R-19}
d %>% rowwise() %>% 
  mutate(p_value = pval_sign(null_median, irs, Time))
```

## Make it easier for ourselves

```{r inference-3-R-20}
d %>% rowwise() %>% 
  mutate(p_value = pval_sign(null_median, irs, Time)) %>% 
  mutate(in_out = ifelse(p_value > 0.05, "inside", "outside"))
```

## confidence interval for median?

-   95% CI to this accuracy from 120 to 200.
-   Can get it more accurately by looking more closely in intervals from
    100 to 120, and from 200 to 220.

## A more efficient way: bisection

-   Know that top end of CI between 200 and 220:

```{r inference-3-R-21}
lo <- 200 
hi <- 220
```

-   Try the value halfway between: is it inside or outside?

```{r inference-3-R-22}
try <- (lo + hi) / 2
try
pval_sign(try,irs,Time)
```

-   Inside, so upper end is between 210 and 220. Repeat (over):

## ... bisection continued

```{r inference-3-R-23}
lo <- try
try <- (lo + hi) / 2
try
pval_sign(try, irs, Time)
```

-   215 is inside too, so upper end between 215 and 220.
-   Continue until have as accurate a result as you want.

## Bisection automatically

-   A loop, but not a `for` since we don't know how many times we're
    going around. Keep going `while` a condition is true:

```{r inference-3-R-24, eval=F}
lo = 200
hi = 220
while (hi - lo > 1) {
  try = (hi + lo) / 2
  ptry = pval_sign(try, irs, Time)
  print(c(try, ptry))
  if (ptry <= 0.05)
    hi = try
  else
    lo = try
}
```

## The output from this loop

```{r inference-3-R-25, echo=F}
lo = 200
hi = 220
while (hi - lo > 1) {
  try = (hi + lo) / 2
  ptry = pval_sign(try, irs, Time)
  print(c(try, ptry))
  if (ptry <= 0.05)
    hi = try
  else
    lo = try
}
```

-   215 inside, 215.625 outside. Upper end of interval to this accuracy
    is 215.

## Using smmr

-   `smmr` has function `ci_median` that does this (by default 95% CI):

```{r inference-3-R-26}
ci_median(irs, Time)
```

-   Uses a more accurate bisection than we did.
-   Or get, say, 90% CI for median:

```{r inference-3-R-27}
ci_median(irs, Time, conf.level=0.90)
```

-   90% CI is shorter, as it should be.

## Bootstrap

-   but, was the sample size (30) big enough to overcome the skewness?
-   Bootstrap, again:

```{r inference-3-R-28, echo=FALSE}
set.seed(457299)
```

```{r inference-3-R-29}
tibble(sim = 1:1000) %>% 
  rowwise() %>% 
  mutate(my_sample = list(sample(irs$Time, replace = TRUE))) %>% 
  mutate(my_mean = mean(my_sample)) %>% 
  ggplot(aes(x=my_mean)) + geom_histogram(bins=10) -> g
```

## With normal quantile plot (see after that section)

```{r inference-3-R-29xxx}
tibble(sim = 1:10000) %>% 
  rowwise() %>% 
  mutate(my_sample = list(sample(irs$Time, replace = TRUE))) %>% 
  mutate(my_mean = mean(my_sample)) %>% 
  ggplot(aes(sample = my_mean)) + stat_qq() + stat_qq_line()

```


## The sampling distribution

```{r inference-3-R-30}
g
```

## Comments

-   A little skewed to right, but not nearly as much as I was expecting.
-   The $t$-test for the mean might actually be OK for these data, *if
    the mean is what you want*.
-   In actual data, mean and median very different; we chose to make
    inference about the median.
-   Thus for us it was right to use the sign test.