--- title: "Analyzing Contingency Tables" author: "Kevin Donovan" date: "`r format(Sys.time(), '%B %d, %Y')`" output: slidy_presentation --- ```{r setup, include=FALSE, echo=FALSE} knitr::opts_chunk$set(echo=FALSE, message = FALSE, warning = FALSE, fig.width = 8, fig.height = 4) library(tidyverse) library(shiny) library(rmarkdown) library(broom) library(gtsummary) library(flextable) library(ggpubr) library(vcd) library(effsize) library(ggfortify) library(irr) library(summarytools) ``` ```{r} ibis_data <- read_csv("../Data/Cross-sec_full.csv") ``` # Introduction - We have discussed how to do many analyses with continuous outcomes - Also discussed regression with categorical outcomes - What about analyses with **all categorical outcomes?** # Contingency Tables - **Def** - Table containing multivariate frequency distributions of variables - Can be 2+ dimensions - Ex.

# Research Questions - Examples of questions one can ask with such data 1. Are two or more categorical variables independent? 2. Are two or raters similar for a given measure (i.e. inter-rater reliability) # Testing for independence - **Def** - Let $f_x$ denote the distribution of $X$, $f_y$ distribution of $Y$ - Let $f_{xy}$ denote the *joint* distribution of $X, Y$ - Random variables $X$ and $Y$ are independent $\leftrightarrow$ $f_{xy}=f_x*f_y$ - This is equivalent to $f_{y|x}=f_y$ and $f_{x|y}=f_x$ ```{r} addmargins(table(ibis_data$SSM_ASD_v24, ibis_data$Gender)) addmargins(prop.table(table(ibis_data$SSM_ASD_v24, ibis_data$Gender))) freq(x=ibis_data$SSM_ASD_v24) ``` # Testing for independence - **Null**: $X$ and $Y$ are independent - **Alt.**: $X$ and $Y$ are dependent - Methods 1. Chi-square test 2. Fisher's Exact Test # Chi-square test - **Test statistic**: $\chi^2=\sum_{i} (O_i-E_i)^2/E_i$ - $O_i=$ observed count in cell $i$ - $E_i=$ expected count in cell $i$ *under null/independence* - Large deviation $\rightarrow$ large test statistic $\rightarrow$ low p-value - Under null: - As $n \rightarrow \infty$, $\chi^2 \rightarrow$ chi-square distribution - Degrees of freedom = $(R-1)*(C-1)$ ($R=\#$ of rows, $R=\#$ of columns) ```{r} table_xtabs <- xtabs(~SSM_ASD_v24+Gender, data=ibis_data) blah <- summary(table_xtabs) ``` # Fisher's Exact Test - What if $n$ is small? - $\rightarrow$ chi-square distribution approx is poor - Need alternative test - Exact test has no ''test statistic'', only a p-value/confidence interval - P-value is accurate no matter sample size - However, more computationally intensive - **Use chi-square test** if sample size is large ```{r} table_xtabs <- xtabs(~SSM_ASD_v24+Gender, data=ibis_data) fisher.test(table_xtabs) ``` # More then two variables - What if we have more then 2 count variables? - Ex. may want to assess association between 2 count variables, controlling for third - I.e., want to **stratify by 3rd variable**, see if you can reject null

# Mantel-Haenszel Test - **Null**: Odds Ratio($X,Y$)=1 for all values of $Z=z$ - **Alt.**: At least one $\neq 1$ - **Test statistic**: $\chi^2_{MH}$ - Under null: - As $n \rightarrow \infty$, $\chi^2 \rightarrow$ chi-square distribution - Degrees of freedom = 1 - Valid for series of 2 by 2 tables ```{r} table_xtabs <- xtabs(~SSM_ASD_v24+Gender+Study_Site, data=ibis_data) table_xtabs summary(table_xtabs) ``` # Ordinal categories - Above all assumed categories had no explicit ordering - **Def**: an *ordinal* variable is a categorical variable where categories have inherent ordering - Can inform analyses, improve statistical power - Ex. 1. ''linear-by-linear'' test 2. Cochran-Armitage test (one ordinal, one nominal) # Ordinal regression - Can also use **regression models** to analyze an ordinal outcome variable - **Recall**: Logistic Regression $$ \begin{align} &\text{logit}(\frac{p_x}{1-p_x})=\beta_0+\beta_1X_1+\ldots+\beta_pX_p \\ & p_x=\Pr(Y=1|X_1=x_1, \ldots, X_p=x_p) \end{align} $$ - Ordinal Logistic Regression $$ \begin{align} &\text{logit}\bigg(\frac{p_x(j)}{1-p_x(j)}\bigg)=\beta_{0j}+\beta_{1j}X_1+\ldots+\beta_{pj}X_p \\ & p_x(j)=\Pr(Y\leq j|X_1=x_1, \ldots, X_p=x_p) \end{align} $$ # Ordinal regression - Predictors can be of any form (continuous, categorical, etc.) - Many contingency table tests for association are equivalent to corresponding regreesion model - **Issue**: Number of parameters in model can be large (as $\beta$ function of $j$) - **Solution**: *Proportional odds model* $$ \begin{align} &\text{logit}\bigg(\frac{p_x(j)}{1-p_x(j)}\bigg)=\beta_{0j}+\beta_{1}X_1+\ldots+\beta_{p}X_p \\ & p_x(j)=\Pr(Y\leq j|X_1=x_1, \ldots, X_p=x_p) \end{align} $$ - Can test for deviation from proportion odds in R (and other software packages) # Inter-rater reliability - Suppose you have a measurement, but want to validate it - One concern: do two people grade the same data similarly using your measurement - Can visualize the raters' data using a contingency table

# Quantifying Aggreement - Common metric: *Cohen's* $\kappa$ - $P_0$ = proportion of chance agreement - $P_E$ = percentage of agreement expected by random chance - Ex. Suppose measurement can be yes or no - Say rater one says "yes" $50\%$ of the time, rate two $25\%$ of the time - Say rater one says "no" $25\%$ of the time, rate two $50\%$ of the time - Then under independence, rater would agree $0.5*0.25+0.25*0.5=25\%$ of the time - so $P_E=0.25$ - $\kappa=\frac{P_0-P_E}{1-P_0}$ ```{r, echo=FALSE} # Demo data diagnoses <- as.table(rbind( c(7, 1, 2, 3, 0), c(0, 8, 1, 1, 0), c(0, 0, 2, 0, 0), c(0, 0, 0, 1, 0), c(0, 0, 0, 0, 4) )) categories <- c("Depression", "Personality Disorder", "Schizophrenia", "Neurosis", "Other") dimnames(diagnoses) <- list(Doctor1 = categories, Doctor2 = categories) diagnoses # Compute kappa res.k <- Kappa(diagnoses) res.k # Confidence intervals confint(res.k) ``` # Quantifying Aggreement - Extensions 1. What if more then 2 raters? **Light's $\kappa$** (averages all pairwise agreement) 2. What if variables are ordinal/levels no arbitrary? **Weighted $\kappa$** - **Note**: Null is rater agree *simply by random chance (or equivalent)* - Other common measure: intraclass correlation (ICC) - `icc` function in `psy` or `irr` packages ```{r, echo=FALSE} # Load and inspect a demo data data("diagnoses", package = "irr") diagnoses # Compute Light's kappa between 6 raters kappam.light(diagnoses) ```