5.12 Chapter summary

Chi-square tests can be generally used to perform two types of tests, the Independence and Homogeneity tests. The appropriate analysis is determined based on the data collection methodology. The parametric Chi-square distribution for which these tests are named is appropriate when the expected cell counts are large enough (related to having a large enough overall sample). When the expected cell count condition is violated, the permutation approach can provide valuable inferences in these situations in most situations.

Data displays of the stacked barchart (Homogeneity) and mosaic plots (Independence) provide a visual summary of the results that can also be found in contingency tables. You should have learned how to calculate the \(X^2\) (X-squared) test statistic based on first finding the expected cell counts. Under certain assumptions, it will follow a Chi-Square distribution with \((R-1)(C-1)\) degrees of freedom. When those assumptions are not met, it is better to use a permutation approach to find p-values. Either way, the same statistic is used to test either kind of hypothesis, independence or homogeneity. After assessing evidence against the null hypothesis, it is interesting to see which cells in the table contributed to the deviations from the null hypothesis. The standardized residuals provide that information. Graphing them in a mosaic plot makes for a fun display to identify the large residuals and often allows you to better understand the results. This should tie back into the original data display (tableplot, stacked bar-chart or mosaic plot) and contingency table where you identified initial patterns and help to tell the story of the results.