References

Akaike, Hirotugu. 1974. “A New Look at the Statistical Model Identification.” IEEE Transactions on Automatic Control 19: 716–23.

Allaire, JJ. 2014. Manipulate: Interactive Plots for Rstudio. https://CRAN.R-project.org/package=manipulate.

Azzalini, Adelchi, and Adrian W. Bowman. 1990. “A Look at Some Data on the Old Faithful Geyser.” Applied Statistics 39: 357–65.

Barton, Kamil. 2019. MuMIn: Multi-Model Inference. https://CRAN.R-project.org/package=MuMIn.

Benson, Roger B. J., and Philip D. Mannion. 2012. “Multi-Variate Models Are Essential for Understanding Vertebrate Diversification in Deep Time.” Biology Letters 8: 127–30. https://doi.org/10.1098/rsbl.2011.0460.

Bland, J Martin, and Douglas G Altman. 1995. “Multiple Significance Tests: The Bonferroni Method.” BMJ 310 (6973): 170. https://doi.org/10.1136/bmj.310.6973.170.

Burnham, Kenneth P., and David R. Anderson. 2002. Model Selection and Multimodel Inference. NY: Springer.

Clevenger, Anthony P, and Nigel Waltho. 2005. “Performance Indices to Identify Attributes of Highway Crossing Structures Facilitating Movement of Large Mammals.” Biological Conservation 121 (3): 453–64.

Crampton, E. 1947. “The Growth of the Odontoblast of the Incisor Teeth as a Criterion of Vitamin c Intake of the Guinea Pig.” The Journal of Nutrition 33 (5): 491–504. http://jn.nutrition.org/content/33/5/491.full.pdf.

Dayton, C. Mitchell. 1998. Latent Class Scaling Analysis. Thousand Oaks, CA: SAGE Publications.

De Veaux, Richard D., Paul F. Velleman, and David E. Bock. 2011. Stats: Data and Models, 3rd Edition. Pearson.

Dieser, Markus, Mark C. Greenwood, and Christine M. Foreman. 2010. “Carotenoid Pigmentation in Antarctic Heterotrophic Bacteria as a Strategy to Withstand Environmental Stresses.” Arctic, Antarctic, and Alpine Research 42(4): 396–405. https://doi.org/10.1657/1938-4246-42.4.396.

Diez, David M, Christopher D Barr, and Mine Cetinkaya-Rundel. 2017. Openintro: Data Sets and Supplemental Functions from ’Openintro’ Textbooks. https://CRAN.R-project.org/package=openintro.

Doolittle, Alan E., and Catherine Welch. 1989. “Gender Differences in Performance on a College-Level Acheivement Test.” ACT Research Report, 89–90.

Faraway, Julian. 2016. Faraway: Functions and Datasets for Books by Julian Faraway. https://CRAN.R-project.org/package=faraway.

F. L. Ramsey, Original by, D. W. Schafer; modifications by Daniel W. Schafer, Jeannie Sifneos, Berwin A. Turlach; vignettes contributed by Nicholas Horton, Kate Aloisio, Ruobing Zhang, and with corrections by Randall Pruim. 2019. Sleuth2: Data Sets from Ramsey and Schafer’s "Statistical Sleuth (2nd Ed)". https://CRAN.R-project.org/package=Sleuth2.

Fox, John. 2003. “Effect Displays in R for Generalised Linear Models.” Journal of Statistical Software 8 (15): 1–27. http://www.jstatsoft.org/v08/i15/.

Fox, John, and Michael Friendly. 2018. Heplots: Visualizing Hypothesis Tests in Multivariate Linear Models. https://CRAN.R-project.org/package=heplots.

Fox, John, and Sanford Weisberg. 2011. An R-Companion to Applied Regression, Second Edition. Thousand Oaks, CA: SAGE Publications. http://socserv.socsci.mcmaster.ca/jfox/Books/Companion.

Fox, John, Sanford Weisberg, and Brad Price. 2018. CarData: Companion to Applied Regression Data Sets. https://CRAN.R-project.org/package=carData.

———. 2019. Car: Companion to Applied Regression. https://CRAN.R-project.org/package=car.

Fox, John, Sanford Weisberg, Brad Price, Michael Friendly, and Jangman Hong. 2019. Effects: Effect Displays for Linear, Generalized Linear, and Other Models. https://CRAN.R-project.org/package=effects.

Gandrud, Christopher. 2015. Reproducible Research with R and R Studio, Second Edition. Chapman Hall, CRC.

Garnier, Simon. 2018. Viridis: Default Color Maps from ’Matplotlib’. https://CRAN.R-project.org/package=viridis.

Greenwood, Mark C., Joel Harper, and Johnnie Moore. 2011. “An Application of Statistics in Climate Change: Detection of Nonlinear Changes in a Streamflow Timing Measure in the Columbia and Missouri Headwaters.” In Handbook of the Philosophy of Science, Vol. 7: Statistics, edited by P. S. Bandyopadhyay and M. Forster, 1117–42. Elsevier.

Greenwood, Mark C., and N. F. Humphrey. 2002. “Glaciated Valley Profiles: An Application of Nonlinear Regression.” Computing Science and Statistics 34: 452–60.

Gude, Patricia H., J. Anthony Cookson, Mark C. Greenwood, and Mark Haggerty. 2009. “Homes in Wildfire-Prone Areas: An Empirical Analysis of Wildfire Suppression Costs and Climate Change.” www.headwaterseconomics.org.

Gundale, Michael J., Lisbet H. Bach, and Annika Nordin. 2013. “The Impact of Simulated Chronic Nitrogen Deposition on the Biomass and N2-Fixation Activity of Two Boreal Feather Moss–Cyanobacteria Associations.” Biology Letters 9 (6). https://doi.org/10.1098/rsbl.2013.0797.

Hothorn, Torsten, Frank Bretz, and Peter Westfall. 2008. “Simultaneous Inference in General Parametric Models.” Biometrical Journal 50 (3): 346–63.

———. 2019. Multcomp: Simultaneous Inference in General Parametric Models. https://CRAN.R-project.org/package=multcomp.

Hurlbert, Stuart H. 1984. “Pseudoreplication and the Design of Ecological Field Experiments.” Ecological Monographs 54 (2): 187–211. http://www.jstor.org/stable/1942661.

Jones, Owen, Robert Maillardet, Andrew Robinson, Olga Borovkova, and Steven Carnie. 2018. SpuRs: Functions and Datasets for "Introduction to Scientific Programming and Simulation Using R". https://CRAN.R-project.org/package=spuRs.

Kampstra, Peter. 2008. “Beanplot: A Boxplot Alternative for Visual Comparison of Distributions.” Journal of Statistical Software, Code Snippets 28 (1): 1–9. http://www.jstatsoft.org/v28/c01/.

Lea, Stephen E. G., Paul Webley, and Catherine M. Walker. 1995. “Psychological Factors in Consumer Debt: Money Management, Economic Socialization, and Credit Use.” Journal of Economic Psychology 16 (4): 681–701.

Liao, Xiyue, and Mary C. Meyer. 2014. “Coneproj: An R Package for the Primal or Dual Cone Projections with Routines for Constrained Regression.” Journal of Statistical Software 61 (12): 1–22. http://www.jstatsoft.org/v61/i12/.

Likert, Rensis. 1932. “A Technique for the Measurement of Attitudes.” Archives of Psychology 140: 1–55.

Linzer, Drew, and Jeffrey Lewis. 2014. PoLCA: Polytomous Variable Latent Class Analysis. https://CRAN.R-project.org/package=poLCA.

Linzer, Drew, and Jeffrey Lewis. 2011. “PoLCA: An R Package for Polytomous Variable Latent Class Analysis.” Journal of Statistical Software 42 (10): 1–29.

Lumley, Thomas. 2019. Survey: Analysis of Complex Survey Samples. https://CRAN.R-project.org/package=survey.

Merkle, Ed, and Michael Smithson. 2018. Smdata: Data to Accompany Smithson & Merkle, 2013. https://CRAN.R-project.org/package=smdata.

Meyer, David, Achim Zeileis, and Kurt Hornik. 2017. Vcd: Visualizing Categorical Data. https://CRAN.R-project.org/package=vcd.

Meyer, Mary C., and Xiyue Liao. 2018. Coneproj: Primal or Dual Cone Projections with Routines for Constrained Regression. https://CRAN.R-project.org/package=coneproj.

Moore, Johnnie N., Joel T. Harper, and Mark C. Greenwood. 2007. “Significance of Trends Toward Earlier Snowmelt Runoff, Columbia and Missouri Basin Headwaters, Western United States.” Geophysical Research Letters 34 (16). https://doi.org/10.1029/2007GL031022.

Phillips, Nathaniel. 2017a. Yarrr: A Companion to the E-Book "Yarrr!: The Pirate’s Guide to R". https://CRAN.R-project.org/package=yarrr.

———. 2017b. Yarrr: A Companion to the E-Book "Yarrr!: The Pirate’s Guide to R". https://CRAN.R-project.org/package=yarrr.

Piepho, Hans-Peter. 2004. “An Algorithm for a Letter-Based Representation of All-Pairwise Comparisons.” Journal of Computational and Graphical Statistics 13 (2): 456–66.

Pruim, Randall, Daniel Kaplan, and Nicholas Horton. 2018. MosaicData: Project Mosaic Data Sets. https://CRAN.R-project.org/package=mosaicData.

Pruim, Randall, Daniel T. Kaplan, and Nicholas J. Horton. 2019. Mosaic: Project Mosaic Statistics and Mathematics Teaching Utilities. https://CRAN.R-project.org/package=mosaic.

Puhan, Milo A, Alex Suarez, Christian Lo Cascio, Alfred Zahn, Markus Heitz, and Otto Braendli. 2006. “Didgeridoo Playing as Alternative Treatment for Obstructive Sleep Apnoea Syndrome: Randomised Controlled Trial.” BMJ 332 (7536): 266–70. https://doi.org/10.1136/bmj.38705.470590.55.

Ramsey, Fred, and Daniel Schafer. 2012. The Statistical Sleuth: A Course in Methods of Data Analysis. Cengage Learning. https://books.google.com/books?id=eSlLjA9TwkUC.

R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Revelle, William. 2019. Psych: Procedures for Psychological, Psychometric, and Personality Research. https://CRAN.R-project.org/package=psych.

Ripley, Brian. 2019. MASS: Support Functions and Datasets for Venables and Ripley’s Mass. https://CRAN.R-project.org/package=MASS.

Robinson, Rebekah, and Homer White. 2016. Tigerstats: R Functions for Elementary Statistics. https://CRAN.R-project.org/package=tigerstats.

RStudio Team. 2018. RStudio: Integrated Development Environment for R. Boston, MA: RStudio, Inc. http://www.rstudio.com/.

Santibáñez, Pamela A., Olivia J. Maselli, Mark C. Greenwood, Mackenzie M. Grieman, Eric S. Saltzman, Joseph R. McConnell, and John C. Priscu. 2018. “Prokaryotes in the Wais Divide Ice Core Reflect Source and Transport Changes Between Last Glacial Maximum and the Early Holocene.” Global Change Biology 24 (5): 2182–97. https://doi.org/10.1111/gcb.14042.

Sasaki, Takao, and Stephen C. Pratt. 2013. “Ants Learn to Rely on More Informative Attributes During Decision-Making.” Biology Letters 9 (6). https://doi.org/10.1098/rsbl.2013.0667.

Schneck, Andreas. 2017. “Examining Publication Bias—a Simulation-Based Evaluation of Statistical Tests on Publication Bias.” PeerJ 5 (November): e4115. https://doi.org/10.7717/peerj.4115.

Smith, Michael L. 2014. “Honey Bee Sting Pain Index by Body Location.” PeerJ 2 (April): e338. https://doi.org/10.7717/peerj.338.

Tennekes, Martijn, and Edwin de Jonge. 2019. Tabplot: Tableplot, a Visualization of Large Datasets. https://CRAN.R-project.org/package=tabplot.

Vsevolozhskaya, Olga A., Dmitri V. Zaykin, Mark C. Greenwood, Changshuai Wei, and Qing Lu. 2014. “Functional Analysis of Variance for Association Studies.” PLOS ONE 9 (9): 13. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0105074.

Walker, Ian, Ian Garrard, and Felicity Jowitt. 2014. “The Influence of a Bicycle Commuter’s Appearance on Drivers’ Overtaking Proximities: An on-Road Test of Bicyclist Stereotypes, High-Visibility Clothing and Safety Aids in the United Kingdom.” Accident Analysis U+0026 Prevention 64: 69–77. https://doi.org/https://doi.org/10.1016/j.aap.2013.11.007.

Wasserstein, Ronald L., and Nicole A. Lazar. 2016. “The Asa Statement on P-Values: Context, Process, and Purpose.” The American Statistician 70 (2): 129–33. https://doi.org/10.1080/00031305.2016.1154108.

Wei, Taiyun, and Viliam Simko. 2017. Corrplot: Visualization of a Correlation Matrix. https://CRAN.R-project.org/package=corrplot.

Weisberg, Sanford. 2005. Applied Linear Regression, Third Edition. Hoboken, NJ: Wiley.

———. 2018. Alr3: Data to Accompany Applied Linear Regression 3rd Edition. https://CRAN.R-project.org/package=alr3.

Westfall, Peter H., and S. Stanley Young. 1993. Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. New York: Wiley.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, and Hiroaki Yutani. 2019. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.

Wickham, Hadley, Jim Hester, and Romain Francois. 2018. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.


  1. A placebo is a treatment level designed to mimic the potentially efficacious level(s) but that can have no actual effect. The placebo effect is the effect that thinking that an effective treatment was received has on subjects. There are other related issues in performing experiments like the Hawthorne or observer effect where subjects modify behavior because they are being observed.↩︎

  2. We will reserve the term “effect” for situations where we could potentially infer causal impacts on the response of the explanatory variable which occurs in situations where the levels of the explanatory variable are randomly assigned to the subjects.↩︎

  3. There is a cloud version of R Studio available at https://rstudio.cloud/ if you want to avoid these steps. We still recommend following the steps to be able to work locally but try this option if you have issues with the installation process.↩︎

  4. I recorded a video that walks through getting R and RStudio installed on a PC available in a recorded video. If you want to see them installed on a mac, you can try this video on youtube. Or for either version, try searching YouTube for “How to install R and RStudio”.↩︎

  5. The need to keep the code up-to-date as R continues to evolve is one reason that this book is locally published and that this is the 6th time it has been revised in six years…↩︎

  6. There are ways to read “.xls” and “.xlsx” files directly into R that we will explore later so you can also use that format if you prefer.↩︎

  7. If you are having trouble getting the file converted and read into R, copy and run the following code: treadmill <- read_csv("http://www.math.montana.edu/courses/s217/documents/treadmill.csv").↩︎

  8. You can also use Ctrl+Enter if you like hot keys.↩︎

  9. Tibbles are R objects that can contain both categorical and quantitative variables on your \(n\) subjects with a name for each variable that is also the name of each column in a matrix. Each subject is a row of the data set. The name (supposedly) is due to the way table sounds in the accent of a particularly influential developer at RStudio who is from New Zealand.↩︎

  10. The median, quartiles and whiskers sometimes occur at the same values when there are many tied observations. If you can’t see all the components of the boxplot, produce the numerical summary to help you understand what happened.↩︎

  11. You will more typically hear “data is” but that more often refers to information, sometimes even statistical summaries of data sets, than to observations made on subjects collected as part of a study, suggesting the confusion of this term in the general public. We will explore a data set in Chapter 5 related to perceptions of this issue collected by researchers at http://fivethirtyeight.com/.↩︎

  12. Either try to remember “data is a plural word” or replace “data” with “things” in your sentence and consider whether it sounds right.↩︎

  13. Of particular interest to the bicycle rider might be the “close” passes and we will revisit this as a categorical response with “close” and “not close” as its two categories later.↩︎

  14. Thanks to Ian Walker for allowing me to use and post these data.↩︎

  15. As noted previously, we reserve the term “effect” for situations where random assignment allows us to consider causality as the reason for the differences in the response variable among levels of the explanatory variable, but this is only the case if we find evidence against the null hypothesis of no difference in the groups.↩︎

  16. If you’ve taken calculus, you will know that the curve is being constructed so that the integral from \(-\infty\) to \(\infty\) is 1. If you don’t know calculus, think of a rectangle with area of 1 based on its height and width. These cover the same area but the top of the region wiggles.↩︎

  17. Jittering typically involves adding random variability to each observation that is uniformly distributed in a range determined based on the spacing of the observation. The idea is to jitter just enough to see all the points but not too much. This means that if you re-run the jitter function, the results will change if you do not set the random number seed using set.seed that is discussed more below. For more details, type help(jitter) in the console in RStudio.↩︎

  18. If you want to type this character in Rmarkdown, try $\sim$ outside of codechunks.↩︎

  19. Remember the bell-shaped curve you encountered in introductory statistics? If not, you can see some at https://en.wikipedia.org/wiki/Normal_distribution.↩︎

  20. The package and function are intentionally amusingly titled but are based on ideas in the beanplot in Kampstra (2008) and provide what they call an RDI graphic - Raw data, Descriptive, and Inferential statistic in the same display.↩︎

  21. The hypothesis of no difference that is typically generated in the hopes of being rejected in favor of the alternative hypothesis, which contains the sort of difference that is of interest in the application.↩︎

  22. The null model is the statistical model that is implied by the chosen null hypothesis. Here, a null hypothesis of no difference translates to having a model with the same mean for both groups.↩︎

  23. While note required, we often set our random number seed using the set.seed function so that when we re-run code with randomization in it we get the same results. ↩︎

  24. We’ll see the shuffle function in a more common usage below; while the code to generate Perm1 is provided, it isn’t something to worry about right now.↩︎

  25. This is a bit like getting a new convertible sports car and driving it to the grocery store - there might be better ways to get groceries, but we want to drive our new car as soon as we get it.↩︎

  26. This will be formalized and explained more in the next chapter when we encounter more than two groups in these same models. For now, it is recommended to start with the sample means from favstats for the two groups and then use that to sort out which direction the differencing was done in the lm output.↩︎

  27. P-values are the probability of obtaining a result as extreme as or more extreme than we observed given that the null hypothesis is true.↩︎

  28. In statistics, vectors are one dimensional lists of numeric elements – basically a column from a matrix of our tibble.↩︎

  29. We often say “under” in statistics and we mean “given that the following is true”.↩︎

  30. This is a fancy way of saying “in advance”, here in advance of seeing the observations.↩︎

  31. Statistically, a conservative method is one that provides less chance of rejecting the null hypothesis in comparison to some other method or less than some pre-defined standard. A liberal method provides higher rates of false rejections.↩︎

  32. Both approaches are reasonable. By using both tails of the distribution we can incorporate potential differences in shape in both tails of the permutation distribution.↩︎

  33. We’ll leave the discussion of the CLT to your previous statistics coursework or an internet search. For this material, just remember that it has something to do with distributions of statistics looking more normal as the sample size increases.↩︎

  34. The t.test function with the var.equal=T option is the more direct route to calculating this statistic (here that would be t.test(Distance~Condition, data=dsamp, var.equal=T)), but since we can get the result of interest by fitting a linear model, we will use that approach.↩︎

  35. On exams, you might be asked to describe the area of interest, sketch a picture of the area of interest, and/or note the distribution you would use. Make sure you think about what you are trying to do here as much as learning the mechanics of how to get p-values from R.↩︎

  36. In some studies, the same subject is measured in both conditions and this violates the assumptions of this procedure.↩︎

  37. At this level, it is critical to learn the tools and learn where they might provide inaccurate inferences. If you explore more advanced statistical resources, you will encounter methods that can allow you to obtain valid inferences in even more scenarios.↩︎

  38. Only male and female were provided as options on the survey. These data were collected as part of a project to study learning of material using online versus paper versions of the book but we focus just on the gender differences in GPA here.↩︎

  39. The data are provided and briefly discussed in the Practice Problems for Chapter 3.↩︎

  40. You can correctly call octothorpes number symbols or, in the twitter verse, hashtags. For more on this symbol, see “http://blog.dictionary.com/octothorpe/”. Even after reading this, I call them number symbols.↩︎

  41. An unbiased estimator is a statistic that is on average equal to the population parameter.↩︎

  42. Some perform bootstrap sampling in this situation by re-sampling within each of the groups. We will discuss using this technique in situations without clearly defined groups, so prefer to sample with replacement from the entire data set. It also directly corresponds to situations where the data came from one large sample and then the grouping variable of interest was measured on the \(n\) subjects.↩︎

  43. The as.numeric function is also used here. It really isn’t important but makes sure the output of table is sorted by observation number by first converting the orig.id variable into a numeric vector.↩︎

  44. In any bootstrap sample, about 1/3 of the observations are not used at all.↩︎

  45. There are actually many ways to use this information to make a confidence interval. We are using the simplest method that is called the “percentile” method.↩︎

  46. When hypothesis tests “work well” they have high power to detect differences while having Type I error rates that are close to what we choose a priori. When confidence intervals “work well”, they contain the true parameter value in repeated random samples at around the selected confidence level, which is called the coverage rate. ↩︎

  47. We will often use this term to indicate perform a calculation using the favstats results – not that you need to go back to the data set and calculate the means and standard deviations yourself.↩︎

  48. In Chapter 4, methods are discussed for when there are two categorical explanatory variables that is called the Two-Way ANOVA and related ANOVA tests are used in Chapter 8 for working with extensions of these models.↩︎

  49. In Chapter 2, we used lm to get these estimates and focused on the estimate of the difference between the second group and the baseline - that was and still is the difference in the sample means. Now there are potentially more than two groups and we need to formalize notation to handle this more complex situation.↩︎

  50. If you look closely in the code for the rest of the book, any model for a quantitative response will use this function, suggesting a common thread in the most commonly used statistical models.↩︎

  51. We can and will select the order of the levels of categorical variables as it can make plots easier to interpret.↩︎

  52. Suppose we were doing environmental monitoring and were studying asbestos levels in soils. We might be hoping that the mean-only model were reasonable to use if the groups being compared were in remediated areas and in areas known to have never been contaminated.↩︎

  53. Make sure you can work from left to right and up and down to fill in the ANOVA table given just the necessary information to determine the other components or from a study description to complete the DF part of the table – there are always questions like these on exams…↩︎

  54. Any further claimed precision is an exaggeration and eventually we might see p-values that approach the precision of the computer at 2.2e-16 and anything below 0.0001 should just be reported as being below 0.0001. Also note the way that R represents small or extremely large numbers using scientific notation such as 3e-4 which is \(3 \cdot 10^{-4} = 0.0003\).↩︎

  55. This would be another type of publication bias – where researchers search across groups and only report their biggest differences and fail to report the other pairs that they compared. As discussed before, this biases the results to detecting results more than they should be and then when other researchers try to repeat the same studies and compare just, say, two groups, they likely will fail to find similar results unless they also search across many different possible comparisons and only report the most extreme. The better approach is to do the ANOVA \(F\)-test first and then Tukey’s comparisons and report all these results, as discussed below.↩︎

  56. We have been using this function quite a bit to make multi-panel graphs but did not show you that line of code. But you need to use this command for linear model diagnostics or you won’t get the plots we want from the model. And you really just need plot(lm2) but the pch=16 option makes it easier to see some of the points in the plots.↩︎

  57. Along with multiple names, there is variation of what is plotted on the x and y axes, the scaling of the values plotted, and even the way the line is chosen to represent the 1-1 relationship, increasing the challenge of interpreting QQ-plots. We are consistent about the x and y axis choices throughout this book and how the line is drawn but different versions of these plots do vary in what is presented, so be careful with using QQ-plots.↩︎

  58. Here this means re-scaled so that they should have similar scaling to a standard normal with mean 0 and standard deviation 1. This does not change the shape of the distribution but can make outlier identification simpler – having a standardized residual more extreme than 5 or -5 would suggest a deviation from normality since we rarely see values that many standard deviations from the mean in a normal distribution. But mainly focus on the pattern in points in the QQ-plot and whether it matches the 1-1 line that is being plotted.↩︎

  59. A resistant procedure is one that is not severely impacted by a particular violation of an assumption. For example, the median is resistant to the impact of an outlier. But the mean is not a resistant measure as changing the value of a single point changes the mean.↩︎

  60. A violation of the independence assumption could have easily been created if they measured cells in two locations on each guinea pig or took measurements over time on each subject.↩︎

  61. Note that to see all the group labels in the plot when making the figure, you have to widen the plot window before copying the figure out of R. You can resize the plot window using the small vertical and horizontal “=” signs in the grey bars that separate the different panels in RStudio.↩︎

  62. In working with researchers on hundreds of projects, my experience has been that many conversations are often required to discover all the potential sources of issues in data sets, especially related to assessing independence of the observations. Discussing how the data were collected is sometimes the only way to understand whether violations of independence are present or not.↩︎

  63. When this procedure is used with unequal group sizes it is also sometimes called Tukey-Kramer’s method.↩︎

  64. We often use “spurious” to describe falsely rejected null hypotheses, but they are also called false detections.↩︎

  65. In more complex models, this code can be used to create pair-wise comparisons on one of many explanatory variables.↩︎

  66. The plot of results usually contains all the labels of groups but if the labels are long or there many groups, sometimes the row labels are hard to see even with re-sizing the plot to make it taller in RStudio. The numerical output is useful as a guide to help you read the plot in those situations.↩︎

  67. Note that this method is implemented slightly differently than explained here in some software packages so if you see this in a journal article, read the discussion carefully.↩︎

  68. There is a warning message produced by the default Tukey’s code here related to the algorithms used to generate approximate p-values and then the CLD, but the results seem reasonable and just a few p-values seem to vary in the second or third decimal points.↩︎

  69. We would not suggest throwing away observations to get balanced designs. Plan in advance to try to have a balanced design but analyze the responses you get.↩︎

  70. Copy and include this code in a codechunk any time you want to use the intplot or inplotarray functions.↩︎

  71. We will use “main effects” to refer to the two explanatory variables in the additive model even if they are not randomly assigned to contrast with having those variables interacting in the model. It is the one place in the book where we use “effects” without worrying about the causal connotation of that word.↩︎

  72. In the standard ANOVA table, \(\text{SS}_A + \text{SS}_B + \text{SS}_{AB} + \text{SS}_E = \text{SS}_{\text{Total}}\). However, to get the tests we really desire when our designs are not balanced, a slight modification of the SS is used, using what are called Type II sums of squares and this result doesn’t hold in the output you will see for additive models. This is discussed further below.↩︎

  73. The anova results are not wrong, just not what we want in all situations.↩︎

  74. Actually, the tests are only conditional on other main effects if Type II Sums of Squares are used for an interaction model, but we rarely focus on the main effect tests when the interaction is present.↩︎

  75. In Multiple Linear Regression models in Chapter 8, the reasons for this wording will (hopefully) become clearer.↩︎

  76. Just so you don’t think that perfect R code should occur on the first try, I have made similarly serious coding mistakes even after accumulating more than decade of experience with R. It is finding those mistakes (in time) that matters.↩︎

  77. To get dosef on the x-axis in the plot, the x.var="dosef" option was employed to force the Dose to be the variable on the x-axis.↩︎

  78. We switched back to the anova function here as the Anova function only reports Error in Anova.lm(lm(responses ~ dropsf * brand, data = ptR)) : residual df = 0, which is fine but not as useful for understanding this issue as what anova provides.↩︎

  79. In larger data sets, multiple subjects are displayed in each row as proportions of the rows in each category.↩︎

  80. Quantitative variables are displayed with boxplot-like bounds to describe the variability in the variable for that row of responses for larger data sets.↩︎

  81. While randomization is typically useful in trying to “equalize” the composition of groups, a possible randomization of subjects to the groups is to put all the males into the treatment group. Sometimes we add additional constraints to randomization of subjects to treatments to guarantee that we don’t get stuck with an unusual and highly unlikely assignment like that. It is important at least to check the demographics of different treatment groups to see if anything odd occurred.↩︎

  82. The vertical line, “|”, in ~ y|x is available on most keyboards on the same key as “\”. It is the mathematical symbol that means “conditional on” whatever follows.↩︎

  83. Standardizing involves dividing by the standard deviation of a quantity so it has a standard deviation 1 regardless of its original variability and that is what is happening here even though it doesn’t look like the standardization you are used to with continuous variables.↩︎

  84. Note that in smaller data sets to get results as discussed here, use the correct=F option. If you get output that contains “...with Yate's continuity correction”, a slightly modified version of this test is being used.↩︎

  85. Here it allows us to relax a requirement that all the expected cell counts are larger than 5 for the parametric test (Section 5.6).↩︎

  86. Doctors are faced with this exact dilemma – with little more training than you have now in statistics, they read a result like this in a paper and used to be encouraged to focus on the p-value to decide about treatment recommendations. Would you recommend the treatment here just based on the small p-value? Would having Figure 5.11 to go with the small p-value help you make a more educated decision? Now the recommendations are starting to move past just focusing on the p-values and thinking about the practical importance and size of the differences. The potential benefits of a treatment need to be balanced with risks of complications too, but that takes us back into discussing having multiple analyses in the same study (treatment improvement, complications/not, etc.).↩︎

  87. Independence tests can’t be causal by their construction. Homogeneity tests could be causal or just associational, depending on how the subjects ended up in the groups.↩︎

  88. A stratified random sample involves taking a simple random sample from each group or strata of the population. It is useful to make sure that each group is represented at a chosen level (for example the sample proportion of the total size). If a simple random sample of all schools had been taken, it is possible that a level could have no schools selected.↩︎

  89. There are measures of correlation between categorical variables but when statisticians say correlation they mean correlation of quantitative variables. If they are discussing correlations of other types, they will make that clear.↩︎

  90. Some of the details of this study have been lost, so we will assume that the subjects were randomly assigned and that a beer means a regular sized can of beer and that the beer was of regular strength. We don’t know if any of that is actually true. It would be nice to repeat this study to know more details and possibly have a larger sample size but I doubt if our institutional review board would allow students to drink as much as 9 beers.↩︎

  91. I added pch=16, col=30 to fill in the default circles and make the points in something other than black, an entirely unnecessary addition here.↩︎

  92. This interface with the cor function only works after you load the mosaic package.↩︎

  93. The natural log (\(\log_e\) or \(\ln\)) is used in statistics so much that the function in R log actually takes the natural log and if you want a \(\log_{10}\) you have to use the function log10. When statisticians say log we mean natural log.↩︎

  94. This is related to what is called Simpson’s paradox, where the overall analysis (ignoring a grouping variable) leads to a conclusion of a relationship in one direction, but when the relationship is broken down into subgroups it is in the opposite direction in each group. This emphasizes the importance of checking and accounting for differences in groups and the more complex models we are setting the stage to consider in the coming chapters.↩︎

  95. The interval is “far” from the reference value under the null (0) so this provides at least strong evidence. With using confidence intervals for tests, we really don’t know much about the strength of evidence against the null hypothesis but the hypothesis test here is a bit more complicated to construct and understand and we will have to tolerate just having crude information about the p-value to assess strength of evidence.↩︎

  96. Observations at the edge of the \(x\text{'s}\) will be called high leverage points in Section 6.9; this point is a low leverage point because it is close to mean of the \(x\text{'s}\).↩︎

  97. Even with clear scientific logic, we sometimes make choices to flip the model directions to facilitate different types of analyses. In Vsevolozhskaya et al. (2014) we looked at genomic differences based on obesity groups, even though we were really interested in exploring how gene-level differences explained differences in obesity.↩︎

  98. The residuals from these methods and ANOVA are the same because they all come from linear models but are completely different from the standardized residuals used in the Chi-square material in Chapter 5.↩︎

  99. We can also write this as \(E(y_i|x_i) = \mu\{y_i|x_i\} = \beta_0 + \beta_1x_i\), which is the notation you will see in books like the Statistical Sleuth (Ramsey and Schafer 2012). We will use notation that is consistent with how we originally introduced the methods.↩︎

  100. There is an area of statistical research on how to optimally choose \(x\)-values to get the most precise estimate of a slope coefficient. In observational studies we have to deal with whatever pattern of \(x\text{'s}\) we ended up with. If you can choose, generate an even spread of \(x\text{'s}\) over some range of interest similar to what was used in the Beers vs BAC study to provide the best distribution of values to discover the relationship across the selected range of \(x\)-values.↩︎

  101. See http://fivethirtyeight.com/features/which-city-has-the-most-unpredictable-weather/ for an interesting discussion of weather variability where Great Falls, MT had a very high rating on “unpredictability”.↩︎

  102. It is actually pretty amazing that there are hundreds of locations with nearly complete daily records for over 100 years.↩︎

  103. All joking aside, if researchers can find evidence of climate change using conservative methods (methods that reject the null hypothesis when it is true less often than stated), then their results are even harder to ignore.↩︎

  104. It took many permutations to get competitor plots this close to the real data set and they really aren’t that close.↩︎

  105. If the removal is of a point that is extreme in \(x\)-values, then it is appropriate to note that the results only apply to the restricted range of \(x\)-values that were actually analyzed in the scope of inference discussion. Our results only ever apply to the range of \(x\)-values we had available so this is a relatively minor change.↩︎

  106. Note exp(x) is the same as \(e^{(x)}\) but easier to read in-line and exp() is the R function name to execute this calculation.↩︎

  107. You can read my dissertation if you want my take on modeling U and V-shaped valley elevation profiles that included some discussion of these models and some related discussions in Greenwood and Humphrey (2002).↩︎

  108. This transformation could not be applied directly to the education growth score data in Chapter 5 because there were negative “growth” scores.↩︎

  109. This silly nomenclature was inspired by De Veaux, Velleman, and Bock (2011) Stats: Data and Models text. If you find this too cheesy, you can just call it x-vee.↩︎

  110. I have suppressed some of the code for making plots in this and the next chapter to make “pretty” pictures - which you probably are happy to not see it until you want to make a pretty plot on your own. All the code used is available upon request.↩︎

  111. I have really enjoyed writing this book but kind of hope to at least have updated this data set before 2050.↩︎

  112. If you take advanced applied mathematics courses, you can learn more about the algorithms being used by lm. Everyone else only cares about the algorithms when they don’t work – which is usually due to the user’s inputs in these models not the algorithm itself.↩︎

  113. Sometimes the effects plots ignores the edge explanatory observations with the default display. Always check the original variable summaries when considering the range of observed values. By turning on the the “partial residuals” with SLR models, the plots show the original observations along with the fitted values and 95% confidence interval band. In more complex models, these displays with residuals are more complicated but can be used to assess linearity with each predictor in the model after accounting for other variables.↩︎

  114. We used this same notation in the fitting the additive Two-Way ANOVA and this is also additive in terms of these variables. Interaction models are discussed later in the chapter.↩︎

  115. I have not given you a formula for calculating partial residuals. We will leave that for more advanced material.↩︎

  116. Imagine showing up to a ski area expecting a 40 inch base and there only being 11 inches. I’m sure ski areas are always more accurate than this model in their reporting of amounts of snow on the ground…↩︎

  117. The site name is redacted to protect the innocence of the reader. More information on this site, located in Beaverhead County, is available at http://www.wcc.nrcs.usda.gov/nwcc/site?sitenum=355&state=mt.↩︎

  118. They hold factor variables at their modal category but we don’t have factor variables in the MLR models, yet.↩︎

  119. The seq function has syntax of seq(from=startingpoint, to=endingpoint, length.out=#ofvalues_between_start_and_end) and the rep function has syntax of rep(numbertorepeat, #oftimes).↩︎

  120. Also see Section 8.13 for another method of picking among different models.↩︎

  121. This section was inspired by a similar section from De Veaux, Velleman, and Bock (2011).↩︎

  122. There are some social science models where the model is fit with the mean subtracted from each predictor so all have mean 0 and the precision of the \(y\)-intercept is interesting. In some cases both the response and predictor variables are “standardized” to have means of 0 and standard deviations of 1. The interpretations of coefficients then relates to changes in standard deviations around the means. These coefficients are called “standardized betas”. But even in these models where the \(x\)-values of 0 are of interest, the test for the \(y\)-intercept being 0 is rarely of interest.↩︎

  123. Either someone had a weighted GPA with bonus points, or more likely here, there was a coding error in the data set since only one observation was over 4.0 in the GPA data. Either way, we could remove it and note that our inferences for HSGPA do not extend above 4.0.↩︎

  124. When there are just two predictors, the VIFs have to be the same since the proportion of information shared is the same in both directions. With more than two predictors, each variable can have a different VIF value.↩︎

  125. We are actually making an educated guess about what these codes mean. Other similar data sets used 1 for males but the documentation on these data is a bit sparse. We proceed with a small potential that the conclusions regarding differences in gender are in the wrong direction.↩︎

  126. Some people also call them dummy variables to reflect that they are stand-ins for dealing with the categorical information. But it seems like a harsh anthropomorphism so I prefer “indicators”.↩︎

  127. This is true for additive uses of indicator variables. In Section 8.11, we consider interactions between quantitative and categorical variables which has the effect of changing slopes and intercepts. The simplification ideas to produce estimated equations for each group are used there but we have to account for changing slopes by group too.↩︎

  128. Models like this with a categorical variable and quantitative variable are often called ANCOVA or analysis of covariance models but really are just versions of our linear models we’ve been using throughout this material.↩︎

  129. Note that we employed some specific options in the legend option to get the legend to fit on this scatterplot better. Usually you can avoid this but the coords option defined a location and the columns option made it a two column legend. The viridis(4) code makes the plot in a suite of four colors from the viridis package (Garnier 2018).↩︎

  130. The strength of this recommendation drops when you have many predictors as you can’t do this for every variable, but the concern remains about an assumption of no interaction whenever you fit models without them. In more complex situations, think about variables that are most likely to interact in their impacts on the response based on the situation being studied and try to explore those.↩︎

  131. Standardizing quantitative predictor variables is popular in social sciences, often where the response variable is also standardized. In those situations, they generate what are called “standardized betas” (https://en.wikipedia.org/wiki/Standardized_coefficient) that estimate the change in SDs in the response for a 1 SD increase in the explanatory variable.↩︎

  132. There is a way to test for a difference in the two lines at a particular \(x\) value but it is beyond the scope of this material.↩︎

  133. This is an example of what is called “step down” testing for model refinement which is a commonly used technique for arriving at a final model to describe response variables. Note that each step in the process should be reported, not just the final model that only has variables with small p-values remaining in it.↩︎

  134. We could also use the anova function to do this but using Anova throughout this material provides the answers we want in the additive model and it has no impact for the only test of interest in the interaction model since the interaction is the last component in the model.↩︎

  135. In most situations, it would be crazy to assume that the true model for a process has been obtained so we can never pick the “correct” model. In fact, we won’t even know if we are picking a “good” model, but just the best from a set of the candidate models on a criterion. But we can study the general performance of methods using simulations where we know the true model and the AIC has some useful properties in identifying the correct model when it is in the candidate set of models. No such similar theory exists for the adjusted R2.↩︎

  136. Most people now call this Akaike’s (pronounced ah-kah-ee-kay) Information Criterion, but he used the AIC nomenclature to mean An Information Criterion – he was not so vain as to name the method after himself in the original paper that proposed it. But it is now common to use “A” for his last name.↩︎

  137. More details on these components of the methods will be left for more advanced material - we will focus on an introduction to using the AIC measure here.↩︎

  138. Although sometimes excluded, the count of parameters should include counting the residual variance as a parameter.↩︎

  139. We put quotes on “full” or sometimes call it the “fullish” model because we could always add more to the model, like interactions or other explanatory variables. So we rarely have a completely full model but we do have our “most complicated that we are considering” model.↩︎

  140. The options in extra=... are to get extra information displayed that you do not necessarily need. You can simply run dredge(m6,rank="AIC") to get just the AIC results.↩︎

  141. The researchers did not do this analysis so never directly addressed this research question although they did discuss it in general ways.↩︎

  142. Instructors often get asked what a problem with non-constant variance actually looks like – this is a perfect example of it!↩︎

  143. This was not even close to their top AIC model so they made an odd choice.↩︎

  144. I had students read this paper in a class and one decided that this was a reasonable way to report small p-values – it is WRONG. We are interested in how small a p-value might be and saying it is over a value is never useful, especially if you say it is larger than a tiny number.↩︎

  145. All too often, I read journal articles that have under-utilized, under-reported, mis-applied, or mis-interpreted statistical methods and results. One of the reasons that I wanted to write this book was to help more people move from basic statistical knowledge to correct use of intermediate statistical methods and beginning to see the potential in more advanced statistical methods. It took me many years of being a statistician (after getting a PhD) just to feel armed for battle when confronted with new applications and two stat courses are not enough to get you there, but you have to start somewhere. You are only maybe two or three hundred hours into your 10,000 hours required for mastery. This book is intended get you some solid fundamentals to build on or a few intermediate tools to use if this is your last statistics training experience.↩︎

  146. They also had an error in their AIC results that is difficult to explain here but was due to an un-careful usage of the results from the more advanced models that account for autocorrelation, which seems to provide the proper ranking of models (that they ignored) but did not provide the correct differences among models.↩︎

References

De Veaux, Richard D., Paul F. Velleman, and David E. Bock. 2011. Stats: Data and Models, 3rd Edition. Pearson.

Garnier, Simon. 2018. Viridis: Default Color Maps from ’Matplotlib’. https://CRAN.R-project.org/package=viridis.

Greenwood, Mark C., and N. F. Humphrey. 2002. “Glaciated Valley Profiles: An Application of Nonlinear Regression.” Computing Science and Statistics 34: 452–60.

Kampstra, Peter. 2008. “Beanplot: A Boxplot Alternative for Visual Comparison of Distributions.” Journal of Statistical Software, Code Snippets 28 (1): 1–9. http://www.jstatsoft.org/v28/c01/.

Ramsey, Fred, and Daniel Schafer. 2012. The Statistical Sleuth: A Course in Methods of Data Analysis. Cengage Learning. https://books.google.com/books?id=eSlLjA9TwkUC.

Vsevolozhskaya, Olga A., Dmitri V. Zaykin, Mark C. Greenwood, Changshuai Wei, and Qing Lu. 2014. “Functional Analysis of Variance for Association Studies.” PLOS ONE 9 (9): 13. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0105074.