In this section, the analysis of a survey of \(N=464\) randomly sampled
adults will be analyzed from a survey conducted by Lea, Webley, and Walker (1995)
and available in the debt
data set from the faraway
package
(Faraway 2016).
The subjects responded to a variety of questions
including whether they buy cigarettes (cigbuy
: 0 if
no, 1 if yes), their housing situation (house
: 1 = rent, 2 = mortgage,
and 3 = owned outright), their income group (incomegp
: 1 =
lowest, 5 = highest), and their score on a continuous scale of attitudes about
debt (prodebt
: 1 = least favorable, 6 = most favorable). The variable prodebt
was derived as the average of a series of questions about debt with each
question measured on an ordinal 1 to 6 scale, with higher values
corresponding to more positive responses about \(\underline{\text{going into debt}}\)
of various kinds. The ordered scale on surveys that try to elicit your opinions on
topics with scales from 1 to 5, 1 to 6, 1 to 7 or even, sometimes, 1 to 10 is called
a Likert scale (Likert 1932). It is not a quantitative scale and really
should be handled more carefully than taking an average of a set responses as was done here. That
said, it is extremely common practice in social science research to treat ordinal
responses as if they are quantitative and take the average of many of them to
create a more continuous response variable like the one we are using here. If
you continue your statistics explorations, you will see some better techniques
for analyzing ordinal responses. That said, the scale of the
response is relatively easy to understand as an amount of willingness to go
into debt on a scale from 1 to 6 with higher values corresponding to more
willingness to be in debt.
These data are typical of survey data where respondents were not required
to answer all questions and there are some
missing responses. We could clean out any individuals that failed to respond to
all questions (called “complete cases”) using the na.omit
function, which will return responses only for subjects
that responded to every question in
the data set, debt
. The change in sample size is available by running the dim
function on the two data sets – there were \(464\) observations (rows) initially along with \(13\) variables (columns) and once observations with any missing values were dropped there are \(N=304\) for us to analyze. Losing 35% of the observations is a pretty noticeable loss.
library(faraway)
data(debt)
library(tibble)
debt <- as_tibble(debt)
debt$incomegp <- factor(debt$incomegp)
debt$cigbuy <- factor(debt$cigbuy)
debtc <- na.omit(debt)
dim(debt)
## [1] 464 13
## [1] 304 13
If we just focus on the three variables we are using in this model (debtR
), the missingness is less dramatic, retaining \(N=388\) observations in debtRc
.
debtR <- debt[,c("incomegp", "cigbuy","prodebt")] # Select only variables of interest for model
debtRc <- na.omit(debtR)
dim(debtRc)
## [1] 388 3
The second approach seems better as it drops fewer observations so we will use that below. But suppose that people did not want to
provide their income levels if they were in the lowest or highest
income groups and that is why they are missing. Then we would be missing responses systematically and conclusions
could be biased because of ignoring particular types of subjects. We don’t have particular statistical tools to easily handle this problem but every researcher should worry about non-response when selected subjects do not respond at
all or fail to answer some questions. When the missing values are systematic in some fashion and not just missing randomly (missing randomly might be thought of as caused by “demonic intrusion” (Hurlbert 1984) that can’t be easily explained or related to the types of responses), then we worry about non-response bias that is systematically biasing our results because of the missing responses. This also ties back into our discussion of who was sampled. We need to
think carefully about who was part of the sample but refused to participate and
how that might impact our inferences. And whether we can even address the research question of interest based on what was measured given those that refused/failed to respond. For example, suppose we are studying river flows and are interested in the height of a river. Missingness in these responses could arise because a battery fails or the data logger “crashes” (not related to the responses and so not definitely problematic) or because of something about the measurements to be taken that causes the missingness (suppose the gage can only can measure up to three feet deep and the river is running at four feet deep during a flood). Those are two very different reasons to fail to observe the river height and the second one clearly leads to bias in estimating mean river height because of what can not be observed. In Chapter ??, we introduce the tableplot
as another tool to visualize data that can also show missing data patterns to help you think about these sorts of issues further.
Ignoring this potential for bias in the
results for the moment, we are first interested in whether buying
cigarettes/not and income groups interact in their explanation of the
respondent’s mean opinions on being in debt. The interaction plot
(Figure 2.63) may suggest an interaction between cigbuy
and incomegp
where the lines cross, switching which of the cigbuy
levels is higher (income levels 2, 3, and 5) or even almost not different (income levels 1 and 4). But it is
not as clear as the previous examples, especially with how large the SEs are relative the variation in the means. The interaction \(F\)-test helps us
objectively assess evidence against the null hypothesis of no interaction. Based on the plot, there do
not appear to be differences based on cigarette purchasing but there might be some
differences between the income groups if we drop the interaction from the model. If we drop the interaction, then
this suggests that we might be in Scenario 2 or 3 where a single main effect of
interest is present.
(ref:fig4-15) Interaction plot array of prodebt
by income group (1 to 5) and whether they buy cigarettes
(0=no, 1=yes).
As in other situations, and especially with observational studies where a single large sample is collected and then the levels of the factor variables are observed, it is important to check for balance – whether all the combinations of the two predictor variables are similarly represented. Even more critically, we need to check whether all the combinations of levels of factors are measured. If a combination is not measured, then we lose the ability to estimate the mean for that combination and the ability to test for an interaction. A solution to that problem would be to collapse the categories of one of the variables, changing the definitions of the levels but if you fail to obtain information for all combinations, you can’t work with the interaction model. In this situation, we barely have enough information to proceed (the smallest \(n_{jk}\) is 13 for income group 4 that buys cigarettes). We have a very unbalanced design with counts between 13 and 60 in the different combinations, so lose some resistance to violation of assumptions but can proceed to explore the model with a critical eye on how the diagnostic plots look.
## incomegp
## cigbuy 1 2 3 4 5
## 0 36 49 54 53 60
## 1 37 45 20 13 21
The test for the interaction is always how we start our modeling in Two-Way ANOVA situations. The ANOVA table suggests that there is little evidence against the null hypothesis of no interaction between the income level and buying cigarettes on the opinions of the respondents towards debt (\(F(4,378)=0.686\), p-value=0.6022). This suggests that the initial assessment that the interaction wasn’t too prominent was correct. We should move to the additive model here but first need to check the assumptions to make sure we can trust this initial test.
## Anova Table (Type II tests)
##
## Response: prodebt
## Sum Sq Df F value Pr(>F)
## incomegp 10.742 4 5.5246 0.0002482
## cigbuy 0.010 1 0.0201 0.8874246
## incomegp:cigbuy 1.333 4 0.6857 0.6022065
## Residuals 183.746 378
The diagnostic plots (Figure 2.64) seem to be pretty well-behaved with no apparent violations of the normality assumption and no clear evidence of a violation of the constant variance assumption. There is no indication of a problem with the independence assumption because there is no indication of structure to the measurements of the survey respondents that might create dependencies. In observational studies, violations of the independence assumption might come from repeated measures of the same person over time or multiple measurements within the same family/household or samples that are clustered geographically, none of which are part of the survey information we have. The random sampling from a population should allow inferences to a larger population except for that issue of removing partially missing responses so we can’t safely generalize results beyond the complete observations we are using without worry that the missing subjects are systematically different from those we are able to analyze. We also don’t have much information on the exact population sampled, so will just leave this vague here but know that there would be a population these conclusions apply since it was random sample (at least those that would answer the questions). All of this suggests proceeding to fitting and exploring the additive model is reasonable here. No causal inferences are possible because this is an observational study.
(ref:fig4-16) Diagnostic plot for prodebt
by income group and buy
cigarettes/not interaction model.
After ruling out the interaction of income and cigarette status on opinions about debt, we can focus on the additive model.
Hypotheses (Two sets apply when the additive model is the focus!):
\(H_0\): No difference in means for prodebt
for income groups in
population, given cigarette buying in model
\(\Leftrightarrow\) All \(\tau_j\text{'s} = 0\) in additive model.
\(H_A\): Some difference in means for prodebt
for income group in
population, given cigarette buying in model
\(\Leftrightarrow\) Not all \(\tau_j\text{'s} = 0\) in additive model.
\(H_0\): No difference in means for prodebt
for cigarette
buying/not in population, given income group in model
\(\Leftrightarrow\) All \(\gamma_k\text{'s} = 0\) in additive model.
\(H_A\): Some difference in means for prodebt
for cigarette
buying/not in population, given income group in model
\(\Leftrightarrow\) Not all \(\gamma_k\text{'s} = 0\) in additive model.
Validity conditions – discussed above but with new plots for the additive model:
Constant Variance:
In the Residuals vs Fitted and the Scale-Location plots in Figure 2.65, the differences in variability among groups is minor and nothing suggests a violation. If you change models, you should always revisit the diagnostic plots to make sure you didn’t create problems that were not present in more complicated models.
We can also explore the partial residuals here as provided in Figure 2.66. The variability in the partial residuals appears to be similar across the different levels of each predictor, controlled for the other variable, and so does suggest any issues that were missed by just looking at the overall residuals versus fitted values in our regular diagnostic plots. Note how hard it is to see differences in the mean for levels of cigbuy
in this plot relative to the variability in the partial residuals but that the differences in the means in incomegp
are at least somewhat obvious.
Normality of residuals:
Calculate the test statistics and p-values for the two main effect tests.
## Anova Table (Type II tests)
##
## Response: prodebt
## Sum Sq Df F value Pr(>F)
## incomegp 10.742 4 5.5428 0.0002399
## cigbuy 0.010 1 0.0201 0.8872394
## Residuals 185.079 382
Conclusions (including for the initial work with the interaction test):
So we learned that the additive model was more appropriate for these responses and that the results resemble Scenario 2 or 3 with only one main effect being important. In the additive model, the coefficients can be interpreted as shifts from the baseline after controlling for the other variable in the model.
Figure 2.67 shows the
increasing average
comfort with being in debt as the income groups go up except between groups 1 and 2 where 1 is a little higher than two. Being a cigarette buyer
was related to a lower comfort level with debt but is really no different from those that did not report buying cigarettes. It would be possible to consider follow-up tests akin to the Tukey’s HSD comparisons for the levels of incomegp
here but that is a bit beyond the scope of this course – focus on the estimated mean for the 5th income group being over 3.5 and none of the others over 3.2. That seems like an interesting although modest difference in mean responses across income groups after controlling for cigarette purchasing or not.
(ref:fig4-19) Term-plots for the prodebt
response additive model with left
panel for income group and the right panel for buying cigarettes or not
(0 for no, 1 for yes).
Scope of inference:
The estimated coefficients can also be interesting to interpret for the additive model. Here are the model summary coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.13127172 0.09027437 34.6861672 4.283917e-120
## incomegp2 -0.05371924 0.10860898 -0.4946114 6.211588e-01
## incomegp3 0.02680595 0.11624894 0.2305909 8.177561e-01
## incomegp4 0.09072124 0.12059542 0.7522777 4.523474e-01
## incomegp5 0.40760033 0.11392712 3.5777288 3.911633e-04
## cigbuy1 -0.01088742 0.07672982 -0.1418929 8.872394e-01
In the model, the baseline group is for non-cigarette buyers (cigbuy=0
)
and income group 1 with \(\hat{\alpha}= 3.131\) points. Regardless of the
cigbuy
level, the difference between income groups 2 and 1 is estimated
to be \(\hat{\tau}_2=-0.054\), an decrease in the mean score of 0.054 points.
The difference between income groups 3 and 1 is \(\hat{\tau}_3=0.027\)
points, regardless of cigarette smoking status. The estimated difference
between cigarette buyers and non-buyers was estimated as
\(\hat{\gamma}_2=-0.011\) points for any income group, remember that this
variable had a large p-value in this model. The additive model-based estimates for all
six combinations can be found in Table 2.7.
(ref:tab4-3) Calculations to construct the estimates for all combinations
of variables for the prodebt
additive model.
\(\color{red}{\text{Cig}}\) \(\color{red}{\text{Buy}}\) |
\(\color{blue}{\textbf{Income}}\) \(\color{blue}{\textbf{Group 1}}\) |
\(\color{blue}{\textbf{Income}}\) \(\color{blue}{\textbf{Group 2}}\) |
\(\color{blue}{\textbf{Income}}\) \(\color{blue}{\textbf{Group 3}}\) |
\(\color{blue}{\textbf{Income}}\) \(\color{blue}{\textbf{Group 4}}\) |
\(\color{blue}{\textbf{Income}}\) \(\color{blue}{\textbf{Group 5}}\) |
---|---|---|---|---|---|
\(\color{red}{\text{0:No}}\) | \(\hat{\alpha} ={3.131}\) | \(\hat{\alpha} + \hat{\tau}_2\) \(=3.131 - 0.016\) \(= 3.115\) |
\(\hat{\alpha} + \hat{\tau}_3\) \(=3.131 + 0.027\) \(= 3.158\) |
\(\hat{\alpha} + \hat{\tau}_4\) \(=3.131 + 0.091\) \(= 3.222\) |
\(\hat{\alpha} + \hat{\tau}_5\) \(=3.131 + 0.408\) \(= 3.539\) |
\(\color{red}{\text{1:}\text{Yes}}\) | \(\hat{\alpha}+\hat{\gamma}_2\) \(=3.131\) \(-0.011\) \(=3.12\) |
\(\hat{\alpha}+\hat{\tau}_2+\hat{\gamma}_2\) \(=3.131 - 0.016\) \(-0.011\) \(=3.104\) |
\(\hat{\alpha}+\hat{\tau}_3+\hat{\gamma}_2\) \(=3.131 + 0.027\) \(-0.011\) \(=3.147\) |
\(\hat{\alpha}+\hat{\tau}_4+\hat{\gamma}_2\) \(=3.131 + 0.091\) \(-0.011\) \(=3.211\) |
\(\hat{\alpha}+\hat{\tau}_5+\hat{\gamma}_2\) \(=3.131 + 0.408\) \(-0.011\) \(=3.528\) |
One final plot of the fitted values
from this additive model in Figure 2.68 hopefully crystallizes
the implications
of an additive model and reinforces that this model creates and assumes that
the differences across levels of one variable are the same regardless of the
level of the other variable and that this creates parallel lines. The
difference between cigbuy
levels across all income groups is a drop in
-0.011 points. The income groups
have the same differences regardless of cigarette buying or not, with income
group 5 much higher than the other four groups. The minor differences in cigarette purchasing and large p-value for it controlled for income group suggest that we could also refine the model further and drop the cigbuy
additive term and just focus on the income groups as a predictor – and this takes us right back to a One-Way ANOVA model so is not repeated here.
(ref:fig4-20) Illustration of the results from Table 2.6
showing the combined impacts
of the components of the additive model for prodebt
. Panel (a) uses income
groups on the x-axis and different lines for cigarette buyers (1) or not (0).
Panel (b) displays the different income groups as lines with the cigarette
buying status on the x-axis.
In general, we proceed through the following steps in any 2-WAY ANOVA situation:
Make a pirate-plot and an interaction plot.
Fit the interaction model; examine the test for the interaction.
Check the residual diagnostic plots for the interaction model (especially normality and equal variance).
If the interaction test has a small p-value, that is your main result. Focus on
the term-plot and the interaction plot from (1) to fully understand the results, adding Tukey’s
HSD results to intplot
to see which means of the combinations of levels are detected as
being different. Discuss the sizes of differences and the pattern of the estimated interaction.
If the interaction is not considered important, then re-fit the model without the interaction (additive model) and re-check the diagnostic plots. If the diagnostics are reasonable to proceed:
Focus on the results for each explanatory variable, using Type II tests especially if the design is not balanced. Possibly consider further model refinement to only retain one of the two variables (the one with the smaller p-value) if a p-value is large. Follow One-Way ANOVA recommendations from this point on.
Report the initial interaction test results and the results for the test for each variable from the model that is re-fit without the interaction.
Model coefficients in the additive model are interesting as they are shifts from baseline for each level of each variable, controlling for the other variable – interpret those differences if the number of levels is not too great.
Whether you end up favoring an additive or interaction model or do further model refinement, all steps of the hypothesis testing protocol should be engaged and a story based on the final results should be compiled, supported by the graphical displays such as the term-plots and interaction plots.
Faraway, Julian. 2016. Faraway: Functions and Datasets for Books by Julian Faraway. https://CRAN.R-project.org/package=faraway.
Hurlbert, Stuart H. 1984. “Pseudoreplication and the Design of Ecological Field Experiments.” Ecological Monographs 54 (2): 187–211. http://www.jstor.org/stable/1942661.
Lea, Stephen E. G., Paul Webley, and Catherine M. Walker. 1995. “Psychological Factors in Consumer Debt: Money Management, Economic Socialization, and Credit Use.” Journal of Economic Psychology 16 (4): 681–701.
Likert, Rensis. 1932. “A Technique for the Measurement of Attitudes.” Archives of Psychology 140: 1–55.