4.5 Observational study example: The Psychology of Debt

In this section, the analysis of a survey of \(N=464\) randomly sampled adults will be analyzed from a survey conducted by Lea, Webley, and Walker (1995) and available in the debt data set from the faraway package (Faraway 2016). The subjects responded to a variety of questions including whether they buy cigarettes (cigbuy: 0 if no, 1 if yes), their housing situation (house: 1 = rent, 2 = mortgage, and 3 = owned outright), their income group (incomegp: 1 = lowest, 5 = highest), and their score on a continuous scale of attitudes about debt (prodebt: 1 = least favorable, 6 = most favorable). The variable prodebt was derived as the average of a series of questions about debt with each question measured on an ordinal 1 to 6 scale, with higher values corresponding to more positive responses about \(\underline{\text{going into debt}}\) of various kinds. The ordered scale on surveys that try to elicit your opinions on topics with scales from 1 to 5, 1 to 6, 1 to 7 or even, sometimes, 1 to 10 is called a Likert scale (Likert 1932). It is not a quantitative scale and really should be handled more carefully than taking an average of a set responses as was done here. That said, it is extremely common practice in social science research to treat ordinal responses as if they are quantitative and take the average of many of them to create a more continuous response variable like the one we are using here. If you continue your statistics explorations, you will see some better techniques for analyzing ordinal responses. That said, the scale of the response is relatively easy to understand as an amount of willingness to go into debt on a scale from 1 to 6 with higher values corresponding to more willingness to be in debt.

These data are typical of survey data where respondents were not required to answer all questions and there are some missing responses. We could clean out any individuals that failed to respond to all questions (called “complete cases”) using the na.omit function, which will return responses only for subjects that responded to every question in the data set, debt. The change in sample size is available by running the dim function on the two data sets – there were \(464\) observations (rows) initially along with \(13\) variables (columns) and once observations with any missing values were dropped there are \(N=304\) for us to analyze. Losing 35% of the observations is a pretty noticeable loss.

library(faraway)
data(debt)
library(tibble)
debt <- as_tibble(debt)
debt$incomegp <- factor(debt$incomegp)
debt$cigbuy <- factor(debt$cigbuy)
debtc <- na.omit(debt)
dim(debt)

## [1] 464  13

dim(debtc)

## [1] 304  13

If we just focus on the three variables we are using in this model (debtR), the missingness is less dramatic, retaining \(N=388\) observations in debtRc.

debtR <- debt[,c("incomegp", "cigbuy","prodebt")] # Select only variables of interest for model
debtRc <- na.omit(debtR)
dim(debtRc)

## [1] 388   3

The second approach seems better as it drops fewer observations so we will use that below. But suppose that people did not want to provide their income levels if they were in the lowest or highest income groups and that is why they are missing. Then we would be missing responses systematically and conclusions could be biased because of ignoring particular types of subjects. We don’t have particular statistical tools to easily handle this problem but every researcher should worry about non-response when selected subjects do not respond at all or fail to answer some questions. When the missing values are systematic in some fashion and not just missing randomly (missing randomly might be thought of as caused by “demonic intrusion” (Hurlbert 1984) that can’t be easily explained or related to the types of responses), then we worry about non-response bias that is systematically biasing our results because of the missing responses. This also ties back into our discussion of who was sampled. We need to think carefully about who was part of the sample but refused to participate and how that might impact our inferences. And whether we can even address the research question of interest based on what was measured given those that refused/failed to respond. For example, suppose we are studying river flows and are interested in the height of a river. Missingness in these responses could arise because a battery fails or the data logger “crashes” (not related to the responses and so not definitely problematic) or because of something about the measurements to be taken that causes the missingness (suppose the gage can only can measure up to three feet deep and the river is running at four feet deep during a flood). Those are two very different reasons to fail to observe the river height and the second one clearly leads to bias in estimating mean river height because of what can not be observed. In Chapter ??, we introduce the tableplot as another tool to visualize data that can also show missing data patterns to help you think about these sorts of issues further.

Ignoring this potential for bias in the results for the moment, we are first interested in whether buying cigarettes/not and income groups interact in their explanation of the respondent’s mean opinions on being in debt. The interaction plot (Figure 2.63) may suggest an interaction between cigbuy and incomegp where the lines cross, switching which of the cigbuy levels is higher (income levels 2, 3, and 5) or even almost not different (income levels 1 and 4). But it is not as clear as the previous examples, especially with how large the SEs are relative the variation in the means. The interaction \(F\)-test helps us objectively assess evidence against the null hypothesis of no interaction. Based on the plot, there do not appear to be differences based on cigarette purchasing but there might be some differences between the income groups if we drop the interaction from the model. If we drop the interaction, then this suggests that we might be in Scenario 2 or 3 where a single main effect of interest is present.

(ref:fig4-15) Interaction plot array of prodebt by income group (1 to 5) and whether they buy cigarettes (0=no, 1=yes).

intplotarray(prodebt~cigbuy*incomegp, data=debtRc, col=c(1,3,4,5,6), lwd=2)

Figure 2.63: (ref:fig4-15)

As in other situations, and especially with observational studies where a single large sample is collected and then the levels of the factor variables are observed, it is important to check for balance – whether all the combinations of the two predictor variables are similarly represented. Even more critically, we need to check whether all the combinations of levels of factors are measured. If a combination is not measured, then we lose the ability to estimate the mean for that combination and the ability to test for an interaction. A solution to that problem would be to collapse the categories of one of the variables, changing the definitions of the levels but if you fail to obtain information for all combinations, you can’t work with the interaction model. In this situation, we barely have enough information to proceed (the smallest \(n_{jk}\) is 13 for income group 4 that buys cigarettes). We have a very unbalanced design with counts between 13 and 60 in the different combinations, so lose some resistance to violation of assumptions but can proceed to explore the model with a critical eye on how the diagnostic plots look.

tally(cigbuy~incomegp, data=debtRc)

##       incomegp
## cigbuy  1  2  3  4  5
##      0 36 49 54 53 60
##      1 37 45 20 13 21

The test for the interaction is always how we start our modeling in Two-Way ANOVA situations. The ANOVA table suggests that there is little evidence against the null hypothesis of no interaction between the income level and buying cigarettes on the opinions of the respondents towards debt (\(F(4,378)=0.686\), p-value=0.6022). This suggests that the initial assessment that the interaction wasn’t too prominent was correct. We should move to the additive model here but first need to check the assumptions to make sure we can trust this initial test.

library(car)
debt1 <- lm(prodebt~incomegp*cigbuy, data=debtRc)
Anova(debt1)

## Anova Table (Type II tests)
## 
## Response: prodebt
##                  Sum Sq  Df F value    Pr(>F)
## incomegp         10.742   4  5.5246 0.0002482
## cigbuy            0.010   1  0.0201 0.8874246
## incomegp:cigbuy   1.333   4  0.6857 0.6022065
## Residuals       183.746 378

The diagnostic plots (Figure 2.64) seem to be pretty well-behaved with no apparent violations of the normality assumption and no clear evidence of a violation of the constant variance assumption. There is no indication of a problem with the independence assumption because there is no indication of structure to the measurements of the survey respondents that might create dependencies. In observational studies, violations of the independence assumption might come from repeated measures of the same person over time or multiple measurements within the same family/household or samples that are clustered geographically, none of which are part of the survey information we have. The random sampling from a population should allow inferences to a larger population except for that issue of removing partially missing responses so we can’t safely generalize results beyond the complete observations we are using without worry that the missing subjects are systematically different from those we are able to analyze. We also don’t have much information on the exact population sampled, so will just leave this vague here but know that there would be a population these conclusions apply since it was random sample (at least those that would answer the questions). All of this suggests proceeding to fitting and exploring the additive model is reasonable here. No causal inferences are possible because this is an observational study.

(ref:fig4-16) Diagnostic plot for prodebt by income group and buy cigarettes/not interaction model.

par(mfrow=c(2,2))
plot(debt1)

Figure 2.64: (ref:fig4-16)

After ruling out the interaction of income and cigarette status on opinions about debt, we can focus on the additive model.
Hypotheses (Two sets apply when the additive model is the focus!):
- \(H_0\): No difference in means for prodebt for income groups in population, given cigarette buying in model
  
  \(\Leftrightarrow\) All \(\tau_j\text{'s} = 0\) in additive model.
- \(H_A\): Some difference in means for prodebt for income group in population, given cigarette buying in model
  
  \(\Leftrightarrow\) Not all \(\tau_j\text{'s} = 0\) in additive model.
- \(H_0\): No difference in means for prodebt for cigarette buying/not in population, given income group in model
  
  \(\Leftrightarrow\) All \(\gamma_k\text{'s} = 0\) in additive model.
- \(H_A\): Some difference in means for prodebt for cigarette buying/not in population, given income group in model
  
  \(\Leftrightarrow\) Not all \(\gamma_k\text{'s} = 0\) in additive model.
Validity conditions – discussed above but with new plots for the additive model:
```
debt1r <- lm(prodebt~incomegp+cigbuy, data=debtRc)
par(mfrow=c(2,2))
plot(debt1r)
```
Figure 2.65: Diagnostic plot of additive model for prodebt by income group and whether they buy cigarettes/not
- Constant Variance:
  - In the Residuals vs Fitted and the Scale-Location plots in Figure 2.65, the differences in variability among groups is minor and nothing suggests a violation. If you change models, you should always revisit the diagnostic plots to make sure you didn’t create problems that were not present in more complicated models.
  - We can also explore the partial residuals here as provided in Figure 2.66. The variability in the partial residuals appears to be similar across the different levels of each predictor, controlled for the other variable, and so does suggest any issues that were missed by just looking at the overall residuals versus fitted values in our regular diagnostic plots. Note how hard it is to see differences in the mean for levels of cigbuy in this plot relative to the variability in the partial residuals but that the differences in the means in incomegp are at least somewhat obvious.
```
plot(allEffects(debt1r, residuals=T))
```
  Figure 2.66: Term-plot for additive model for prodebt by income group and whether they buy cigarettes/not with partial residuals
- Normality of residuals:
  - The QQ-Plot in Figure 2.65 does not suggest a problem with this assumption.

Calculate the test statistics and p-values for the two main effect tests.

Anova(debt1r)

## Anova Table (Type II tests)
## 
## Response: prodebt
##            Sum Sq  Df F value    Pr(>F)
## incomegp   10.742   4  5.5428 0.0002399
## cigbuy      0.010   1  0.0201 0.8872394
## Residuals 185.079 382

The test statistics are \(F(4,382)=5.54\) and \(F(1,382)=0.0201\) with p-values of 0.00024 and 0.887.

Conclusions (including for the initial work with the interaction test):
- There was initially little to no evidence against the null hypothesis of no interaction between income group and cigarette buying on pro-debt feelings (\(F(4,378)=0.686\), p-value=\(0.6022\)) so the interaction was dropped from the model. There is strong evidence against the null hypothesis of no difference in the mean pro-debt feelings in the population across the income groups, after adjusting for cigarette buying (\(F(4,382)=5.54\), p-value=\(0.00024\)), so we would conclude that there is some difference in them. There is little evidence against the null hypothesis of no difference in the mean pro-debt feelings in the population based on cigarette buying/not, after adjusting for income group (\(F(1,382)=0.0201\), p-value=\(0.887\)).

So we learned that the additive model was more appropriate for these responses and that the results resemble Scenario 2 or 3 with only one main effect being important. In the additive model, the coefficients can be interpreted as shifts from the baseline after controlling for the other variable in the model.

Size:

Figure 2.67 shows the increasing average comfort with being in debt as the income groups go up except between groups 1 and 2 where 1 is a little higher than two. Being a cigarette buyer was related to a lower comfort level with debt but is really no different from those that did not report buying cigarettes. It would be possible to consider follow-up tests akin to the Tukey’s HSD comparisons for the levels of incomegp here but that is a bit beyond the scope of this course – focus on the estimated mean for the 5^th income group being over 3.5 and none of the others over 3.2. That seems like an interesting although modest difference in mean responses across income groups after controlling for cigarette purchasing or not.

(ref:fig4-19) Term-plots for the prodebt response additive model with left panel for income group and the right panel for buying cigarettes or not (0 for no, 1 for yes).

plot(allEffects(debt1r))

Figure 2.67: (ref:fig4-19)

Scope of inference:
- Because the income group and cigarette purchasing were not (and really could not) be randomly assigned, causal inference is not possible here. The data set came from a random sample but from an unspecified population and then there were missing observations. At best we can make inferences to those in that population that would answer these questions and it would be nice to know more about the population to really understand who this actually applies to. There would certainly be concerns about non-response bias in doing inference to the entire population that these data were sampled from.

The estimated coefficients can also be interesting to interpret for the additive model. Here are the model summary coefficients:

summary(debt1r)$coefficients

##                Estimate Std. Error    t value      Pr(>|t|)
## (Intercept)  3.13127172 0.09027437 34.6861672 4.283917e-120
## incomegp2   -0.05371924 0.10860898 -0.4946114  6.211588e-01
## incomegp3    0.02680595 0.11624894  0.2305909  8.177561e-01
## incomegp4    0.09072124 0.12059542  0.7522777  4.523474e-01
## incomegp5    0.40760033 0.11392712  3.5777288  3.911633e-04
## cigbuy1     -0.01088742 0.07672982 -0.1418929  8.872394e-01

In the model, the baseline group is for non-cigarette buyers (cigbuy=0) and income group 1 with \(\hat{\alpha}= 3.131\) points. Regardless of the cigbuy level, the difference between income groups 2 and 1 is estimated to be \(\hat{\tau}_2=-0.054\), an decrease in the mean score of 0.054 points. The difference between income groups 3 and 1 is \(\hat{\tau}_3=0.027\) points, regardless of cigarette smoking status. The estimated difference between cigarette buyers and non-buyers was estimated as \(\hat{\gamma}_2=-0.011\) points for any income group, remember that this variable had a large p-value in this model. The additive model-based estimates for all six combinations can be found in Table 2.7.

(ref:tab4-3) Calculations to construct the estimates for all combinations of variables for the prodebt additive model.

Table 2.7: (ref:tab4-3)
\(\color{red}{\text{Cig}}\) \(\color{red}{\text{Buy}}\)	\(\color{blue}{\textbf{Income}}\) \(\color{blue}{\textbf{Group 1}}\)	\(\color{blue}{\textbf{Income}}\) \(\color{blue}{\textbf{Group 2}}\)	\(\color{blue}{\textbf{Income}}\) \(\color{blue}{\textbf{Group 3}}\)	\(\color{blue}{\textbf{Income}}\) \(\color{blue}{\textbf{Group 4}}\)	\(\color{blue}{\textbf{Income}}\) \(\color{blue}{\textbf{Group 5}}\)
\(\color{red}{\text{0:No}}\)	\(\hat{\alpha} ={3.131}\)	\(\hat{\alpha} + \hat{\tau}_2\) \(=3.131 - 0.016\) \(= 3.115\)	\(\hat{\alpha} + \hat{\tau}_3\) \(=3.131 + 0.027\) \(= 3.158\)	\(\hat{\alpha} + \hat{\tau}_4\) \(=3.131 + 0.091\) \(= 3.222\)	\(\hat{\alpha} + \hat{\tau}_5\) \(=3.131 + 0.408\) \(= 3.539\)
\(\color{red}{\text{1:}\text{Yes}}\)	\(\hat{\alpha}+\hat{\gamma}_2\) \(=3.131\) \(-0.011\) \(=3.12\)	\(\hat{\alpha}+\hat{\tau}_2+\hat{\gamma}_2\) \(=3.131 - 0.016\) \(-0.011\) \(=3.104\)	\(\hat{\alpha}+\hat{\tau}_3+\hat{\gamma}_2\) \(=3.131 + 0.027\) \(-0.011\) \(=3.147\)	\(\hat{\alpha}+\hat{\tau}_4+\hat{\gamma}_2\) \(=3.131 + 0.091\) \(-0.011\) \(=3.211\)	\(\hat{\alpha}+\hat{\tau}_5+\hat{\gamma}_2\) \(=3.131 + 0.408\) \(-0.011\) \(=3.528\)

One final plot of the fitted values from this additive model in Figure 2.68 hopefully crystallizes the implications of an additive model and reinforces that this model creates and assumes that the differences across levels of one variable are the same regardless of the level of the other variable and that this creates parallel lines. The difference between cigbuy levels across all income groups is a drop in -0.011 points. The income groups have the same differences regardless of cigarette buying or not, with income group 5 much higher than the other four groups. The minor differences in cigarette purchasing and large p-value for it controlled for income group suggest that we could also refine the model further and drop the cigbuy additive term and just focus on the income groups as a predictor – and this takes us right back to a One-Way ANOVA model so is not repeated here.

(ref:fig4-20) Illustration of the results from Table 2.6 showing the combined impacts of the components of the additive model for prodebt. Panel (a) uses income groups on the x-axis and different lines for cigarette buyers (1) or not (0). Panel (b) displays the different income groups as lines with the cigarette buying status on the x-axis.

Figure 2.68: (ref:fig4-20)

In general, we proceed through the following steps in any 2-WAY ANOVA situation:

Make a pirate-plot and an interaction plot.
Fit the interaction model; examine the test for the interaction.
Check the residual diagnostic plots for the interaction model (especially normality and equal variance).
- If there is a problem with normality or equal variance, consider a “transformation” of the response as discussed in Chapter ??. This can help make the responses have similar variances or responses (and the model residuals) to be more normal, but sometimes not both.
If the interaction test has a small p-value, that is your main result. Focus on the term-plot and the interaction plot from (1) to fully understand the results, adding Tukey’s HSD results to intplot to see which means of the combinations of levels are detected as being different. Discuss the sizes of differences and the pattern of the estimated interaction.
If the interaction is not considered important, then re-fit the model without the interaction (additive model) and re-check the diagnostic plots. If the diagnostics are reasonable to proceed:
- Focus on the results for each explanatory variable, using Type II tests especially if the design is not balanced. Possibly consider further model refinement to only retain one of the two variables (the one with the smaller p-value) if a p-value is large. Follow One-Way ANOVA recommendations from this point on.
- Report the initial interaction test results and the results for the test for each variable from the model that is re-fit without the interaction.
- Model coefficients in the additive model are interesting as they are shifts from baseline for each level of each variable, controlling for the other variable – interpret those differences if the number of levels is not too great.

Whether you end up favoring an additive or interaction model or do further model refinement, all steps of the hypothesis testing protocol should be engaged and a story based on the final results should be compiled, supported by the graphical displays such as the term-plots and interaction plots.

References

Faraway, Julian. 2016. Faraway: Functions and Datasets for Books by Julian Faraway. https://CRAN.R-project.org/package=faraway.

Hurlbert, Stuart H. 1984. “Pseudoreplication and the Design of Ecological Field Experiments.” Ecological Monographs 54 (2): 187–211. http://www.jstor.org/stable/1942661.

Lea, Stephen E. G., Paul Webley, and Catherine M. Walker. 1995. “Psychological Factors in Consumer Debt: Money Management, Economic Socialization, and Credit Use.” Journal of Economic Psychology 16 (4): 681–701.

Likert, Rensis. 1932. “A Technique for the Measurement of Attitudes.” Archives of Psychology 140: 1–55.