--- title: "Assignment 03" subtitle: "EVIDENCE AND MODEL SELECTION" author: output: html_document: highlight: zenburn css: ['style/assignment-style.css', 'style/syntax.css'] --- The goal of this assignment is to build your understanding of using information criteria for model selection. In this assignment, you will use the data from the file *wine.csv* to examine several different predictors of wine rating (a measure of the wine's quality). The literature has suggested that price of wine is quite predictive of a wine's quality. You will be carrying out a replication study (using a different data set) of a study published by Snipes and Taylor (2014). - [[CSV]](https://raw.githubusercontent.com/zief0002/epsy-8252/master/data/wine.csv) - [[Data Codebook]](http://zief0002.github.io/epsy-8252/codebooks/wine.html) ```{r echo=FALSE, out.width="50%", fig.align='center'} knitr::include_graphics("figs/assign-03.png") ``` ## Instructions Submit either an HTML file or, if you are not using R Markdown, a PDF file of your responses to the following questions. Please adhere to the following guidelines for further formatting your assignment: - All graphics should be resized so that they do not take up more room than necessary and should have an appropriate caption. - Any typed mathematics (equations, matrices, vectors, etc.) should be appropriately typeset within the document using Markdown's equation typesetting. - All syntax should be hidden (i.e., not displayed) unless specifically asked for. - Any messages or warnings produced (e.g., from loading packages) should also be hidden. This assignment is worth 15 points.
## Preparation Read the article [Model selection and Akaike Information Criteria: An example from wine ratings and prices](http://www.sciencedirect.com/science/article/pii/S2212977414000064).
Fit the same nine candidate models that Snipes and Taylor fitted in their analysis, using the *wine.csv* data. In these models use wine rating (`rating`) as the outcome. The point is not to replicate their exact data, but to use the same set of predictors---even though in our dataset the predictors have different levels (e.g., our data includes more regions than Snipes and Taylor's data, and we will treat year as a continuous variable; don't categorize it). By using a different set of data we can more vigorously evaluate the underlying working hypotheses.
## Model Selection: Likelihood Framework of Evidence 1. Compute and report the likelihood for Model 1 given the residuals and set of model assumptions. Use `dnorm()` for this computation, and show your syntax for full credit. 2. Create a table of the log-likelihoods for the nine candidate models. (Use the `logLik()` function to compute these values.) 3. Compute and interpret the likelihood ratio for comparing the empirical support between Model 3 and Model 4. 4. Can we carry out a likelihood ratio test to evaluate whether the amount of empirical support when comparing Model 3 and Model 4 is more than we expect because of sampling error? If so, compute and report the results from the $\chi^2$-test. If not, explain why not. 5. Compute and interpret the likelihood ratio for comparing the empirical support between Model 3 and Model 6. 6. Can we carry out a likelihood ratio test to evaluate whether the amount of empirical support when comparing Model 3 and Model 6 is more than we expect because of sampling error? If so, compute and report the results from the $\chi^2$-test. If not, explain why not.
## Model Selection: Information Criteria 7. Create a table of model evidence that includes the following information for each of the nine candidate models. **(2pts.)** - Model - Log-likelihood - K - AICc - $\Delta$AICc - Model Probability Use this table of model evidence to answer Questions 8--14. 8. Use the AICc values to select the working hypothesis with the most empirical evidence. 9. Interpret the model probability/AICc weight for the working hypothesis with the most empirical evidence. 10. Compute and interpret the evidence ratio that compares the two working hypotheses with the most empirical evidence. 11. Based on previous literature, Snipes and Taylor hypothesized that price was an important predictor of wine quality. Based on your analyses, is price an important predictor of wine quality? Justify your response by referring to the model evidence. (Hint: Pay attention to which models include price and which do not.) 12. Does the empirical evidence support adopting more than one working hypothesis? Justify your response by referring to the model evidence. 13. Does the empirical evidence from the Snipes and Taylor analyses support adopting more than one candidate model? Justify your response by by referring to the model evidence. 14. Based on your responses to the last two questions, which set of analyses (yours or Snipes and Taylor) has more model selection uncertainty? Explain.