--- title: "Assignment 03" subtitle: "EVIDENCE AND MODEL SELECTION" format: html: css: "assets/styles.css" date: "`r format(Sys.Date(), '%B %d, %Y')`" ---
The goal of this assignment is to build your understanding of using information criteria for model selection. In this assignment, you will use the data from the file *wine.csv* to examine several different predictors of wine rating (a measure of the wine's quality). The literature has suggested that price of wine is quite predictive of a wine's quality. - [[CSV]](https://raw.githubusercontent.com/zief0002/bespectacled-antelope/main/data/wine.csv) - [[Data Codebook]](http://zief0002.github.io/bespectacled-antelope/codebooks/wine.html) ```{r} #| echo: false #| out-width: "50%" #| fig-align: "center" knitr::include_graphics("figs/assign-03.png") ``` ## Instructions Submit either your QMD and HTML file or, if you are not using Quarto, a PDF file of your responses to the following questions. Please adhere to the following guidelines for further formatting your assignment: - All graphics should be resized so that they do not take up more room than necessary and should have an appropriate caption. - Any typed mathematics (equations, matrices, vectors, etc.) should be appropriately typeset within the document using Markdown's equation typesetting. - All syntax should be hidden (i.e., not displayed) unless specifically asked for. - Any messages or warnings produced (e.g., from loading packages) should also be hidden. This assignment is worth 15 points.
## Preparation Read the article: Snipes, M., & Taylor, D. C. (2014). Model selection and Akaike Information Criteria: An example from wine ratings and prices. *Wine Economics and Policy, 3*(1), 3--9. https://doi.org/10.1016/j.wep.2014.03.001 You will be carrying out a form of a robustness study to evaluate the working hypotheses proposed in Snipes and Taylor (2014). In this study, we will fit the same candidate models that Snipes and Taylor fitted in their analysis, however we will be using a different data set (e.g., our data includes more regions than Snipes and Taylor’s data). We will also make one change in the analysis, and that is we will treat vintage as a continuous variable; we won’t categorize it like Snipes and Taylor did. By using a different set of data and making slightly different analytic decisions we can more vigorously evaluate the underlying working hypotheses. Fit the same nine candidate models that Snipes and Taylor fitted in their analysis, using the *wine.csv* data. In these models use wine rating (`rating`) as the outcome. Remember to treat `vintage` as a continuous variable in the models that include it. These models will be named Model 0, Model 1, ..., Model 8 to be consistent with the naming in the Snipes and Taylor paper. Use these fitted models to answer the questions in the assignment.
## Model Selection: Likelihood Framework of Evidence 1. Compute and report the likelihood for Model 0 given the residuals and set of model assumptions. Use `dnorm()` for this computation, and show your syntax for full credit. 2. Create a table of the log-likelihoods for the nine candidate models. (Use the `logLik()` function to compute these values.) 3. Compute and interpret the likelihood ratio for comparing the empirical support between Model 2 and Model 3. 4. Can we carry out a likelihood ratio test to evaluate whether the amount of empirical support when comparing Model 2 and Model 3 is more than we expect because of sampling error? If so, compute and report the results from the $\chi^2$-test. If not, explain why not. 5. Compute and interpret the likelihood ratio for comparing the empirical support between Model 2 and Model 5. 6. Can we carry out a likelihood ratio test to evaluate whether the amount of empirical support when comparing Model 2 and Model 5 is more than we expect because of sampling error? If so, compute and report the results from the $\chi^2$-test. If not, explain why not.
## Model Selection: Information Criteria 7. Create a table of model evidence that includes the following information for each of the nine candidate models. **(2pts.)** - Model - Log-likelihood - *K* - AICc - $\Delta$ AICc - Model Probability (AICc Weight) Use this table of model evidence to answer Questions 8--14. 8. Use the AICc values to select the working hypothesis with the most empirical evidence. 9. Interpret the model probability (i.e., AICc weight) for the working hypothesis with the most empirical evidence. 10. Compute and interpret the evidence ratio that compares the two working hypotheses with the most empirical evidence. 11. Based on previous literature, Snipes and Taylor hypothesized that price was an important predictor of wine quality. Based on your analyses, is price an important predictor of wine quality? Justify your response by referring to the model evidence. (Hint: Pay attention to which models include price and which do not.) 12. Does the empirical evidence support adopting more than one working hypothesis? Justify your response by referring to the model evidence. 13. Does the empirical evidence from the Snipes and Taylor analyses support adopting more than one candidate model? Justify your response by by referring to the model evidence. 14. Based on your responses to the last two questions, which set of analyses (yours or Snipes and Taylor) has more model selection uncertainty? Explain.
## What to Submit You need to submit a zipped version of your entire `assignment-01` project directory. When the TA unzips this and opens your R project file they will render your QMD document. This should produce the HTML file that will be graded. (You can include your HTML file as an extra attachment if you want, but the QMD document will need to render. If it doesn't render the TA will return it to you to try again.)