--- title: "Conditional Probability" author: "Your name here!" date: "02/17/2025" --- **Abstract:** This is a technical blog post of **both** an HTML file *and* [.qmd file](https://raw.githubusercontent.com/cd-public/D505/refs/heads/master/hws/src/cond.qmd) hosted on GitHub pages. # 0. Quarto Type-setting - This document is rendered with Quarto, and configured to embed an images using the `embed-resources` option in the header. - If you wish to use a similar header, here's is the format specification for this document: ```email format: html: embed-resources: true ``` # 1. Setup **Step Up Code:** ```{r} sh <- suppressPackageStartupMessages sh(library(tidyverse)) sh(library(caret)) wine <- readRDS(gzcon(url("https://github.com/cd-public/D505/raw/master/dat/pinot.rds"))) ``` # 2. Conditional Probability Calculate the probability that a Pinot comes from Burgundy given it has the word 'fruit' in the description. $$ P({\rm Burgundy}~|~{\rm Fruit}) $$ ```{r} # TODO ``` # 3. Naive Bayes Algorithm We train a naive bayes algorithm to classify a wine's province using: 1. An 80-20 train-test split. 2. Three features engineered from the description 3. 5-fold cross validation. We report Kappa after using the model to predict provinces in the holdout sample. ```{r} # TODO ``` # 4. Frequency Differences We find the three words that most distinguish New York Pinots from all other Pinots. ```{r} # TODO ``` # 5. Extension > Either do this as a bonus problem, or delete this section. Calculate the variance of the logged word-frequency distributions for each province.