---
title: "Conditional Probability"
author: "Your name here!"
date: "02/17/2025"

---

**Abstract:**

This is a technical blog post of **both** an HTML file *and* [.qmd file](https://raw.githubusercontent.com/cd-public/D505/refs/heads/master/hws/src/cond.qmd) hosted on GitHub pages.

# 0. Quarto Type-setting

- This document is rendered with Quarto, and configured to embed an images using the `embed-resources` option in the header.
- If you wish to use a similar header, here's is the format specification for this document:

```email
format: 
  html:
    embed-resources: true
```

# 1. Setup

**Step Up Code:**

```{r}
sh <- suppressPackageStartupMessages
sh(library(tidyverse))
sh(library(caret))
wine <- readRDS(gzcon(url("https://github.com/cd-public/D505/raw/master/dat/pinot.rds")))
```

# 2. Conditional Probability

Calculate the probability that a Pinot comes from Burgundy given it has the word 'fruit' in the description.

$$
P({\rm Burgundy}~|~{\rm Fruit})
$$

```{r}
# TODO
```

# 3. Naive Bayes Algorithm

We train a naive bayes algorithm to classify a wine's province using:
1. An 80-20 train-test split.
2. Three features engineered from the description
3. 5-fold cross validation.

We report Kappa after using the model to predict provinces in the holdout sample.

```{r}
# TODO
```


# 4. Frequency Differences

We find the three words that most distinguish New York Pinots from all other Pinots.

```{r}
# TODO
```

# 5. Extension

> Either do this as a bonus problem, or delete this section.

Calculate the variance of the logged word-frequency distributions for each province.