---
title: $K$NN
author: "Your name here!"
date: "02/10/2025"
---
**Abstract:**
This is a technical blog post of **both** an HTML file *and* [.qmd file](https://raw.githubusercontent.com/cd-public/D505/refs/heads/master/hws/src/knn.qmd) hosted on GitHub pages.
# 0. Quarto Type-setting
- This document is rendered with Quarto, and configured to embed an images using the `embed-resources` option in the header.
- If you wish to use a similar header, here's is the format specification for this document:
```email
format:
html:
embed-resources: true
```
# 1. Setup
```{r}
library(tidyverse)
library(caret)
wine <- readRDS(gzcon(url("https://github.com/cd-public/D505/raw/master/dat/pinot.rds")))
```
## 2. $K$NN Concepts
> TODO: *Explain how the choice of K affects the quality of your prediction when using a $K$ Nearest Neighbors algorithm.*
## 3. Feature Engineering
1. Create a version of the year column that is a *factor* (instead of numeric).
2. Create dummy variables that indicate the presence of "cherry", "chocolate" and "earth" in the description.
- Take care to handle upper and lower case characters.
3. Create 3 new features that represent the interaction between *time* and the cherry, chocolate and earth inidicators.
4. Remove the description column from the data.
```{r}
# your code here
```
## 4. Preprocessing
1. Preprocess the dataframe from the previous code block using BoxCox, centering and scaling of the numeric features
2. Create dummy variables for the `year` factor column
```{r}
# your code here
```
## 5. Running $K$NN
1. Split the dataframe into an 80/20 training and test set
2. Use Caret to run a $K$NN model that uses our engineered features to predict province
- use 5-fold cross validated subsampling
- allow Caret to try 15 different values for $K$
3. Display the confusion matrix on the test data
```{r}
```
## 6. Kappa
How do we determine whether a Kappa value represents a good, bad or some other outcome?
> TODO: *Explain*
## 7. Improvement
How can we interpret the confusion matrix, and how can we improve in our predictions?
> TODO: *Explain*