---
title: 'Unit 9 (Advanced Models: Decision Trees, Bagging Trees, and Random Forest)'
author: "Your name here"
date: "`r lubridate::today()`"
format: 
  html: 
    embed-resources: true
    toc: true 
    toc_depth: 4
editor_options: 
  chunk_output_type: console
---

## Assignment overview

To begin, download the following from the course web book (Unit 9):

* `hw_unit_09_random_forest.qmd` (notebook for this assignment)

* `fifa.csv` (training & test data for this assignment)


The data for this week's assignment provide rankings data of FIFA soccer players. While the entire database is available [online][https://www.fifaindex.com/], we subsetted the data down to a random sample of 2000 players for the sake of computational costs.

The outcome variable in this dataset, `overall` is the overall quality rating for the player out of 100. This file also includes 15 predictor variables that describe differing attributes of each player (e.g., `age`, `nationality`, `dribbling`) as well as an ID variable listing the `name` of the player.

The goal of this assignment is to build decision tree and random forest regression models to predict overall player quality of FIFA soccer players.

Let's get started!

-----

## Setup

Be sure to initiate parallel processing and set up your path
```{r}


```

## Prep data

Prepare the data to be used to fit a decision tree and random forest model to predict `overall` from all predictor variables. 

* Make sure variable names are clean

* Make sure your code is reproducible

* Split your data into `overall` training and test sets

* Set up a recipe that works for both a decision tree and random forest to predict `overall` from all predictor variables. Include the minimum necessary steps you would need in your recipe to fit each model (i.e., do the fewest amount of feature engineering steps possible; you do not need to do any additional/advanced feature engineering)

```{r}


```

## Decision Tree

Fit a decision tree to predict `overall` from all predictor variables in your training data.

* Use the `rpart` engine

* Use `rmse` as your metric

* Tune on `cost_complexity` and `min_n` using 100 bootstraps. Set `tree_depth` to 10 (Practically, you would never do this. This is just to lower run time of your models)

* Provide visualization(s) (graph(s)) to support that you considered a wide enough range of  hyperparameter values

* Print your best hyperparameter values and the held out/out of bag training RMSE

```{r}


```

## Random Forest

Now let's see what bagging and decorrelating can do for this model. Fit a random forest model to predict `overall` from all predictor variables in your training data.

* Use the `ranger` engine

* Use `rmse` as your metric

* Tune on `mtry` and `min_n` using 100 bootstraps. Set `trees` to 100 (Practically, you would never do this. This is just to lower run time of your models)

* Provide visualization(s) (graph(s)) to support that you considered a wide enough range of  hyperparameter values

* Print your best hyperparameter values and the held out/out of bag training RMSE

```{r}


```

## 4. Evaluate best model in test

* Select your best model based on bootstrapped performance in the training data

* Evaluate the performance of this best model in your held out test set

* Print the RMSE of your best model in test

* Provide a visualization (graph) of your model's performance in test

```{r}


```

Great Work!