--- title: 'Unit 9 (Advanced Models: Decision Trees, Bagging Trees, and Random Forest)' author: "Your name here" date: "`r lubridate::today()`" format: html: embed-resources: true toc: true toc_depth: 4 editor_options: chunk_output_type: console --- ## Assignment overview To begin, download the following from the course web book (Unit 9): * `hw_unit_09_random_forest.qmd` (notebook for this assignment) * `fifa.csv` (training & test data for this assignment) The data for this week's assignment provide rankings data of FIFA soccer players. While the entire database is available [online][https://www.fifaindex.com/], we subsetted the data down to a random sample of 2000 players for the sake of computational costs. The outcome variable in this dataset, `overall` is the overall quality rating for the player out of 100. This file also includes 15 predictor variables that describe differing attributes of each player (e.g., `age`, `nationality`, `dribbling`) as well as an ID variable listing the `name` of the player. The goal of this assignment is to build decision tree and random forest regression models to predict overall player quality of FIFA soccer players. Let's get started! ----- ## Setup Be sure to initiate parallel processing and set up your path ```{r} ``` ## Prep data Prepare the data to be used to fit a decision tree and random forest model to predict `overall` from all predictor variables. * Make sure variable names are clean * Make sure your code is reproducible * Split your data into `overall` training and test sets * Set up a recipe that works for both a decision tree and random forest to predict `overall` from all predictor variables. Include the minimum necessary steps you would need in your recipe to fit each model (i.e., do the fewest amount of feature engineering steps possible; you do not need to do any additional/advanced feature engineering) ```{r} ``` ## Decision Tree Fit a decision tree to predict `overall` from all predictor variables in your training data. * Use the `rpart` engine * Use `rmse` as your metric * Tune on `cost_complexity` and `min_n` using 100 bootstraps. Set `tree_depth` to 10 (Practically, you would never do this. This is just to lower run time of your models) * Provide visualization(s) (graph(s)) to support that you considered a wide enough range of hyperparameter values * Print your best hyperparameter values and the held out/out of bag training RMSE ```{r} ``` ## Random Forest Now let's see what bagging and decorrelating can do for this model. Fit a random forest model to predict `overall` from all predictor variables in your training data. * Use the `ranger` engine * Use `rmse` as your metric * Tune on `mtry` and `min_n` using 100 bootstraps. Set `trees` to 100 (Practically, you would never do this. This is just to lower run time of your models) * Provide visualization(s) (graph(s)) to support that you considered a wide enough range of hyperparameter values * Print your best hyperparameter values and the held out/out of bag training RMSE ```{r} ``` ## 4. Evaluate best model in test * Select your best model based on bootstrapped performance in the training data * Evaluate the performance of this best model in your held out test set * Print the RMSE of your best model in test * Provide a visualization (graph) of your model's performance in test ```{r} ``` Great Work!