{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Validation Basics\n", "> This chapter focuses on the basics of model validation. From splitting data into training, validation, and testing datasets, to creating an understanding of the bias-variance tradeoff, we build the foundation for the techniques of K-Fold and Leave-One-Out validation practiced in chapter three. This is the Summary of lecture \"Model Validation in Python\", via datacamp.\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Machine_Learning]\n", "- image: images/train_test_score.png" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "plt.rcParams['figure.figsize'] = (8, 8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating train,test, and validation datasets\n", "- Traditional train/test split\n", " - Seen data (used for training)\n", " - Unseen data (unavailable for training)\n", "![holdout](image/holdout.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create one holdout set\n", "Your boss has asked you to create a simple random forest model on the `tic_tac_toe` dataset. She doesn't want you to spend much time selecting parameters; rather she wants to know how well the model will perform on future data. For future Tic-Tac-Toe games, it would be nice to know if your model can predict which player will win." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Top-Left | \n", "Top-Middle | \n", "Top-Right | \n", "Middle-Left | \n", "Middle-Middle | \n", "Middle-Right | \n", "Bottom-Left | \n", "Bottom-Middle | \n", "Bottom-Right | \n", "Class | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "x | \n", "x | \n", "x | \n", "x | \n", "o | \n", "o | \n", "x | \n", "o | \n", "o | \n", "positive | \n", "
1 | \n", "x | \n", "x | \n", "x | \n", "x | \n", "o | \n", "o | \n", "o | \n", "x | \n", "o | \n", "positive | \n", "
2 | \n", "x | \n", "x | \n", "x | \n", "x | \n", "o | \n", "o | \n", "o | \n", "o | \n", "x | \n", "positive | \n", "
3 | \n", "x | \n", "x | \n", "x | \n", "x | \n", "o | \n", "o | \n", "o | \n", "b | \n", "b | \n", "positive | \n", "
4 | \n", "x | \n", "x | \n", "x | \n", "x | \n", "o | \n", "o | \n", "b | \n", "o | \n", "b | \n", "positive | \n", "