{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Overfitting II\n", "\n", "Last time, we saw a theoretical example of *overfitting*, in which we fit a machine learning model that perfectly fit the data it saw, but performed extremely poorly on fresh, unseen data. In this lecture, we'll observe overfitting in a more practical context, using the Titanic data set again. We'll then begin to study *validation* techniques for finding models with \"just the right amount\" of flexibility. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from matplotlib import pyplot as plt\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "Siblings/Spouses Aboard | \n", "Parents/Children Aboard | \n", "Fare | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "3 | \n", "Mr. Owen Harris Braund | \n", "male | \n", "22.0 | \n", "1 | \n", "0 | \n", "7.2500 | \n", "
1 | \n", "1 | \n", "1 | \n", "Mrs. John Bradley (Florence Briggs Thayer) Cum... | \n", "female | \n", "38.0 | \n", "1 | \n", "0 | \n", "71.2833 | \n", "
2 | \n", "1 | \n", "3 | \n", "Miss. Laina Heikkinen | \n", "female | \n", "26.0 | \n", "0 | \n", "0 | \n", "7.9250 | \n", "
3 | \n", "1 | \n", "1 | \n", "Mrs. Jacques Heath (Lily May Peel) Futrelle | \n", "female | \n", "35.0 | \n", "1 | \n", "0 | \n", "53.1000 | \n", "
4 | \n", "0 | \n", "3 | \n", "Mr. William Henry Allen | \n", "male | \n", "35.0 | \n", "0 | \n", "0 | \n", "8.0500 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
882 | \n", "0 | \n", "2 | \n", "Rev. Juozas Montvila | \n", "male | \n", "27.0 | \n", "0 | \n", "0 | \n", "13.0000 | \n", "
883 | \n", "1 | \n", "1 | \n", "Miss. Margaret Edith Graham | \n", "female | \n", "19.0 | \n", "0 | \n", "0 | \n", "30.0000 | \n", "
884 | \n", "0 | \n", "3 | \n", "Miss. Catherine Helen Johnston | \n", "female | \n", "7.0 | \n", "1 | \n", "2 | \n", "23.4500 | \n", "
885 | \n", "1 | \n", "1 | \n", "Mr. Karl Howell Behr | \n", "male | \n", "26.0 | \n", "0 | \n", "0 | \n", "30.0000 | \n", "
886 | \n", "0 | \n", "3 | \n", "Mr. Patrick Dooley | \n", "male | \n", "32.0 | \n", "0 | \n", "0 | \n", "7.7500 | \n", "
887 rows × 8 columns
\n", "