{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sample grouping\n", "In this notebook we present the concept of **sample groups**. We use the\n", "handwritten digits dataset to highlight some surprising results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_digits\n", "\n", "digits = load_digits()\n", "data, target = digits.data, digits.target" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create a model consisting of a logistic regression classifier with a\n", "preprocessor to scale the data.\n", "\n", "
Note
\n", "Here we use a MinMaxScaler as we know that each pixel's gray-scale is\n", "strictly bounded between 0 (white) and 16 (black). This makes MinMaxScaler\n", "more suited in this case than StandardScaler, as some pixels consistently\n", "have low variance (pixels at the borders might almost always be zero if most\n", "digits are centered in the image). Then, using StandardScaler can result in\n", "a very high scaled value due to division by a small number.
\n", "