{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Machine Learning overview - assignment 2" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Digit Recognizer\n", "\n", "Although this is a computer vision problem, we employ here a simple model using **K-Nearest Neighbors** algorithm in this notebook to be a good starting point. We use the **GridSearchCV** to fine tune the hyperparameters such as *\"n_neighbors\", and \"weights\"*. Furthermore, we use **Data Augmentation** or **Artificial Data Synthesis** technique in this notebook to boost the model's performance on the test set." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5" }, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import numpy as np # Linear algebra\n", "import pandas as pd # For data manipulation\n", "import json\n", "import os\n", "import matplotlib.pyplot as plt # For visualization\n", "from sklearn.neighbors import KNeighborsClassifier # For modelling\n", "from sklearn.model_selection import cross_val_score, GridSearchCV # For evaluation and hyperparameter tuning\n", "from sklearn.metrics import confusion_matrix, classification_report # For evaluation\n", "from scipy.ndimage import shift, rotate, zoom # For data augmentation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Peeking the data**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Loading the datasets into dataframes" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "_cell_guid": "79c7e3d0-c299-4dcb-8224-4455121ee9b0", "_uuid": "d629ff2d2480ee46fbb7e2d37f6b5fab8052498a" }, "outputs": [], "source": [ "train_df = pd.read_csv(\n", " \"https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/mnist_train.csv\"\n", ")\n", "\n", "test_df = pd.read_csv(\n", " \"https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/mnist_test.csv\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Knowing about the features in the datasets" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_df.info()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_df.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Converting the train and test dataframes into numpy arrays" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train = train_df.iloc[:6000, 1:].values\n", "y_train = train_df.iloc[:6000, 0].values\n", "X_test = test_df.iloc[:1000, 1:].values\n", "y_test = test_df.iloc[:1000, 0].values\n", "\n", "print(f\"X_train shape: {X_train.shape}\")\n", "print(f\"y_train shape: {y_train.shape}\")\n", "print(f\"X_test shape: {X_test.shape}\")\n", "print(f\"y_test shape: {y_test.shape}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Visualizing a digit from the training data as a 28 X 28 image" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "some_digit = X_train[46]\n", "\n", "some_digit_image = some_digit.reshape(28, 28)\n", "print(f\"Label: {y_train[40]}\")\n", "plt.imshow(some_digit_image, cmap=\"binary\")\n", "plt.show()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Train Model**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "estimator = KNeighborsClassifier()\n", "estimator.fit(X_train, y_train)\n", "predictions = estimator.predict(X_test)\n", "\n", "print(classification_report(y_test, predictions, digits=3), end=\"\\n\\n\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(confusion_matrix(y_test, predictions), end=\"\\n\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fine-tuning the model by finding the best values for the hyperparameters (weights, n_neighbors) using GridSearchCV" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "grid_params = {\n", " \"weights\": ['distance'],\n", " \"n_neighbors\": [3, 5, 7, 9, 11]\n", "}\n", "\n", "estimator = KNeighborsClassifier()\n", "grid_estimator = GridSearchCV(estimator, # Base estimator\n", " grid_params, # Parameters to tune\n", " verbose=2, # Verbosity of the logs\n", " n_jobs=-1) # Number of jobs to be run concurrently with -1 meaning all the processors\n", "\n", "# Fitting the estimator with training data\n", "grid_estimator.fit(X_train, y_train)\n", "\n", "print(f\"Best Score: {grid_estimator.best_score_}\", end=\"\\n\\n\")\n", "print(f\"Best Parameters: \\n{json.dumps(grid_estimator.best_params_, indent=4)}\",\n", " end=\"\\n\\n\")\n", "print(\"Grid Search CV results:\")\n", "results_df = pd.DataFrame(grid_estimator.cv_results_)\n", "results_df" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Best parameter values found:** {n_neighbors: 3, weights: 'distance'}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fitting a new model with the found hyperparameter values to the training data and making predictions on the test data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "estimator = KNeighborsClassifier(n_neighbors=3, weights='distance')\n", "estimator.fit(X_train, y_train)\n", "predictions = estimator.predict(X_test)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "confusion_matrix(y_test, predictions)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(classification_report(y_test, predictions, digits=3), end=\"\\n\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Data Augmentation**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each image in the training set is \n", "\n", "* shifted down, up, left and right by one pixel\n", "* rotated clockwise and anti-clockwise \n", "* clipped and zoomed at two different ranges\n", "\n", "generating eight different images. The image is clipped before zooming to preserve the image size." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def shift_in_one_direction(image, direction):\n", " \"\"\"\n", " Shifts an image by one pixel in the specified direction\n", " \"\"\"\n", " if direction == \"DOWN\":\n", " image = shift(image, [1, 0])\n", " elif direction == \"UP\":\n", " image = shift(image, [-1, 0])\n", " elif direction == \"LEFT\":\n", " image = shift(image, [0, -1])\n", " else:\n", " image = shift(image, [0, 1])\n", "\n", " return image\n", "\n", "\n", "def shift_in_all_directions(image):\n", " \"\"\"\n", " Shifts an image in all the directions by one pixel\n", " \"\"\"\n", " reshaped_image = image.reshape(28, 28)\n", "\n", " down_shifted_image = shift_in_one_direction(reshaped_image, \"DOWN\")\n", " up_shifted_image = shift_in_one_direction(reshaped_image, \"UP\")\n", " left_shifted_image = shift_in_one_direction(reshaped_image, \"LEFT\")\n", " right_shifted_image = shift_in_one_direction(reshaped_image, \"RIGHT\")\n", "\n", " return (down_shifted_image, up_shifted_image,\n", " left_shifted_image, right_shifted_image)\n", "\n", "\n", "def rotate_in_all_directions(image, angle):\n", " \"\"\"\n", " Rotates an image clockwise and anti-clockwise\n", " \"\"\"\n", " reshaped_image = image.reshape(28, 28)\n", " \n", " rotated_images = (rotate(reshaped_image, angle, reshape=False),\n", " rotate(reshaped_image, -angle, reshape=False))\n", " \n", " return rotated_images\n", "\n", "\n", "def clipped_zoom(image, zoom_ranges):\n", " \"\"\"\n", " Clips and zooms an image at the specified zooming ranges\n", " \"\"\"\n", " reshaped_image = image.reshape(28, 28)\n", " \n", " h, w = reshaped_image.shape\n", " \n", " zoomed_images = []\n", " for zoom_range in zoom_ranges:\n", " zh = int(np.round(h / zoom_range))\n", " zw = int(np.round(w / zoom_range))\n", " top = (h - zh) // 2\n", " left = (w - zw) // 2\n", " \n", " zoomed_images.append(zoom(reshaped_image[top:top+zh, left:left+zw],\n", " zoom_range))\n", " \n", " return zoomed_images\n", "\n", "def alter_image(image):\n", " \"\"\"\n", " Alters an image by shifting, rotating, and zooming it\n", " \"\"\"\n", " shifted_images = shift_in_all_directions(image)\n", " rotated_images = rotate_in_all_directions(image, 10)\n", " zoomed_images = clipped_zoom(image, [1.1, 1.2])\n", " \n", " return np.r_[shifted_images, rotated_images, zoomed_images]\n", "\n", "X_train_add = np.apply_along_axis(alter_image, 1, X_train).reshape(-1, 784)\n", "y_train_add = np.repeat(y_train, 8)\n", "\n", "print(f\"X_train_add shape: {X_train_add.shape}\")\n", "print(f\"y_train_add shape: {y_train_add.shape}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Combining the synthesized data with the actual training data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train_combined = np.r_[X_train, X_train_add]\n", "y_train_combined = np.r_[y_train, y_train_add]\n", "\n", "del X_train\n", "del X_train_add\n", "del y_train\n", "del y_train_add\n", "\n", "print(f\"X_train_combined shape: {X_train_combined.shape}\")\n", "print(f\"y_train_combined shape: {y_train_combined.shape}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fitting a new model with the tuned hyperparameters to the combined dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cdata_estimator = KNeighborsClassifier(n_neighbors=3, weights='distance')\n", "cdata_estimator.fit(X_train_combined, y_train_combined)\n", "cdata_estimator_predictions = cdata_estimator.predict(X_test)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "confusion_matrix(y_test, cdata_estimator_predictions)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(classification_report(y_test, cdata_estimator_predictions, digits=3), end=\"\\n\\n\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** With **Data Augmentation** the accuracy jumped from 91.6% to 95.3% on the test data." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Acknowledgments\n", "\n", "Thanks to SkalskiP for creating the open-source [Kaggle jupyter notebook](https://www.kaggle.com/code/gauthampughazh/digit-recognition-using-knn), licensed under Apache 2.0. It inspires the majority of the content of this assignment." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }