{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Building neural network classifier with Keras" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports\n", "First let's import some prerequisites" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "from keras.datasets import mnist\n", "from keras.models import Sequential\n", "from keras.layers import Dense, Dropout, Activation\n", "from keras.layers import Conv2D, MaxPooling2D, Flatten\n", "from keras.utils import to_categorical\n", "from keras import backend as K" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.rcParams['figure.figsize'] = (10,10) # Make the figures a bit bigger" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load training data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nb_classes = 10\n", "\n", "# the data, shuffled and split between tran and test sets\n", "(X_train, y_train), (X_test, y_test) = mnist.load_data()\n", "print(\"X_train original shape\", X_train.shape)\n", "print(\"y_train original shape\", y_train.shape)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# input image dimensions\n", "img_rows, img_cols = 28, 28" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at some examples of the training data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i in range(9):\n", " plt.subplot(3,3,i+1)\n", " plt.imshow(X_train[i], cmap='gray', interpolation='none')\n", " plt.title(\"Class {}\".format(y_train[i]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Format the data for training" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we are building Fully Connected network. Our neural-network is going to take a single vector for each training example, so we need to reshape the input so that each 28x28 image becomes a single 784 dimensional vector. We'll also scale the inputs to be in the range [0-1] rather than [0-255]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train1 = X_train.reshape(60000, 784)\n", "X_test1 = X_test.reshape(10000, 784)\n", "X_train1 = X_train1.astype('float32')\n", "X_test1 = X_test1.astype('float32')\n", "X_train1 /= 255\n", "X_test1 /= 255\n", "print(\"Training matrix shape\", X_train1.shape)\n", "print(\"Testing matrix shape\", X_test1.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Modify the target matrices to be in the one-hot format, i.e.\n", "\n", "0 -> [1, 0, 0, 0, 0, 0, 0, 0, 0]\n", "\n", "1 -> [0, 1, 0, 0, 0, 0, 0, 0, 0]\n", "\n", "2 -> [0, 0, 1, 0, 0, 0, 0, 0, 0]\n", "etc." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Y_train = to_categorical(y_train, nb_classes)\n", "Y_test = to_categorical(y_test, nb_classes)\n", "print(\"Y_train shape\", Y_train.shape)\n", "print(\"Y_test shape\", Y_test.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build the neural network" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we'll do a simple 3 layer fully connected (Dense) network.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model1 = Sequential()\n", "model1.add(Dense(128, input_shape=(784,)))\n", "model1.add(Activation('relu')) # An \"activation\" is just a non-linear function applied to the output\n", " # of the layer above. Here, with a \"rectified linear unit\",\n", " # we clamp all values below 0 to 0.\n", " \n", "#model1.add(Dropout(0.2)) # Dropout helps protect the model from memorizing or \"overfitting\" the training data\n", "model1.add(Dense(128))\n", "model1.add(Activation('relu'))\n", "#model1.add(Dropout(0.2))\n", "model1.add(Dense(10))\n", "model1.add(Activation('softmax')) # This special \"softmax\" activation among other things,\n", " # ensures the output is a valid probaility distribution, that is\n", " # that its values are all non-negative and sum to 1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compile the model\n", "When compiing a model, Keras asks you to specify your loss function and your optimizer. The loss function we'll use here is called categorical crossentropy, and is a loss function well-suited to comparing two probability distributions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set hyperparameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "batch_size = 128\n", "epochs = 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training!\n", "This is the fun part: you can feed the training data loaded in earlier into this model and it will learn to classify digits" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model1.fit(X_train1, Y_train,\n", " batch_size=batch_size, epochs=epochs, verbose=1,\n", " validation_data=(X_test1, Y_test))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model1.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate its performance" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "score1 = model1.evaluate(X_test1, Y_test, verbose=1)\n", "print('Test score:', score1[0])\n", "print('Test accuracy:', score1[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inspecting the output\n", "It's always a good idea to inspect the output and make sure everything looks sane. Here we'll look at some examples it gets right, and some examples it gets wrong." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The predict_classes function outputs the highest probability class\n", "# according to the trained classifier for each input example.\n", "predicted_classes1 = model1.predict(X_test1, verbose=1)\n", "predicted_classes1 = np.argmax(predicted_classes1, axis=1)\n", "# Check which items we got right / wrong\n", "correct_indices1 = np.nonzero(predicted_classes1 == y_test)[0]\n", "incorrect_indices1 = np.nonzero(predicted_classes1 != y_test)[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure()\n", "for i, correct in enumerate(correct_indices1[:9]):\n", " plt.subplot(3,3,i+1)\n", " plt.imshow(X_test1[correct].reshape(28,28), cmap='gray', interpolation='none')\n", " plt.title(\"Predicted {}, Class {}\".format(predicted_classes1[correct], y_test[correct]))\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure()\n", "for i, incorrect in enumerate(incorrect_indices1[:9]):\n", " plt.subplot(3,3,i+1)\n", " plt.imshow(X_test1[incorrect].reshape(28,28), cmap='gray', interpolation='none')\n", " plt.title(\"Predicted {}, Class {}\".format(predicted_classes1[incorrect], y_test[incorrect]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Different approach: CNN" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Format the data for training\n", "Now we are building Convolutional Neural Network (CNN). This type of neural-network is going to take an image for each training example, so we pass the input as 28x28 image. We'll also scale the inputs to be in the range [0-1] rather than [0-255]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if K.image_data_format() == 'channels_first':\n", " X_train2 = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)\n", " X_test2 = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)\n", " input_shape = (1, img_rows, img_cols)\n", "else:\n", " X_train2 = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)\n", " X_test2 = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)\n", " input_shape = (img_rows, img_cols, 1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train2 = X_train2.astype('float32')\n", "X_test2 = X_test2.astype('float32')\n", "X_train2 /= 255\n", "X_test2 /= 255\n", "print('X_train shape:', X_train2.shape)\n", "print(X_train2.shape[0], 'train samples')\n", "print(X_test2.shape[0], 'test samples')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Output stays the same - encoded as one-hot vectors." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Y_train shape\", Y_train.shape)\n", "print(\"Y_test shape\", Y_test.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Building CNN\n", "Here we'll do a Convolutional Neural Network (CNN) with 2 conv layers followed by MaxPooling.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model2 = Sequential()\n", "model2.add(Conv2D(16, kernel_size=(3, 3),\n", " activation='relu',\n", " input_shape=input_shape))\n", "#model2.add(Conv2D(64, (3, 3), activation='relu'))\n", "model2.add(MaxPooling2D(pool_size=(2, 2)))\n", "model2.add(Flatten())\n", "#model2.add(Dense(128, activation='relu'))\n", "model2.add(Dense(nb_classes, activation='softmax'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model2.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compile CNN model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model2.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# hyperparameters\n", "batch_size = 128\n", "epochs = 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train CNN" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model2.fit(X_train2, Y_train,\n", " batch_size=batch_size,\n", " epochs=epochs,\n", " verbose=1,\n", " validation_data=(X_test2, Y_test))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Evalate CNN" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "score = model2.evaluate(X_test2, Y_test, verbose=1)\n", "print('Test loss:', score[0])\n", "print('Test accuracy:', score[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inspecting the output" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The predict_classes function outputs the highest probability class\n", "# according to the trained classifier for each input example.\n", "predicted_classes2 = model2.predict(X_test2, verbose=1)\n", "predicted_classes2 = np.argmax(predicted_classes2, axis=1)\n", "# Check which items we got right / wrong\n", "correct_indices2 = np.nonzero(predicted_classes2 == y_test)[0]\n", "incorrect_indices2 = np.nonzero(predicted_classes2 != y_test)[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure()\n", "for i, correct in enumerate(correct_indices2[:9]):\n", " plt.subplot(3,3,i+1)\n", " plt.imshow(X_test2[correct].reshape(28,28), cmap='gray', interpolation='none')\n", " plt.title(\"Predicted {}, Class {}\".format(predicted_classes2[correct], y_test[correct]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure()\n", "for i, incorrect in enumerate(incorrect_indices2[:9]):\n", " plt.subplot(3,3,i+1)\n", " plt.imshow(X_test2[incorrect].reshape(28,28), cmap='gray', interpolation='none')\n", " plt.title(\"Predicted {}, Class {}\".format(predicted_classes2[incorrect], y_test[incorrect]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Comparison: FC vs CNN" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "score1 = model1.evaluate(X_test1, Y_test, verbose=0)\n", "#print('FC model Test loss:', score1[0])\n", "print('FC model Test accuracy:', score1[1])\n", "params1 = model1.count_params()\n", "print('FC model # params:', str(params1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "score2 = model2.evaluate(X_test2, Y_test, verbose=0)\n", "#print('CNN model Test loss:', score2[0])\n", "print('CNN model Test accuracy:', score2[1])\n", "params2 = model2.count_params()\n", "print('CNN model # params:', str(params2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Summary:\n", "CNNs have better performance with fewer parameters, hence less memory and computing consumption. CNNs are able to handle image data that is infeasible using only FC layers. The number of weights in FC layer with 1000 neurons for 224x224x3 image is something like 150M. That's 150M for only one layer. While modern CNN architectures that have 50-100 layers while having overall couple dozen milion parameters, for example ResNet50 has 23M params and InceptionV3 has 21M parameters." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## TL;DR\n", "CNNs are better for image data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }