{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "%%html\n", "\n", "\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# Install the necessary dependencies\n", "\n", "import os\n", "import sys\n", "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython " ] }, { "cell_type": "markdown", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "---\n", "license:\n", " code: MIT\n", " content: CC-BY-4.0\n", "github: https://github.com/ocademy-ai/machine-learning\n", "venue: By Ocademy\n", "open_access: true\n", "bibliography:\n", " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", "---" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "0MRC0e0KhQ0S", "slideshow": { "slide_type": "slide" } }, "source": [ "# Neural Network" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Introduction\n", "\n", "\n", "* In fact, Logistic Regression (that have learned in our last session) is the simplest form of Neural Network; Artificial neural networks can be viewed as an extension of Logistic Regression\n", "* Logistic Regression: results in decision boundaries that are a straight line\n", "* Neural Networks: can generate more complex decision boundaries\n", "* (Deep) Neural Networks: a universal approximator!\n", "* In this session, we will learn to use TensorFlow Keras for digit recgonization\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "LWd1UlMnhT2s", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Importing the libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "YvGPUQaHhXfL", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np # linear algebra\n", "import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n", "import matplotlib.pyplot as plt # plotting library\n", "from keras.models import Sequential\n", "from keras.layers import Dense , Activation, Dropout\n", "from tensorflow.keras.optimizers import Adam, RMSprop\n", "from keras import backend as K" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "K1VMqkGvhc3-", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Importing the dataset" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- MNIST is a collection of handwritten digits ranging from the number 0 to 9. \n", "\n", "- It has a training set of 60,000 images, and 10,000 test images that are classified into corresponding categories or labels. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "M52QDmyzhh9s", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# import dataset\n", "from keras.datasets import mnist\n", "\n", "# load dataset\n", "(x_train, y_train),(x_test, y_test) = mnist.load_data()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# count the number of unique train labels\n", "unique, counts = np.unique(y_train, return_counts=True)\n", "print(\"Train labels: \", dict(zip(unique, counts)))\n", "\n", "# count the number of unique test labels\n", "unique, counts = np.unique(y_test, return_counts=True)\n", "print(\"\\nTest labels: \", dict(zip(unique, counts)))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "YvxIPVyMhmKp", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Data visualization" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Let's sample the 25 random MNIST digits and visualize them." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "AVzJWAXIhxoC", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# sample 25 mnist digits from train dataset\n", "indexes = np.random.randint(0, x_train.shape[0], size=25)\n", "images = x_train[indexes]\n", "labels = y_train[indexes]\n", "\n", "# plot the 25 mnist digits\n", "plt.figure(figsize=(5,5))\n", "for i in range(len(indexes)):\n", " plt.subplot(5, 5, i + 1)\n", " image = images[i]\n", " plt.imshow(image, cmap='gray')\n", " plt.axis('off')\n", " \n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "kW3c7UYih0hT", "slideshow": { "slide_type": "slide" } }, "source": [ "## Designing model architecture using Keras" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Import Keras layers" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "from keras.models import Sequential\n", "from keras.layers import Dense, Activation, Dropout\n", "from tensorflow.keras.utils import to_categorical, plot_model" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Compute the number of labels" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "num_labels = len(np.unique(y_train))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## One-Hot Encoding\n", "\n", "- At this point, the labels are in digits format, 0 to 9. \n", "- A more suitable format is called a one-hot vector, a 10-dim vector with all elements 0, except for the index of the digit class. \n", "- For example, if the label is 2, the equivalent one-hot vector is [0,0,1,0,0,0,0,0,0,0]. The first label has index 0." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# convert to one-hot vector\n", "y_train = to_categorical(y_train)\n", "y_test = to_categorical(y_test)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "yyxW5b395mR2", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Data Preprocessing " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# image dimensions (assumed square)\n", "image_size = x_train.shape[1]\n", "input_size = image_size * image_size\n", "input_size" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# resize and normalize\n", "x_train = np.reshape(x_train, [-1, input_size])\n", "x_train = x_train.astype('float32') / 255\n", "x_test = np.reshape(x_test, [-1, input_size])\n", "x_test = x_test.astype('float32') / 255" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Setting network parameters" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- The **batch_size** argument indicates the number of data that we will use for each update of the model parameters.\n", "\n", "- **Hidden_units** shows the number of hidden units.\n", "\n", "- **Dropout** is the dropout rate (related to **Overfitting and Regularization**)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 2118, "status": "ok", "timestamp": 1588265315505, "user": { "displayName": "Hadelin de Ponteves", "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64", "userId": "15047218817161520419" }, "user_tz": -240 }, "id": "f8YOXsQy58rP", "outputId": "2e1b0063-548e-4924-cf3a-93a79d97e35e", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# network parameters\n", "batch_size = 128\n", "hidden_units = 256\n", "dropout = 0.45" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "vKYVQH-l5NpE", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Designing the model architecture" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "colab_type": "code", "executionInfo": { "elapsed": 2112, "status": "ok", "timestamp": 1588265315506, "user": { "displayName": "Hadelin de Ponteves", "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64", "userId": "15047218817161520419" }, "user_tz": -240 }, "id": "p6VMTb2O4hwM", "outputId": "a4f03a97-2942-45cd-f735-f4063277a96c", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# model is a 3-layer MLP with ReLU and dropout after each layer\n", "model = Sequential()\n", "model.add(Dense(hidden_units, input_dim=input_size))\n", "model.add(Activation('relu'))\n", "model.add(Dropout(dropout))\n", "model.add(Dense(hidden_units))\n", "model.add(Activation('relu'))\n", "model.add(Dropout(dropout))\n", "model.add(Dense(num_labels))\n", "model.add(Activation('softmax'))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "h4Hwj34ziWQW", "slideshow": { "slide_type": "subslide" } }, "source": [ "## View model summary\n", "\n", "- Keras library provides us `summary()` method to check the model description." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 68 }, "colab_type": "code", "executionInfo": { "elapsed": 2107, "status": "ok", "timestamp": 1588265315506, "user": { "displayName": "Hadelin de Ponteves", "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64", "userId": "15047218817161520419" }, "user_tz": -240 }, "id": "D6bpZwUiiXic", "outputId": "f202fcb3-5882-4d93-e5df-50791185067e", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "model.summary()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## How big is our model (number of parameters)?\n", "\n", "- From input to Dense layer: 784 × 256 + 256 = 200,960. \n", "\n", "- From first Dense to second Dense: 256 × 256 + 256 = 65,792. \n", "\n", "- From second Dense to the output layer: 10 × 256 + 10 = 2,570. \n", "\n", "- The total is 200,690 + 65,972 + 2,570 = 269,322." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "6OMC_P0diaoD", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Compile the model with compile() method" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "model.compile(loss='categorical_crossentropy', \n", " optimizer='adam',\n", " metrics=['accuracy'])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Loss function (categorical_crossentropy)\n", "\n", "- How far the predicted tensor is from the one-hot ground truth vector is called **loss**.\n", "\n", "- In this example, we use **categorical_crossentropy** as the loss function. It is the negative of the sum of the product of the target and the logarithm of the prediction. \n", "\n", "- There are other loss functions in Keras, such as mean_absolute_error and binary_crossentropy. The choice of the loss function is not arbitrary but should be a criterion that the model is learning. \n", "\n", "- For classification by category, categorical_crossentropy or mean_squared_error is a good choice after the softmax activation layer. The binary_crossentropy loss function is normally used after the sigmoid activation layer while mean_squared_error is an option for tanh output." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Optimization (optimizer adam)\n", "\n", "- With optimization, the objective is to minimize the loss function. The idea is that if the loss is reduced to an acceptable level, the model has indirectly learned the function mapping input to output.\n", "\n", "- In Keras, there are several choices for optimizers. The most commonly used optimizers are; **Stochastic Gradient Descent (SGD)**, **Adaptive Moments (Adam)** and **Root Mean Squared Propagation (RMSprop)**. \n", "\n", "- Each optimizer features tunable parameters like learning rate, momentum, and decay. \n", "\n", "- Adam and RMSprop are variations of SGD with adaptive learning rates. In the proposed classifier network, Adam is used since it has the highest test accuracy." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Metrics (accuracy)\n", "\n", "- Performance metrics are used to determine if a model has learned the underlying data distribution. The default metric in Keras is loss. \n", "\n", "- During training, validation, and testing, other metrics such as **accuracy** can also be included. \n", "\n", "- **Accuracy** is the percent, or fraction, of correct predictions based on ground truth." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Train the model with fit() method" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "model.fit(x_train, y_train, epochs=20, batch_size=batch_size)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "SZ-j28aPihZx", "slideshow": { "slide_type": "subslide" } }, "source": [ "## Evaluating model performance with evaluate() method" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 349 }, "colab_type": "code", "executionInfo": { "elapsed": 43807, "status": "ok", "timestamp": 1588265357223, "user": { "displayName": "Hadelin de Ponteves", "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GhEuXdT7eQweUmRPW8_laJuPggSK6hfvpl5a6WBaA=s64", "userId": "15047218817161520419" }, "user_tz": -240 }, "id": "qeTjz2vDilAC", "outputId": "00fb10bc-c726-46b8-8eaa-c5c6b584aa54", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "loss, acc = model.evaluate(x_test, y_test, batch_size=batch_size)\n", "print(\"\\nTest accuracy: %.1f%%\" % (100.0 * acc))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Neural Network from scratch\n", "\n", "- It's for tomorrow!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Acknowledgments\n", "\n", "Thanks to PRASHANT BANERJEE for creating the open-source [Kaggle jupyter notebook](https://www.kaggle.com/code/prashant111/mnist-deep-neural-network-with-keras), licensed under Apache 2.0. It inspires the majority of the content of this slides." ] } ], "metadata": { "celltoolbar": "Slideshow", "colab": { "authorship_tag": "ABX9TyOsvB/iqEjYj3VN6C/JbvkE", "collapsed_sections": [], "machine_shape": "hm", "name": "logistic_regression.ipynb", "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 1 }