{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Neural Networks\n", "> The previous chapters taught you how to build models in TensorFlow 2.0. In this chapter, you will apply those same tools to build, train, and make predictions with neural networks. You will learn how to define dense layers, apply activation functions, select an optimizer, and apply regularization to reduce overfitting. You will take advantage of TensorFlow's flexibility by using both low-level linear algebra and high-level Keras API operations to define and train models. This is the Summary of lecture \"Introduction to TensorFlow in Python\", via datacamp.\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Tensorflow-Keras, Deep_Learning]\n", "- image: images/10_7_3_1_network.png" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2.2.0'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import tensorflow as tf\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "plt.rcParams['figure.figsize'] = (8, 8)\n", "\n", "tf.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dense layers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The linear algebra of dense layers\n", "There are two ways to define a dense layer in tensorflow. The first involves the use of low-level, linear algebraic operations. The second makes use of high-level keras operations. In this exercise, we will use the first method to construct the network shown in the image below.\n", "\n", "\"drawing\"\n", "\n", "The input layer contains 3 features -- education, marital status, and age -- which are available as `borrower_features`. The hidden layer contains 2 nodes and the output layer contains a single node.\n", "\n", "For each layer, you will take the previous layer as an input, initialize a set of weights, compute the product of the inputs and weights, and then apply an activation function." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "borrower_features = np.array([[2., 2., 43.]], np.float32)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dense1's output shape: (1, 2)\n" ] } ], "source": [ "# Initialize bias1\n", "bias1 = tf.Variable(1.0, tf.float32)\n", "\n", "# Initialize weights1 as 3x2 variable of ones\n", "weights1 = tf.Variable(tf.ones((3, 2)))\n", "\n", "# Perform matrix multiplication of borrower_features and weights1\n", "product1 = tf.matmul(borrower_features, weights1)\n", "\n", "# Apply sigmoid activation function to product1 + bias1\n", "dense1 = tf.keras.activations.sigmoid(product1 + bias1)\n", "\n", "# Print shape of dense1\n", "print(\"dense1's output shape: {}\".format(dense1.shape))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "prediction: 0.9525741338729858\n", "\n", " actual: 1\n" ] } ], "source": [ "# Initialize bias2 and weights2\n", "bias2 = tf.Variable(1.0)\n", "weights2 = tf.Variable(tf.ones((2, 1)))\n", "\n", "# Perform matrix multiplication of dense1 and weights2\n", "product2 = tf.matmul(dense1, weights2)\n", "\n", "# Apply activation to product2 + bias2 and print the prediction\n", "prediction = tf.keras.activations.sigmoid(product2 + bias2)\n", "print('prediction: {}'.format(prediction.numpy()[0, 0]))\n", "print('\\n actual: 1')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our model produces predicted values in the interval between 0 and 1. For the example we considered, the actual value was 1 and the predicted value was a probability between 0 and 1. This, of course, is not meaningful, since we have not yet trained our model's parameters." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The low-level approach with multiple examples\n", "In this exercise, we'll build further intuition for the low-level approach by constructing the first dense hidden layer for the case where we have multiple examples. We'll assume the model is trained and the first layer weights, `weights1`, and bias, `bias1`, are available. We'll then perform matrix multiplication of the `borrower_features` tensor by the `weights1` variable. Recall that the `borrower_features` tensor includes education, marital status, and age. Finally, we'll apply the sigmoid function to the elements of `products1 + bias1`, yielding `dense1`.\n", "\n", "$$ \\text{products1} = \\begin{bmatrix} 3 & 3 & 23 \\\\ 2 & 1 & 24 \\\\ 1 & 1 & 49 \\\\ 1 & 1 & 49 \\\\ 2 & 1 & 29 \\end{bmatrix} \\begin{bmatrix} -0.6 & 0.6 \\\\ 0.8 & -0.3 \\\\ -0.09 & -0.08 \\end{bmatrix} $$" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " shape of borrower_features: (1, 3)\n", "\n", " shape of weights1: (3, 2)\n", "\n", " shape of bias1: (1,)\n", "\n", " shape of dense1: (1, 2)\n" ] } ], "source": [ "bias1 = tf.Variable([0.1], tf.float32)\n", "\n", "# Compute the product of borrower_features and weights1\n", "products1 = tf.matmul(borrower_features, weights1)\n", "\n", "# Apply a sigmoid activation function to products1 + bias1\n", "dense1 = tf.keras.activations.sigmoid(products1 + bias1)\n", "\n", "# Print the shapes of borrower_features, weights1, bias1, and dense1\n", "print('\\n shape of borrower_features: ', borrower_features.shape)\n", "print('\\n shape of weights1: ', weights1.shape)\n", "print('\\n shape of bias1: ', bias1.shape)\n", "print('\\n shape of dense1: ', dense1.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that our input data, `borrower_features`, is 5x3 because it consists of 5 examples for 3 features. The shape of `weights1` is 3x2, as it was in the previous exercise, since it does not depend on the number of examples. Additionally, `bias1` is a scalar. Finally, `dense1` is 5x2, which means that we can multiply it by the following set of weights, `weights2`, which we defined to be 2x1 in the previous exercise." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using the dense layer operation\n", "We've now seen how to define dense layers in tensorflow using linear algebra. In this exercise, we'll skip the linear algebra and let keras work out the details. This will allow us to construct the network below, which has 2 hidden layers and 10 features, using less code than we needed for the network with 1 hidden layer and 3 features.\n", "\n", "\"drawing\"\n", "\n", "To construct this network, we'll need to define three dense layers, each of which takes the previous layer as an input, multiplies it by weights, and applies an activation function." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4...BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default.payment.next.month
0120000.02212422-1-1...0.00.00.00.0689.00.00.00.00.01
12120000.022226-1200...3272.03455.03261.00.01000.01000.01000.00.02000.01
2390000.0222340000...14331.014948.015549.01518.01500.01000.01000.01000.05000.00
3450000.0221370000...28314.028959.029547.02000.02019.01200.01100.01069.01000.00
4550000.012157-10-10...20940.019146.019131.02000.036681.010000.09000.0689.0679.00
\n", "

5 rows × 25 columns

\n", "
" ], "text/plain": [ " ID LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_0 PAY_2 PAY_3 PAY_4 \\\n", "0 1 20000.0 2 2 1 24 2 2 -1 -1 \n", "1 2 120000.0 2 2 2 26 -1 2 0 0 \n", "2 3 90000.0 2 2 2 34 0 0 0 0 \n", "3 4 50000.0 2 2 1 37 0 0 0 0 \n", "4 5 50000.0 1 2 1 57 -1 0 -1 0 \n", "\n", " ... BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 \\\n", "0 ... 0.0 0.0 0.0 0.0 689.0 0.0 \n", "1 ... 3272.0 3455.0 3261.0 0.0 1000.0 1000.0 \n", "2 ... 14331.0 14948.0 15549.0 1518.0 1500.0 1000.0 \n", "3 ... 28314.0 28959.0 29547.0 2000.0 2019.0 1200.0 \n", "4 ... 20940.0 19146.0 19131.0 2000.0 36681.0 10000.0 \n", "\n", " PAY_AMT4 PAY_AMT5 PAY_AMT6 default.payment.next.month \n", "0 0.0 0.0 0.0 1 \n", "1 1000.0 0.0 2000.0 1 \n", "2 1000.0 1000.0 5000.0 0 \n", "3 1100.0 1069.0 1000.0 0 \n", "4 9000.0 689.0 679.0 0 \n", "\n", "[5 rows x 25 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv('./dataset/uci_credit_card.csv')\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "features = df.columns[1:11].tolist()\n", "borrower_features = df[features].values\n", "borrower_features = tf.convert_to_tensor(borrower_features, np.float32)\n", "idx = tf.constant(list(range(0,100)))\n", "borrower_features = tf.gather(borrower_features, idx)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " shape of dense1: (100, 7)\n", "\n", " shape of dense2: (100, 3)\n", "\n", " shape of predictions: (100, 1)\n" ] } ], "source": [ "# Define the first dense layer\n", "dense1 = tf.keras.layers.Dense(7, activation='sigmoid')(borrower_features)\n", "\n", "# Define a dense layer with 3 output nodes\n", "dense2 = tf.keras.layers.Dense(3, activation='sigmoid')(dense1)\n", "\n", "# Define a dense layer with 1 output node\n", "predictions = tf.keras.layers.Dense(1, activation='sigmoid')(dense2)\n", "\n", "# Print the shapes of dense1, dense2, and predictions\n", "print('\\n shape of dense1: ', dense1.shape)\n", "print('\\n shape of dense2: ', dense2.shape)\n", "print('\\n shape of predictions: ', predictions.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With just 8 lines of code, you were able to define 2 dense hidden layers and an output layer. This is the advantage of using high-level operations in tensorflow. Note that each layer has 100 rows because the input data contains 100 examples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Activation functions\n", "- Activation function\n", " - Component of a typical hidden layer\n", " - Linear: Matrix multiplication\n", " - Nonlinear: Activation function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Binary classification problems\n", "In this exercise, you will again make use of credit card data. The target variable, `default`, indicates whether a credit card holder defaults on his or her payment in the following period. Since there are only two options--default or not--this is a binary classification problem. While the dataset has many features, you will focus on just three: the size of the three latest credit card bills. Finally, you will compute predictions from your untrained network, `outputs`, and compare those the target variable, `default`." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "bill_amounts = df[['BILL_AMT1', 'BILL_AMT2', 'BILL_AMT3']].to_numpy()\n", "default = df[['default.payment.next.month']].to_numpy()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. ]\n", " [ 0.5]\n", " [-0.5]\n", " [-1. ]\n", " [-0.5]]\n" ] } ], "source": [ "# Construct input layer from features\n", "inputs = tf.constant(bill_amounts, tf.float32)\n", "\n", "# Define first dense layer\n", "dense1 = tf.keras.layers.Dense(3, activation='relu')(inputs)\n", "\n", "# Define second dense layer\n", "dense2 = tf.keras.layers.Dense(2, activation='relu')(dense1)\n", "\n", "# Define output layer\n", "outputs = tf.keras.layers.Dense(1, activation='sigmoid')(dense2)\n", "\n", "# Print error for first five examples\n", "error = default[:5] - outputs.numpy()[:5]\n", "print(error)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you run the code several times, you'll notice that the errors change each time. This is because you're using an untrained model with randomly initialized parameters. Furthermore, the errors fall on the interval between -1 and 1 because `default` is a binary variable that takes on values of 0 and 1 and `outputs` is a probability between 0 and 1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multiclass classification problems\n", "In this exercise, we expand beyond binary classification to cover multiclass problems. A multiclass problem has targets that can take on three or more values. In the credit card dataset, the education variable can take on 6 different values, each corresponding to a different level of education. We will use that as our target in this exercise and will also expand the feature set from 3 to 10 columns.\n", "\n", "As in the previous problem, you will define an input layer, dense layers, and an output layer. You will also print the untrained model's predictions, which are probabilities assigned to the classes." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "features = df.columns[1:11].tolist()\n", "borrower_features = df[features].values" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.11079348 0.2970938 0.15755723 0.13226885 0.19852833 0.10375827]\n", " [0.11079348 0.2970938 0.15755723 0.13226885 0.19852833 0.10375827]\n", " [0.11079348 0.2970938 0.15755723 0.13226885 0.19852833 0.10375827]]\n" ] } ], "source": [ "# Construct input layer from borrower features\n", "inputs = tf.constant(borrower_features, tf.float32)\n", "\n", "# Define first dense layer\n", "dense1 = tf.keras.layers.Dense(10, activation='sigmoid')(inputs)\n", "\n", "# Define second dense layer\n", "dense2 = tf.keras.layers.Dense(8, activation='relu')(dense1)\n", "\n", "# Define output layer\n", "outputs = tf.keras.layers.Dense(6, activation='softmax')(dense2)\n", "\n", "# Print first five predictions\n", "print(outputs.numpy()[:3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that each row of `outputs` sums to one. This is because a row contains the predicted class probabilities for one example. As with the previous exercise, our predictions are not yet informative, since we are using an untrained model with randomly initialized parameters. This is why the model tends to assign similar probabilities to each class." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Optimizers\n", "- Stochastic Gradient Descent (SGD) optimizer\n", " - Simple and easy to interpret\n", "- Root Mean Squared (RMS) propagation optimizer\n", " - Applies different learning rates to each feature\n", " - Allows for momentum to both build and decay\n", "- Adaptive Momemtum (Adam) optimizer\n", " - performs well with default parameter values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The dangers of local minima\n", "Consider the plot of the following loss function, `loss_function()`, which contains a global minimum, marked by the dot on the right, and several local minima, including the one marked by the dot on the left.\n", "\n", "\"drawing\"\n", "\n", "In this exercise, you will try to find the global minimum of `loss_function()` using `keras.optimizers.SGD()`. You will do this twice, each time with a different initial value of the input to `loss_function()`. First, you will use `x_1`, which is a variable with an initial value of 6.0. Second, you will use `x_2`, which is a variable with an initial value of 0.3. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "import math\n", "\n", "def loss_function(x):\n", " return 4.0 * math.cos(x - 1) + math.cos(2.0 * math.pi * x) / x" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6.027515 0.25\n" ] } ], "source": [ "# Initialize x_1 and x_2\n", "x_1 = tf.Variable(6.0, tf.float32)\n", "x_2 = tf.Variable(0.3, tf.float32)\n", "\n", "# Define the optimization operation\n", "opt = tf.keras.optimizers.SGD(learning_rate=0.01)\n", "\n", "for j in range(100):\n", " # Perform minimization using the loss function and x_1\n", " opt.minimize(lambda: loss_function(x_1), var_list=[x_1])\n", " \n", " # Perform minimization using the loss function and x_2\n", " opt.minimize(lambda: loss_function(x_2), var_list=[x_2])\n", "\n", "# Print x_1 and x_2 as numpy arrays\n", "print(x_1.numpy(), x_2.numpy())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that we used the same optimizer and loss function, but two different initial values. When we started at 6.0 with `x_1`, we found the global minimum at 6.03(?), marked by the dot on the right. When we started at 0.3, we stopped around 0.25 with `x_2`, the local minimum marked by a dot on the far left." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Avoiding local minima\n", "The previous problem showed how easy it is to get stuck in local minima. We had a simple optimization problem in one variable and gradient descent still failed to deliver the global minimum when we had to travel through local minima first. One way to avoid this problem is to use momentum, which allows the optimizer to break through local minima. We will again use the loss function from the previous problem, which has been defined and is available for you as `loss_function()`.\n", "\n", "Several optimizers in tensorflow have a momentum parameter, including SGD and RMSprop. You will make use of RMSprop in this exercise. Note that `x_1` and `x_2` have been initialized to the same value this time. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.744511 0.24999999\n" ] } ], "source": [ "# Initialize x_1 and x_2\n", "x_1 = tf.Variable(0.05, tf.float32)\n", "x_2 = tf.Variable(0.05, tf.float32)\n", "\n", "# Define the optimization operation for opt_1 and opt_2\n", "opt_1 = tf.keras.optimizers.RMSprop(learning_rate=0.01, momentum=0.99)\n", "opt_2 = tf.keras.optimizers.RMSprop(learning_rate=0.01, momentum=0.00)\n", "\n", "for j in range(100):\n", " opt_1.minimize(lambda: loss_function(x_1), var_list=[x_1])\n", " # Define the minimization operation for opt_2\n", " opt_2.minimize(lambda: loss_function(x_2), var_list=[x_2])\n", " \n", "# Print x_1 and x_2 as numpy arrays\n", "print(x_1.numpy(), x_2.numpy())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recall that the global minimum is approximately 4.38. Notice that opt_1 built momentum, bringing `x_1` closer to the global minimum. To the contrary, `opt_2`, which had a momentum parameter of 0.0, got stuck in the local minimum on the left." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training a network in TensorFlow\n", "- Random Initializers\n", " - Often need to initialize thousands of variables\n", " - `tf.ones()` may perform poorly\n", " - Tedious and difficult to initialize variables individually\n", " - Alternatively, draw initial values from distribution\n", " - Normal\n", " - Uniform\n", " - Glorot initializer\n", "- Applying dropout\n", "\"drawing\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initialization in TensorFlow\n", "A good initialization can reduce the amount of time needed to find the global minimum. In this exercise, we will initialize weights and biases for a neural network that will be used to predict credit card default decisions. To build intuition, we will use the low-level, linear algebraic approach, rather than making use of convenience functions and high-level keras operations. We will also expand the set of input features from 3 to 23." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Define the layer 1 weights\n", "w1 = tf.Variable(tf.random.normal([23, 7]), tf.float32)\n", "\n", "# Initialize the layer 1 bias\n", "b1 = tf.Variable(tf.ones([7]), tf.float32)\n", "\n", "# Define the layer 2 weights\n", "w2 = tf.Variable(tf.random.normal([7, 1]), tf.float32)\n", "\n", "# Define the layer 2 bias\n", "b2 = tf.Variable(0.0, tf.float32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Defining the model and loss function\n", "In this exercise, you will train a neural network to predict whether a credit card holder will default. The features and targets you will use to train your network are available in the Python shell as `borrower_features` and `default`. You defined the weights and biases in the previous exercise.\n", "\n", "Note that the predictions layer is defined as $\\sigma(\\text{layer1} \\times w2 + b2)$, where $\\sigma$ is the sigmoid activation, `layer1` is a tensor of nodes for the first hidden dense layer, `w2` is a tensor of weights, and `b2` is the bias tensor." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "df.head()\n", "\n", "X = df.iloc[:3000 ,1:24].astype(np.float32).to_numpy()\n", "y = df.iloc[:3000, 24].astype(np.float32).to_numpy()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1., 1., 0., ..., 0., 1., 0.], dtype=float32)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "borrower_features, test_features, borrower_targets, test_targets = train_test_split(X, \n", " y, \n", " test_size=0.25,\n", " stratify=y)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# Define the model\n", "def model(w1, b1, w2, b2, features=borrower_features):\n", " # Apply relu activation function to layer 1\n", " layer1 = tf.keras.activations.relu(tf.matmul(features, w1) + b1)\n", " \n", " # Apply Dropout\n", " dropout = tf.keras.layers.Dropout(0.25)(layer1)\n", " return tf.keras.activations.sigmoid(tf.matmul(dropout, w2) + b2)\n", "\n", "# Define the loss function\n", "def loss_function(w1, b1, w2, b2, features=borrower_features, targets = borrower_targets):\n", " predictions = model(w1, b1, w2, b2)\n", " \n", " # Pass targets and predictions to the cross entropy loss\n", " return tf.keras.losses.binary_crossentropy(targets, predictions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Training neural networks with TensorFlow\n", "In the previous exercise, you defined a model, `model(w1, b1, w2, b2, features)`, and a loss function, `loss_function(w1, b1, w2, b2, features, targets)`, both of which are available to you in this exercise. You will now train the model and then evaluate its performance by predicting default outcomes in a test set, which consists of `test_features` and `test_targets` and is available to you. The trainable variables are `w1`, `b1`, `w2`, and `b2`." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "opt = tf.keras.optimizers.Adam(learning_rate=0.1, beta_1=0.9, beta_2=0.999, amsgrad=False)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[575, 8],\n", " [166, 1]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics import confusion_matrix\n", "\n", "# Train the model\n", "for j in range(1000):\n", " # Complete the optimizer\n", " opt.minimize(lambda: loss_function(w1, b1, w2, b2), var_list=[w1, b1, w2, b2])\n", " \n", "# Make predictions with model\n", "model_predictions = model(w1, b1, w2, b2, test_features)\n", "\n", "# Construct the confusion matrix\n", "confusion_matrix(test_targets.reshape(-1, 1), model_predictions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Additional : Plot heatmap" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "import seaborn as sns\n", "\n", "def confusion_matrix_plot(default, model_predictions):\n", " df = pd.DataFrame(np.hstack([default, model_predictions.numpy() > 0.5]),\n", " columns = ['Actual','Predicted'])\n", " confusion_matrix = pd.crosstab(df['Actual'], df['Predicted'], \n", " rownames=['Actual'], colnames=['Predicted'])\n", " sns.heatmap(confusion_matrix, cmap=\"Greys\", fmt=\"d\", annot=True, cbar=False)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAesAAAHgCAYAAACFNEViAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAVqklEQVR4nO3de+zldX3n8debmQEkDCAYkYoCynihqziDYLMIK4hUmrbYURE2Qq1GoqtYStqou5tsq0ZdL9QQK6tN663LJaWZDZWysrJF3C0oFwVBcOUiXmCpURwIHWRn5rN//H4Mw2RmGOB3fuc9v3k8kl9yzvd8z/m+MZ485/s93/M9NcYIANDXTtMeAADYOrEGgObEGgCaE2sAaE6sAaA5sQaA5hZPe4AtqSrfKYMpWL9+/bRHgB1WVdXmltuzBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaG7xtAdg+3TnnXfmgQceyLp167J27docfvjhueCCC/LCF74wSbLXXnvll7/8ZZYvX54DDjggt9xyS77//e8nSa6++uq8853vnOb4sOB84QtfyEUXXZSqyrJly/KRj3wku+yyy7THYo6INU/aMccck5///Ocb7p988skbbn/iE5/I6tWrN9y//fbbs3z58nmdD3YU9957b7785S/nkksuya677pozzzwzl1xySVauXDnt0ZgjYs1EnHTSSTn22GOnPQbsMNatW5eHHnooixcvzpo1a/LMZz5z2iMxhyb+mXVV7V1VT5/0dphfY4xcdtllufbaa/P2t7/9MY8dddRRuffee3PbbbdtWHbQQQfl+uuvzxVXXJFXvvKV8z0uLGj77rtv3vrWt+bYY4/NUUcdlaVLl3qfLTATiXVVPbeqLqiqnyX5ZpJrquqfZ5cdOIltMr+OPPLIHHbYYTnhhBPyrne9K0cdddSGx0455ZScf/75G+7fc889ee5zn5sVK1bkrLPOynnnnZelS5dOY2xYkFavXp3LL788X/va13LllVdmzZo1ufjii6c9FnNoUnvWFyZZleRZY4xlY4yDk+yX5L8luWBLT6qq06vq2qq6dkJzMUfuueeeJMnPfvazrFq1KkcccUSSZNGiRVm5cmUuvPDCDes+/PDD+cUvfpEkuf7663P77bfnBS94wfwPDQvUVVddlf333z977713lixZkte85jX59re/Pe2xmEOTivUzxhgXjjHWPbJgjLFujHFBkn229KQxxufGGC8fY7x8QnMxB3bbbbfsvvvuG24ff/zxuemmm5Ikxx13XG699db89Kc/3bD+M57xjOy008z/1Q466KAsW7Ysd9xxx/wPDgvUfvvtlxtuuCFr1qzJGCNXXXVVnve85017LObQpE4wu66qPpPki0l+PLvsOUl+P4l/7m3n9t1336xatSpJsnjx4px33nn56le/mmTmjPCND4EnydFHH50PfOADWbt2bdatW5d3vOMdue++++Z9blioDj300Bx//PFZuXJlFi9enBe/+MV505veNO2xmEM1xpj7F63aOcnbkpyY5NlJKjPR/vskfzXG+NU2vMbcDwY8rvXr1097BNhhVVVtdvkkYj0XxBqmQ6xherYU63m/3GhV/fZ8bxMAtmfTuDb44VPYJgBstyZ2GLyqXpRHP7MeSe5OcvEY45ZtfL7D4DAFDoPD9MzrYfCqem9mvk9dSb6V5JrZ2+dX1fsmsU0AWKgmdTb4/0ny62OM/7fJ8p2T3DzGWLYNr2HPGqbAnjVMz3yfYLY+ya9tZvl+s48BANtoUhdFOTPJ5VX1gzx6UZTnJjk4ybsntE0AWJAmeYLZTkmOyKMXRflJkms2vgTp4zzfYXCYAofBYXpcFAXYJmIN09PmoigAwBMj1gDQnFgDQHNiDQDNiTUANCfWANCcWANAc2INAM2JNQA0J9YA0JxYA0BzYg0AzYk1ADQn1gDQnFgDQHNiDQDNiTUANCfWANCcWANAc2INAM2JNQA0J9YA0JxYA0BzYg0AzYk1ADQn1gDQnFgDQHNiDQDNiTUANCfWANCcWANAc2INAM2JNQA0J9YA0JxYA0BzYg0AzYk1ADQn1gDQnFgDQHNiDQDNiTUANCfWANCcWANAc2INAM2JNQA0J9YA0JxYA0BzYg0AzYk1ADQn1gDQnFgDQHNiDQDNiTUANCfWANCcWANAc2INAM2JNQA0J9YA0JxYA0BzYg0AzYk1ADQn1gDQnFgDQHNiDQDNiTUANCfWANCcWANAc2INAM2JNQA0J9YA0JxYA0BzYg0AzYk1ADQn1gDQnFgDQHNiDQDNiTUANCfWANCcWANAc2INAM2JNQA0t3hLD1TV3ycZW3p8jPG7E5kIAHiMLcY6ySfmbQoAYItqjC3uPE9VVfUcDBa49evXT3sE2GFVVW1u+db2rB954rIkH0lySJJdH1k+xnjenE0HAGzRtpxg9vkk5yZZm+SYJF9K8uVJDgUAPGpbYv20McblmTlkftcY40+THDvZsQCARzzuYfAkD1XVTkl+UFXvTvLTJM+c7FgAwCMe9wSzqjo8yS1J9krywSR7JvnYGOPqiQ7mBDOYCieYwfRs6QQzZ4MDjyHWMD1P5Wzwf8xmLo4yxvC5NQDMg235zPqPN7q9a5LXZ+bMcABgHjypw+BV9fUxxr+ZwDwbb8NhcJgCh8Fhep7KYfC9N7q7U5LDkjxrjuYCAB7HthwGvy4zn1lXZg5/35nkbZMcKkluuOGGSW8C2Iwt/MMemKJtifWLxxgPbbygqnaZ0DwAwCa25Qpm/7SZZVfN9SAAwOZt7fesn5Xk2UmeVlXLM3MYPEn2SLLbPMwGAGTrh8F/M8lbkuyf5JN5NNb3J/n3kx0LAHjEFmM9xvhiki9W1evHGH83jzMBABvZls+sD6uqvR65U1VPr6oPTXAmAGAj2xLrE8YYv3zkzhjjviS/NbmRAICNbUusF238Va2qeloSX90CgHmyLd+z/pskl1fV52fv/0GSL05uJABgY48b6zHGx6rqxiTHZeaM8P+e5IBJDwYAzNiWw+BJ8n+TrM/ML269OsktE5sIAHiMrV0U5QVJTk5ySpKfJ7kwM7/Sdcw8zQYAZOuHwW9N8o0kvzPGuC1JquqP5mUqAGCDrR0Gf31mDn//Y1X9ZVW9Oo9exQwAmCdbjPUYY9UY401JXpTkiiR/lGTfqjq3qo6fp/kAYIf3uCeYjTEeHGP81zHGb2fmOuHfSfK+iU8GACTZ9rPBkyRjjF+MMT47xjh2UgMBAI/1hGINAMw/sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5hZPewC2P5/5zGdy3XXXZc8998zZZ5+9Yfmll16aSy+9NIsWLcqKFSty6qmnJknuuuuufPazn82aNWtSVfnoRz+anXfeeVrjw4L0/ve/P1dccUX22WeffOUrX5n2OMwxseYJe9WrXpXXvva1+fSnP71h2U033ZRrrrkmn/zkJ7NkyZKsXr06SbJu3bqcc845OeOMM3LggQfmgQceyKJFi6Y1OixYK1euzJvf/Oa8973vnfYoTIDD4DxhhxxySHbffffHLLvsssvyute9LkuWLEmS7LnnnkmSG264IQcccEAOPPDAJMnSpUvFGibg8MMP3/C+Y+GxZ82cuPvuu3PLLbfk/PPPz5IlS3Laaafl4IMPzj333JMk+dCHPpT7778/Rx55ZE488cQpTwuwfZnonnVV7VtVK6pqeVXtO8ltMV3r16/Pgw8+mA9/+MM59dRTc/bZZ2eMkXXr1uXWW2/Ne97znnzwgx/MN7/5zXz3u9+d9rgA25WJxLqqXlZVVye5IsnHknw8yder6uqqWrGV551eVddW1bUXXXTRJEZjQvbee++84hWvSFVl2bJl2WmnnXL//fdnn332ySGHHJI99tgju+yyS1asWJE77rhj2uMCbFcmtWf9hSR/OMZ48RjjuNm/FyU5M8nnt/SkMcbnxhgvH2O8/A1veMOERmMSjjjiiA17zHfffXfWrl2bPfbYI4ceemh+9KMf5Ve/+lXWrVuX733ve9l///2nPC3A9qXGGHP/olU/GGMs28Jjt40xDn6817jxxhvnfjDmxKc+9ancfPPNeeCBB7LnnnvmpJNOytFHH51zzz03P/zhD7N48eKceuqpeclLXpIkufLKK7Nq1apUVZYvX77hK1309NKXvnTaI/AknHXWWfnWt76V++67L/vss0/OOOOMvPGNb5z2WDxxtdmFE4r1OUmen+RLSX48u/g5SU5LcucY492P9xpiDdMh1jBVm431RM4GH2O8p6pOSHJikmfPbvwnSf5ijPEPk9gmACxUE/vq1hjj0iSXTur1AWBHMe8XRamq0+d7mwCwPZvGFcw2ezweANi8acT64SlsEwC2W9OI9Z9NYZsAsN2ayAlmVXXjlh5K4rKjAPAETOps8H2T/GaS+zZZXkn+aULbBIAFaVKx/kqS3ccY39n0gaq6YkLbBIAFaVIXRXnbVh77t5PYJgAsVNM4wQwAeALEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGhOrAGgObEGgObEGgCaE2sAaE6sAaA5sQaA5sQaAJoTawBoTqwBoDmxBoDmxBoAmhNrAGiuxhjTnoEFqKpOH2N8btpzwI7Ge29hsmfNpJw+7QFgB+W9twCJNQA0J9YA0JxYMyk+M4Pp8N5bgJxgBgDN2bMGgObEmqekql5bVd+vqtuq6n2beXyXqrpw9vFvVtWB8z8lLCxV9ddV9c9VddMWHq+qOmf2fXdjVa2Y7xmZW2LNk1ZVi5L8RZITkhyS5JSqOmST1d6W5L4xxsFJ/jzJf57fKWFB+kKS127l8ROSLJv9Oz3JufMwExMk1jwVRyS5bYxxxxjj4SQXJDlxk3VOTPLF2dsXJXl1VdU8zggLzhjjyiS/2MoqJyb50phxdZK9qmq/+ZmOSRBrnopnJ/nxRvd/Mrtss+uMMdYmWZ1kn3mZDnZc2/LeZDsi1jwVm9tD3vTrBduyDjC3vO8WGLHmqfhJkudsdH//JHdvaZ2qWpxkz2z98B3w1G3Le5PtiFjzVFyTZFlVHVRVOyc5OcnFm6xzcZLfn739hiT/c/hyP0zaxUlOmz0r/DeSrB5j3DPtoXjyFk97ALZfY4y1VfXuJF9NsijJX48xbq6qDyS5doxxcZK/SvLlqrotM3vUJ09vYlgYqur8JK9K8oyq+kmS/5RkSZKMMf5Lkn9I8ltJbkvyL0n+YDqTMldcwQwAmnMYHACaE2sAaE6sAaA5sQaA5sQaAJoTa9hOVdW6qvpOVd1UVX9bVbs9hdd6VVV9Zfb2727uF9Q2Wnevqvp3T2Ibf1pVf/xkZ4QdmVjD9mvNGONlY4x/leThJO/Y+MHZC2I84ff4GOPiMcZHt7LKXkmecKyBJ0+sYWH4RpKDq+rAqrqlqj6T5Pokz6mq46vqqqq6fnYPfPdkw2+R31pV/yvJykdeqKreUlWfnr29b1WtqqobZv/+dZKPJnn+7F79x2fX+5Oqumb2t5P/bKPX+g+zv3f+tSQvnLf/NWCBEWvYzs1ec/2EJN+dXfTCzPw84vIkDyb5j0mOG2OsSHJtkrOqatckf5nkd5IcleRZW3j5c5J8fYxxaJIVSW5O8r4kt8/u1f9JVR2fmd9NPiLJy5IcVlVHV9Vhmbli3fLM/GPg8Dn+T4cdhsuNwvbraVX1ndnb38jMpV1/Lclds79hnCS/keSQJP979mfEd05yVZIXJblzjPGDJKmqv0ly+ma2cWyS05JkjLEuyeqqevom6xw/+/ft2fu7ZybeS5OsGmP8y+w2Nr1uPLCNxBq2X2vGGC/beMFskB/ceFGS/zHGOGWT9V6WufvJxErykTHGZzfZxplzuA3YoTkMDgvb1UmOrKqDk6SqdquqFyS5NclBVfX82fVO2cLzL0/yztnnLqqqPZI8kJm95kd8NclbN/os/NlV9cwkVyb5vap6WlUtzcwhd+BJEGtYwMYYP0vyliTnV9WNmYn3i8YYD2XmsPclsyeY3bWFl/jDJMdU1XeTXJfk18cYP8/MYfWbqurjY4zLkpyX5KrZ9S5KsnSMcX2SC5N8J8nfZeZQPfAk+NUtAGjOnjUANCfWANCcWANAc2INAM2JNQA0J9YA0JxYA0BzYg0Azf1/MB8U9zYw82IAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "confusion_matrix_plot(test_targets.reshape(-1, 1), model_predictions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The diagram shown is called a \"confusion matrix.\" The diagonal elements show the number of correct predictions. The off-diagonal elements show the number of incorrect predictions. We can see that the model performs reasonably-well, but does so by overpredicting non-default. This suggests that we may need to train longer, tune the model's hyperparameters, or change the model's architecture." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }