{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Create your first deep learning neural network\n", "\n", "## Introduction\n", "\n", "This is the first of our [beginner tutorial series](https://github.com/awslabs/djl/tree/master/jupyter/tutorial) that will take you through creating, training, and running inference on a neural network. In this tutorial, you will learn how to use the built-in `Block` to create your first neural network - a Multilayer Perceptron.\n", "\n", "## Neural Network\n", "\n", "A neural network is a black box function. Instead of coding this function yourself, you provide many sample input/output pairs for this function. Then, we try to train the network to learn how to match the behavior of the function given only these input/output pairs. A better model with more data can more accurately match the function.\n", "\n", "## Multilayer Perceptron\n", "\n", "A Multilayer Perceptron (MLP) is one of the simplest deep learning networks. The MLP has an input layer which contains your input data, an output layer which is produced by the network and contains the data the network is supposed to be learning, and some number of hidden layers. The example below contains an input of size 3, a single hidden layer of size 3, and an output of size 2. The number and sizes of the hidden layers are determined through experimentation but more layers enable the network to represent more complicated functions. Between each pair of layers is a linear operation (sometimes called a FullyConnected operation because each number in the input connected to each number in the output by a matrix multiplication). Not pictured, there is also a non-linear activation function after each linear operation. For more information, see [Multilayer Perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron).\n", "\n", "![MLP Image](https://upload.wikimedia.org/wikipedia/commons/c/c2/MultiLayerNeuralNetworkBigger_english.png)\n", "\n", "\n", "## Step 1: Setup development environment\n", "\n", "### Installation\n", "\n", "This tutorial requires the installation of the Java Jupyter Kernel. To install the kernel, see the [Jupyter README](https://github.com/awslabs/djl/blob/master/jupyter/README.md)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "// Add the snapshot repository to get the DJL snapshot artifacts\n", "// // %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/\n", "\n", "// Add the maven dependencies\n", "%maven ai.djl:api:0.4.0\n", "%maven org.slf4j:slf4j-api:1.7.26\n", "%maven org.slf4j:slf4j-simple:1.7.26\n", " \n", "// See https://github.com/awslabs/djl/blob/master/mxnet/mxnet-engine/README.md\n", "// for more MXNet library selection options\n", "%maven ai.djl.mxnet:mxnet-native-auto:1.6.0" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import ai.djl.*;\n", "import ai.djl.nn.*;\n", "import ai.djl.nn.core.*;\n", "import ai.djl.training.*;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Determine your input and output size\n", "\n", "The MLP model uses a one dimensional vector as the input and the output. You should determine the appropriate size of this vector based on your input data and what you will use the output of the model for. In a later tutorial, we will use this model for Mnist image classification. \n", "\n", "Our input vector will have size `28x28` because the input images have a height and width of 28 and it takes only a single number to represent each pixel. For a color image, you would need to further multiply this by `3` for the RGB channels.\n", "\n", "Our output vector has size `10` because there are `10` possible classes for each image." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "long inputSize = 28*28;\n", "long outputSize = 10;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Create a **SequentialBlock**\n", "\n", "### NDArray\n", "\n", "The core data type used for working with Deep Learning is the [NDArray](https://javadoc.djl.ai/api/0.4.0/index.html?ai/djl/ndarray/NDArray.html). An NDArray represents a multidimensional, fixed-size homogeneous array. It has very similar behavior to the Numpy python package with the addition of efficient computing. We also have a helper class, the [NDList](https://javadoc.djl.ai/api/0.4.0/index.html?ai/djl/ndarray/NDList.html) which is a list of NDArrays which can have different sizes and data types.\n", "\n", "### Block API\n", "\n", "In DJL, [Blocks](https://javadoc.djl.ai/api/0.4.0/index.html?ai/djl/nn/Block.html) serve a purpose similar to functions that convert an input `NDList` to an output `NDList`. They can represent single operations, parts of a neural network, and even the whole neural network. What makes blocks special is that they contain a number of parameters that are used in their function and are trained during deep learning. As these parameters are trained, the function represented by the blocks get more and more accurate.\n", "\n", "When building these block functions, the easiest way is to use composition. Similar to how functions are built by calling other functions, blocks can be built by combining other blocks. We refer to the containing block as the parent and the sub-blocks as the children.\n", "\n", "\n", "We provide several helpers to make it easy to build common block composition structures. For the MLP we will use the [SequentialBlock](https://javadoc.djl.ai/api/0.4.0/index.html?ai/djl/nn/SequentialBlock.html), a container block whose children form a chain of blocks where each child block feeds its output to the next child block in a sequence.\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "SequentialBlock block = new SequentialBlock();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4: Add blocks to SequentialBlock\n", "\n", "An MLP is organized into several layers. Each layer is composed of a [Linear Block](https://javadoc.djl.ai/api/0.4.0/index.html?ai/djl/nn/core/Linear.html) and a non-linear activation function. If we just had two linear blocks in a row, it would be the same as a combined linear block ($f(x) = W_2(W_1x) = (W_2W_1)x = W_{combined}x$). An activation is used to intersperse between the linear blocks to allow them to represent non-linear functions. We will use the popular [ReLU](https://javadoc.djl.ai/api/0.4.0/ai/djl/nn/Activation.html#reluBlock--) as our activation function.\n", "\n", "The first layer and last layers have fixed sizes depending on your desired input and output size. However, you are free to choose the number and sizes of the middle layers in the network. We will create a smaller MLP with two middle layers that gradually decrease the size. Typically, you would experiment with different values to see what works the best on your data set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "block.add(Blocks.batchFlattenBlock(inputSize));\n", "block.add(Linear.builder().setOutChannels(128).build());\n", "block.add(Activation::relu);\n", "block.add(Linear.builder().setOutChannels(64).build());\n", "block.add(Activation::relu);\n", "block.add(Linear.builder().setOutChannels(outputSize).build());\n", "\n", "block" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "Now that you've successfully created your first neural network, you can use this network to train your model.\n", "\n", "Next chapter: [Train your first model](train_your_first_model.ipynb)\n", "\n", "You can find the complete source code for this tutorial in the [model zoo](https://github.com/awslabs/djl/blob/master/model-zoo/src/main/java/ai/djl/basicmodelzoo/basic/Mlp.java)." ] } ], "metadata": { "kernelspec": { "display_name": "Java", "language": "java", "name": "java" }, "language_info": { "codemirror_mode": "java", "file_extension": ".jshell", "mimetype": "text/x-java-source", "name": "Java", "pygments_lexer": "java", "version": "11.0.5+10-LTS" } }, "nbformat": 4, "nbformat_minor": 2 }