# Implementing a simple deep neural network for MNIST classifier

At firs, we need to load mxnet gem.

In [2]:
require 'mxnet'

true

## Setting up the configuration

Initialize the variables that are used below.

In [3]:
@data_dir = File.expand_path("../data/mnist")
@data_ctx = MXNet.cpu
@model_ctx = MXNet.cpu
#@model_ctx = MXNet.gpu

#

Note that if you have CUDA available GPU, `@model_ctx = MXNet.gpu` enables us to use GPU for all the computation below.

## Data loaders

Setting up data loaders for both training and validation.

In [4]:
num_inputs = 784
num_outputs = 10
batch_size = 64
num_examples = 60000

train_iter = MXNet::IO::MNISTIter.new(
 image: File.join(@data_dir, 'train-images-idx3-ubyte'),
 label: File.join(@data_dir, 'train-labels-idx1-ubyte'),
 batch_size: batch_size,
 shuffle: true)
val_iter = MXNet::IO::MNISTIter.new(
 image: File.join(@data_dir, 't10k-images-idx3-ubyte'),
 label: File.join(@data_dir, 't10k-labels-idx1-ubyte'),
 batch_size: batch_size,
 shuffle: false)
nil

## The parameters of the model

Initialize the weights and biases of the neural network.

In [5]:
#######################
# Set some constants so it's easy to modify the network later
#######################
num_hidden = 256
weight_scale = 0.01

#######################
# Allocate parameters for the first hidden layer
#######################
@w1 = MXNet::NDArray.random_normal(shape: [num_inputs, num_hidden], scale: weight_scale, ctx: @model_ctx)
@b1 = MXNet::NDArray.random_normal(shape: [num_hidden], scale: weight_scale, ctx: @model_ctx)

#######################
# Allocate parameters for the second hidden layer
#######################
@w2 = MXNet::NDArray.random_normal(shape: [num_hidden, num_hidden], scale: weight_scale, ctx: @model_ctx)
@b2 = MXNet::NDArray.random_normal(shape: [num_hidden], scale: weight_scale, ctx: @model_ctx)

#######################
# Allocate parameters for the output layer
#######################
@w3 = MXNet::NDArray.random_normal(shape: [num_hidden, num_outputs], scale: weight_scale, ctx: @model_ctx)
@b3 = MXNet::NDArray.random_normal(shape: [num_outputs], scale: weight_scale, ctx: @model_ctx)

nil

Mark all the parameters to be calculated their gradients automatically.

In [6]:
@all_params = [@w1, @b1, @w2, @b2, @w3, @b3]
@all_params.each(&:attach_grad)
nil

## Activation and loss functions

Define ReLU activation function.

In [7]:
def relu(x)
 MXNet::NDArray.maximum(x, MXNet::NDArray.zeros_like(x))
end

:relu

Define softmax cross entropy function that is used to calculate for computing predictionlosses.

In [8]:
def softmax_cross_entropy(y_hat_linear, y)
 return -MXNet::NDArray.nansum(y * MXNet::NDArray.log_softmax(y_hat_linear), axis: 0, exclude: true)
end

:softmax_cross_entropy

## Model definition

The definition of the neural network.

In [9]:
def net(x)
 # first hidden layer
 h1_linear = MXNet::NDArray.dot(x, @w1) + @b1
 h1 = relu(h1_linear)

 # second hidden layer
 h2_linear = MXNet::NDArray.dot(h1, @w2) + @b2
 h2 = relu(h2_linear)

 # output layer
 y_hat_linear = MXNet::NDArray.dot(h2, @w3) + @b3
 return y_hat_linear
end

:net

## Parameter optimizer

And parameter optimizer. In this notebook, stochastic gradient descent is used.

In [10]:
def sgd(params, lr)
 params.each do |param|
 param[0..-1] = param - lr * param.grad
 end
end

:sgd

## Evaluator

The next function is calculate the prediction accuracy for the given data set.

In [11]:
def evaluate_accuracy(data_iter)
 numerator = 0.0
 denominator = 0.0
 data_iter.each_with_index do |batch, i|
 data = batch.data[0].as_in_context(@model_ctx).reshape([-1, 784])
 label = batch.label[0].as_in_context(@model_ctx)
 output = net(data)
 predictions = MXNet::NDArray.argmax(output, axis: 1)
 numerator += MXNet::NDArray.sum(predictions == label)
 denominator += data.shape[0]
 end
 return (numerator / denominator).as_scalar
end

:evaluate_accuracy

## Training loop

Execute the training loop for 10 epochs.

In [12]:
epochs = 10
learning_rate = 0.001
smoothing_constant = 0.01

epochs.times do |e|
 start = Time.now
 cumulative_loss = 0.0
 train_iter.each_with_index do |batch, i|
 data = batch.data[0].as_in_context(@model_ctx).reshape([-1, 784])
 label = batch.label[0].as_in_context(@model_ctx)
 label_one_hot = MXNet::NDArray.one_hot(label, depth: 10)
 loss = MXNet::Autograd.record do
 output = net(data)
 softmax_cross_entropy(output, label_one_hot)
 end
 loss.backward()
 sgd(@all_params, learning_rate)
 cumulative_loss += MXNet::NDArray.sum(loss).as_scalar
 end
 
 val_accuracy = evaluate_accuracy(val_iter)
 train_accuracy = evaluate_accuracy(train_iter)
 duration = Time.now - start
 puts "Epoch #{e}. Loss: #{cumulative_loss/num_examples}, Train_acc #{train_accuracy}, Val_acc #{val_accuracy} (#{duration} sec)"
end

Epoch 0. Loss: 1.2580198553880055, Train_acc 0.8733324408531189, Val_acc 0.8747996687889099 (3.4784769 sec)
Epoch 1. Loss: 0.3402405142625173, Train_acc 0.925293505191803, Val_acc 0.9240785241127014 (2.9854487 sec)
Epoch 2. Loss: 0.2273845267256101, Train_acc 0.9493563175201416, Val_acc 0.9460136294364929 (3.1201378 sec)
Epoch 3. Loss: 0.16553548452655475, Train_acc 0.9613627195358276, Val_acc 0.9580328464508057 (3.3423264 sec)
Epoch 4. Loss: 0.1294232620060444, Train_acc 0.969266951084137, Val_acc 0.9645432829856873 (3.1851936 sec)
Epoch 5. Loss: 0.10531740031341712, Train_acc 0.9744197130203247, Val_acc 0.9686498641967773 (3.008946 sec)
Epoch 6. Loss: 0.08776766262004773, Train_acc 0.9786719679832458, Val_acc 0.9706530570983887 (2.9635756 sec)
Epoch 7. Loss: 0.07455756265173356, Train_acc 0.9818903207778931, Val_acc 0.97265625 (2.9956695 sec)
Epoch 8. Loss: 0.06412682893921931, Train_acc 0.9842416048049927, Val_acc 0.973557710647583 (2.9654095 sec)
Epoch 9. Loss: 0.05563920784989993,

10

Now we have the model with 0.97 validation accuracy.

## Prediction with the trained model

Let's use the trained model for prediction.

We need the following helper function to display the input image.

In [13]:
require 'chunky_png'
require 'base64'

def imshow(ary)
 height, width = ary.shape
 fig = ChunkyPNG::Image.new(width, height, ChunkyPNG::Color::TRANSPARENT)
 ary = ((ary - ary.min) / ary.max) * 255
 0.upto(height - 1) do |i|
 0.upto(width - 1) do |j|
 v = ary[i, j].round
 fig[j, i] = ChunkyPNG::Color.rgba(v, v, v, 255)
 end
 end

 src = 'data:image/png;base64,' + Base64.strict_encode64(fig.to_blob)
 IRuby.display "", mime: 'text/html'
end

:imshow

Define the function for prediction.

In [14]:
def predict(data)
 output = net(data)
 MXNet::NDArray.argmax(output, axis: 1)
end

:predict

Create a new data iterator for prediction.
And generate predictions for the first 10 samples.

In [15]:
sample_size = 10
sample_iter = test_iter = MXNet::IO::MNISTIter.new(
 image: File.join(@data_dir, 't10k-images-idx3-ubyte'),
 label: File.join(@data_dir, 't10k-labels-idx1-ubyte'),
 batch_size: sample_size,
 shuffle: true,
 seed: rand(100))
sample_iter.each do |batch|
 data = batch.data[0].as_in_context(@model_ctx)
 label = batch.label[0]

 im = data.transpose(axes: [1, 0, 2, 3]).reshape([10*28, 28, 1])
 imshow(im[0..-1, 0..-1, 0].to_narray)

 pred = predict(data.reshape([-1, 784]))
 puts "model predictions are: #{pred.inspect}"
 puts
 puts "true labels: #{label.inspect}"
 break
end

model predictions are: 
[0, 2, 7, 8, 7, 1, 5, 4, 3, 9]


true labels: 
[0, 2, 3, 8, 7, 1, 5, 8, 3, 9]



## Conclusion

In this notebook, the simple neural network model for MNIST classifier is implemented by MXNet NDArray API.

More complex example is available here: https://github.com/mrkn/mxnet.rb/blob/taiwan2018/example/scratch/resnet/wrn.rb