# Preparations


Execute the following code blocks to configure the session and import relevant modules.

In [None]:
%config InlineBackend.figure_format ='retina'
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
import os
import sys
import math
import numpy as np
import pandas as pd
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.utils import np_utils
from keras.preprocessing.sequence import pad_sequences
from keras.optimizer_v2.adam import Adam
# Alternatively:
# from tensorflow.keras.optimizers import Adam
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import rnnutils

# Lab session: predicting time series, discrete state space


## Aims

In this lab the aim is to predict a character in the alphabet given a short subsequence. Basically, the network learns to output the probability distribution of a character conditional on a sequence of input characters. Since the state space is discrete, you need to think about what output activity and loss function to use.

As in the previous lab, to help you along the way, some of the steps have been prepared in advance, but in most cases, your task is to complete missing code. Don't hesitate to change parameter settings and experiment with the model architectures.

## Prepare data

We will work with the English alphabet, which consists of 26 characters (states). The predictions will be based on alphabet substrings, such that the model given an input "CDE" should output "F", "STUV" "W" and so on.

In [None]:
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

Since a neural network cannot deal directly with characters, we map each individual letter to an integer (integer encoding):

In [None]:
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))

Training data will be generated by selecting n-tuple (n<=6) slices from the alphabet, where the output will be the last character and the input the preceding characters of a slice. The following code generates training data. 

In [None]:
num_inputs = 200 # number of training samples (randomly generated)
max_len = 5 # maximum number of sequence length
dataX = []
dataY = []
for i in range(num_inputs):
 start = np.random.randint(len(alphabet)-2)
 end = np.random.randint(start, min(start+max_len,len(alphabet)-1))
 sequence_in = alphabet[start:end+1]
 sequence_out = alphabet[end + 1]
 dataX.append([char_to_int[char] for char in sequence_in])
 dataY.append(char_to_int[sequence_out])

Take a minute to inspect the dataX inputs. As you will see, the length of different entries differ. Prior to training, we need to pad input sequences shorter than five characters with zeros. Can you think of why this is necessary?

In [None]:
# convert list of lists to array and pad sequences if needed (with zero)
X = pad_sequences(dataX, maxlen=max_len, dtype='float32')
# reshape X to be [samples, time steps, features]
X = np.reshape(X, (X.shape[0], max_len, 1))
# normalize by length of alphabet (26)
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

For convenience, we define parameters related to training and instantiate the optimizer (Adam). Test different values to see what happens. For more information, see [the keras documentation on Adam](https://keras.io/api/optimizers/adam).

In [None]:
epochsVal = 100
learnRateVal = 0.001
batchSizeVal = 10
opt = Adam(learning_rate=learnRateVal, decay=learnRateVal / epochsVal)

Finally, build your model. See the [keras LSTM documentation](https://keras.io/api/layers/recurrent_layers/lstm/) for information on more parameter settings, and [documentation on losses](https://keras.io/api/losses/) to learn more about different loss functions.

In [None]:
model = Sequential()
# Add LSTM layers
# model.add(LSTM(units=32, input_shape=(), return_sequences=..., activation=...))
# Add dense layer with activation for categorical output
# model.add(Dense(..., activation="")
# Compile model using loss function for categorical data
# model.compile(loss=... , optimizer=opt, metrics=["accuracy"])

In [None]:
model = Sequential()
# Add LSTM layers; X.shape[1] refers to the number of columns in X which is the number of time steps, or window size
model.add(LSTM(units=8, input_shape=(X.shape[1], 1), return_sequences=False, activation="tanh"))
# Add dense layer with activation for categorical output
model.add(Dense(len(alphabet), activation="softmax"))
# Compile model using loss function for categorical data
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

Once you have compiled the model, fit it using the parameters you set above and evaluate.

In [None]:
H = model.fit(X, y, epochs=epochsVal, batch_size=batchSizeVal, validation_split=0.1, verbose=1)

In [None]:
scores = model.evaluate(X, y, verbose=0)
print("Model train accuracy: %.2f%%" % (scores[1]*100))

In [None]:
# 
# rnnutils.plot_loss_acc(H)

## Printing predictions

Finally, to test some predictions, you can select an entry from the input data and run `model.predict`. Briefly, the code will select input sequences from the training data and the model will output predictions based on an input. If you increase the number of examples you will probably see cases where the predictions are wrong.

In [None]:
num_examples = 2
for i in range(num_examples):
 pattern_index = np.random.randint(len(dataX))
 pattern = dataX[pattern_index]
 x = pad_sequences([pattern], maxlen=max_len, dtype='float32')
 x = np.reshape(x, (1, max_len, 1))
 x = x / float(len(alphabet))
 prediction = model.predict(x, verbose=0)
 index = np.argmax(prediction)
 result = int_to_char[index]
 seq_in = [int_to_char[value] for value in pattern]
 print (pattern_index, pattern, seq_in, "->", result)