# DJL BERT Inference Demo

## Introduction

In this tutorial, you walk through running inference using DJL on a [BERT](https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270) QA model trained with MXNet. 
You can provide a question and a paragraph containing the answer to the model. The model is then able to find the best answer from the answer paragraph.

Example:
```text
Q: When did BBC Japan start broadcasting?
```

Answer paragraph:
```text
BBC Japan was a general entertainment channel, which operated between December 2004 and April 2006.
It ceased operations after its Japanese distributor folded.
```
And it picked the right answer:
```text
A: December 2004
```


## Preparation

This tutorial requires the installation of Java Kernel. To install the Java Kernel, see the [README](https://github.com/awslabs/djl/blob/v0.3.0/jupyter/README.md).

In [None]:
%maven ai.djl:api:0.3.0
%maven ai.djl.mxnet:mxnet-engine:0.3.0
%maven ai.djl:repository:0.3.0
%maven ai.djl.mxnet:mxnet-model-zoo:0.3.0
%maven org.slf4j:slf4j-api:1.7.26
%maven org.slf4j:slf4j-simple:1.7.26
%maven net.java.dev.jna:jna:5.3.0
 
// See https://github.com/awslabs/djl/blob/v0.3.0/mxnet/mxnet-engine/README.md
// for more MXNet library selection options
%maven ai.djl.mxnet:mxnet-native-auto:1.6.0

### Import java packages by running the following:

In [None]:
import java.io.*;
import java.nio.charset.*;
import java.nio.file.*;
import java.util.*;
import com.google.gson.*;
import com.google.gson.annotations.*;
import ai.djl.*;
import ai.djl.inference.*;
import ai.djl.metric.*;
import ai.djl.mxnet.zoo.*;
import ai.djl.mxnet.zoo.nlp.qa.*;
import ai.djl.repository.zoo.*;
import ai.djl.ndarray.*;
import ai.djl.ndarray.types.*;
import ai.djl.training.util.*;
import ai.djl.translate.*;
import ai.djl.util.*;


Now that all of the prerequisites are complete, start writing code to run inference with this example.

## Load the model and input

The model requires three inputs:

- word indices: The index of each word in a sentence
- word types: The type index of the word. All Questions will be labelled with 0 and all Answers will be labelled with 1.
- sequence length: You need to limit the length of the input. In this case, the length is 384
- valid length: The actual length of the question and answer tokens

**First, load the input**


In [None]:
var question = "When did BBC Japan start broadcasting?";
var resourceDocument = "BBC Japan was a general entertainment Channel.\n" +
 "Which operated between December 2004 and April 2006.\n" +
 "It ceased operations after its Japanese distributor folded.";

QAInput input = new QAInput(question, resourceDocument, 384);

Then load the model and vocabulary. Create a variable `model` by using the `ModelZoo` as shown in the following code. 

In [None]:
Criteria criteria = Criteria.builder()
 .optApplication(Application.NLP.QUESTION_ANSWER)
 .setTypes(QAInput.class, String.class)
 .optFilter("backbone", "bert")
 .optFilter("dataset", "book_corpus_wiki_en_uncased")
 .optProgress(new ProgressBar()).build();
ZooModel model = ModelZoo.loadModel(criteria);

## Run inference
Once the model is loaded, you can call `Predictor` and run inference as follows

In [None]:
Predictor predictor = model.newPredictor();
String answer = predictor.predict(input);
answer

Running inference on DJL is that easy. In the example, you use a model from the `ModelZoo`. However, you can also load the model on your own and use custom classes as the input and output. The process for that is illustrated in greater detail later in this tutorial. 

## Dive deep into Translator

Inference in deep learning is the process of predicting the output for a given input based on a pre-defined model. 
DJL abstracts away the whole process for ease of use. It can load the model, perform inference on the input, and provide 
output. DJL also allows you to provide user-defined inputs. The workflow looks like the following:

![image](https://github.com/awslabs/djl/blob/v0.3.0/examples/docs/img/workFlow.png?raw=true)

The red block ("Images") in the workflow is the input that DJL expects from you. The green block ("Images 
bounding box") is the output that you expect. Because DJL does not know which input to expect and which output format that you prefer, DJL provides the `Translator` interface so you can define your own 
input and output. 

The `Translator` interface encompasses the two white blocks: Pre-processing and Post-processing. The pre-processing 
component converts the user-defined input objects into an NDList, so that the `Predictor` in DJL can understand the 
input and make its prediction. Similarly, the post-processing block receives an NDList as the output from the 
`Predictor`. The post-processing block allows you to convert the output from the `Predictor` to the desired output 
format. 

### Pre-processing

Now, you need to convert the sentences into tokens. You can use `BertDataParser.tokenizer` to convert questions and answers into tokens. Then, use `BertDataParser.formTokens` to create Bert-Formatted tokens. Once you have properly formatted tokens, use `parser.token2idx` to create the indices. 

The following code block converts the question and answer defined earlier into bert-formatted tokens and creates word types for the tokens. 

In [None]:
// Create token lists for question and answer
List tokenQ = BertDataParser.tokenizer(question.toLowerCase());
List tokenA = BertDataParser.tokenizer(resourceDocument.toLowerCase());
int validLength = tokenQ.size() + tokenA.size();
System.out.println("Question Token: " + tokenQ);
System.out.println("Answer Token: " + tokenA);
System.out.println("Valid length: " + validLength);

Normally, words/sentences are represented as indices instead of Strings for training. They typically work like a vector in a n-dimensional space. In this case, you need to map them into indices. The form tokens also pad the sentence to the required length.

In [None]:
// Create Bert-formatted tokens
List tokens = BertDataParser.formTokens(tokenQ, tokenA, 384);
// Convert tokens into indices in the vocabulary
BertDataParser parser = model.getArtifact("vocab.json", BertDataParser::parse);
List indices = parser.token2idx(tokens);

Finally, the model needs to understand which part is the Question and which part is the Answer. Mask the tokens as follows:
```
[Question tokens...AnswerTokens...padding tokens] => [000000...11111....0000]
```

In [None]:
// Get token types
List tokenTypes = BertDataParser.getTokenTypes(tokenQ, tokenA, 384);

To properly convert them into `float[]` for `NDArray` creation, here is the helper function:

In [None]:
/**
 * Convert a List of Number to float array.
 *
 * @param list the list to be converted
 * @return float array
 */
public static float[] toFloatArray(List list) {
 float[] ret = new float[list.size()];
 int idx = 0;
 for (Number n : list) {
 ret[idx++] = n.floatValue();
 }
 return ret;
}

float[] indicesFloat = toFloatArray(indices);
float[] types = toFloatArray(tokenTypes);

Now that you have everything you need, you can create an NDList and populate all of the inputs you formatted earlier. You're done with pre-processing! 

#### Construct `Translator`

You need to do this processing within an implementation of the `Translator` interface. `Translator` is designed to do pre-processing and post-processing. You must define the input and output objects. It contains the following two override classes:
- `public NDList processInput(TranslatorContext ctx, I)`
- `public String processOutput(TranslatorContext ctx, O)`

Every translator takes in input and returns output in the form of generic objects. In this case, the translator takes input in the form of `QAInput` (I) and returns output as a `String` (O). `QAInput` is just an object that holds questions and answer; We have prepared the Input class for you.

Armed with the needed knowledge, you can write an implementation of the `Translator` interface. `BertTranslator` uses the code snippets explained previously to implement the `processInput`method. For more information, see [`NDManager`](https://javadoc.djl.ai/api/0.3.0/ai/djl/ndarray/NDManager.html).

```
manager.create(Number[] data, Shape)
manager.create(Number[] data)
```

The `Shape` for `data0` and `data1` is (num_of_batches, sequence_length). For `data2` is just 1.

In [None]:

public class BertTranslator implements Translator {
 private BertDataParser parser;
 private List tokens;
 private int seqLength;

 BertTranslator(BertDataParser parser) {
 this.parser = parser;
 this.seqLength = 384;
 }
 
 @Override
 public Batchifier getBatchifier() {
 return null;
 }

 @Override
 public NDList processInput(TranslatorContext ctx, QAInput input) throws IOException {
 BertDataParser parser = ctx.getModel().getArtifact("vocab.json", BertDataParser::parse);
 // Pre-processing - tokenize sentence
 // Create token lists for question and answer
 List tokenQ = BertDataParser.tokenizer(question.toLowerCase());
 List tokenA = BertDataParser.tokenizer(resourceDocument.toLowerCase());
 
 // Calculate valid length (length(Question tokens) + length(resourceDocument tokens))
 var validLength = tokenQ.size() + tokenA.size();
 
 // Create Bert-formatted tokens
 tokens = BertDataParser.formTokens(tokenQ, tokenA, 384);
 
 if (tokens == null) {
 throw new IllegalStateException("tokens is not defined");
 }
 
 // Convert tokens into indices in the vocabulary
 List indices = parser.token2idx(tokens);
 // Get token types
 List tokenTypes = BertDataParser.getTokenTypes(tokenQ, tokenA, 384);

 NDManager manager = ctx.getNDManager();
 
 // Using the manager created, create NDArrays for the indices, types, and valid length.
 // in that order. The type of the NDArray should all be float
 NDArray indicesNd = manager.create(toFloatArray(indices), new Shape(1, 384));
 indicesNd.setName("data0");
 NDArray typesNd = manager.create(toFloatArray(tokenTypes), new Shape(1, 384));
 typesNd.setName("data1");
 NDArray validLengthNd = manager.create(new float[]{validLength});
 validLengthNd.setName("data2");

 NDList list = new NDList(3);
 list.add(indicesNd);
 list.add(typesNd);
 list.add(validLengthNd);
 
 return list;
 }

 @Override
 public String processOutput(TranslatorContext ctx, NDList list) {
 NDArray array = list.singletonOrThrow();
 NDList output = array.split(2, 2);
 // Get the formatted logits result
 NDArray startLogits = output.get(0).reshape(new Shape(1, -1));
 NDArray endLogits = output.get(1).reshape(new Shape(1, -1));
 // Get Probability distribution
 NDArray startProb = startLogits.softmax(-1);
 NDArray endProb = endLogits.softmax(-1);
 int startIdx = (int) startProb.argMax(1).getLong();
 int endIdx = (int) endProb.argMax(1).getLong();
 return tokens.subList(startIdx, endIdx + 1).toString();
 }
 }

Congrats! You have created your first Translator! We have pre-filled the `processOutput()` that will process the `NDList` and return it in a desired format. `processInput()` and `processOutput()` offer the flexibility to get the predictions from the model in any format you desire. 


With the Translator implemented, you need to bring up the predictor that uses your `Translator` to start making predictions. You can find the usage for `Predictor` in the [Predictor Javadoc](https://javadoc.djl.ai/api/0.3.0/ai/djl/inference/Predictor.html). Create a translator and use the `question` and `resourceDocument` provided previously.

In [None]:
String predictResult = null;

QAInput input = new QAInput(question, resourceDocument, 384);
BertTranslator translator = new BertTranslator(parser);

// Create a Predictor and use it to predict the output
try (Predictor predictor = model.newPredictor(translator)) {
 predictResult = predictor.predict(input);
}

System.out.println(question);
System.out.println(predictResult);

Based on the input, the following result will be shown:
```
[december, 2004]
```
That's it! 

You can try with more questions and answers. Here are the samples:

**Answer Material**

The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse ("Norman" comes from "Norseman") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct cultural and ethnic identity of the Normans emerged initially in the first half of the 10th century, and it continued to evolve over the succeeding centuries.


**Question**

Q: When were the Normans in Normandy?
A: 10th and 11th centuries

Q: In what country is Normandy located?
A: france