# BentoML Example:  Keras Text Classification

**BentoML makes moving trained ML models to production easy:**

* Package models trained with **any ML framework** and reproduce them for model serving in production
* **Deploy anywhere** for online API serving or offline batch serving
* High-Performance API model server with *adaptive micro-batching* support
* Central hub for managing models and deployment process via Web UI and APIs
* Modular and flexible design making it *adaptable to your infrastrcuture*

BentoML is a framework for serving, managing, and deploying machine learning models. It is aiming to bridge the gap between Data Science and DevOps, and enable teams to deliver prediction services in a fast, repeatable, and scalable way.

Before reading this example project, be sure to check out the [Getting started guide](https://github.com/bentoml/BentoML/blob/master/guides/quick-start/bentoml-quick-start-guide.ipynb) to learn about the basic concepts in BentoML. 

This notebook demonstrates how to use BentoML to turn a Keras model into a docker image containing a REST API server serving this model, how to use your ML service built with BentoML as a CLI tool, and how to distribute it a pypi package.

This notebook is built based on Keras's IMDB LSTM tutorial [here](https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py).

![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=keras&ea=keras-text-classification&dt=keras-text-classification)

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
!pip install -q bentoml tensorflow==1.14.0 numpy>=1.16.6



In [2]:
from __future__ import absolute_import, division, print_function

import numpy as np
import tensorflow as tf
print("Tensorflow Version: %s" % tf.__version__)

from tensorflow import keras
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding
from tensorflow.keras.layers import LSTM
from tensorflow.keras.datasets import imdb

import bentoml
print("BentoML Version: %s" % bentoml.__version__)

Tensorflow Version: 1.14.0
BentoML Version: 0.9.0.pre+7.g8af1c8b


In [3]:
max_features = 1000
maxlen = 80 # cut texts after this number of words (among top max_features most common words)
batch_size = 300
index_from=3 # word index offset

# Prepare Dataset
Download the IMDB dataset

In [4]:
# A dictionary mapping words to an integer index
imdb.load_data(num_words=max_features)
word_index = imdb.get_word_index()

# The first indices are reserved
word_index = {k:(v+index_from) for k,v in word_index.items()} 
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2  # unknown

# Use decode_review to look at original review text in training/testing data
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
def decode_review(encoded_text):
    return ' '.join([reverse_word_index.get(i, '?') for i in encoded_text])

  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


In [5]:
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features, index_from=index_from)

In [6]:
x_train = sequence.pad_sequences(x_train,
                                 value=word_index["<PAD>"],
                                 padding='post',
                                 maxlen=maxlen)

x_test = sequence.pad_sequences(x_test,
                                value=word_index["<PAD>"],
                                padding='post',
                                maxlen=maxlen)

# Model Training & Evaluation

In [7]:
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

model.summary()

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 128)         128000    
_________________________________________________________________
lstm (LSTM)                  (None, 128)               131584    
_________________________________________________________________
dense (Dense)                (None, 1)                 129       
Total params: 259,713
Trainable params: 259,713
Non-trainable params: 0
_________________________________________________________________


In [8]:
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=3, # for demo purpose :P
          validation_data=(x_test, y_test))

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Train on 25000 samples, validate on 25000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7f7308679898>

In [9]:
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)

print('Test score:', score)
print('Test accuracy:', acc)

Test score: 0.40862307107448576
Test accuracy: 0.81524


## Define BentoService for model serving

In [13]:
%%writefile keras_text_classification_service.py
from typing import List

import pandas as pd
import numpy as np
from tensorflow import keras
from tensorflow.keras.preprocessing import sequence, text
from bentoml import api, env, BentoService, artifacts
from bentoml.frameworks.keras import KerasModelArtifact
from bentoml.service.artifacts.common import PickleArtifact
from bentoml.adapters import JsonInput
from bentoml.types import JsonSerializable


max_features = 1000

@artifacts([
    KerasModelArtifact('model'),
    PickleArtifact('word_index')
])
@env(pip_packages=['tensorflow==1.14.0', 'numpy', 'pandas'])
class KerasTextClassificationService(BentoService):
   
    def word_to_index(self, word):
        if word in self.artifacts.word_index and self.artifacts.word_index[word] <= max_features:
            return self.artifacts.word_index[word]
        else:
            return self.artifacts.word_index["<UNK>"]
    
    def preprocessing(self, text_str):
        sequence = text.text_to_word_sequence(text_str)
        return list(map(self.word_to_index, sequence))
    
    @api(input=JsonInput(), batch=True)
    def predict(self, parsed_jsons: List[JsonSerializable]):
        input_datas = [self.preprocessing(parsed_json['text']) for parsed_json in parsed_jsons]
        input_datas = sequence.pad_sequences(input_datas,
                                             value=self.artifacts.word_index["<PAD>"],
                                             padding='post',
                                             maxlen=80)

        return self.artifacts.model.predict_classes(input_datas).T[0]

Overwriting keras_text_classification_service.py


## Save BentoService to file archive

In [14]:
# 1) import the custom BentoService defined above
from keras_text_classification_service import KerasTextClassificationService

# 2) `pack` it with required artifacts
bento_svc = KerasTextClassificationService()
bento_svc.pack('model', model)
bento_svc.pack('word_index', word_index)

# 3) save your BentoSerivce
saved_path = bento_svc.save()


[2020-09-23 13:27:55,262] INFO - Detected non-PyPI-released BentoML installed, copying local BentoML modulefiles to target saved bundle path..


no previously-included directories found matching 'e2e_tests'
no previously-included directories found matching 'tests'
no previously-included directories found matching 'benchmark'


UPDATING BentoML-0.9.0rc0+7.g8af1c8b/bentoml/_version.py
set BentoML-0.9.0rc0+7.g8af1c8b/bentoml/_version.py to '0.9.0.pre+7.g8af1c8b'
[2020-09-23 13:27:56,076] INFO - BentoService bundle 'KerasTextClassificationService:20200923132753_A6330E' saved to: /home/bentoml/bentoml/repository/KerasTextClassificationService/20200923132753_A6330E


## REST API Model Serving


To start a REST API model server with the BentoService saved above, use the bentoml serve command:

In [1]:
!bentoml serve KerasTextClassificationService:latest

[2020-09-23 13:28:28,754] INFO - Getting latest version KerasTextClassificationService:20200923132753_A6330E
[2020-09-23 13:28:28,754] INFO - Starting BentoML API server in development mode..
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2020-09-23 13:28:31.030552: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-09-23 13:28:31.045371: I tensorflow/stream_e

If you are running this notebook from Google Colab, you can start the dev server with `--run-with-ngrok` option, to gain acccess to the API endpoint via a public endpoint managed by [ngrok](https://ngrok.com/):

In [None]:
!bentoml serve KerasTextClassificationService:latest --run-with-ngrok

### Send prediction request to REST API server

*Run the following command in terminal to make a HTTP request to the API server*
```bash
curl -i \
--header "Content-Type: application/json" \
--request POST \
--data '{"text": "best movie ever"}' \
localhost:5000/predict
```

## Containerize model server with Docker


One common way of distributing this model API server for production deployment, is via Docker containers. And BentoML provides a convenient way to do that.

Note that docker is **not available in Google Colab**. You will need to download and run this notebook locally to try out this containerization with docker feature.

If you already have docker configured, simply run the follow command to product a docker container serving the IrisClassifier prediction service created above:

In [2]:
!bentoml containerize KerasTextClassificationService:latest -t kerastextclassificationservice:latest

[2020-09-23 13:29:13,718] INFO - Getting latest version KerasTextClassificationService:20200923132753_A6330E
[39mFound Bento: /home/bentoml/bentoml/repository/KerasTextClassificationService/20200923132753_A6330E[0m
Building Docker image kerastextclassificationservice:latest from KerasTextClassificationService:latest 
-[39mStep 1/15 : FROM bentoml/model-server:0.9.0.pre-py36[0m
[39m ---> 4aac43d10e50[0m
[39mStep 2/15 : ARG EXTRA_PIP_INSTALL_ARGS=[0m
[39m ---> Using cache[0m
[39m ---> 790054f5ad85[0m
[39mStep 3/15 : ENV EXTRA_PIP_INSTALL_ARGS $EXTRA_PIP_INSTALL_ARGS[0m
[39m ---> Using cache[0m
[39m ---> 85b0a1b40542[0m
[39mStep 4/15 : COPY environment.yml requirements.txt setup.sh* bentoml-init.sh python_version* /bento/[0m
|[39m ---> 7b194657ef63[0m
[39mStep 5/15 : WORKDIR /bento[0m
[39m ---> Running in 71f0d3c7e9c5[0m
\[39m ---> f08b2b76d924[0m
[39mStep 6/15 : RUN chmod +x /bento/bentoml-init.sh[0m
[39m ---> Running in 04750f5c4d86[0m
|[39m ---> bdc7

In [3]:
!docker run -p 5000:5000 kerastextclassificationservice

[2020-09-23 05:36:30,701] INFO - Starting BentoML API server in production mode..
[2020-09-23 05:36:30,921] INFO - get_gunicorn_num_of_workers: 3, calculated by cpu count
[2020-09-23 05:36:30 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2020-09-23 05:36:30 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2020-09-23 05:36:30 +0000] [1] [INFO] Using worker: sync
[2020-09-23 05:36:30 +0000] [13] [INFO] Booting worker with pid: 13
[2020-09-23 05:36:30 +0000] [14] [INFO] Booting worker with pid: 14
[2020-09-23 05:36:31 +0000] [15] [INFO] Booting worker with pid: 15
2020-09-23 05:36:33.325834: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-09-23 05:36:33.328236: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-09-23 05:36:33.331867: I tensorflow/core/platform/profile_utils/cpu_utils

## Launch inference job from CLI

BentoML cli supports loading and running a packaged model from CLI. With the DataframeInput adapter, the CLI command supports reading input Dataframe data from CLI argument or local csv or json files:

In [2]:
!bentoml run KerasTextClassificationService:latest predict --input '{"text": "bad movie"}'

[2020-08-04 12:54:48,270] INFO - Getting latest version KerasTextClassificationService:20200804125206_4639A5
2020-08-04 12:54:50.476102: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-08-04 12:54:50.491479: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-04 12:54:50.491867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
2020-08-04 12:54:50.492035: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2020-08-04 12:54:50.493344: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2020-08-04 12:54:50.494672: 

# Deployment Options

If you are at a small team with limited engineering or DevOps resources, try out automated deployment with BentoML CLI, currently supporting AWS Lambda, AWS SageMaker, and Azure Functions:
- [AWS Lambda Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_lambda.html)
- [AWS SageMaker Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_sagemaker.html)
- [Azure Functions Deployment Guide](https://docs.bentoml.org/en/latest/deployment/azure_functions.html)

If the cloud platform you are working with is not on the list above, try out these step-by-step guide on manually deploying BentoML packaged model to cloud platforms:
- [AWS ECS Deployment](https://docs.bentoml.org/en/latest/deployment/aws_ecs.html)
- [Google Cloud Run Deployment](https://docs.bentoml.org/en/latest/deployment/google_cloud_run.html)
- [Azure container instance Deployment](https://docs.bentoml.org/en/latest/deployment/azure_container_instance.html)
- [Heroku Deployment](https://docs.bentoml.org/en/latest/deployment/heroku.html)

Lastly, if you have a DevOps or ML Engineering team who's operating a Kubernetes or OpenShift cluster, use the following guides as references for implementating your deployment strategy:
- [Kubernetes Deployment](https://docs.bentoml.org/en/latest/deployment/kubernetes.html)
- [Knative Deployment](https://docs.bentoml.org/en/latest/deployment/knative.html)
- [Kubeflow Deployment](https://docs.bentoml.org/en/latest/deployment/kubeflow.html)
- [KFServing Deployment](https://docs.bentoml.org/en/latest/deployment/kfserving.html)
- [Clipper.ai Deployment Guide](https://docs.bentoml.org/en/latest/deployment/clipper.html)