{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "j0a4mTk9o1Qg"
   },
   "outputs": [],
   "source": [
    "# Copyright 2019 Google Inc.\n",
    "\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "\n",
    "#     http://www.apache.org/licenses/LICENSE-2.0\n",
    "\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "dCpvgG0vwXAZ"
   },
   "source": [
    "# Predicting Movie Review Sentiment with [kpe/bert-for-tf2](https://github.com/kpe/bert-for-tf2)\n",
    "\n",
    "\n",
    "**BentoML makes moving trained ML models to production easy:**\n",
    "\n",
    "* Package models trained with **any ML framework** and reproduce them for model serving in production\n",
    "* **Deploy anywhere** for online API serving or offline batch serving\n",
    "* High-Performance API model server with *adaptive micro-batching* support\n",
    "* Central hub for managing models and deployment process via Web UI and APIs\n",
    "* Modular and flexible design making it *adaptable to your infrastrcuture*\n",
    "\n",
    "BentoML is a framework for serving, managing, and deploying machine learning models. It is aiming to bridge the gap between Data Science and DevOps, and enable teams to deliver prediction services in a fast, repeatable, and scalable way.\n",
    "\n",
    "Before reading this example project, be sure to check out the [Getting started guide](https://github.com/bentoml/BentoML/blob/master/guides/quick-start/bentoml-quick-start-guide.ipynb) to learn about the basic concepts in BentoML.\n",
    "\n",
    "\n",
    "A modification of https://github.com/kpe/bert-for-tf2/blob/master/examples/gpu_movie_reviews.ipynb,\n",
    "which is a modification of https://github/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb using the Tensorflow 2.0 Keras\n",
    "\n",
    "![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=tensorflow&ea=tensorflow_2_bert_movie_review&dt=tensorflow_2_bert_movie_review)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -q bentoml \"tqdm==4.32.2\" \"bert-for-tf2==0.14.5\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "hsZvic2YxnTz"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "import math\n",
    "import datetime\n",
    "from tqdm import tqdm\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "\n",
    "# tf.config.set_visible_devices([], 'GPU')  # disable GPU"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "Evlk1N78HIXM",
    "outputId": "9ada0ad2-3297-414d-b7bb-748063302382"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tensorflow:  2.1.0\n",
      "Python:  3.6.10 |Anaconda, Inc.| (default, Jan  7 2020, 21:14:29) \n",
      "[GCC 7.3.0]\n",
      "WARNING:tensorflow:From <ipython-input-3-7068fd7facab>:3: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Use `tf.config.list_physical_devices('GPU')` instead.\n",
      "GPU:  True\n"
     ]
    }
   ],
   "source": [
    "print(\"Tensorflow: \", tf.__version__)\n",
    "print(\"Python: \", sys.version)\n",
    "print(\"GPU: \", tf.test.is_gpu_available())\n",
    "assert sys.version_info.major == 3 and sys.version_info.minor == 6  # required by clipper benchmark"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "ZtI7cKWDbUVc"
   },
   "outputs": [],
   "source": [
    "import bert\n",
    "from bert import BertModelLayer\n",
    "from bert.loader import StockBertConfig, map_stock_config_to_params, load_stock_weights\n",
    "from bert.tokenization.bert_tokenization import FullTokenizer\n",
    "from tensorflow import keras\n",
    "import os\n",
    "import re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "fom_ff20gyy6"
   },
   "outputs": [],
   "source": [
    "from tensorflow import keras\n",
    "import os\n",
    "import re\n",
    "\n",
    "# Load all files from a directory in a DataFrame.\n",
    "def load_directory_data(directory):\n",
    "  data = {}\n",
    "  data[\"sentence\"] = []\n",
    "  data[\"sentiment\"] = []\n",
    "  for file_path in tqdm(os.listdir(directory), desc=os.path.basename(directory)):\n",
    "    with tf.io.gfile.GFile(os.path.join(directory, file_path), \"r\") as f:\n",
    "      data[\"sentence\"].append(f.read())\n",
    "      data[\"sentiment\"].append(re.match(\"\\d+_(\\d+)\\.txt\", file_path).group(1))\n",
    "  return pd.DataFrame.from_dict(data)\n",
    "\n",
    "# Merge positive and negative examples, add a polarity column and shuffle.\n",
    "def load_dataset(directory):\n",
    "    pos_df = load_directory_data(os.path.join(directory, \"pos\"))\n",
    "    neg_df = load_directory_data(os.path.join(directory, \"neg\"))\n",
    "    pos_df[\"polarity\"] = 1\n",
    "    neg_df[\"polarity\"] = 0\n",
    "    return pd.concat([pos_df, neg_df]).sample(frac=1).reset_index(drop=True)\n",
    "\n",
    "# Download and process the dataset files.\n",
    "def download_and_load_datasets(force_download=False):\n",
    "    dataset = tf.keras.utils.get_file(\n",
    "        fname=\"aclImdb.tar.gz\", \n",
    "        origin=\"http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz\", \n",
    "        extract=True)\n",
    "\n",
    "    train_df = load_dataset(os.path.join(os.path.dirname(dataset), \n",
    "                           \"aclImdb\", \"train\"))\n",
    "    test_df = load_dataset(os.path.join(os.path.dirname(dataset), \n",
    "                          \"aclImdb\", \"test\"))\n",
    "\n",
    "    return train_df, test_df\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "CaE2G_2DdzVg"
   },
   "source": [
    "Let's use the `MovieReviewData` class below, to prepare/encode \n",
    "the data for feeding into our BERT model, by:\n",
    "  - tokenizing the text\n",
    "  - trim or pad it to a `max_seq_len` length\n",
    "  - append the special tokens `[CLS]` and `[SEP]`\n",
    "  - convert the string tokens to numerical `ID`s using the original model's token encoding from `vocab.txt`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "2abfwdn-g135"
   },
   "outputs": [],
   "source": [
    "import bert\n",
    "from bert import BertModelLayer\n",
    "from bert.loader import StockBertConfig, map_stock_config_to_params, load_stock_weights\n",
    "\n",
    "\n",
    "class MovieReviewData:\n",
    "    DATA_COLUMN = \"sentence\"\n",
    "    LABEL_COLUMN = \"polarity\"\n",
    "\n",
    "    def __init__(self, tokenizer: FullTokenizer, sample_size=None, max_seq_len=1024):\n",
    "        self.tokenizer = tokenizer\n",
    "        self.sample_size = sample_size\n",
    "        self.max_seq_len = 0\n",
    "        train, test = download_and_load_datasets()\n",
    "        \n",
    "        train, test = map(lambda df: df.reindex(df[MovieReviewData.DATA_COLUMN].str.len().sort_values().index), \n",
    "                          [train, test])\n",
    "                \n",
    "        if sample_size is not None:\n",
    "            train, test = train.head(sample_size), test.head(sample_size)\n",
    "            # train, test = map(lambda df: df.sample(sample_size), [train, test])\n",
    "        \n",
    "        ((self.train_x, self.train_y),\n",
    "         (self.test_x, self.test_y)) = map(self._prepare, [train, test])\n",
    "\n",
    "        print(\"max seq_len\", self.max_seq_len)\n",
    "        self.max_seq_len = min(self.max_seq_len, max_seq_len)\n",
    "        ((self.train_x, self.train_x_token_types),\n",
    "         (self.test_x, self.test_x_token_types)) = map(self._pad, \n",
    "                                                       [self.train_x, self.test_x])\n",
    "\n",
    "    def _prepare(self, df):\n",
    "        x, y = [], []\n",
    "        with tqdm(total=df.shape[0], unit_scale=True) as pbar:\n",
    "            for ndx, row in df.iterrows():\n",
    "                text, label = row[MovieReviewData.DATA_COLUMN], row[MovieReviewData.LABEL_COLUMN]\n",
    "                tokens = self.tokenizer.tokenize(text)\n",
    "                tokens = [\"[CLS]\"] + tokens + [\"[SEP]\"]\n",
    "                token_ids = self.tokenizer.convert_tokens_to_ids(tokens)\n",
    "                self.max_seq_len = max(self.max_seq_len, len(token_ids))\n",
    "                x.append(token_ids)\n",
    "                y.append(int(label))\n",
    "                pbar.update()\n",
    "        return np.array(x), np.array(y)\n",
    "\n",
    "    def _pad(self, ids):\n",
    "        x, t = [], []\n",
    "        token_type_ids = [0] * self.max_seq_len\n",
    "        for input_ids in ids:\n",
    "            input_ids = input_ids[:min(len(input_ids), self.max_seq_len - 2)]\n",
    "            input_ids = input_ids + [0] * (self.max_seq_len - len(input_ids))\n",
    "            x.append(np.array(input_ids))\n",
    "            t.append(token_type_ids)\n",
    "        return np.array(x), np.array(t)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "SGL0mEoNFGlP"
   },
   "source": [
    "## A tweak\n",
    "\n",
    "Because of a `tf.train.load_checkpoint` limitation requiring list permissions on the google storage bucket, we need to copy the pre-trained BERT weights locally."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "lw_F488eixTV"
   },
   "outputs": [],
   "source": [
    "asset_path = 'asset'\n",
    "bert_model_name = \"uncased_L-12_H-768_A-12\"\n",
    "bert_ckpt_dir    = os.path.join(asset_path, bert_model_name)\n",
    "bert_ckpt_file   = os.path.join(bert_ckpt_dir, \"bert_model.ckpt\")\n",
    "bert_config_file = os.path.join(bert_ckpt_dir, \"bert_config.json\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Archive:  asset/uncased_L-12_H-768_A-12.zip\n",
      "   creating: asset/uncased_L-12_H-768_A-12/\n",
      "  inflating: asset/uncased_L-12_H-768_A-12/bert_model.ckpt.meta  \n",
      "  inflating: asset/uncased_L-12_H-768_A-12/bert_model.ckpt.data-00000-of-00001  \n",
      "  inflating: asset/uncased_L-12_H-768_A-12/vocab.txt  \n",
      "  inflating: asset/uncased_L-12_H-768_A-12/bert_model.ckpt.index  \n",
      "  inflating: asset/uncased_L-12_H-768_A-12/bert_config.json  \n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "100  388M  100  388M    0     0  3564k      0  0:01:51  0:01:51 --:--:-- 3948k\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "\n",
    "if [ ! -f asset/uncased_L-12_H-768_A-12.zip ]; then\n",
    "    curl -o asset/uncased_L-12_H-768_A-12.zip --create-dirs https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip\n",
    "fi\n",
    "if [ ! -d asset/uncased_L-12_H-768_A-12 ]; then\n",
    "    unzip asset/uncased_L-12_H-768_A-12.zip -d asset/\n",
    "fi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "G4xPTleh2X2b"
   },
   "source": [
    "# Preparing the Data\n",
    "\n",
    "Now let's fetch and prepare the data by taking the first `max_seq_len` tokenens after tokenizing with the BERT tokenizer, und use `sample_size` examples for both training and testing."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "XA8WHJgzhIZf"
   },
   "source": [
    "To keep training fast, we'll take a sample of about 2500 train and test examples, respectively, and use the first 128 tokens only (transformers memory and computation requirements scale quadraticly with the sequence length - so with a TPU you might use `max_seq_len=512`, but on a GPU this would be too slow, and you will have to use a very small `batch_size`s to fit the model into the GPU memory)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 171
    },
    "colab_type": "code",
    "id": "kF_3KhGQ0GTc",
    "outputId": "fd993d23-61ce-4aae-c992-5b5591123198"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "pos: 100%|██████████| 12500/12500 [00:00<00:00, 19259.77it/s]\n",
      "neg: 100%|██████████| 12500/12500 [00:00<00:00, 19675.69it/s]\n",
      "pos: 100%|██████████| 12500/12500 [00:00<00:00, 19420.16it/s]\n",
      "neg: 100%|██████████| 12500/12500 [00:00<00:00, 18046.21it/s]\n",
      "100%|██████████| 2.56k/2.56k [00:02<00:00, 879it/s]  \n",
      "100%|██████████| 2.56k/2.56k [00:02<00:00, 866it/s]  \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "max seq_len 178\n",
      "CPU times: user 20 s, sys: 5.05 s, total: 25.1 s\n",
      "Wall time: 25.5 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "tokenizer = FullTokenizer(vocab_file=os.path.join(bert_ckpt_dir, \"vocab.txt\"))\n",
    "data = MovieReviewData(tokenizer, \n",
    "                       sample_size=10*128*2, #10*128*2\n",
    "                       max_seq_len=128)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 103
    },
    "colab_type": "code",
    "id": "prRQM8pDi8xI",
    "outputId": "b98433d4-c7e6-4bb5-af54-941f52e0b8e7"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "            train_x (2560, 128)\n",
      "train_x_token_types (2560, 128)\n",
      "            train_y (2560,)\n",
      "             test_x (2560, 128)\n",
      "        max_seq_len 128\n"
     ]
    }
   ],
   "source": [
    "print(\"            train_x\", data.train_x.shape)\n",
    "print(\"train_x_token_types\", data.train_x_token_types.shape)\n",
    "print(\"            train_y\", data.train_y.shape)\n",
    "\n",
    "print(\"             test_x\", data.test_x.shape)\n",
    "\n",
    "print(\"        max_seq_len\", data.max_seq_len)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "sfRnHSz3iSXz"
   },
   "source": [
    "## Adapter BERT\n",
    "\n",
    "If we decide to use [adapter-BERT](https://arxiv.org/abs/1902.00751) we need some helpers for freezing the original BERT layers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "IuMOGwFui4it"
   },
   "outputs": [],
   "source": [
    "def flatten_layers(root_layer):\n",
    "    if isinstance(root_layer, keras.layers.Layer):\n",
    "        yield root_layer\n",
    "    for layer in root_layer._layers:\n",
    "        for sub_layer in flatten_layers(layer):\n",
    "            yield sub_layer\n",
    "\n",
    "\n",
    "def freeze_bert_layers(l_bert):\n",
    "    \"\"\"\n",
    "    Freezes all but LayerNorm and adapter layers - see arXiv:1902.00751.\n",
    "    \"\"\"\n",
    "    for layer in flatten_layers(l_bert):\n",
    "        if layer.name in [\"LayerNorm\", \"adapter-down\", \"adapter-up\"]:\n",
    "            layer.trainable = True\n",
    "        elif len(layer._layers) == 0:\n",
    "            layer.trainable = False\n",
    "        l_bert.embeddings_layer.trainable = False\n",
    "\n",
    "\n",
    "def create_learning_rate_scheduler(max_learn_rate=5e-5,\n",
    "                                   end_learn_rate=1e-7,\n",
    "                                   warmup_epoch_count=10,\n",
    "                                   total_epoch_count=90):\n",
    "\n",
    "    def lr_scheduler(epoch):\n",
    "        if epoch < warmup_epoch_count:\n",
    "            res = (max_learn_rate/warmup_epoch_count) * (epoch + 1)\n",
    "        else:\n",
    "            res = max_learn_rate*math.exp(\n",
    "                math.log(end_learn_rate/max_learn_rate)*(epoch-warmup_epoch_count+1)/(total_epoch_count-warmup_epoch_count+1))\n",
    "        return float(res)\n",
    "    learning_rate_scheduler = tf.keras.callbacks.LearningRateScheduler(lr_scheduler, verbose=1)\n",
    "\n",
    "    return learning_rate_scheduler\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ccp5trMwRtmr"
   },
   "source": [
    "# Creating a model\n",
    "\n",
    "Now let's create a classification model using [adapter-BERT](https//arxiv.org/abs/1902.00751), which is clever way of reducing the trainable parameter count, by freezing the original BERT weights, and adapting them with two FFN bottlenecks (i.e. `adapter_size` bellow) in every BERT layer.\n",
    "\n",
    "**N.B.** The commented out code below show how to feed a `token_type_ids`/`segment_ids` sequence (which is not needed in our case)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "6o2a5ZIvRcJq"
   },
   "outputs": [],
   "source": [
    "def create_model(max_seq_len, adapter_size=64):\n",
    "    \"\"\"Creates a classification model.\"\"\"\n",
    "\n",
    "    #adapter_size = 64  # see - arXiv:1902.00751\n",
    "\n",
    "    # create the bert layer\n",
    "    with tf.io.gfile.GFile(bert_config_file, \"r\") as reader:\n",
    "        bc = StockBertConfig.from_json_string(reader.read())\n",
    "        bert_params = map_stock_config_to_params(bc)\n",
    "        bert_params.adapter_size = adapter_size\n",
    "        bert = BertModelLayer.from_params(bert_params, name=\"bert\")\n",
    "\n",
    "    input_ids      = keras.layers.Input(shape=(max_seq_len,), dtype='int32', name=\"input_ids\")\n",
    "    # token_type_ids = keras.layers.Input(shape=(max_seq_len,), dtype='int32', name=\"token_type_ids\")\n",
    "    # output         = bert([input_ids, token_type_ids])\n",
    "    output         = bert(input_ids)\n",
    "\n",
    "    print(\"bert shape\", output.shape)\n",
    "    cls_out = keras.layers.Lambda(lambda seq: seq[:, 0, :])(output)\n",
    "    cls_out = keras.layers.Dropout(0.5)(cls_out)\n",
    "    logits = keras.layers.Dense(units=768, activation=\"tanh\")(cls_out)\n",
    "    logits = keras.layers.Dropout(0.5)(logits)\n",
    "    logits = keras.layers.Dense(units=2, activation=\"softmax\")(logits)\n",
    "\n",
    "    # model = keras.Model(inputs=[input_ids, token_type_ids], outputs=logits)\n",
    "    # model.build(input_shape=[(None, max_seq_len), (None, max_seq_len)])\n",
    "    model = keras.Model(inputs=input_ids, outputs=logits)\n",
    "    model.build(input_shape=(None, max_seq_len))\n",
    "\n",
    "    # load the pre-trained model weights\n",
    "    load_stock_weights(bert, bert_ckpt_file)\n",
    "\n",
    "    # freeze weights if adapter-BERT is used\n",
    "    if adapter_size is not None:\n",
    "        freeze_bert_layers(bert)\n",
    "\n",
    "    model.compile(optimizer=keras.optimizers.Adam(),\n",
    "                loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
    "                metrics=[keras.metrics.SparseCategoricalAccuracy(name=\"acc\")])\n",
    "\n",
    "    model.summary()\n",
    "\n",
    "    return model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 517
    },
    "colab_type": "code",
    "id": "bZnmtDc7HlEm",
    "outputId": "fcd96c78-792c-4032-d188-73a9c21ec304"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bert shape (None, 128, 768)\n",
      "Done loading 196 BERT weights from: asset/uncased_L-12_H-768_A-12/bert_model.ckpt into <bert.model.BertModelLayer object at 0x7fd9f4052470> (prefix:bert). Count of weights not found in the checkpoint was: [0]. Count of weights with mismatched shape: [0]\n",
      "Unused weights from checkpoint: \n",
      "\tbert/embeddings/token_type_embeddings\n",
      "\tbert/pooler/dense/bias\n",
      "\tbert/pooler/dense/kernel\n",
      "\tcls/predictions/output_bias\n",
      "\tcls/predictions/transform/LayerNorm/beta\n",
      "\tcls/predictions/transform/LayerNorm/gamma\n",
      "\tcls/predictions/transform/dense/bias\n",
      "\tcls/predictions/transform/dense/kernel\n",
      "\tcls/seq_relationship/output_bias\n",
      "\tcls/seq_relationship/output_weights\n",
      "Model: \"model\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "input_ids (InputLayer)       [(None, 128)]             0         \n",
      "_________________________________________________________________\n",
      "bert (BertModelLayer)        (None, 128, 768)          108890112 \n",
      "_________________________________________________________________\n",
      "lambda (Lambda)              (None, 768)               0         \n",
      "_________________________________________________________________\n",
      "dropout (Dropout)            (None, 768)               0         \n",
      "_________________________________________________________________\n",
      "dense (Dense)                (None, 768)               590592    \n",
      "_________________________________________________________________\n",
      "dropout_1 (Dropout)          (None, 768)               0         \n",
      "_________________________________________________________________\n",
      "dense_1 (Dense)              (None, 2)                 1538      \n",
      "=================================================================\n",
      "Total params: 109,482,242\n",
      "Trainable params: 109,482,242\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "adapter_size = None # use None to fine-tune all of BERT\n",
    "model = create_model(data.max_seq_len, adapter_size=adapter_size)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "colab_type": "code",
    "id": "ZuLOkwonF-9S",
    "outputId": "ce1451d4-310c-41f1-ddd3-a2e0841d8528"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 2304 samples, validate on 256 samples\n",
      "\n",
      "Epoch 00001: LearningRateScheduler reducing learning rate to 5.000000000000001e-07.\n",
      "Epoch 1/2\n",
      "2304/2304 [==============================] - 121s 52ms/sample - loss: 0.4103 - acc: 0.9058 - val_loss: 0.4349 - val_acc: 0.8633\n",
      "\n",
      "Epoch 00002: LearningRateScheduler reducing learning rate to 1.0000000000000002e-06.\n",
      "Epoch 2/2\n",
      "2304/2304 [==============================] - 121s 53ms/sample - loss: 0.4053 - acc: 0.9054 - val_loss: 0.4415 - val_acc: 0.8594\n",
      "CPU times: user 4min 31s, sys: 43.2 s, total: 5min 14s\n",
      "Wall time: 4min 2s\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<tensorflow.python.keras.callbacks.History at 0x7fd8bc9238d0>"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "log_dir = \".log/movie_reviews/\" + datetime.datetime.now().strftime(\"%Y%m%d-%H%M%s\")\n",
    "tensorboard_callback = keras.callbacks.TensorBoard(log_dir=log_dir)\n",
    "\n",
    "total_epoch_count = 2\n",
    "model.fit(x=data.train_x, y=data.train_y,\n",
    "          validation_split=0.1,\n",
    "          batch_size=12,\n",
    "          shuffle=True,\n",
    "          epochs=total_epoch_count,\n",
    "          callbacks=[create_learning_rate_scheduler(max_learn_rate=1e-5,\n",
    "                                                    end_learn_rate=1e-7,\n",
    "                                                    warmup_epoch_count=20,\n",
    "                                                    total_epoch_count=total_epoch_count),\n",
    "                     keras.callbacks.EarlyStopping(patience=20, restore_best_weights=True),\n",
    "                     tensorboard_callback])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.save_weights('./movie_reviews.h5', overwrite=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 120
    },
    "colab_type": "code",
    "id": "BSqMu64oHzqy",
    "outputId": "95a8284b-b3b2-4a7d-f335-c4ff65f8f920"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2560/2560 [==============================] - 35s 14ms/sample - loss: 0.3835 - acc: 0.9270\n",
      "2560/2560 [==============================] - 33s 13ms/sample - loss: 0.4088 - acc: 0.8992\n",
      "train acc 0.92695314\n",
      " test acc 0.89921874\n",
      "CPU times: user 1min 7s, sys: 222 ms, total: 1min 8s\n",
      "Wall time: 1min 7s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "_, train_acc = model.evaluate(data.train_x, data.train_y)\n",
    "_, test_acc = model.evaluate(data.test_x, data.test_y)\n",
    "\n",
    "print(\"train acc\", train_acc)\n",
    "print(\" test acc\", test_acc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "xSKDZEnVabnl"
   },
   "source": [
    "# Evaluation\n",
    "\n",
    "To evaluate the trained model, let's load the saved weights in a new model instance, and evaluate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bert shape (None, 128, 768)\n",
      "Done loading 196 BERT weights from: asset/uncased_L-12_H-768_A-12/bert_model.ckpt into <bert.model.BertModelLayer object at 0x7ff6b45c4160> (prefix:bert). Count of weights not found in the checkpoint was: [0]. Count of weights with mismatched shape: [0]\n",
      "Unused weights from checkpoint: \n",
      "\tbert/embeddings/token_type_embeddings\n",
      "\tbert/pooler/dense/bias\n",
      "\tbert/pooler/dense/kernel\n",
      "\tcls/predictions/output_bias\n",
      "\tcls/predictions/transform/LayerNorm/beta\n",
      "\tcls/predictions/transform/LayerNorm/gamma\n",
      "\tcls/predictions/transform/dense/bias\n",
      "\tcls/predictions/transform/dense/kernel\n",
      "\tcls/seq_relationship/output_bias\n",
      "\tcls/seq_relationship/output_weights\n",
      "Model: \"model\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "input_ids (InputLayer)       [(None, 128)]             0         \n",
      "_________________________________________________________________\n",
      "bert (BertModelLayer)        (None, 128, 768)          108890112 \n",
      "_________________________________________________________________\n",
      "lambda (Lambda)              (None, 768)               0         \n",
      "_________________________________________________________________\n",
      "dropout (Dropout)            (None, 768)               0         \n",
      "_________________________________________________________________\n",
      "dense (Dense)                (None, 768)               590592    \n",
      "_________________________________________________________________\n",
      "dropout_1 (Dropout)          (None, 768)               0         \n",
      "_________________________________________________________________\n",
      "dense_1 (Dense)              (None, 2)                 1538      \n",
      "=================================================================\n",
      "Total params: 109,482,242\n",
      "Trainable params: 109,482,242\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "model = create_model(data.max_seq_len, adapter_size=None)\n",
    "model.load_weights(\"./movie_reviews.h5\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 531
    },
    "colab_type": "code",
    "id": "qCpabQ15WS3U",
    "outputId": "220b2b55-dd78-4795-d3b6-d364c8323734"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2560/2560 [==============================] - 35s 14ms/sample - loss: 0.3903 - acc: 0.9230\n",
      " test acc 0.9230469\n",
      "CPU times: user 34.6 s, sys: 113 ms, total: 34.7 s\n",
      "Wall time: 34.6 s\n"
     ]
    }
   ],
   "source": [
    "%%time \n",
    "\n",
    "# _, train_acc = model.evaluate(data.train_x, data.train_y)\n",
    "_, test_acc = model.evaluate(data.test_x, data.test_y)\n",
    "\n",
    "# print(\"train acc\", train_acc)\n",
    "print(\" test acc\", test_acc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "5uzdOFQ5awM1"
   },
   "source": [
    "# Prediction\n",
    "\n",
    "For prediction, we need to prepare the input text the same way as we did for training - tokenize, adding the special `[CLS]` and `[SEP]` token at begin and end of the token sequence, and pad to match the model input shape."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 150 ms, sys: 7.46 ms, total: 158 ms\n",
      "Wall time: 177 ms\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['negative', 'negative', 'positive', 'positive']"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%time\n",
    "CLASSES = [\"negative\",\"positive\"]\n",
    "max_seq_len = 128\n",
    "pred_sentences = [\n",
    "  \"That movie was absolutely awful\",\n",
    "  \"The acting was a bit lacking\",\n",
    "  \"The film was creative and surprising\",\n",
    "  \"Absolutely fantastic!\",\n",
    "]\n",
    "\n",
    "inputs = pd.DataFrame(pred_sentences)\n",
    "\n",
    "pred_tokens    = map(tokenizer.tokenize, inputs.to_numpy()[:, 0].tolist())\n",
    "pred_tokens    = map(lambda tok: [\"[CLS]\"] + tok + [\"[SEP]\"], pred_tokens)\n",
    "pred_token_ids = list(map(tokenizer.convert_tokens_to_ids, pred_tokens))\n",
    "pred_token_ids = map(lambda tids: tids + [0] * (max_seq_len-len(tids)), pred_token_ids)\n",
    "pred_token_ids = np.array(list(pred_token_ids))\n",
    "\n",
    "res = model(pred_token_ids).numpy().argmax(axis=-1)\n",
    "[CLASSES[i] for i in res]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Build & Save bentoml service"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting bentoml_service.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile bentoml_service.py\n",
    "\n",
    "import bentoml\n",
    "import tensorflow as tf\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from typing import List\n",
    "\n",
    "\n",
    "from bentoml.frameworks.tensorflow import TensorflowSavedModelArtifact\n",
    "from bentoml.service.artifacts.common import PickleArtifact\n",
    "from bentoml.adapters import DataframeInput\n",
    "\n",
    "\n",
    "CLASSES = [\"negative\",\"positive\"]\n",
    "max_seq_len = 128\n",
    "\n",
    "try:\n",
    "    tf.config.set_visible_devices([], 'GPU')  # disable GPU, required when served in docker\n",
    "except:\n",
    "    pass\n",
    "\n",
    "\n",
    "@bentoml.env(pip_packages=['tensorflow', 'bert-for-tf2'])\n",
    "@bentoml.artifacts([TensorflowSavedModelArtifact('model'), PickleArtifact('tokenizer')])\n",
    "class BertService(bentoml.BentoService):\n",
    "\n",
    "    def tokenize(self, inputs: pd.DataFrame):\n",
    "        tokenizer = self.artifacts.tokenizer\n",
    "        if isinstance(inputs, pd.DataFrame):\n",
    "            inputs = inputs.to_numpy()[:, 0].tolist()\n",
    "        else:\n",
    "            inputs = inputs.tolist()  # for predict_clipper\n",
    "        pred_tokens = map(tokenizer.tokenize, inputs)\n",
    "        pred_tokens = map(lambda tok: [\"[CLS]\"] + tok + [\"[SEP]\"], pred_tokens)\n",
    "        pred_token_ids = list(map(tokenizer.convert_tokens_to_ids, pred_tokens))\n",
    "        pred_token_ids = map(lambda tids: tids + [0] * (max_seq_len - len(tids)), pred_token_ids)\n",
    "        pred_token_ids = tf.constant(list(pred_token_ids), dtype=tf.int32)\n",
    "        return pred_token_ids\n",
    "\n",
    "    @bentoml.api(input=DataframeInput(), mb_max_latency=3000, mb_max_batch_size=20, batch=True)\n",
    "    def predict(self, inputs: pd.DataFrame) -> List[str]:\n",
    "        model = self.artifacts.model\n",
    "        pred_token_ids = self.tokenize(inputs)\n",
    "        res = model(pred_token_ids).numpy().argmax(axis=-1)\n",
    "        return [CLASSES[i] for i in res]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2020-07-28 15:11:02,575] WARNING - Using BentoML installed in `editable` model, the local BentoML repository including all code changes will be packaged together with saved bundle created, under the './bundled_pip_dependencies' directory of the saved bundle.\n",
      "WARNING:tensorflow:Skipping full serialization of Keras layer <tensorflow.python.keras.layers.embeddings.Embedding object at 0x7fd96c59ab70>, because it is not built.\n",
      "WARNING:tensorflow:From /opt/anaconda3/envs/bentoml-dev-py36/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "If using Keras pass *_constraint arguments to layers.\n",
      "INFO:tensorflow:Assets written to: /tmp/bentoml-temp-r9yxj9ku/Service/artifacts/model_saved_model/assets\n",
      "[2020-07-28 15:12:13,764] INFO - Detect BentoML installed in development model, copying local BentoML module file to target saved bundle path\n",
      "running sdist\n",
      "running egg_info\n",
      "writing BentoML.egg-info/PKG-INFO\n",
      "writing dependency_links to BentoML.egg-info/dependency_links.txt\n",
      "writing entry points to BentoML.egg-info/entry_points.txt\n",
      "writing requirements to BentoML.egg-info/requires.txt\n",
      "writing top-level names to BentoML.egg-info/top_level.txt\n",
      "reading manifest file 'BentoML.egg-info/SOURCES.txt'\n",
      "reading manifest template 'MANIFEST.in'\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "warning: no previously-included files matching '*~' found anywhere in distribution\n",
      "warning: no previously-included files matching '*.pyo' found anywhere in distribution\n",
      "warning: no previously-included files matching '.git' found anywhere in distribution\n",
      "warning: no previously-included files matching '.ipynb_checkpoints' found anywhere in distribution\n",
      "warning: no previously-included files matching '__pycache__' found anywhere in distribution\n",
      "warning: no directories found matching 'bentoml/server/static'\n",
      "warning: no directories found matching 'bentoml/yatai/web/dist'\n",
      "no previously-included directories found matching 'e2e_tests'\n",
      "no previously-included directories found matching 'tests'\n",
      "no previously-included directories found matching 'benchmark'\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "writing manifest file 'BentoML.egg-info/SOURCES.txt'\n",
      "running check\n",
      "creating BentoML-0.8.3+42.gb8d36b6\n",
      "creating BentoML-0.8.3+42.gb8d36b6/BentoML.egg-info\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/cli\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/clipper\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/configuration\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/configuration/__pycache__\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/handlers\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/marshal\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/saved_bundle\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/server\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/utils\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/client\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/aws_lambda\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/azure_functions\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/sagemaker\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/migrations\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/migrations/__pycache__\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/migrations/versions\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/migrations/versions/__pycache__\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/proto\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/repository\n",
      "creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/validator\n",
      "copying files to BentoML-0.8.3+42.gb8d36b6...\n",
      "copying LICENSE -> BentoML-0.8.3+42.gb8d36b6\n",
      "copying MANIFEST.in -> BentoML-0.8.3+42.gb8d36b6\n",
      "copying README.md -> BentoML-0.8.3+42.gb8d36b6\n",
      "copying pyproject.toml -> BentoML-0.8.3+42.gb8d36b6\n",
      "copying setup.cfg -> BentoML-0.8.3+42.gb8d36b6\n",
      "copying setup.py -> BentoML-0.8.3+42.gb8d36b6\n",
      "copying versioneer.py -> BentoML-0.8.3+42.gb8d36b6\n",
      "copying BentoML.egg-info/PKG-INFO -> BentoML-0.8.3+42.gb8d36b6/BentoML.egg-info\n",
      "copying BentoML.egg-info/SOURCES.txt -> BentoML-0.8.3+42.gb8d36b6/BentoML.egg-info\n",
      "copying BentoML.egg-info/dependency_links.txt -> BentoML-0.8.3+42.gb8d36b6/BentoML.egg-info\n",
      "copying BentoML.egg-info/entry_points.txt -> BentoML-0.8.3+42.gb8d36b6/BentoML.egg-info\n",
      "copying BentoML.egg-info/requires.txt -> BentoML-0.8.3+42.gb8d36b6/BentoML.egg-info\n",
      "copying BentoML.egg-info/top_level.txt -> BentoML-0.8.3+42.gb8d36b6/BentoML.egg-info\n",
      "copying bentoml/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml\n",
      "copying bentoml/_version.py -> BentoML-0.8.3+42.gb8d36b6/bentoml\n",
      "copying bentoml/exceptions.py -> BentoML-0.8.3+42.gb8d36b6/bentoml\n",
      "copying bentoml/service.py -> BentoML-0.8.3+42.gb8d36b6/bentoml\n",
      "copying bentoml/service_env.py -> BentoML-0.8.3+42.gb8d36b6/bentoml\n",
      "copying bentoml/adapters/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/base_input.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/base_output.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/clipper_input.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/dataframe_input.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/dataframe_output.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/default_output.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/fastai_image_input.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/file_input.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/image_input.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/json_input.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/json_output.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/legacy_image_input.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/legacy_json_input.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/multi_image_input.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/pytorch_tensor_input.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/tensorflow_tensor_input.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/tensorflow_tensor_output.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/adapters/utils.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/adapters\n",
      "copying bentoml/artifact/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/fastai2_model_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/fastai_model_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/fasttext_model_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/h2o_model_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/json_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/keras_model_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/lightgbm_model_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/onnx_model_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/pickle_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/pytorch_model_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/sklearn_model_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/spacy_model_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/text_file_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/tf_savedmodel_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/artifact/xgboost_model_artifact.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/artifact\n",
      "copying bentoml/cli/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/cli\n",
      "copying bentoml/cli/aws_lambda.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/cli\n",
      "copying bentoml/cli/aws_sagemaker.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/cli\n",
      "copying bentoml/cli/azure_functions.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/cli\n",
      "copying bentoml/cli/bento_management.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/cli\n",
      "copying bentoml/cli/bento_service.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/cli\n",
      "copying bentoml/cli/click_utils.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/cli\n",
      "copying bentoml/cli/config.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/cli\n",
      "copying bentoml/cli/deployment.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/cli\n",
      "copying bentoml/cli/utils.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/cli\n",
      "copying bentoml/cli/yatai_service.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/cli\n",
      "copying bentoml/clipper/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/clipper\n",
      "copying bentoml/configuration/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/configuration\n",
      "copying bentoml/configuration/configparser.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/configuration\n",
      "copying bentoml/configuration/default_bentoml.cfg -> BentoML-0.8.3+42.gb8d36b6/bentoml/configuration\n",
      "copying bentoml/configuration/__pycache__/__init__.cpython-36.pyc -> BentoML-0.8.3+42.gb8d36b6/bentoml/configuration/__pycache__\n",
      "copying bentoml/configuration/__pycache__/__init__.cpython-37.pyc -> BentoML-0.8.3+42.gb8d36b6/bentoml/configuration/__pycache__\n",
      "copying bentoml/configuration/__pycache__/__init__.cpython-38.pyc -> BentoML-0.8.3+42.gb8d36b6/bentoml/configuration/__pycache__\n",
      "copying bentoml/configuration/__pycache__/configparser.cpython-36.pyc -> BentoML-0.8.3+42.gb8d36b6/bentoml/configuration/__pycache__\n",
      "copying bentoml/configuration/__pycache__/configparser.cpython-37.pyc -> BentoML-0.8.3+42.gb8d36b6/bentoml/configuration/__pycache__\n",
      "copying bentoml/configuration/__pycache__/configparser.cpython-38.pyc -> BentoML-0.8.3+42.gb8d36b6/bentoml/configuration/__pycache__\n",
      "copying bentoml/handlers/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/handlers\n",
      "copying bentoml/marshal/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/marshal\n",
      "copying bentoml/marshal/dispatcher.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/marshal\n",
      "copying bentoml/marshal/marshal.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/marshal\n",
      "copying bentoml/marshal/utils.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/marshal\n",
      "copying bentoml/saved_bundle/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/saved_bundle\n",
      "copying bentoml/saved_bundle/bentoml-init.sh -> BentoML-0.8.3+42.gb8d36b6/bentoml/saved_bundle\n",
      "copying bentoml/saved_bundle/bundler.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/saved_bundle\n",
      "copying bentoml/saved_bundle/config.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/saved_bundle\n",
      "copying bentoml/saved_bundle/docker-entrypoint.sh -> BentoML-0.8.3+42.gb8d36b6/bentoml/saved_bundle\n",
      "copying bentoml/saved_bundle/loader.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/saved_bundle\n",
      "copying bentoml/saved_bundle/pip_pkg.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/saved_bundle\n",
      "copying bentoml/saved_bundle/py_module_utils.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/saved_bundle\n",
      "copying bentoml/saved_bundle/templates.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/saved_bundle\n",
      "copying bentoml/server/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/server\n",
      "copying bentoml/server/api_server.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/server\n",
      "copying bentoml/server/gunicorn_config.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/server\n",
      "copying bentoml/server/gunicorn_server.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/server\n",
      "copying bentoml/server/instruments.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/server\n",
      "copying bentoml/server/marshal_server.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/server\n",
      "copying bentoml/server/open_api.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/server\n",
      "copying bentoml/server/trace.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/server\n",
      "copying bentoml/server/utils.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/server\n",
      "copying bentoml/utils/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/utils\n",
      "copying bentoml/utils/alg.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/utils\n",
      "copying bentoml/utils/benchmark.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/utils\n",
      "copying bentoml/utils/cloudpickle.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/utils\n",
      "copying bentoml/utils/dataframe_util.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/utils\n",
      "copying bentoml/utils/flask_ngrok.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/utils\n",
      "copying bentoml/utils/hybridmethod.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/utils\n",
      "copying bentoml/utils/lazy_loader.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/utils\n",
      "copying bentoml/utils/log.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/utils\n",
      "copying bentoml/utils/s3.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/utils\n",
      "copying bentoml/utils/tempdir.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/utils\n",
      "copying bentoml/utils/usage_stats.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/utils\n",
      "copying bentoml/yatai/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai\n",
      "copying bentoml/yatai/alembic.ini -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai\n",
      "copying bentoml/yatai/db.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai\n",
      "copying bentoml/yatai/deployment_utils.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai\n",
      "copying bentoml/yatai/status.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai\n",
      "copying bentoml/yatai/utils.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai\n",
      "copying bentoml/yatai/yatai_service.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai\n",
      "copying bentoml/yatai/yatai_service_impl.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai\n",
      "copying bentoml/yatai/client/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/client\n",
      "copying bentoml/yatai/client/bento_repository_api.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/client\n",
      "copying bentoml/yatai/client/deployment_api.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/client\n",
      "copying bentoml/yatai/deployment/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment\n",
      "copying bentoml/yatai/deployment/operator.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment\n",
      "copying bentoml/yatai/deployment/store.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment\n",
      "copying bentoml/yatai/deployment/utils.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment\n",
      "copying bentoml/yatai/deployment/aws_lambda/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/aws_lambda\n",
      "copying bentoml/yatai/deployment/aws_lambda/download_extra_resources.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/aws_lambda\n",
      "copying bentoml/yatai/deployment/aws_lambda/lambda_app.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/aws_lambda\n",
      "copying bentoml/yatai/deployment/aws_lambda/operator.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/aws_lambda\n",
      "copying bentoml/yatai/deployment/aws_lambda/utils.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/aws_lambda\n",
      "copying bentoml/yatai/deployment/azure_functions/Dockerfile -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/azure_functions\n",
      "copying bentoml/yatai/deployment/azure_functions/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/azure_functions\n",
      "copying bentoml/yatai/deployment/azure_functions/app_init.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/azure_functions\n",
      "copying bentoml/yatai/deployment/azure_functions/constants.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/azure_functions\n",
      "copying bentoml/yatai/deployment/azure_functions/host.json -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/azure_functions\n",
      "copying bentoml/yatai/deployment/azure_functions/local.settings.json -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/azure_functions\n",
      "copying bentoml/yatai/deployment/azure_functions/operator.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/azure_functions\n",
      "copying bentoml/yatai/deployment/azure_functions/templates.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/azure_functions\n",
      "copying bentoml/yatai/deployment/sagemaker/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/sagemaker\n",
      "copying bentoml/yatai/deployment/sagemaker/model_server.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/sagemaker\n",
      "copying bentoml/yatai/deployment/sagemaker/nginx.conf -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/sagemaker\n",
      "copying bentoml/yatai/deployment/sagemaker/operator.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/sagemaker\n",
      "copying bentoml/yatai/deployment/sagemaker/serve -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/sagemaker\n",
      "copying bentoml/yatai/deployment/sagemaker/wsgi.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment/sagemaker\n",
      "copying bentoml/yatai/migrations/README -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/migrations\n",
      "copying bentoml/yatai/migrations/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/migrations\n",
      "copying bentoml/yatai/migrations/env.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/migrations\n",
      "copying bentoml/yatai/migrations/script.py.mako -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/migrations\n",
      "copying bentoml/yatai/migrations/__pycache__/env.cpython-36.pyc -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/migrations/__pycache__\n",
      "copying bentoml/yatai/migrations/versions/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/migrations/versions\n",
      "copying bentoml/yatai/migrations/versions/a6b00ae45279_add_last_updated_at_for_deployments.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/migrations/versions\n",
      "copying bentoml/yatai/migrations/versions/__pycache__/a6b00ae45279_add_last_updated_at_for_deployments.cpython-36.pyc -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/migrations/versions/__pycache__\n",
      "copying bentoml/yatai/proto/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/proto\n",
      "copying bentoml/yatai/proto/deployment_pb2.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/proto\n",
      "copying bentoml/yatai/proto/repository_pb2.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/proto\n",
      "copying bentoml/yatai/proto/status_pb2.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/proto\n",
      "copying bentoml/yatai/proto/yatai_service_pb2.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/proto\n",
      "copying bentoml/yatai/proto/yatai_service_pb2_grpc.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/proto\n",
      "copying bentoml/yatai/repository/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/repository\n",
      "copying bentoml/yatai/repository/base_repository.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/repository\n",
      "copying bentoml/yatai/repository/local_repository.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/repository\n",
      "copying bentoml/yatai/repository/metadata_store.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/repository\n",
      "copying bentoml/yatai/repository/repository.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/repository\n",
      "copying bentoml/yatai/repository/s3_repository.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/repository\n",
      "copying bentoml/yatai/validator/__init__.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/validator\n",
      "copying bentoml/yatai/validator/deployment_pb_validator.py -> BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/validator\n",
      "Writing BentoML-0.8.3+42.gb8d36b6/setup.cfg\n",
      "UPDATING BentoML-0.8.3+42.gb8d36b6/bentoml/_version.py\n",
      "set BentoML-0.8.3+42.gb8d36b6/bentoml/_version.py to '0.8.3+42.gb8d36b6'\n",
      "Creating tar archive\n",
      "removing 'BentoML-0.8.3+42.gb8d36b6' (and everything under it)\n",
      "[2020-07-28 15:12:18,720] INFO - BentoService bundle 'Service:20200728151102_B3E065' saved to: /home/bentoml/bentoml/repository/Service/20200728151102_B3E065\n"
     ]
    }
   ],
   "source": [
    "from bentoml_service import Service\n",
    "\n",
    "bento_svc = Service()\n",
    "bento_svc.pack(\"model\", model)\n",
    "bento_svc.pack(\"tokenizer\", tokenizer)\n",
    "saved_path = bento_svc.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/bentoml/bentoml/repository/Service/20200728151102_B3E065\n"
     ]
    }
   ],
   "source": [
    "print(saved_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## REST API Model Serving\n",
    "\n",
    "\n",
    "To start a REST API model server with the BentoService saved above, use the bentoml serve command:\n",
    "\n",
    "*Since BERT is a large model, if you met OOM bellow,  \n",
    "you may need to restart this kernel to release the RAM/GRAM used by training model.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "bentoml_bundle_path = '/home/bentoml/bentoml/repository/Service/20200728151102_B3E065' # saved_path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bentoml serve-gunicorn /home/bentoml/bentoml/repository/Service/20200728151102_B3E065 --port 5000 --enable-microbatch --workers 1\n",
      "[2020-07-28 15:14:35,761] INFO - Starting BentoML API server in production mode..\n",
      "[2020-07-28 15:14:36,563] INFO - Running micro batch service on :5000\n",
      "[2020-07-28 15:14:36 +0800] [2953201] [INFO] Starting gunicorn 20.0.4\n",
      "[2020-07-28 15:14:36 +0800] [2952697] [INFO] Starting gunicorn 20.0.4\n",
      "[2020-07-28 15:14:36 +0800] [2952697] [INFO] Listening at: http://0.0.0.0:60577 (2952697)\n",
      "[2020-07-28 15:14:36 +0800] [2953201] [INFO] Listening at: http://0.0.0.0:5000 (2953201)\n",
      "[2020-07-28 15:14:36 +0800] [2952697] [INFO] Using worker: sync\n",
      "[2020-07-28 15:14:36 +0800] [2953201] [INFO] Using worker: aiohttp.worker.GunicornWebWorker\n",
      "[2020-07-28 15:14:36 +0800] [2953203] [INFO] Booting worker with pid: 2953203\n",
      "[2020-07-28 15:14:36 +0800] [2953202] [INFO] Booting worker with pid: 2953202\n",
      "[2020-07-28 15:14:36,613] WARNING - Using BentoML installed in `editable` model, the local BentoML repository including all code changes will be packaged together with saved bundle created, under the './bundled_pip_dependencies' directory of the saved bundle.\n",
      "[2020-07-28 15:14:36,631] WARNING - Saved BentoService bundle version mismatch: loading BentoService bundle create with BentoML version 0.8.3, but loading from BentoML version 0.8.3+42.gb8d36b6\n",
      "[2020-07-28 15:14:36,874] INFO - Micro batch enabled for API `predict`\n",
      "[2020-07-28 15:14:36,874] INFO - Your system nofile limit is 10000, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file descriptors for the server process, or launch more microbatch instances to accept more concurrent connection.\n",
      "[2020-07-28 15:14:37,659] WARNING - Using BentoML installed in `editable` model, the local BentoML repository including all code changes will be packaged together with saved bundle created, under the './bundled_pip_dependencies' directory of the saved bundle.\n",
      "[2020-07-28 15:14:37,697] WARNING - Saved BentoService bundle version mismatch: loading BentoService bundle create with BentoML version 0.8.3, but loading from BentoML version 0.8.3+42.gb8d36b6\n",
      "2020-07-28 15:14:39.628530: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1\n",
      "2020-07-28 15:14:39.636462: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
      "2020-07-28 15:14:39.636718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: \n",
      "pciBusID: 0000:01:00.0 name: GeForce GTX 1060 computeCapability: 6.1\n",
      "coreClock: 1.6705GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s\n",
      "2020-07-28 15:14:39.637038: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1\n",
      "2020-07-28 15:14:39.639381: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10\n",
      "2020-07-28 15:14:39.641536: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10\n",
      "2020-07-28 15:14:39.642546: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10\n",
      "2020-07-28 15:14:39.644930: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10\n",
      "2020-07-28 15:14:39.646255: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10\n",
      "2020-07-28 15:14:39.650387: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7\n",
      "2020-07-28 15:14:39.650598: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
      "2020-07-28 15:14:39.651086: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
      "2020-07-28 15:14:39.651293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0\n",
      "2020-07-28 15:14:40.990800: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA\n",
      "2020-07-28 15:14:41.013344: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2699905000 Hz\n",
      "2020-07-28 15:14:41.013808: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b080b80810 initialized for platform Host (this does not guarantee that XLA will be used). Devices:\n",
      "2020-07-28 15:14:41.013856: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version\n",
      "2020-07-28 15:14:41.013969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:\n",
      "2020-07-28 15:14:41.013981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      \n",
      "2020-07-28 15:14:41.170231: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
      "2020-07-28 15:14:41.170580: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b080be4f10 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:\n",
      "2020-07-28 15:14:41.170601: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1060, Compute Capability 6.1\n",
      "^C\n",
      "[2020-07-28 15:14:49 +0800] [2953201] [INFO] Handling signal: int\n",
      "[2020-07-28 15:14:49 +0800] [2952697] [INFO] Handling signal: int\n",
      "[2020-07-28 15:14:49 +0800] [2953202] [INFO] Worker exiting (pid: 2953202)\n",
      "[2020-07-28 15:14:49 +0800] [2953201] [INFO] Shutting down: Master\n",
      "[2020-07-28 15:14:49 +0800] [2953203] [INFO] Worker exiting (pid: 2953203)\n"
     ]
    }
   ],
   "source": [
    "# Option 1: serve directly\n",
    "print(f\"bentoml serve-gunicorn {bentoml_bundle_path} --port 5000 --enable-microbatch --workers 1\")\n",
    "\n",
    "!bentoml serve-gunicorn {bentoml_bundle_path} --port 5000 --enable-microbatch --workers 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you are running this notebook from Google Colab, you can start the dev server with `--run-with-ngrok` option, to gain acccess to the API endpoint via a public endpoint managed by [ngrok](https://ngrok.com/):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bentoml serve BertService --run-with-ngrok"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Open http://127.0.0.1:5000 to see more information about the REST APIs server in your\n",
    "browser.\n",
    "\n",
    "\n",
    "### Send prediction requeset to the REST API server\n",
    "\n",
    "Navigate to parent directory of the notebook(so you have reference to the `test.jpg` image), and run the following `curl` command to send the image to REST API server and get a prediction result:\n",
    "\n",
    "```bash\n",
    "curl -i \\\n",
    "    --request POST \\\n",
    "    --header \"Content-Type: application/json\" \\\n",
    "    --data '{\"0\":{\"0\":\"The acting was a bit lacking.\"}}' \\\n",
    "    localhost:5000/predict\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Test the API with requests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "b'[\"negative\"]'\n",
      "CPU times: user 2.36 ms, sys: 3.26 ms, total: 5.62 ms\n",
      "Wall time: 29.1 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "import requests\n",
    "import pandas as pd\n",
    "\n",
    "server_url = f\"http://127.0.0.1:5000/predict\"\n",
    "method = \"POST\"\n",
    "headers = {\"content-type\": \"application/json\"}\n",
    "pred_sentences =  [\"The acting was a bit lacking.\"]\n",
    "data = pd.DataFrame(pred_sentences).to_json()\n",
    "\n",
    "r = requests.request(method, server_url, headers=headers, data=data)\n",
    "print(r.content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Option 2: serve in docker\n",
    "!cd {bentoml_bundle_path}\n",
    "IMG_NAME = bentoml_bundle_path.split('/')[-1].lower()\n",
    "\n",
    "!docker build --quiet -t {IMG_NAME} {bentoml_bundle_path}\n",
    "# launch docker instances\n",
    "!docker run -itd -p 5000:5000 {IMG_NAME}:latest --workers 1 --enable-microbatch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Containerize model server with Docker\n",
    "\n",
    "\n",
    "One common way of distributing this model API server for production deployment, is via Docker containers. And BentoML provides a convenient way to do that.\n",
    "\n",
    "Note that docker is **not available in Google Colab**. You will need to download and run this notebook locally to try out this containerization with docker feature.\n",
    "\n",
    "If you already have docker configured, simply run the follow command to product a docker container serving the IrisClassifier prediction service created above:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bentoml containerize BertService:latest"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!docker run -p 5000:5000 bertservice"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load saved BentoService\n",
    "\n",
    "bentoml.load is the API for loading a BentoML packaged model in python:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bentoml import load\n",
    "\n",
    "service = load(saved_path)\n",
    "\n",
    "print(service.predict([[\"The acting was a bit lacking.\"]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Launch inference job from CLI\n",
    "\n",
    "BentoML cli supports loading and running a packaged model from CLI. With the DataframeInput adapter, the CLI command supports reading input Dataframe data from CLI argument or local csv or json files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bentoml run BertService:latest predict --input '{\"0\":{\"0\":\"The acting was a bit lacking.\"}}'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deployment Options\n",
    "\n",
    "If you are at a small team with limited engineering or DevOps resources, try out automated deployment with BentoML CLI, currently supporting AWS Lambda, AWS SageMaker, and Azure Functions:\n",
    "- [AWS Lambda Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_lambda.html)\n",
    "- [AWS SageMaker Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_sagemaker.html)\n",
    "- [Azure Functions Deployment Guide](https://docs.bentoml.org/en/latest/deployment/azure_functions.html)\n",
    "\n",
    "If the cloud platform you are working with is not on the list above, try out these step-by-step guide on manually deploying BentoML packaged model to cloud platforms:\n",
    "- [AWS ECS Deployment](https://docs.bentoml.org/en/latest/deployment/aws_ecs.html)\n",
    "- [Google Cloud Run Deployment](https://docs.bentoml.org/en/latest/deployment/google_cloud_run.html)\n",
    "- [Azure container instance Deployment](https://docs.bentoml.org/en/latest/deployment/azure_container_instance.html)\n",
    "- [Heroku Deployment](https://docs.bentoml.org/en/latest/deployment/heroku.html)\n",
    "\n",
    "Lastly, if you have a DevOps or ML Engineering team who's operating a Kubernetes or OpenShift cluster, use the following guides as references for implementating your deployment strategy:\n",
    "- [Kubernetes Deployment](https://docs.bentoml.org/en/latest/deployment/kubernetes.html)\n",
    "- [Knative Deployment](https://docs.bentoml.org/en/latest/deployment/knative.html)\n",
    "- [Kubeflow Deployment](https://docs.bentoml.org/en/latest/deployment/kubeflow.html)\n",
    "- [KFServing Deployment](https://docs.bentoml.org/en/latest/deployment/kfserving.html)\n",
    "- [Clipper.ai Deployment Guide](https://docs.bentoml.org/en/latest/deployment/clipper.html)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "name": "Movie Reviews with bert-for-tf2.ipynb",
   "provenance": [],
   "toc_visible": true,
   "version": "0.3.2"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}