{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nterop": {
     "id": "44"
    }
   },
   "source": [
    "# TF/Keras BERT Baseline (Training/Inference)\n",
    "> A tutorial about how to train an NLP model with the huggingface's pretrained BERT in TF/Keras\n",
    "\n",
    "- toc: true \n",
    "- badges: true\n",
    "- comments: true\n",
    "- categories: [notebook, kaggle, nlp]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nterop": {
     "id": "1"
    }
   },
   "source": [
    "This notebook shows how to train a neural network model with pre-trained BERT in Tensorflow/Keras. It is based on @xhlulu's [Disaster NLP: Keras BERT using TFHub](https://www.kaggle.com/xhlulu/disaster-nlp-keras-bert-using-tfhub\n",
    ") notebook and [Text Extraction with BERT](https://keras.io/examples/nlp/text_extraction_with_bert/) example at Keras.\n",
    "\n",
    "This competition is a code competition without access to internet. So we add the `transformers` tokenizer and pre-trained BERT model through Kaggle Datasets instead.\n",
    "\n",
    "Hope it helps."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nterop": {
     "id": "28"
    }
   },
   "source": [
    "# Changelogs"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nterop": {
     "id": "29"
    }
   },
   "source": [
    "| Version  | CV Score | Public Score | Changes | Comment |\n",
    "|----------|----------|--------------|---------|---------|\n",
    "| v9 | to be updated | to be updated | use transformers' tokenizer |\n",
    "| v8 | 0.653635 | 0.606 | add 5-fold CV + early-stopping back. | |\n",
    "| v7 | N/A | 0.617 | fix the bug in learning rate scheduler | overfitting to train? (n=20) |\n",
    "| v6 | N/A | 0.566 | add the warm-up learning rate scheduler | **With a bug. Don't use it** |\n",
    "| v5 | N/A | 0.531 | roll back to v3 | |\n",
    "| v4 | N/A | 0.573 | add early-stopping | seemed to stop too early with `patience=1` (n=5) |\n",
    "| v3 | N/A | **0.530** | initial baseline | |"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nterop": {
     "id": "2"
    }
   },
   "source": [
    "# Load Libraries and Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-07T23:33:06.447697Z",
     "start_time": "2021-05-07T23:33:06.421563Z"
    },
    "nterop": {
     "id": "30"
    }
   },
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-08T00:40:33.409410Z",
     "start_time": "2021-05-08T00:40:32.188406Z"
    },
    "_kg_hide-input": true,
    "nterop": {
     "id": "3"
    }
   },
   "outputs": [],
   "source": [
    "from copy import copy\n",
    "import joblib\n",
    "from matplotlib import pyplot as plt\n",
    "import numpy as np\n",
    "import os\n",
    "import pandas as pd\n",
    "from pathlib import Path\n",
    "import seaborn as sns\n",
    "from sklearn.metrics import mean_squared_error\n",
    "from sklearn.model_selection import KFold\n",
    "import sys\n",
    "from warnings import simplefilter\n",
    "\n",
    "import tensorflow as tf\n",
    "from tensorflow.keras import Model, Input\n",
    "from tensorflow.keras.callbacks import EarlyStopping, LearningRateScheduler\n",
    "from tensorflow.keras.initializers import Constant\n",
    "from tensorflow.keras.layers import Dense, Embedding, Bidirectional, LSTM, Dropout\n",
    "from tensorflow.keras.layers.experimental.preprocessing import TextVectorization\n",
    "from tensorflow.keras.metrics import RootMeanSquaredError\n",
    "from tensorflow.keras.utils import to_categorical\n",
    "from tensorflow.keras.optimizers import Adam\n",
    "from transformers import TFBertModel, BertConfig, BertTokenizerFast\n",
    "\n",
    "simplefilter('ignore')\n",
    "plt.style.use('fivethirtyeight')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-07T23:33:10.974271Z",
     "start_time": "2021-05-07T23:33:10.869360Z"
    },
    "nterop": {
     "id": "31"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Num GPUs Available:  1\n"
     ]
    }
   ],
   "source": [
    "# limit the GPU memory growth\n",
    "gpu = tf.config.list_physical_devices('GPU')\n",
    "print(\"Num GPUs Available: \", len(gpu))\n",
    "if len(gpu) > 0:\n",
    "    tf.config.experimental.set_memory_growth(gpu[0], True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-08T00:20:53.916646Z",
     "start_time": "2021-05-08T00:20:53.681663Z"
    },
    "nterop": {
     "id": "6"
    }
   },
   "outputs": [],
   "source": [
    "model_name = 'bert_v9'\n",
    "\n",
    "data_dir = Path('../input/commonlitreadabilityprize')\n",
    "train_file = data_dir / 'train.csv'\n",
    "test_file = data_dir / 'test.csv'\n",
    "sample_file = data_dir / 'sample_submission.csv'\n",
    "\n",
    "build_dir = Path('../build/')\n",
    "output_dir = build_dir / model_name\n",
    "trn_encoded_file = output_dir / 'trn.enc.joblib'\n",
    "tokenizer_file = output_dir / 'tokenizer.joblib'\n",
    "val_predict_file = output_dir / f'{model_name}.val.txt'\n",
    "submission_file = 'submission.csv'\n",
    "\n",
    "module_url = \"../input/bert-en-uncased-l24-h1024-a16\"\n",
    "\n",
    "id_col = 'id'\n",
    "target_col = 'target'\n",
    "text_col = 'excerpt'\n",
    "\n",
    "max_len = 205\n",
    "n_fold = 5\n",
    "n_est = 2\n",
    "n_stop = 2\n",
    "batch_size = 8\n",
    "seed = 42"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-07T23:33:11.380920Z",
     "start_time": "2021-05-07T23:33:11.252872Z"
    },
    "nterop": {
     "id": "41"
    }
   },
   "outputs": [],
   "source": [
    "output_dir.mkdir(parents=True, exist_ok=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-07T23:33:11.635430Z",
     "start_time": "2021-05-07T23:33:11.382137Z"
    },
    "nterop": {
     "id": "7"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2834, 5) (2834,) (7, 3)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>url_legal</th>\n",
       "      <th>license</th>\n",
       "      <th>excerpt</th>\n",
       "      <th>target</th>\n",
       "      <th>standard_error</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>c12129c31</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>When the young people returned to the ballroom...</td>\n",
       "      <td>-0.340259</td>\n",
       "      <td>0.464009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85aa80a4c</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>All through dinner time, Mrs. Fayre was somewh...</td>\n",
       "      <td>-0.315372</td>\n",
       "      <td>0.480805</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b69ac6792</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>As Roger had predicted, the snow departed as q...</td>\n",
       "      <td>-0.580118</td>\n",
       "      <td>0.476676</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>dd1000b26</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>And outside before the palace a great garden w...</td>\n",
       "      <td>-1.054013</td>\n",
       "      <td>0.450007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37c1b32fb</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Once upon a time there were Three Bears who li...</td>\n",
       "      <td>0.247197</td>\n",
       "      <td>0.510845</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          url_legal license  \\\n",
       "id                            \n",
       "c12129c31       NaN     NaN   \n",
       "85aa80a4c       NaN     NaN   \n",
       "b69ac6792       NaN     NaN   \n",
       "dd1000b26       NaN     NaN   \n",
       "37c1b32fb       NaN     NaN   \n",
       "\n",
       "                                                     excerpt    target  \\\n",
       "id                                                                       \n",
       "c12129c31  When the young people returned to the ballroom... -0.340259   \n",
       "85aa80a4c  All through dinner time, Mrs. Fayre was somewh... -0.315372   \n",
       "b69ac6792  As Roger had predicted, the snow departed as q... -0.580118   \n",
       "dd1000b26  And outside before the palace a great garden w... -1.054013   \n",
       "37c1b32fb  Once upon a time there were Three Bears who li...  0.247197   \n",
       "\n",
       "           standard_error  \n",
       "id                         \n",
       "c12129c31        0.464009  \n",
       "85aa80a4c        0.480805  \n",
       "b69ac6792        0.476676  \n",
       "dd1000b26        0.450007  \n",
       "37c1b32fb        0.510845  "
      ]
     },
     "execution_count": 6,
     "metadata": {
      "nterop": {
       "id": "42"
      }
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trn = pd.read_csv(train_file, index_col=id_col)\n",
    "tst = pd.read_csv(test_file, index_col=id_col)\n",
    "y = trn[target_col].values\n",
    "print(trn.shape, y.shape, tst.shape)\n",
    "trn.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nterop": {
     "id": "12"
    }
   },
   "source": [
    "# Tokenization Using `transformers`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-07T23:33:11.756625Z",
     "start_time": "2021-05-07T23:33:11.637470Z"
    },
    "nterop": {
     "id": "13"
    }
   },
   "outputs": [],
   "source": [
    "pretrained_dir = output_dir / \"bert_base_uncased/\"\n",
    "pretrained_dir.mkdir(exist_ok=True)\n",
    "\n",
    "def load_tokenizer():\n",
    "    if not os.path.exists(pretrained_dir / 'vocab.txt'):\n",
    "        tokenizer = BertTokenizerFast.from_pretrained(\"bert-base-uncased\")\n",
    "        tokenizer.save_pretrained(pretrained_dir)\n",
    "    else:\n",
    "        print('loading the saved pretrained tokenizer')\n",
    "        tokenizer = BertTokenizerFast.from_pretrained(str(pretrained_dir))\n",
    "        \n",
    "    model_config = BertConfig.from_pretrained(str(pretrained_dir))\n",
    "    model_config.output_hidden_states = True\n",
    "    return tokenizer, model_config\n",
    "\n",
    "def load_bert(config):\n",
    "    if not os.path.exists(pretrained_dir / 'tf_model.h5'):\n",
    "        bert_model = TFBertModel.from_pretrained(\"bert-base-uncased\", config=config)\n",
    "        bert_model.save_pretrained(pretrained_dir)\n",
    "    else:\n",
    "        print('loading the saved pretrained model')\n",
    "        bert_model = TFBertModel.from_pretrained(pretrained_dir, config=config)\n",
    "    return bert_model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-07T23:33:37.616807Z",
     "start_time": "2021-05-07T23:33:37.437031Z"
    },
    "nterop": {
     "id": "14"
    }
   },
   "outputs": [],
   "source": [
    "def bert_encode(texts, tokenizer, max_len=max_len):\n",
    "    input_ids = []\n",
    "    token_type_ids = []\n",
    "    attention_mask = []\n",
    "    \n",
    "    for text in texts:\n",
    "        token = tokenizer(text, max_length=max_len, truncation=True, padding='max_length',\n",
    "                         add_special_tokens=True)\n",
    "        input_ids.append(token['input_ids'])\n",
    "        token_type_ids.append(token['token_type_ids'])\n",
    "        attention_mask.append(token['attention_mask'])\n",
    "    \n",
    "    return np.array(input_ids), np.array(token_type_ids), np.array(attention_mask)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-07T23:33:40.020617Z",
     "start_time": "2021-05-07T23:33:37.618391Z"
    },
    "nterop": {
     "id": "15"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "loading the saved pretrained tokenizer\n",
      "(2834, 205) (7, 205) (2834,)\n"
     ]
    }
   ],
   "source": [
    "tokenizer, bert_config = load_tokenizer()\n",
    "\n",
    "X = bert_encode(trn[text_col].values, tokenizer, max_len=max_len)\n",
    "X_tst = bert_encode(tst[text_col].values, tokenizer, max_len=max_len)\n",
    "y = trn[target_col].values\n",
    "print(X[0].shape, X_tst[0].shape, y.shape)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nterop": {
     "id": "37"
    }
   },
   "source": [
    "## Save Tokenizer and Encoded Training Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-07T23:33:40.469594Z",
     "start_time": "2021-05-07T23:33:40.022369Z"
    },
    "nterop": {
     "id": "16"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['../build/bert_v9/tokenizer.joblib']"
      ]
     },
     "execution_count": 11,
     "metadata": {
      "nterop": {
       "id": "43"
      }
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "joblib.dump(X, trn_encoded_file)\n",
    "joblib.dump(tokenizer, tokenizer_file)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nterop": {
     "id": "17"
    }
   },
   "source": [
    "# Model Training with Cross-Validation"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nterop": {
     "id": "18"
    }
   },
   "source": [
    "Simple model with only an output dense layer added to the pre-trained BERT model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-08T00:40:34.054223Z",
     "start_time": "2021-05-08T00:40:33.411649Z"
    },
    "nterop": {
     "id": "19"
    }
   },
   "outputs": [],
   "source": [
    "def build_model(bert_model, max_len=max_len):    \n",
    "    input_ids = Input(shape=(max_len,), dtype=tf.int32, name=\"input_ids\")\n",
    "    token_type_ids = Input(shape=(max_len,), dtype=tf.int32, name=\"token_type_ids\")\n",
    "    attention_mask = Input(shape=(max_len,), dtype=tf.int32, name=\"attention_mask\")\n",
    "\n",
    "    sequence_output = bert_model(input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask)[0]\n",
    "    clf_output = sequence_output[:, 0, :]\n",
    "    clf_output = Dropout(.1)(clf_output)\n",
    "    out = Dense(1, activation='linear')(clf_output)\n",
    "    \n",
    "    model = Model(inputs=[input_ids, token_type_ids, attention_mask], outputs=out)\n",
    "    model.compile(Adam(lr=1e-5), loss='mean_squared_error', metrics=[RootMeanSquaredError()])\n",
    "    \n",
    "    return model"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nterop": {
     "id": "20"
    }
   },
   "source": [
    "Training the model with early stopping and a learning-rate scheduler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-08T00:40:32.186896Z",
     "start_time": "2021-05-08T00:20:56.528875Z"
    },
    "nterop": {
     "id": "21"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "training CV #1:\n",
      "loading the saved pretrained model\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "All model checkpoint layers were used when initializing TFBertModel.\n",
      "\n",
      "All the layers of TFBertModel were initialized from the model checkpoint at ../build/bert_v9/bert_base_uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "Model: \"model_1\"\n",
      "__________________________________________________________________________________________________\n",
      "Layer (type)                    Output Shape         Param #     Connected to                     \n",
      "==================================================================================================\n",
      "input_ids (InputLayer)          [(None, 205)]        0                                            \n",
      "__________________________________________________________________________________________________\n",
      "attention_mask (InputLayer)     [(None, 205)]        0                                            \n",
      "__________________________________________________________________________________________________\n",
      "token_type_ids (InputLayer)     [(None, 205)]        0                                            \n",
      "__________________________________________________________________________________________________\n",
      "tf_bert_model_1 (TFBertModel)   TFBaseModelOutputWit 109482240   input_ids[0][0]                  \n",
      "                                                                 attention_mask[0][0]             \n",
      "                                                                 token_type_ids[0][0]             \n",
      "__________________________________________________________________________________________________\n",
      "tf.__operators__.getitem_1 (Sli (None, 768)          0           tf_bert_model_1[0][13]           \n",
      "__________________________________________________________________________________________________\n",
      "dropout_75 (Dropout)            (None, 768)          0           tf.__operators__.getitem_1[0][0] \n",
      "__________________________________________________________________________________________________\n",
      "dense_1 (Dense)                 (None, 1)            769         dropout_75[0][0]                 \n",
      "==================================================================================================\n",
      "Total params: 109,483,009\n",
      "Trainable params: 109,483,009\n",
      "Non-trainable params: 0\n",
      "__________________________________________________________________________________________________\n",
      "None\n",
      "Epoch 1/2\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model_1/bert/pooler/dense/kernel:0', 'tf_bert_model_1/bert/pooler/dense/bias:0'] when minimizing the loss.\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model_1/bert/pooler/dense/kernel:0', 'tf_bert_model_1/bert/pooler/dense/bias:0'] when minimizing the loss.\n",
      "284/284 [==============================] - ETA: 0s - loss: 1.0671WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "284/284 [==============================] - 118s 366ms/step - loss: 1.0662 - val_loss: 0.4776\n",
      "Epoch 2/2\n",
      "284/284 [==============================] - 104s 367ms/step - loss: 0.5244 - val_loss: 0.4544\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "training CV #2:\n",
      "loading the saved pretrained model\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "All model checkpoint layers were used when initializing TFBertModel.\n",
      "\n",
      "All the layers of TFBertModel were initialized from the model checkpoint at ../build/bert_v9/bert_base_uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "Epoch 1/2\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model_2/bert/pooler/dense/kernel:0', 'tf_bert_model_2/bert/pooler/dense/bias:0'] when minimizing the loss.\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model_2/bert/pooler/dense/kernel:0', 'tf_bert_model_2/bert/pooler/dense/bias:0'] when minimizing the loss.\n",
      "284/284 [==============================] - ETA: 0s - loss: 1.0675WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "284/284 [==============================] - 117s 365ms/step - loss: 1.0667 - val_loss: 0.5301\n",
      "Epoch 2/2\n",
      "284/284 [==============================] - 101s 356ms/step - loss: 0.5712 - val_loss: 0.4714\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "training CV #3:\n",
      "loading the saved pretrained model\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "All model checkpoint layers were used when initializing TFBertModel.\n",
      "\n",
      "All the layers of TFBertModel were initialized from the model checkpoint at ../build/bert_v9/bert_base_uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "Epoch 1/2\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model_3/bert/pooler/dense/kernel:0', 'tf_bert_model_3/bert/pooler/dense/bias:0'] when minimizing the loss.\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model_3/bert/pooler/dense/kernel:0', 'tf_bert_model_3/bert/pooler/dense/bias:0'] when minimizing the loss.\n",
      "284/284 [==============================] - ETA: 0s - loss: 0.9928WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "284/284 [==============================] - 118s 365ms/step - loss: 0.9922 - val_loss: 0.5096\n",
      "Epoch 2/2\n",
      "284/284 [==============================] - 102s 358ms/step - loss: 0.5822 - val_loss: 0.5252\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "training CV #4:\n",
      "loading the saved pretrained model\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "All model checkpoint layers were used when initializing TFBertModel.\n",
      "\n",
      "All the layers of TFBertModel were initialized from the model checkpoint at ../build/bert_v9/bert_base_uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "Epoch 1/2\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model_4/bert/pooler/dense/kernel:0', 'tf_bert_model_4/bert/pooler/dense/bias:0'] when minimizing the loss.\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model_4/bert/pooler/dense/kernel:0', 'tf_bert_model_4/bert/pooler/dense/bias:0'] when minimizing the loss.\n",
      "284/284 [==============================] - ETA: 0s - loss: 1.0345WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "284/284 [==============================] - 120s 375ms/step - loss: 1.0337 - val_loss: 0.5380\n",
      "Epoch 2/2\n",
      "284/284 [==============================] - 102s 358ms/step - loss: 0.5264 - val_loss: 0.4960\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "training CV #5:\n",
      "loading the saved pretrained model\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "All model checkpoint layers were used when initializing TFBertModel.\n",
      "\n",
      "All the layers of TFBertModel were initialized from the model checkpoint at ../build/bert_v9/bert_base_uncased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "Epoch 1/2\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model_5/bert/pooler/dense/kernel:0', 'tf_bert_model_5/bert/pooler/dense/bias:0'] when minimizing the loss.\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model_5/bert/pooler/dense/kernel:0', 'tf_bert_model_5/bert/pooler/dense/bias:0'] when minimizing the loss.\n",
      "284/284 [==============================] - ETA: 0s - loss: 1.0071WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n",
      "284/284 [==============================] - 120s 371ms/step - loss: 1.0063 - val_loss: 0.5089\n",
      "Epoch 2/2\n",
      "284/284 [==============================] - 103s 363ms/step - loss: 0.4906 - val_loss: 0.5032\n",
      "WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).\n",
      "WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.\n"
     ]
    }
   ],
   "source": [
    "def scheduler(epoch, lr, warmup=5, decay_start=10):\n",
    "    if epoch <= warmup:\n",
    "        return lr / (warmup - epoch + 1)\n",
    "    elif warmup < epoch <= decay_start:\n",
    "        return lr\n",
    "    else:\n",
    "        return lr * tf.math.exp(-.1)\n",
    "\n",
    "ls = LearningRateScheduler(scheduler)\n",
    "es = EarlyStopping(patience=n_stop, restore_best_weights=True)\n",
    "\n",
    "cv = KFold(n_splits=n_fold, shuffle=True, random_state=seed)\n",
    "\n",
    "p = np.zeros_like(y, dtype=float)\n",
    "p_tst = np.zeros((X_tst[0].shape[0], ), dtype=float)\n",
    "for i, (i_trn, i_val) in enumerate(cv.split(X[0]), 1):\n",
    "    print(f'training CV #{i}:')\n",
    "    tf.random.set_seed(seed + i)\n",
    "    bert_model = load_bert(bert_config)\n",
    "    clf = build_model(bert_model, max_len=max_len)\n",
    "    if i == 1:\n",
    "        print(clf.summary())\n",
    "\n",
    "    history = clf.fit([x[i_trn] for x in X], y[i_trn],\n",
    "                      validation_data=([x[i_val] for x in X], y[i_val]),\n",
    "                      epochs=n_est,\n",
    "                      batch_size=batch_size,\n",
    "                      callbacks=[ls])\n",
    "    clf.save_weights(f'{model_name}_cv{i}.h5')\n",
    "    p[i_val] = clf.predict([x[i_val] for x in X]).flatten()\n",
    "    p_tst += clf.predict(X_tst).flatten() / n_fold"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nterop": {
     "id": "39"
    }
   },
   "source": [
    "## Print CV RMSE and Save CV Predictions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-07T23:33:40.734140Z",
     "start_time": "2021-05-07T23:33:37.410Z"
    },
    "nterop": {
     "id": "22"
    }
   },
   "outputs": [],
   "source": [
    "print(f'CV RMSE: {mean_squared_error(y, p, squared=False):.6f}')\n",
    "np.savetxt(val_predict_file, p, fmt='%.6f')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nterop": {
     "id": "23"
    }
   },
   "source": [
    "# Submission"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-07T23:33:40.734791Z",
     "start_time": "2021-05-07T23:33:37.414Z"
    },
    "nterop": {
     "id": "25"
    }
   },
   "outputs": [],
   "source": [
    "sub = pd.read_csv(sample_file, index_col=id_col)\n",
    "sub[target_col] = p_tst\n",
    "sub.to_csv(submission_file)\n",
    "sub.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nterop": {
     "id": "27"
    }
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3.7 (TF2.4)",
   "language": "python",
   "name": "tf2.4"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  },
  "nterop": {
   "seedId": "44"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}