{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"nterop": {
"id": "133"
}
},
"source": [
"# PyTorch Lightning RoBERTa Baseline (Training/Inference)\n",
"> A tutorial about how to train an NLP model with the huggingface's pretrained RoBERTa in PyTorch Lightning\n",
"\n",
"- toc: true \n",
"- badges: true\n",
"- comments: true\n",
"- categories: [notebook, kaggle, nlp]"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19",
"_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5",
"nterop": {
"id": "1"
}
},
"source": [
"This notebook shows how to train a neural network model with pre-trained RoBERTa in Pytorch Lightning. \n",
"\n",
"This competition is a code competition without access to internet. So we add the pretrained model through @abhishek's [`roberta-base` Kaggle Datasets](https://www.kaggle.com/abhishek/roberta-base) instead.\n",
"\n",
"This notebook shares the same structure as in [TF/Keras BERT Baseline (Training/Inference)](https://www.kaggle.com/jeongyoonlee/tf-keras-bert-baseline-training-inference), and is built on top of two other notebooks:\n",
"* [BERT & PyTorch [CommonLit Readability] Simple](https://www.kaggle.com/shivanandmn/bert-pytorch-commonlit-readability-simple) by @shivanandmn\n",
"* [RoBERTa meets TPUs](https://www.kaggle.com/yassinealouini/roberta-meets-tpus#Application:-Tweet-Sentiment-Extraction) by @yassinealouini\n",
"\n",
"Hope it helps."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"nterop": {
"id": "25"
}
},
"source": [
"# Changelogs\n",
"\n",
"| Version | CV Score | Public Score | Changes | Comment |\n",
"|---------|----------|--------------|---------|---------|\n",
"| v1 | | to be updated | initial baseline | |"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"nterop": {
"id": "54"
}
},
"source": [
"# Load Libraries and Data"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T19:05:24.567758Z",
"start_time": "2021-05-07T19:05:24.534817Z"
},
"nterop": {
"id": "18"
}
},
"outputs": [],
"source": [
"%reload_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T19:05:27.193898Z",
"start_time": "2021-05-07T19:05:24.569274Z"
},
"nterop": {
"id": "2"
}
},
"outputs": [],
"source": [
"import joblib\n",
"import numpy as np\n",
"import os\n",
"import pandas as pd \n",
"from pathlib import Path\n",
"import random\n",
"from sklearn.model_selection import KFold\n",
"from sklearn.metrics import mean_squared_error\n",
"from tqdm import tqdm\n",
"from warnings import simplefilter\n",
"simplefilter('ignore')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T19:05:30.967346Z",
"start_time": "2021-05-07T19:05:27.195522Z"
},
"nterop": {
"id": "19"
}
},
"outputs": [],
"source": [
"from pytorch_lightning import Trainer, seed_everything\n",
"from pytorch_lightning.callbacks.early_stopping import EarlyStopping\n",
"from pytorch_lightning.core.lightning import LightningModule\n",
"import torch\n",
"from torch import nn\n",
"from torch.utils.data import DataLoader, Dataset\n",
"from transformers import (PreTrainedModel, RobertaModel, RobertaTokenizerFast, RobertaConfig,\n",
" get_constant_schedule_with_warmup, AdamW)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T19:05:31.046311Z",
"start_time": "2021-05-07T19:05:30.968875Z"
},
"nterop": {
"id": "20"
}
},
"outputs": [],
"source": [
"model_name = 'roberta_v1'\n",
"\n",
"data_dir = Path('../input/commonlitreadabilityprize')\n",
"train_file = data_dir / 'train.csv'\n",
"test_file = data_dir / 'test.csv'\n",
"sample_file = data_dir / 'sample_submission.csv'\n",
"\n",
"pretrained_path = '../input/roberta-base/'\n",
"\n",
"build_dir = Path('../build')\n",
"output_dir = build_dir / 'model' / model_name\n",
"\n",
"trn_encoded_file = output_dir / 'trn.enc.joblib'\n",
"tokenizer_file = output_dir / 'tokenizer.joblib'\n",
"val_predict_file = output_dir / f'{model_name}.val.txt'\n",
"submission_file = output_dir / 'submission.csv'\n",
"\n",
"id_col = 'id'\n",
"target_col = 'target'\n",
"text_col = 'excerpt'\n",
"\n",
"max_len = 200\n",
"n_fold = 5\n",
"n_est = 20\n",
"n_stop = 2\n",
"batch_size = 8\n",
"seed = 42"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T19:05:31.294397Z",
"start_time": "2021-05-07T19:05:31.047420Z"
},
"nterop": {
"id": "21"
}
},
"outputs": [],
"source": [
"output_dir.mkdir(parents=True, exist_ok=True)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T19:05:31.420159Z",
"start_time": "2021-05-07T19:05:31.295486Z"
},
"nterop": {
"id": "3"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Global seed set to 42\n"
]
},
{
"data": {
"text/plain": [
"42"
]
},
"execution_count": 6,
"metadata": {
"nterop": {
"id": "55"
}
},
"output_type": "execute_result"
}
],
"source": [
"seed_everything(seed)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T19:05:31.673420Z",
"start_time": "2021-05-07T19:05:31.421280Z"
},
"nterop": {
"id": "4"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GPU is available\n"
]
}
],
"source": [
"if torch.cuda.is_available():\n",
" device = torch.device(\"cuda\")\n",
" print(\"GPU is available\")\n",
"else:\n",
" device = torch.device(\"cpu\")\n",
" print(\"GPU not available, CPU used\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T19:05:31.921938Z",
"start_time": "2021-05-07T19:05:31.675186Z"
},
"nterop": {
"id": "5"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(2834, 5) (2834,)\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" url_legal | \n",
" license | \n",
" excerpt | \n",
" target | \n",
" standard_error | \n",
"
\n",
" \n",
" id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" c12129c31 | \n",
" NaN | \n",
" NaN | \n",
" When the young people returned to the ballroom... | \n",
" -0.340259 | \n",
" 0.464009 | \n",
"
\n",
" \n",
" 85aa80a4c | \n",
" NaN | \n",
" NaN | \n",
" All through dinner time, Mrs. Fayre was somewh... | \n",
" -0.315372 | \n",
" 0.480805 | \n",
"
\n",
" \n",
" b69ac6792 | \n",
" NaN | \n",
" NaN | \n",
" As Roger had predicted, the snow departed as q... | \n",
" -0.580118 | \n",
" 0.476676 | \n",
"
\n",
" \n",
" dd1000b26 | \n",
" NaN | \n",
" NaN | \n",
" And outside before the palace a great garden w... | \n",
" -1.054013 | \n",
" 0.450007 | \n",
"
\n",
" \n",
" 37c1b32fb | \n",
" NaN | \n",
" NaN | \n",
" Once upon a time there were Three Bears who li... | \n",
" 0.247197 | \n",
" 0.510845 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" url_legal license \\\n",
"id \n",
"c12129c31 NaN NaN \n",
"85aa80a4c NaN NaN \n",
"b69ac6792 NaN NaN \n",
"dd1000b26 NaN NaN \n",
"37c1b32fb NaN NaN \n",
"\n",
" excerpt target \\\n",
"id \n",
"c12129c31 When the young people returned to the ballroom... -0.340259 \n",
"85aa80a4c All through dinner time, Mrs. Fayre was somewh... -0.315372 \n",
"b69ac6792 As Roger had predicted, the snow departed as q... -0.580118 \n",
"dd1000b26 And outside before the palace a great garden w... -1.054013 \n",
"37c1b32fb Once upon a time there were Three Bears who li... 0.247197 \n",
"\n",
" standard_error \n",
"id \n",
"c12129c31 0.464009 \n",
"85aa80a4c 0.480805 \n",
"b69ac6792 0.476676 \n",
"dd1000b26 0.450007 \n",
"37c1b32fb 0.510845 "
]
},
"execution_count": 8,
"metadata": {
"nterop": {
"id": "56"
}
},
"output_type": "execute_result"
}
],
"source": [
"trn = pd.read_csv(train_file, index_col=id_col)\n",
"tst = pd.read_csv(test_file, index_col=id_col)\n",
"y = trn[target_col].values\n",
"print(trn.shape, y.shape)\n",
"trn.head()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"nterop": {
"id": "57"
}
},
"source": [
"# Tokenization Using RoBERTa"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T19:05:32.216036Z",
"start_time": "2021-05-07T19:05:31.923200Z"
},
"nterop": {
"id": "14"
}
},
"outputs": [],
"source": [
"tokenizer = RobertaTokenizerFast.from_pretrained(pretrained_path, do_lower_case=True)\n",
"model_config = RobertaConfig.from_pretrained(pretrained_path)\n",
"model_config.output_hidden_states = True"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T19:05:32.266862Z",
"start_time": "2021-05-07T19:05:32.217251Z"
},
"nterop": {
"id": "11"
}
},
"outputs": [],
"source": [
"class Data(Dataset):\n",
" def __init__(self, df):\n",
" super().__init__()\n",
" self.df = df\n",
" self.labeled = target_col in df\n",
"\n",
" def __len__(self):\n",
" return len(self.df)\n",
"\n",
" def __getitem__(self, idx):\n",
" texts = self.df[text_col][idx]\n",
" token = tokenizer(texts, max_length=max_len, truncation=True, padding='max_length', \n",
" return_tensors='pt', add_special_tokens=True)\n",
" ids = torch.tensor(token['input_ids'], dtype=torch.long).squeeze()\n",
" mask = torch.tensor(token['attention_mask'], dtype=torch.long).squeeze()\n",
" if self.labeled:\n",
" target = torch.tensor(self.df[target_col][idx], dtype=torch.float)\n",
" \n",
" return (ids, mask, target) if self.labeled else (ids, mask)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"nterop": {
"id": "58"
}
},
"source": [
"# Model Training with Cross-Validation"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"nterop": {
"id": "59"
}
},
"source": [
"Simple model with only an output dense layer added to the pre-trained RoBERTa model."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T19:05:32.449179Z",
"start_time": "2021-05-07T19:05:32.268050Z"
},
"nterop": {
"id": "27"
}
},
"outputs": [],
"source": [
"class ReadabilityModel(LightningModule):\n",
" \n",
" def __init__(self, conf):\n",
" super().__init__()\n",
" self.config = conf\n",
" self.model = RobertaModel.from_pretrained(pretrained_path, config=self.config)\n",
" self.dropout = nn.Dropout(0.1)\n",
" self.num_targets = 1\n",
" self.clf = nn.Linear(768, self.num_targets)\n",
" torch.nn.init.normal_(self.clf.weight, std=0.02)\n",
" \n",
" def forward(self, inputs):\n",
" ids, mask = inputs\n",
" out = self.model(ids, attention_mask=mask)\n",
" out = out['hidden_states']\n",
" x = out[-1]\n",
" x = self.dropout(x)\n",
" x = torch.mean(x, 1, True)\n",
" preds = self.clf(x)\n",
" preds = preds.squeeze(-1).squeeze(-1)\n",
"\n",
" return preds\n",
" \n",
" def training_step(self, batch, batch_idx):\n",
" ids, mask, y = batch\n",
" p = self([ids, mask])\n",
" loss = self.loss_fn(p, y)\n",
" self.log('train_loss', loss)\n",
" return loss\n",
" \n",
" def validation_step(self, batch, batch_idx):\n",
" ids, mask, y = batch\n",
" p = self([ids, mask])\n",
" loss = self.loss_fn(p, y)\n",
" self.log('val_loss', loss)\n",
" \n",
" def configure_optimizers(self):\n",
" optimizer = AdamW(model.parameters(), lr=1e-5, weight_decay=0.01)\n",
" lr_scheduler = get_constant_schedule_with_warmup(optimizer, 100)\n",
" return [optimizer], [lr_scheduler]\n",
" \n",
" def loss_fn(self, p, y):\n",
" return torch.sqrt(nn.MSELoss()(p, y))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"nterop": {
"id": "60"
}
},
"source": [
"Training the model with early stopping and a learning-rate schedulerTraining the model"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T20:16:12.965274Z",
"start_time": "2021-05-07T19:05:32.450264Z"
},
"nterop": {
"id": "15"
},
"scrolled": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"GPU available: True, used: True\n",
"TPU available: False, using: 0 TPU cores\n",
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n",
"\n",
" | Name | Type | Params\n",
"-----------------------------------------\n",
"0 | model | RobertaModel | 124 M \n",
"1 | dropout | Dropout | 0 \n",
"2 | clf | Linear | 769 \n",
"-----------------------------------------\n",
"124 M Trainable params\n",
"0 Non-trainable params\n",
"124 M Total params\n",
"498.586 Total estimated model params size (MB)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validation sanity check: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "61"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9c97c472c5e84aa98b0adcdb7745db21",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Training: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "62"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "63"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "64"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "65"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "66"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "67"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "68"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "69"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "70"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "71"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "72"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b6e21792eb42436abc7863e65a936e46",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Predicting: 284it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "73"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3a275f3ef19941f2920604c402934947",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Predicting: 284it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "74"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"GPU available: True, used: True\n",
"TPU available: False, using: 0 TPU cores\n",
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n",
"\n",
" | Name | Type | Params\n",
"-----------------------------------------\n",
"0 | model | RobertaModel | 124 M \n",
"1 | dropout | Dropout | 0 \n",
"2 | clf | Linear | 769 \n",
"-----------------------------------------\n",
"124 M Trainable params\n",
"0 Non-trainable params\n",
"124 M Total params\n",
"498.586 Total estimated model params size (MB)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validation sanity check: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "75"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "327dc4acbc574d91a1d414c9b836d9fc",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Training: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "76"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "77"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "78"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "79"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "80"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "81"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "82"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "83"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "84"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "85"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3b92cc41b133468cb9f829e20eacb664",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Predicting: 284it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "86"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "0ea706c071aa4cd19a844a5599ca771c",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Predicting: 284it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "87"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"GPU available: True, used: True\n",
"TPU available: False, using: 0 TPU cores\n",
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n",
"\n",
" | Name | Type | Params\n",
"-----------------------------------------\n",
"0 | model | RobertaModel | 124 M \n",
"1 | dropout | Dropout | 0 \n",
"2 | clf | Linear | 769 \n",
"-----------------------------------------\n",
"124 M Trainable params\n",
"0 Non-trainable params\n",
"124 M Total params\n",
"498.586 Total estimated model params size (MB)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validation sanity check: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "88"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e5108dd92c8749f6880aa03300a297d8",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Training: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "89"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "90"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "91"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "92"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "93"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "94"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "95"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "96"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "97"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "98"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "99"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "100"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5f2318e538b14cb6af4a6ac712e57293",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Predicting: 284it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "101"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "50f6b0260f5b4489a2001dda2db053d9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Predicting: 284it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "102"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"GPU available: True, used: True\n",
"TPU available: False, using: 0 TPU cores\n",
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n",
"\n",
" | Name | Type | Params\n",
"-----------------------------------------\n",
"0 | model | RobertaModel | 124 M \n",
"1 | dropout | Dropout | 0 \n",
"2 | clf | Linear | 769 \n",
"-----------------------------------------\n",
"124 M Trainable params\n",
"0 Non-trainable params\n",
"124 M Total params\n",
"498.586 Total estimated model params size (MB)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validation sanity check: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "103"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "23d7ffdd77e247239f8f5b97a2b78d0a",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Training: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "104"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "105"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "106"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "107"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "108"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "109"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "110"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "111"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "112"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "113"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "89542174dd784d99a6f507c0fd48804d",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Predicting: 284it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "114"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a47b9204dec44a59a75d79d5d8df100d",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Predicting: 284it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "115"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"GPU available: True, used: True\n",
"TPU available: False, using: 0 TPU cores\n",
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n",
"\n",
" | Name | Type | Params\n",
"-----------------------------------------\n",
"0 | model | RobertaModel | 124 M \n",
"1 | dropout | Dropout | 0 \n",
"2 | clf | Linear | 769 \n",
"-----------------------------------------\n",
"124 M Trainable params\n",
"0 Non-trainable params\n",
"124 M Total params\n",
"498.586 Total estimated model params size (MB)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validation sanity check: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "116"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "c124407becd340b392d11b4a27b1fa77",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Training: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "117"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "118"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "119"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "120"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "121"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "122"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "123"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "124"
}
},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Validating: 0it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "125"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "04c788951bba4e74991b7d4277c6c6f3",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Predicting: 284it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "126"
}
},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "12d5f071397a42c7af1d649940d6e271",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Predicting: 284it [00:00, ?it/s]"
]
},
"metadata": {
"nterop": {
"id": "127"
}
},
"output_type": "display_data"
}
],
"source": [
"cv = KFold(n_splits=n_fold, shuffle=True, random_state=seed)\n",
"\n",
"p = np.zeros_like(y, dtype=float)\n",
"p_tst = np.zeros((tst.shape[0],), dtype=float)\n",
"for i_cv, (i_trn, i_val) in enumerate(cv.split(trn), 1):\n",
" model = ReadabilityModel(model_config)\n",
" trn_loader = DataLoader(Data(trn.iloc[i_trn]), shuffle=True, batch_size=batch_size)\n",
" val_loader = DataLoader(Data(trn.iloc[i_val]), shuffle=False, batch_size=batch_size * 8)\n",
"\n",
" trainer = Trainer(gpus=[0], max_epochs=n_est, \n",
" callbacks=[EarlyStopping(monitor='val_loss', mode='min', patience=n_stop)], \n",
" checkpoint_callback=False)\n",
" trainer.fit(model, trn_loader, val_loader)\n",
"\n",
" val_loader = DataLoader(Data(trn.iloc[i_val].drop(target_col, axis=1)), shuffle=False, \n",
" batch_size=batch_size * 8)\n",
" tst_loader = DataLoader(Data(tst), shuffle=False, batch_size=batch_size * 8)\n",
" p[i_val] = np.concatenate(trainer.predict(model, val_loader))\n",
" p_tst += np.concatenate(trainer.predict(model, tst_loader)) / n_fold\n",
" \n",
" trainer.save_checkpoint(f'{model_name}_cv{i_cv}.ckpt')\n",
" del trainer, model"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"nterop": {
"id": "128"
}
},
"source": [
"## Print CV RMSE and Save CV Predictions"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T20:16:15.068460Z",
"start_time": "2021-05-07T20:16:12.966707Z"
},
"nterop": {
"id": "17"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CV RMSE: 0.678173\n"
]
}
],
"source": [
"print(f'CV RMSE: {mean_squared_error(y, p, squared=False):.6f}')\n",
"np.savetxt(val_predict_file, p, fmt='%.6f')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"nterop": {
"id": "129"
}
},
"source": [
"# Submission"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"ExecuteTime": {
"end_time": "2021-05-07T20:16:15.795823Z",
"start_time": "2021-05-07T20:16:15.069808Z"
},
"nterop": {
"id": "53"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" target | \n",
"
\n",
" \n",
" id | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" c0f722661 | \n",
" -0.087452 | \n",
"
\n",
" \n",
" f0953f0a5 | \n",
" -0.122406 | \n",
"
\n",
" \n",
" 0df072751 | \n",
" -0.165737 | \n",
"
\n",
" \n",
" 04caf4e0c | \n",
" -2.293868 | \n",
"
\n",
" \n",
" 0e63f8bea | \n",
" -1.454202 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" target\n",
"id \n",
"c0f722661 -0.087452\n",
"f0953f0a5 -0.122406\n",
"0df072751 -0.165737\n",
"04caf4e0c -2.293868\n",
"0e63f8bea -1.454202"
]
},
"execution_count": 14,
"metadata": {
"nterop": {
"id": "130"
}
},
"output_type": "execute_result"
}
],
"source": [
"sub = pd.read_csv(sample_file, index_col=id_col)\n",
"sub[target_col] = p_tst\n",
"sub.to_csv(submission_file)\n",
"sub.head()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"nterop": {
"id": "131"
}
},
"source": [
"If you find it helpful, please upvote the notebook. Also check out my other notebooks below:\n",
"\n",
"* [TF/Keras BERT Baseline (Training/Inference)](https://www.kaggle.com/jeongyoonlee/tf-keras-bert-baseline-training-inference): shares the TF/Keras BERT baseline with 5-fold CV\n",
"* [All Zero Submission](https://www.kaggle.com/jeongyoonlee/all-zero-submission): shows the public LB score for all zero submission\n",
"* [DAE with 2 Lines of Code with Kaggler](https://www.kaggle.com/jeongyoonlee/dae-with-2-lines-of-code-with-kaggler): shows how to generate Denoising AutoEncoder features using `Kaggler`\n",
"\n",
"Happy Kagglging~!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"nterop": {
"id": "132"
}
},
"outputs": [],
"source": []
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3.7 (TF2.4)",
"language": "python",
"name": "tf2.4"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.8"
},
"nterop": {
"seedId": "133"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}