INFO:gluonnlp:06:13:55 Namespace(accumulate=8, batch_size=4, bert_dataset='book_corpus_wiki_en_uncased', bert_model='bert_24_1024_16', doc_stride=128, epochs=2, gpu=True, log_interval=50, lr=3e-05, max_answer_length=30, max_query_length=64, max_seq_length=384, model_parameters=None, n_best_size=20, null_score_diff_threshold=-2.0, only_predict=False, optimizer='adam', output_dir='./output_dir', pretrained_bert_parameters=None, test_batch_size=24, uncased=True, version_2=True, warmup_ratio=0.1) INFO:gluonnlp:06:13:55 Using gradient accumulation. Effective batch size = 32 INFO:gluonnlp:06:14:02 Loader Train data... INFO:gluonnlp:06:14:03 Number of records in Train data:130319 INFO:gluonnlp:06:16:51 The number of examples after preprocessing:131944 INFO:gluonnlp:06:16:51 Start Training INFO:gluonnlp:06:19:17 Epoch: 0, Batch: 399/32986, Loss=0.7509, lr=0.0000018 Time cost=145.4 Thoughput=11.00 samples/s INFO:gluonnlp:06:21:42 Epoch: 0, Batch: 799/32986, Loss=0.6331, lr=0.0000036 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:06:24:07 Epoch: 0, Batch: 1199/32986, Loss=0.4781, lr=0.0000055 Time cost=145.0 Thoughput=11.03 samples/s INFO:gluonnlp:06:26:32 Epoch: 0, Batch: 1599/32986, Loss=0.4334, lr=0.0000073 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:06:28:58 Epoch: 0, Batch: 1999/32986, Loss=0.3704, lr=0.0000091 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:06:31:23 Epoch: 0, Batch: 2399/32986, Loss=0.2948, lr=0.0000109 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:06:33:48 Epoch: 0, Batch: 2799/32986, Loss=0.2653, lr=0.0000127 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:06:36:13 Epoch: 0, Batch: 3199/32986, Loss=0.2367, lr=0.0000146 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:06:38:38 Epoch: 0, Batch: 3599/32986, Loss=0.2220, lr=0.0000164 Time cost=145.0 Thoughput=11.04 samples/s INFO:gluonnlp:06:41:03 Epoch: 0, Batch: 3999/32986, Loss=0.2160, lr=0.0000182 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:06:43:29 Epoch: 0, Batch: 4399/32986, Loss=0.2038, lr=0.0000200 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:06:45:54 Epoch: 0, Batch: 4799/32986, Loss=0.1956, lr=0.0000218 Time cost=145.4 Thoughput=11.00 samples/s INFO:gluonnlp:06:48:19 Epoch: 0, Batch: 5199/32986, Loss=0.1951, lr=0.0000237 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:06:50:44 Epoch: 0, Batch: 5599/32986, Loss=0.1879, lr=0.0000255 Time cost=145.4 Thoughput=11.01 samples/s INFO:gluonnlp:06:53:10 Epoch: 0, Batch: 5999/32986, Loss=0.1900, lr=0.0000273 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:06:55:35 Epoch: 0, Batch: 6399/32986, Loss=0.1806, lr=0.0000291 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:06:58:00 Epoch: 0, Batch: 6799/32986, Loss=0.1753, lr=0.0000299 Time cost=144.9 Thoughput=11.04 samples/s INFO:gluonnlp:07:00:24 Epoch: 0, Batch: 7199/32986, Loss=0.1723, lr=0.0000297 Time cost=144.6 Thoughput=11.06 samples/s INFO:gluonnlp:07:02:49 Epoch: 0, Batch: 7599/32986, Loss=0.1650, lr=0.0000295 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:07:05:15 Epoch: 0, Batch: 7999/32986, Loss=0.1706, lr=0.0000293 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:07:07:40 Epoch: 0, Batch: 8399/32986, Loss=0.1591, lr=0.0000291 Time cost=145.1 Thoughput=11.02 samples/s INFO:gluonnlp:07:10:05 Epoch: 0, Batch: 8799/32986, Loss=0.1567, lr=0.0000289 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:07:12:30 Epoch: 0, Batch: 9199/32986, Loss=0.1619, lr=0.0000287 Time cost=144.7 Thoughput=11.05 samples/s INFO:gluonnlp:07:14:55 Epoch: 0, Batch: 9599/32986, Loss=0.1467, lr=0.0000285 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:07:17:20 Epoch: 0, Batch: 9999/32986, Loss=0.1449, lr=0.0000283 Time cost=145.0 Thoughput=11.03 samples/s INFO:gluonnlp:07:19:45 Epoch: 0, Batch: 10399/32986, Loss=0.1478, lr=0.0000281 Time cost=144.8 Thoughput=11.05 samples/s INFO:gluonnlp:07:22:11 Epoch: 0, Batch: 10799/32986, Loss=0.1473, lr=0.0000279 Time cost=146.0 Thoughput=10.96 samples/s INFO:gluonnlp:07:24:36 Epoch: 0, Batch: 11199/32986, Loss=0.1434, lr=0.0000277 Time cost=145.0 Thoughput=11.03 samples/s INFO:gluonnlp:07:27:01 Epoch: 0, Batch: 11599/32986, Loss=0.1492, lr=0.0000275 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:07:29:26 Epoch: 0, Batch: 11999/32986, Loss=0.1420, lr=0.0000273 Time cost=145.0 Thoughput=11.04 samples/s INFO:gluonnlp:07:31:51 Epoch: 0, Batch: 12399/32986, Loss=0.1414, lr=0.0000271 Time cost=144.8 Thoughput=11.05 samples/s INFO:gluonnlp:07:34:16 Epoch: 0, Batch: 12799/32986, Loss=0.1359, lr=0.0000269 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:07:36:41 Epoch: 0, Batch: 13199/32986, Loss=0.1378, lr=0.0000267 Time cost=144.8 Thoughput=11.05 samples/s INFO:gluonnlp:07:39:06 Epoch: 0, Batch: 13599/32986, Loss=0.1418, lr=0.0000265 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:07:41:31 Epoch: 0, Batch: 13999/32986, Loss=0.1313, lr=0.0000263 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:07:43:56 Epoch: 0, Batch: 14399/32986, Loss=0.1340, lr=0.0000261 Time cost=144.9 Thoughput=11.04 samples/s INFO:gluonnlp:07:46:21 Epoch: 0, Batch: 14799/32986, Loss=0.1472, lr=0.0000259 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:07:48:47 Epoch: 0, Batch: 15199/32986, Loss=0.1346, lr=0.0000257 Time cost=145.4 Thoughput=11.01 samples/s INFO:gluonnlp:07:51:12 Epoch: 0, Batch: 15599/32986, Loss=0.1314, lr=0.0000254 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:07:53:37 Epoch: 0, Batch: 15999/32986, Loss=0.1270, lr=0.0000252 Time cost=144.9 Thoughput=11.05 samples/s INFO:gluonnlp:07:56:02 Epoch: 0, Batch: 16399/32986, Loss=0.1299, lr=0.0000250 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:07:58:27 Epoch: 0, Batch: 16799/32986, Loss=0.1334, lr=0.0000248 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:08:00:52 Epoch: 0, Batch: 17199/32986, Loss=0.1337, lr=0.0000246 Time cost=145.0 Thoughput=11.03 samples/s INFO:gluonnlp:08:03:17 Epoch: 0, Batch: 17599/32986, Loss=0.1250, lr=0.0000244 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:08:05:42 Epoch: 0, Batch: 17999/32986, Loss=0.1175, lr=0.0000242 Time cost=145.0 Thoughput=11.03 samples/s INFO:gluonnlp:08:08:07 Epoch: 0, Batch: 18399/32986, Loss=0.1298, lr=0.0000240 Time cost=145.0 Thoughput=11.04 samples/s INFO:gluonnlp:08:10:32 Epoch: 0, Batch: 18799/32986, Loss=0.1276, lr=0.0000238 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:08:12:58 Epoch: 0, Batch: 19199/32986, Loss=0.1236, lr=0.0000236 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:08:15:23 Epoch: 0, Batch: 19599/32986, Loss=0.1188, lr=0.0000234 Time cost=145.0 Thoughput=11.04 samples/s INFO:gluonnlp:08:17:48 Epoch: 0, Batch: 19999/32986, Loss=0.1206, lr=0.0000232 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:08:20:13 Epoch: 0, Batch: 20399/32986, Loss=0.1284, lr=0.0000230 Time cost=145.0 Thoughput=11.03 samples/s INFO:gluonnlp:08:22:38 Epoch: 0, Batch: 20799/32986, Loss=0.1202, lr=0.0000228 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:08:25:03 Epoch: 0, Batch: 21199/32986, Loss=0.1190, lr=0.0000226 Time cost=144.7 Thoughput=11.06 samples/s INFO:gluonnlp:08:27:28 Epoch: 0, Batch: 21599/32986, Loss=0.1270, lr=0.0000224 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:08:29:53 Epoch: 0, Batch: 21999/32986, Loss=0.1121, lr=0.0000222 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:08:32:19 Epoch: 0, Batch: 22399/32986, Loss=0.1099, lr=0.0000220 Time cost=145.4 Thoughput=11.00 samples/s INFO:gluonnlp:08:34:44 Epoch: 0, Batch: 22799/32986, Loss=0.1193, lr=0.0000218 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:08:37:09 Epoch: 0, Batch: 23199/32986, Loss=0.1131, lr=0.0000216 Time cost=145.4 Thoughput=11.00 samples/s INFO:gluonnlp:08:39:35 Epoch: 0, Batch: 23599/32986, Loss=0.1183, lr=0.0000214 Time cost=145.4 Thoughput=11.01 samples/s INFO:gluonnlp:08:42:00 Epoch: 0, Batch: 23999/32986, Loss=0.1135, lr=0.0000212 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:08:44:25 Epoch: 0, Batch: 24399/32986, Loss=0.1141, lr=0.0000210 Time cost=144.8 Thoughput=11.05 samples/s INFO:gluonnlp:08:46:50 Epoch: 0, Batch: 24799/32986, Loss=0.1109, lr=0.0000208 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:08:49:15 Epoch: 0, Batch: 25199/32986, Loss=0.1159, lr=0.0000206 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:08:51:40 Epoch: 0, Batch: 25599/32986, Loss=0.1191, lr=0.0000204 Time cost=144.9 Thoughput=11.04 samples/s INFO:gluonnlp:08:54:04 Epoch: 0, Batch: 25999/32986, Loss=0.1236, lr=0.0000202 Time cost=144.5 Thoughput=11.07 samples/s INFO:gluonnlp:08:56:29 Epoch: 0, Batch: 26399/32986, Loss=0.1044, lr=0.0000200 Time cost=144.5 Thoughput=11.08 samples/s INFO:gluonnlp:08:58:53 Epoch: 0, Batch: 26799/32986, Loss=0.1148, lr=0.0000198 Time cost=144.6 Thoughput=11.06 samples/s INFO:gluonnlp:09:01:18 Epoch: 0, Batch: 27199/32986, Loss=0.1011, lr=0.0000196 Time cost=144.6 Thoughput=11.07 samples/s INFO:gluonnlp:09:03:42 Epoch: 0, Batch: 27599/32986, Loss=0.1167, lr=0.0000194 Time cost=144.2 Thoughput=11.09 samples/s INFO:gluonnlp:09:06:07 Epoch: 0, Batch: 27999/32986, Loss=0.1172, lr=0.0000192 Time cost=144.2 Thoughput=11.09 samples/s INFO:gluonnlp:09:08:31 Epoch: 0, Batch: 28399/32986, Loss=0.1127, lr=0.0000190 Time cost=144.2 Thoughput=11.09 samples/s INFO:gluonnlp:09:10:55 Epoch: 0, Batch: 28799/32986, Loss=0.1052, lr=0.0000188 Time cost=144.1 Thoughput=11.10 samples/s INFO:gluonnlp:09:13:19 Epoch: 0, Batch: 29199/32986, Loss=0.1061, lr=0.0000186 Time cost=144.5 Thoughput=11.07 samples/s INFO:gluonnlp:09:15:45 Epoch: 0, Batch: 29599/32986, Loss=0.1118, lr=0.0000184 Time cost=145.9 Thoughput=10.96 samples/s INFO:gluonnlp:09:18:10 Epoch: 0, Batch: 29999/32986, Loss=0.1083, lr=0.0000182 Time cost=145.0 Thoughput=11.04 samples/s INFO:gluonnlp:09:20:36 Epoch: 0, Batch: 30399/32986, Loss=0.1105, lr=0.0000180 Time cost=145.5 Thoughput=11.00 samples/s INFO:gluonnlp:09:23:01 Epoch: 0, Batch: 30799/32986, Loss=0.1125, lr=0.0000178 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:09:25:26 Epoch: 0, Batch: 31199/32986, Loss=0.1128, lr=0.0000176 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:09:27:51 Epoch: 0, Batch: 31599/32986, Loss=0.1044, lr=0.0000174 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:09:30:16 Epoch: 0, Batch: 31999/32986, Loss=0.1115, lr=0.0000172 Time cost=144.6 Thoughput=11.06 samples/s INFO:gluonnlp:09:32:41 Epoch: 0, Batch: 32399/32986, Loss=0.1051, lr=0.0000170 Time cost=145.0 Thoughput=11.04 samples/s INFO:gluonnlp:09:35:05 Epoch: 0, Batch: 32799/32986, Loss=0.1044, lr=0.0000168 Time cost=144.7 Thoughput=11.06 samples/s INFO:gluonnlp:09:36:13 Time cost=11961.13 s, Thoughput=11.03 samples/s INFO:gluonnlp:09:38:38 Epoch: 1, Batch: 399/32986, Loss=0.0817, lr=0.0000165 Time cost=145.3 Thoughput=16.13 samples/s INFO:gluonnlp:09:41:03 Epoch: 1, Batch: 799/32986, Loss=0.0797, lr=0.0000163 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:09:43:28 Epoch: 1, Batch: 1199/32986, Loss=0.0741, lr=0.0000161 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:09:45:53 Epoch: 1, Batch: 1599/32986, Loss=0.0820, lr=0.0000159 Time cost=144.9 Thoughput=11.04 samples/s INFO:gluonnlp:09:48:17 Epoch: 1, Batch: 1999/32986, Loss=0.0749, lr=0.0000157 Time cost=144.4 Thoughput=11.08 samples/s INFO:gluonnlp:09:50:42 Epoch: 1, Batch: 2399/32986, Loss=0.0789, lr=0.0000154 Time cost=144.6 Thoughput=11.07 samples/s INFO:gluonnlp:09:53:06 Epoch: 1, Batch: 2799/32986, Loss=0.0760, lr=0.0000152 Time cost=144.5 Thoughput=11.07 samples/s INFO:gluonnlp:09:55:31 Epoch: 1, Batch: 3199/32986, Loss=0.0815, lr=0.0000150 Time cost=144.6 Thoughput=11.07 samples/s INFO:gluonnlp:09:57:56 Epoch: 1, Batch: 3599/32986, Loss=0.0795, lr=0.0000148 Time cost=144.6 Thoughput=11.06 samples/s INFO:gluonnlp:10:00:20 Epoch: 1, Batch: 3999/32986, Loss=0.0794, lr=0.0000146 Time cost=144.6 Thoughput=11.06 samples/s INFO:gluonnlp:10:02:45 Epoch: 1, Batch: 4399/32986, Loss=0.0777, lr=0.0000144 Time cost=144.7 Thoughput=11.06 samples/s INFO:gluonnlp:10:05:10 Epoch: 1, Batch: 4799/32986, Loss=0.0802, lr=0.0000142 Time cost=144.7 Thoughput=11.05 samples/s INFO:gluonnlp:10:07:34 Epoch: 1, Batch: 5199/32986, Loss=0.0715, lr=0.0000140 Time cost=144.4 Thoughput=11.08 samples/s INFO:gluonnlp:10:09:59 Epoch: 1, Batch: 5599/32986, Loss=0.0779, lr=0.0000138 Time cost=144.7 Thoughput=11.06 samples/s INFO:gluonnlp:10:12:24 Epoch: 1, Batch: 5999/32986, Loss=0.0763, lr=0.0000136 Time cost=144.7 Thoughput=11.06 samples/s INFO:gluonnlp:10:14:48 Epoch: 1, Batch: 6399/32986, Loss=0.0819, lr=0.0000134 Time cost=144.8 Thoughput=11.05 samples/s INFO:gluonnlp:10:17:13 Epoch: 1, Batch: 6799/32986, Loss=0.0787, lr=0.0000132 Time cost=144.6 Thoughput=11.06 samples/s INFO:gluonnlp:10:19:37 Epoch: 1, Batch: 7199/32986, Loss=0.0778, lr=0.0000130 Time cost=144.2 Thoughput=11.10 samples/s INFO:gluonnlp:10:22:02 Epoch: 1, Batch: 7599/32986, Loss=0.0752, lr=0.0000128 Time cost=145.0 Thoughput=11.04 samples/s INFO:gluonnlp:10:24:27 Epoch: 1, Batch: 7999/32986, Loss=0.0730, lr=0.0000126 Time cost=144.7 Thoughput=11.06 samples/s INFO:gluonnlp:10:26:52 Epoch: 1, Batch: 8399/32986, Loss=0.0822, lr=0.0000124 Time cost=145.6 Thoughput=10.99 samples/s INFO:gluonnlp:10:29:17 Epoch: 1, Batch: 8799/32986, Loss=0.0768, lr=0.0000122 Time cost=144.2 Thoughput=11.10 samples/s INFO:gluonnlp:10:31:41 Epoch: 1, Batch: 9199/32986, Loss=0.0785, lr=0.0000120 Time cost=144.5 Thoughput=11.07 samples/s INFO:gluonnlp:10:34:05 Epoch: 1, Batch: 9599/32986, Loss=0.0769, lr=0.0000118 Time cost=144.4 Thoughput=11.08 samples/s INFO:gluonnlp:10:36:30 Epoch: 1, Batch: 9999/32986, Loss=0.0762, lr=0.0000116 Time cost=144.3 Thoughput=11.09 samples/s INFO:gluonnlp:10:38:54 Epoch: 1, Batch: 10399/32986, Loss=0.0785, lr=0.0000114 Time cost=144.7 Thoughput=11.06 samples/s INFO:gluonnlp:10:41:19 Epoch: 1, Batch: 10799/32986, Loss=0.0742, lr=0.0000112 Time cost=144.3 Thoughput=11.09 samples/s INFO:gluonnlp:10:43:43 Epoch: 1, Batch: 11199/32986, Loss=0.0783, lr=0.0000110 Time cost=144.3 Thoughput=11.09 samples/s INFO:gluonnlp:10:46:07 Epoch: 1, Batch: 11599/32986, Loss=0.0751, lr=0.0000108 Time cost=144.3 Thoughput=11.09 samples/s INFO:gluonnlp:10:48:32 Epoch: 1, Batch: 11999/32986, Loss=0.0765, lr=0.0000106 Time cost=144.4 Thoughput=11.08 samples/s INFO:gluonnlp:10:50:56 Epoch: 1, Batch: 12399/32986, Loss=0.0817, lr=0.0000104 Time cost=144.6 Thoughput=11.06 samples/s INFO:gluonnlp:10:53:21 Epoch: 1, Batch: 12799/32986, Loss=0.0798, lr=0.0000102 Time cost=144.3 Thoughput=11.09 samples/s INFO:gluonnlp:10:55:45 Epoch: 1, Batch: 13199/32986, Loss=0.0786, lr=0.0000100 Time cost=144.3 Thoughput=11.09 samples/s INFO:gluonnlp:10:58:10 Epoch: 1, Batch: 13599/32986, Loss=0.0747, lr=0.0000098 Time cost=144.9 Thoughput=11.04 samples/s INFO:gluonnlp:11:00:35 Epoch: 1, Batch: 13999/32986, Loss=0.0775, lr=0.0000096 Time cost=144.6 Thoughput=11.06 samples/s INFO:gluonnlp:11:02:59 Epoch: 1, Batch: 14399/32986, Loss=0.0684, lr=0.0000094 Time cost=144.5 Thoughput=11.07 samples/s INFO:gluonnlp:11:05:24 Epoch: 1, Batch: 14799/32986, Loss=0.0767, lr=0.0000092 Time cost=144.6 Thoughput=11.06 samples/s INFO:gluonnlp:11:07:48 Epoch: 1, Batch: 15199/32986, Loss=0.0729, lr=0.0000090 Time cost=144.8 Thoughput=11.05 samples/s INFO:gluonnlp:11:10:13 Epoch: 1, Batch: 15599/32986, Loss=0.0757, lr=0.0000088 Time cost=144.6 Thoughput=11.07 samples/s INFO:gluonnlp:11:12:38 Epoch: 1, Batch: 15999/32986, Loss=0.0730, lr=0.0000086 Time cost=144.6 Thoughput=11.07 samples/s INFO:gluonnlp:11:15:02 Epoch: 1, Batch: 16399/32986, Loss=0.0772, lr=0.0000084 Time cost=144.1 Thoughput=11.10 samples/s INFO:gluonnlp:11:17:26 Epoch: 1, Batch: 16799/32986, Loss=0.0717, lr=0.0000082 Time cost=144.7 Thoughput=11.06 samples/s INFO:gluonnlp:11:19:51 Epoch: 1, Batch: 17199/32986, Loss=0.0681, lr=0.0000080 Time cost=144.6 Thoughput=11.07 samples/s INFO:gluonnlp:11:22:16 Epoch: 1, Batch: 17599/32986, Loss=0.0704, lr=0.0000078 Time cost=144.5 Thoughput=11.07 samples/s INFO:gluonnlp:11:24:40 Epoch: 1, Batch: 17999/32986, Loss=0.0739, lr=0.0000076 Time cost=144.7 Thoughput=11.06 samples/s INFO:gluonnlp:11:27:05 Epoch: 1, Batch: 18399/32986, Loss=0.0745, lr=0.0000074 Time cost=144.5 Thoughput=11.08 samples/s INFO:gluonnlp:11:29:30 Epoch: 1, Batch: 18799/32986, Loss=0.0753, lr=0.0000072 Time cost=144.9 Thoughput=11.04 samples/s INFO:gluonnlp:11:31:55 Epoch: 1, Batch: 19199/32986, Loss=0.0723, lr=0.0000070 Time cost=144.9 Thoughput=11.04 samples/s INFO:gluonnlp:11:34:19 Epoch: 1, Batch: 19599/32986, Loss=0.0729, lr=0.0000068 Time cost=144.6 Thoughput=11.06 samples/s INFO:gluonnlp:11:36:44 Epoch: 1, Batch: 19999/32986, Loss=0.0732, lr=0.0000066 Time cost=144.9 Thoughput=11.05 samples/s INFO:gluonnlp:11:39:08 Epoch: 1, Batch: 20399/32986, Loss=0.0736, lr=0.0000064 Time cost=144.4 Thoughput=11.08 samples/s INFO:gluonnlp:11:41:33 Epoch: 1, Batch: 20799/32986, Loss=0.0715, lr=0.0000062 Time cost=144.5 Thoughput=11.07 samples/s INFO:gluonnlp:11:43:58 Epoch: 1, Batch: 21199/32986, Loss=0.0717, lr=0.0000059 Time cost=144.8 Thoughput=11.05 samples/s INFO:gluonnlp:11:46:23 Epoch: 1, Batch: 21599/32986, Loss=0.0779, lr=0.0000057 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:11:48:48 Epoch: 1, Batch: 21999/32986, Loss=0.0793, lr=0.0000055 Time cost=144.6 Thoughput=11.06 samples/s INFO:gluonnlp:11:51:13 Epoch: 1, Batch: 22399/32986, Loss=0.0728, lr=0.0000053 Time cost=144.9 Thoughput=11.04 samples/s INFO:gluonnlp:11:53:37 Epoch: 1, Batch: 22799/32986, Loss=0.0714, lr=0.0000051 Time cost=144.9 Thoughput=11.04 samples/s INFO:gluonnlp:11:56:02 Epoch: 1, Batch: 23199/32986, Loss=0.0761, lr=0.0000049 Time cost=144.8 Thoughput=11.05 samples/s INFO:gluonnlp:11:58:27 Epoch: 1, Batch: 23599/32986, Loss=0.0696, lr=0.0000047 Time cost=144.7 Thoughput=11.06 samples/s INFO:gluonnlp:12:00:52 Epoch: 1, Batch: 23999/32986, Loss=0.0728, lr=0.0000045 Time cost=144.7 Thoughput=11.06 samples/s INFO:gluonnlp:12:03:16 Epoch: 1, Batch: 24399/32986, Loss=0.0765, lr=0.0000043 Time cost=144.5 Thoughput=11.07 samples/s INFO:gluonnlp:12:05:41 Epoch: 1, Batch: 24799/32986, Loss=0.0705, lr=0.0000041 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:12:08:06 Epoch: 1, Batch: 25199/32986, Loss=0.0776, lr=0.0000039 Time cost=144.6 Thoughput=11.07 samples/s INFO:gluonnlp:12:10:31 Epoch: 1, Batch: 25599/32986, Loss=0.0743, lr=0.0000037 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:12:12:56 Epoch: 1, Batch: 25999/32986, Loss=0.0700, lr=0.0000035 Time cost=145.4 Thoughput=11.00 samples/s INFO:gluonnlp:12:15:22 Epoch: 1, Batch: 26399/32986, Loss=0.0685, lr=0.0000033 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:12:17:47 Epoch: 1, Batch: 26799/32986, Loss=0.0769, lr=0.0000031 Time cost=145.0 Thoughput=11.04 samples/s INFO:gluonnlp:12:20:13 Epoch: 1, Batch: 27199/32986, Loss=0.0683, lr=0.0000029 Time cost=146.1 Thoughput=10.95 samples/s INFO:gluonnlp:12:22:38 Epoch: 1, Batch: 27599/32986, Loss=0.0689, lr=0.0000027 Time cost=145.0 Thoughput=11.03 samples/s INFO:gluonnlp:12:25:03 Epoch: 1, Batch: 27999/32986, Loss=0.0726, lr=0.0000025 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:12:27:29 Epoch: 1, Batch: 28399/32986, Loss=0.0797, lr=0.0000023 Time cost=145.4 Thoughput=11.00 samples/s INFO:gluonnlp:12:29:54 Epoch: 1, Batch: 28799/32986, Loss=0.0711, lr=0.0000021 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:12:32:19 Epoch: 1, Batch: 29199/32986, Loss=0.0723, lr=0.0000019 Time cost=144.8 Thoughput=11.05 samples/s INFO:gluonnlp:12:34:44 Epoch: 1, Batch: 29599/32986, Loss=0.0663, lr=0.0000017 Time cost=145.0 Thoughput=11.03 samples/s INFO:gluonnlp:12:37:09 Epoch: 1, Batch: 29999/32986, Loss=0.0731, lr=0.0000015 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:12:39:34 Epoch: 1, Batch: 30399/32986, Loss=0.0641, lr=0.0000013 Time cost=145.2 Thoughput=11.02 samples/s INFO:gluonnlp:12:41:59 Epoch: 1, Batch: 30799/32986, Loss=0.0757, lr=0.0000011 Time cost=145.0 Thoughput=11.04 samples/s INFO:gluonnlp:12:44:24 Epoch: 1, Batch: 31199/32986, Loss=0.0753, lr=0.0000009 Time cost=145.0 Thoughput=11.04 samples/s INFO:gluonnlp:12:46:49 Epoch: 1, Batch: 31599/32986, Loss=0.0726, lr=0.0000007 Time cost=145.0 Thoughput=11.03 samples/s INFO:gluonnlp:12:49:14 Epoch: 1, Batch: 31999/32986, Loss=0.0766, lr=0.0000005 Time cost=145.3 Thoughput=11.01 samples/s INFO:gluonnlp:12:51:39 Epoch: 1, Batch: 32399/32986, Loss=0.0776, lr=0.0000003 Time cost=145.1 Thoughput=11.03 samples/s INFO:gluonnlp:12:54:04 Epoch: 1, Batch: 32799/32986, Loss=0.0758, lr=0.0000001 Time cost=144.9 Thoughput=11.04 samples/s INFO:gluonnlp:12:55:11 Time cost=23899.98 s, Thoughput=11.04 samples/s INFO:gluonnlp:12:55:23 Loader dev data... INFO:gluonnlp:12:55:23 Number of records in Train data:11873 INFO:gluonnlp:12:55:39 The number of examples after preprocessing:12232 INFO:gluonnlp:12:55:39 Start predict INFO:gluonnlp:12:59:25 Time cost=225.17 s, Thoughput=54.32 samples/s INFO:gluonnlp:12:59:25 Get prediction results...