Training: 2022-01-07 17:13:51,071-rank_id: 0 Training: 2022-01-07 17:14:17,828-: loss cosface Training: 2022-01-07 17:14:17,829-: network r100 Training: 2022-01-07 17:14:17,829-: resume False Training: 2022-01-07 18:37:51,584-: output work_dirs/webface42m_r100_lr01_pfc02_16gpus Training: 2022-01-07 18:37:51,584-: embedding_size 512 Training: 2022-01-07 18:37:51,584-: sample_rate 0.2 Training: 2022-01-07 18:37:51,584-: fp16 True Training: 2022-01-07 18:37:51,584-: momentum 0.9 Training: 2022-01-07 18:37:51,584-: weight_decay 0.0005 Training: 2022-01-07 18:37:51,584-: batch_size 256 Training: 2022-01-07 18:37:51,584-: lr 0.3 Training: 2022-01-07 18:37:51,584-: dali True Training: 2022-01-07 18:37:51,584-: verbose 2000 Training: 2022-01-07 18:37:51,584-: frequent 10 Training: 2022-01-07 18:37:51,584-: if_hard_scale False Training: 2022-01-07 18:37:51,585-: score None Training: 2022-01-07 18:37:51,585-: rec /train_tmp/WebFace42M Training: 2022-01-07 18:37:51,585-: num_classes 2059906 Training: 2022-01-07 18:37:51,585-: num_image 42474557 Training: 2022-01-07 18:37:51,585-: num_epoch 20 Training: 2022-01-07 18:37:51,585-: warmup_epoch 1 Training: 2022-01-07 18:37:51,585-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-01-07 18:37:51,585-: warmup_step 10369 Training: 2022-01-07 18:37:51,585-: total_step 207380 Training: 2022-01-07 18:38:05,243-Speed 5498.09 samples/sec Loss 39.6443 LearningRate 0.0159 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:38:12,670-Speed 5516.75 samples/sec Loss 39.6036 LearningRate 0.0162 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:38:20,106-Speed 5509.71 samples/sec Loss 39.5587 LearningRate 0.0165 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:38:27,549-Speed 5504.42 samples/sec Loss 39.5510 LearningRate 0.0168 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:38:35,042-Speed 5468.04 samples/sec Loss 39.5291 LearningRate 0.0171 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:38:42,644-Speed 5388.85 samples/sec Loss 39.4717 LearningRate 0.0174 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:38:50,096-Speed 5497.64 samples/sec Loss 39.4198 LearningRate 0.0176 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:38:57,510-Speed 5526.22 samples/sec Loss 39.4227 LearningRate 0.0179 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:39:04,968-Speed 5493.36 samples/sec Loss 39.3887 LearningRate 0.0182 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:39:12,403-Speed 5510.44 samples/sec Loss 39.3821 LearningRate 0.0185 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:39:19,944-Speed 5432.65 samples/sec Loss 39.3465 LearningRate 0.0188 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:39:27,369-Speed 5517.94 samples/sec Loss 39.3211 LearningRate 0.0191 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:39:34,955-Speed 5400.76 samples/sec Loss 39.3001 LearningRate 0.0194 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:39:42,357-Speed 5535.23 samples/sec Loss 39.2760 LearningRate 0.0197 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:39:49,724-Speed 5560.82 samples/sec Loss 39.2780 LearningRate 0.0200 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 32768 Required: 43 hours ng: 2022-01-07 18:39:51,469-Speed 5584.63 samples/sec Loss 42.4764 LearningRate 0.0023 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-01-07 18:39:57,092-Speed 5560.67 samples/sec Loss 39.2595 LearningRate 0.0203 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:40:04,755-Speed 5350.07 samples/sec Loss 39.2531 LearningRate 0.0205 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:40:12,366-Speed 5383.32 samples/sec Loss 39.2174 LearningRate 0.0208 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:40:19,696-Speed 5589.52 samples/sec Loss 39.2185 LearningRate 0.0211 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 32768 Required: 43 hours ining: 2022-01-07 18:40:20,771-Speed 5479.16 samples/sec Loss 42.4339 LearningRate 0.0035 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-01-07 18:40:27,265-Speed 5411.99 samples/sec Loss 39.1960 LearningRate 0.0214 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:40:28,143-Speed 5557.25 samples/sec Loss 42.4347 LearningRate 0.0038 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-01-07 18:40:35,382-Speed 5661.93 samples/sec Loss 42.4128 LearningRate 0.0041 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-01-07 18:40:42,621-Speed 5658.90 samples/sec Loss 42.3886 LearningRate 0.0043 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-01-07 18:40:50,093-Speed 5483.51 samples/sec Loss 42.3571 LearningRate 0.0046 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-01-07 18:40:56,844-Speed 5497.71 samples/sec Loss 39.2052 LearningRate 0.0226 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:41:04,119-Speed 5633.38 samples/sec Loss 39.1676 LearningRate 0.0229 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:41:11,600-Speed 5476.66 samples/sec Loss 39.1966 LearningRate 0.0231 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:41:19,205-Speed 5387.43 samples/sec Loss 39.2049 LearningRate 0.0234 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:41:26,630-Speed 5517.57 samples/sec Loss 39.1738 LearningRate 0.0237 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:41:34,039-Speed 5530.22 samples/sec Loss 39.1547 LearningRate 0.0240 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:41:41,538-Speed 5463.09 samples/sec Loss 39.2001 LearningRate 0.0243 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:41:49,165-Speed 5372.02 samples/sec Loss 39.1941 LearningRate 0.0246 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:41:56,819-Speed 5351.96 samples/sec Loss 39.1715 LearningRate 0.0249 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:42:04,290-Speed 5484.49 samples/sec Loss 39.1698 LearningRate 0.0252 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:42:11,790-Speed 5462.60 samples/sec Loss 39.1889 LearningRate 0.0255 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:42:19,050-Speed 5643.81 samples/sec Loss 39.1620 LearningRate 0.0257 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:42:26,244-Speed 5695.20 samples/sec Loss 39.1756 LearningRate 0.0260 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:42:33,393-Speed 5731.05 samples/sec Loss 39.1696 LearningRate 0.0263 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:42:40,833-Speed 5506.69 samples/sec Loss 39.1989 LearningRate 0.0266 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:42:48,272-Speed 5507.02 samples/sec Loss 39.1680 LearningRate 0.0269 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:42:55,811-Speed 5434.35 samples/sec Loss 39.1873 LearningRate 0.0272 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:43:03,286-Speed 5480.87 samples/sec Loss 39.1872 LearningRate 0.0275 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:43:10,743-Speed 5493.97 samples/sec Loss 39.1804 LearningRate 0.0278 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:43:18,192-Speed 5499.56 samples/sec Loss 39.1828 LearningRate 0.0281 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:43:24,241-Speed 5636.82 samples/sec Loss 40.6593 LearningRate 0.0107 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:43:25,627-Speed 5509.75 samples/sec Loss 39.2133 LearningRate 0.0284 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:43:33,069-Speed 5506.43 samples/sec Loss 39.2146 LearningRate 0.0286 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:43:40,603-Speed 5437.84 samples/sec Loss 39.2198 LearningRate 0.0289 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:43:48,022-Speed 5521.40 samples/sec Loss 39.2090 LearningRate 0.0292 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 65536 Required: 43 hours aining: 2022-01-07 18:43:53,343-Speed 5667.45 samples/sec Loss 40.3711 LearningRate 0.0119 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-01-07 18:43:55,480-Speed 5492.99 samples/sec Loss 39.2389 LearningRate 0.0295 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:44:02,954-Speed 5483.64 samples/sec Loss 39.2298 LearningRate 0.0298 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:44:10,375-Speed 5519.81 samples/sec Loss 39.2127 LearningRate 0.0301 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:44:17,820-Speed 5502.64 samples/sec Loss 39.2076 LearningRate 0.0304 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 65536 Required: 43 hours ining: 2022-01-07 18:44:22,696-Speed 5618.41 samples/sec Loss 40.0572 LearningRate 0.0130 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-01-07 18:44:25,302-Speed 5475.10 samples/sec Loss 39.2280 LearningRate 0.0307 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:44:32,915-Speed 5382.85 samples/sec Loss 39.2349 LearningRate 0.0310 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:44:40,527-Speed 5381.43 samples/sec Loss 39.2379 LearningRate 0.0312 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:44:47,933-Speed 5532.06 samples/sec Loss 39.2408 LearningRate 0.0315 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 32768 Required: 43 hours ining: 2022-01-07 18:44:52,197-Speed 5600.57 samples/sec Loss 39.8258 LearningRate 0.0142 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-01-07 18:44:55,373-Speed 5505.53 samples/sec Loss 39.2582 LearningRate 0.0318 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:45:02,819-Speed 5504.47 samples/sec Loss 39.2440 LearningRate 0.0321 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:45:10,302-Speed 5474.30 samples/sec Loss 39.2359 LearningRate 0.0324 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:45:17,720-Speed 5523.38 samples/sec Loss 39.2752 LearningRate 0.0327 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:45:25,146-Speed 5516.10 samples/sec Loss 39.2780 LearningRate 0.0330 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:45:32,557-Speed 5527.83 samples/sec Loss 39.2744 LearningRate 0.0333 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:45:39,978-Speed 5520.37 samples/sec Loss 39.2987 LearningRate 0.0336 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:45:47,399-Speed 5520.66 samples/sec Loss 39.2949 LearningRate 0.0339 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:45:54,826-Speed 5515.07 samples/sec Loss 39.2948 LearningRate 0.0341 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:46:02,301-Speed 5481.11 samples/sec Loss 39.2977 LearningRate 0.0344 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:46:09,791-Speed 5469.36 samples/sec Loss 39.3209 LearningRate 0.0347 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:46:17,235-Speed 5502.73 samples/sec Loss 39.3473 LearningRate 0.0350 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:46:24,697-Speed 5489.61 samples/sec Loss 39.3534 LearningRate 0.0353 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:46:32,130-Speed 5512.19 samples/sec Loss 39.3334 LearningRate 0.0356 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:46:39,586-Speed 5494.28 samples/sec Loss 39.3281 LearningRate 0.0359 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:46:47,037-Speed 5497.47 samples/sec Loss 39.3505 LearningRate 0.0362 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:46:54,502-Speed 5487.73 samples/sec Loss 39.3721 LearningRate 0.0365 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:47:01,957-Speed 5495.09 samples/sec Loss 39.3805 LearningRate 0.0367 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:47:06,580-Speed 5312.02 samples/sec Loss 39.2506 LearningRate 0.0194 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-01-07 18:47:14,394-Speed 5246.81 samples/sec Loss 39.2250 LearningRate 0.0197 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-01-07 18:47:22,178-Speed 5263.14 samples/sec Loss 39.2187 LearningRate 0.0200 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-01-07 18:47:29,897-Speed 5307.10 samples/sec Loss 39.2036 LearningRate 0.0203 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 16384 Required: 43 hours hours Training: 2022-01-07 18:47:37,637-Speed 5292.97 samples/sec Loss 39.1947 LearningRate 0.0205 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-01-07 18:47:45,365-Speed 5304.02 samples/sec Loss 39.1989 LearningRate 0.0208 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-01-07 18:47:53,036-Speed 5340.50 samples/sec Loss 39.1970 LearningRate 0.0211 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-01-07 18:48:00,835-Speed 5253.25 samples/sec Loss 39.2106 LearningRate 0.0214 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-01-07 18:48:08,541-Speed 5316.07 samples/sec Loss 39.1846 LearningRate 0.0217 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:48:16,117-Speed 5407.92 samples/sec Loss 39.1867 LearningRate 0.0220 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:48:23,462-Speed 5577.85 samples/sec Loss 39.1850 LearningRate 0.0223 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:48:31,006-Speed 5430.42 samples/sec Loss 39.2002 LearningRate 0.0226 Epoch:2 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:48:39,314-Speed 5483.07 samples/sec Loss 39.4691 LearningRate 0.0405 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:48:46,840-Speed 5443.01 samples/sec Loss 39.4508 LearningRate 0.0408 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:48:54,382-Speed 5431.75 samples/sec Loss 39.4505 LearningRate 0.0411 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:49:01,895-Speed 5453.39 samples/sec Loss 39.4577 LearningRate 0.0414 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:49:09,515-Speed 5375.74 samples/sec Loss 39.4651 LearningRate 0.0417 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:49:16,965-Speed 5498.71 samples/sec Loss 39.4912 LearningRate 0.0420 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:49:24,494-Speed 5441.04 samples/sec Loss 39.4540 LearningRate 0.0422 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:49:31,997-Speed 5459.90 samples/sec Loss 39.4867 LearningRate 0.0425 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:49:39,539-Speed 5431.80 samples/sec Loss 39.4755 LearningRate 0.0428 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:49:45,580-Speed 5559.86 samples/sec Loss 39.2483 LearningRate 0.0255 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:49:53,074-Speed 5469.29 samples/sec Loss 39.2726 LearningRate 0.0257 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:50:00,468-Speed 5540.56 samples/sec Loss 39.2569 LearningRate 0.0260 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:50:02,074-Speed 5436.83 samples/sec Loss 39.5017 LearningRate 0.0437 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:50:09,600-Speed 5448.27 samples/sec Loss 39.4714 LearningRate 0.0440 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:50:15,404-Speed 5526.08 samples/sec Loss 39.2703 LearningRate 0.0266 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:50:22,679-Speed 5633.52 samples/sec Loss 39.2809 LearningRate 0.0269 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:50:29,947-Speed 5636.98 samples/sec Loss 39.2958 LearningRate 0.0272 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:50:32,363-Speed 5266.49 samples/sec Loss 39.4985 LearningRate 0.0448 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:50:39,954-Speed 5397.81 samples/sec Loss 39.4815 LearningRate 0.0451 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:50:47,411-Speed 5493.68 samples/sec Loss 39.5012 LearningRate 0.0454 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:50:54,880-Speed 5484.84 samples/sec Loss 39.5174 LearningRate 0.0457 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:51:02,354-Speed 5481.27 samples/sec Loss 39.4911 LearningRate 0.0460 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:51:09,845-Speed 5468.51 samples/sec Loss 39.4959 LearningRate 0.0463 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:51:17,292-Speed 5501.29 samples/sec Loss 39.4971 LearningRate 0.0466 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:51:24,736-Speed 5502.73 samples/sec Loss 39.4911 LearningRate 0.0469 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:51:32,243-Speed 5457.25 samples/sec Loss 39.4939 LearningRate 0.0472 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:51:39,828-Speed 5401.39 samples/sec Loss 39.4735 LearningRate 0.0474 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:51:44,495-Speed 5509.43 samples/sec Loss 39.3612 LearningRate 0.0301 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:51:51,808-Speed 5607.29 samples/sec Loss 39.3666 LearningRate 0.0304 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:51:59,191-Speed 5548.64 samples/sec Loss 39.4078 LearningRate 0.0307 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:52:02,284-Speed 5506.25 samples/sec Loss 39.4541 LearningRate 0.0483 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:52:09,772-Speed 5473.73 samples/sec Loss 39.4923 LearningRate 0.0486 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:52:13,984-Speed 5592.76 samples/sec Loss 39.4264 LearningRate 0.0312 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:52:21,544-Speed 5428.66 samples/sec Loss 39.4239 LearningRate 0.0315 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:52:28,918-Speed 5557.64 samples/sec Loss 39.4139 LearningRate 0.0318 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:52:32,165-Speed 5473.31 samples/sec Loss 39.4546 LearningRate 0.0495 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:52:39,659-Speed 5468.06 samples/sec Loss 39.4632 LearningRate 0.0498 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:52:43,782-Speed 5527.94 samples/sec Loss 39.4360 LearningRate 0.0324 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 18:52:51,273-Speed 5472.03 samples/sec Loss 39.4344 LearningRate 0.0327 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:52:58,753-Speed 5476.95 samples/sec Loss 39.4446 LearningRate 0.0330 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:53:02,048-Speed 5502.50 samples/sec Loss 39.4940 LearningRate 0.0506 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:53:09,508-Speed 5494.48 samples/sec Loss 39.4427 LearningRate 0.0509 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 18:53:13,785-Speed 5387.15 samples/sec Loss 39.4590 LearningRate 0.0336 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:53:21,155-Speed 5560.95 samples/sec Loss 39.4655 LearningRate 0.0339 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:53:28,563-Speed 5529.94 samples/sec Loss 39.4995 LearningRate 0.0341 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:53:35,996-Speed 5511.14 samples/sec Loss 39.4858 LearningRate 0.0344 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:53:39,501-Speed 5493.68 samples/sec Loss 39.4168 LearningRate 0.0521 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:53:47,000-Speed 5463.40 samples/sec Loss 39.4033 LearningRate 0.0524 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:53:54,441-Speed 5505.02 samples/sec Loss 39.4054 LearningRate 0.0527 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:54:01,906-Speed 5488.21 samples/sec Loss 39.3479 LearningRate 0.0529 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:54:09,343-Speed 5508.18 samples/sec Loss 39.3927 LearningRate 0.0532 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:54:13,212-Speed 5514.77 samples/sec Loss 39.5079 LearningRate 0.0359 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:54:20,628-Speed 5526.24 samples/sec Loss 39.5323 LearningRate 0.0362 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:54:28,080-Speed 5497.58 samples/sec Loss 39.5191 LearningRate 0.0365 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:54:31,748-Speed 5502.77 samples/sec Loss 39.3351 LearningRate 0.0541 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:54:39,189-Speed 5507.69 samples/sec Loss 39.3343 LearningRate 0.0544 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:54:42,724-Speed 5572.05 samples/sec Loss 39.5339 LearningRate 0.0370 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:54:50,043-Speed 5599.64 samples/sec Loss 39.5414 LearningRate 0.0373 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:54:57,481-Speed 5508.26 samples/sec Loss 39.5520 LearningRate 0.0376 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:55:01,710-Speed 5463.32 samples/sec Loss 39.2786 LearningRate 0.0553 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:55:09,173-Speed 5491.74 samples/sec Loss 39.2420 LearningRate 0.0556 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:55:12,156-Speed 5587.76 samples/sec Loss 39.5262 LearningRate 0.0382 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:55:19,559-Speed 5538.23 samples/sec Loss 39.5398 LearningRate 0.0385 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:55:26,930-Speed 5557.66 samples/sec Loss 39.5331 LearningRate 0.0388 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:55:31,479-Speed 5497.32 samples/sec Loss 39.1896 LearningRate 0.0564 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:55:38,912-Speed 5521.93 samples/sec Loss 39.1933 LearningRate 0.0567 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:55:41,810-Speed 5475.10 samples/sec Loss 39.5673 LearningRate 0.0393 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:55:49,938-Speed 5042.59 samples/sec Loss 39.5428 LearningRate 0.0396 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:55:58,007-Speed 5077.20 samples/sec Loss 39.5418 LearningRate 0.0399 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 18:56:06,002-Speed 5124.11 samples/sec Loss 39.5541 LearningRate 0.0402 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 131072 Required: 43 hours rs Training: 2022-01-07 18:56:14,077-Speed 5073.76 samples/sec Loss 39.5674 LearningRate 0.0405 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:56:22,072-Speed 5125.88 samples/sec Loss 39.5585 LearningRate 0.0408 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:56:30,071-Speed 5121.24 samples/sec Loss 39.5631 LearningRate 0.0411 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:56:38,180-Speed 5051.72 samples/sec Loss 39.5640 LearningRate 0.0414 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:56:46,397-Speed 4986.23 samples/sec Loss 39.5618 LearningRate 0.0417 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:56:54,672-Speed 4950.82 samples/sec Loss 39.5726 LearningRate 0.0420 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:57:02,694-Speed 5106.31 samples/sec Loss 39.5725 LearningRate 0.0422 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:57:10,719-Speed 5105.00 samples/sec Loss 39.5873 LearningRate 0.0425 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:57:18,721-Speed 5119.05 samples/sec Loss 39.5725 LearningRate 0.0428 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:57:26,706-Speed 5130.66 samples/sec Loss 39.5576 LearningRate 0.0431 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:57:34,675-Speed 5140.35 samples/sec Loss 39.5729 LearningRate 0.0434 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:57:42,687-Speed 5113.73 samples/sec Loss 39.5956 LearningRate 0.0437 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:57:50,721-Speed 5099.38 samples/sec Loss 39.5641 LearningRate 0.0440 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:57:58,728-Speed 5116.38 samples/sec Loss 39.5526 LearningRate 0.0443 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:58:06,886-Speed 5021.98 samples/sec Loss 39.5827 LearningRate 0.0446 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:58:14,910-Speed 5106.56 samples/sec Loss 39.5695 LearningRate 0.0448 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:58:23,145-Speed 4975.02 samples/sec Loss 39.5463 LearningRate 0.0451 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:58:31,579-Speed 4857.21 samples/sec Loss 39.5315 LearningRate 0.0454 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:58:39,590-Speed 5113.74 samples/sec Loss 39.5589 LearningRate 0.0457 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:58:47,272-Speed 5526.44 samples/sec Loss 39.0886 LearningRate 0.0584 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-01-07 18:58:54,714-Speed 5507.43 samples/sec Loss 39.0395 LearningRate 0.0587 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-01-07 18:59:02,109-Speed 5540.12 samples/sec Loss 39.0065 LearningRate 0.0590 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-01-07 18:59:09,513-Speed 5533.45 samples/sec Loss 38.9746 LearningRate 0.0593 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-01-07 18:59:11,868-Speed 5079.67 samples/sec Loss 39.6042 LearningRate 0.0469 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 18:59:16,916-Speed 5534.65 samples/sec Loss 38.9635 LearningRate 0.0596 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-01-07 18:59:24,311-Speed 5540.22 samples/sec Loss 38.9349 LearningRate 0.0599 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-01-07 18:59:31,757-Speed 5502.44 samples/sec Loss 38.9255 LearningRate 0.0602 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-01-07 18:59:39,188-Speed 5512.93 samples/sec Loss 38.8944 LearningRate 0.0605 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-01-07 18:59:46,593-Speed 5532.63 samples/sec Loss 38.8868 LearningRate 0.0608 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-01-07 18:59:54,000-Speed 5531.40 samples/sec Loss 38.8464 LearningRate 0.0610 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-01-07 19:00:01,407-Speed 5531.46 samples/sec Loss 38.8233 LearningRate 0.0613 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-01-07 19:00:08,816-Speed 5529.28 samples/sec Loss 38.8056 LearningRate 0.0616 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 47 hours Training: 2022-01-07 19:00:16,255-Speed 5507.87 samples/sec Loss 38.7604 LearningRate 0.0619 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:00:23,748-Speed 5467.40 samples/sec Loss 38.7654 LearningRate 0.0622 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:00:31,042-Speed 5617.18 samples/sec Loss 38.7448 LearningRate 0.0625 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:00:38,242-Speed 5690.54 samples/sec Loss 38.7204 LearningRate 0.0628 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:00:45,364-Speed 5752.83 samples/sec Loss 38.6850 LearningRate 0.0631 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:00:52,481-Speed 5756.39 samples/sec Loss 38.6535 LearningRate 0.0634 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:00:59,840-Speed 5567.14 samples/sec Loss 38.5824 LearningRate 0.0637 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:01:07,254-Speed 5526.24 samples/sec Loss 38.5746 LearningRate 0.0639 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:01:14,668-Speed 5525.70 samples/sec Loss 38.5995 LearningRate 0.0642 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:01:22,061-Speed 5541.77 samples/sec Loss 38.5608 LearningRate 0.0645 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:01:29,482-Speed 5520.74 samples/sec Loss 38.5198 LearningRate 0.0648 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:01:36,859-Speed 5553.80 samples/sec Loss 38.4989 LearningRate 0.0651 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:01:44,102-Speed 5656.58 samples/sec Loss 38.4296 LearningRate 0.0654 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:01:51,567-Speed 5488.55 samples/sec Loss 38.4549 LearningRate 0.0657 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:01:58,965-Speed 5537.68 samples/sec Loss 38.4344 LearningRate 0.0660 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:02:06,254-Speed 5620.31 samples/sec Loss 38.3981 LearningRate 0.0663 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:02:13,682-Speed 5516.07 samples/sec Loss 38.3636 LearningRate 0.0665 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:02:21,055-Speed 5556.83 samples/sec Loss 38.3443 LearningRate 0.0668 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:02:28,392-Speed 5584.11 samples/sec Loss 38.3023 LearningRate 0.0671 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:02:35,759-Speed 5560.87 samples/sec Loss 38.2830 LearningRate 0.0674 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:02:43,148-Speed 5544.14 samples/sec Loss 38.2760 LearningRate 0.0677 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:02:50,576-Speed 5515.95 samples/sec Loss 38.2683 LearningRate 0.0680 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:02:58,019-Speed 5505.20 samples/sec Loss 38.2192 LearningRate 0.0683 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:03:05,411-Speed 5542.19 samples/sec Loss 38.1707 LearningRate 0.0686 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:03:12,855-Speed 5503.93 samples/sec Loss 38.1459 LearningRate 0.0689 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:03:20,307-Speed 5497.33 samples/sec Loss 38.0994 LearningRate 0.0691 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:03:27,358-Speed 5811.01 samples/sec Loss 38.1053 LearningRate 0.0694 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:03:34,710-Speed 5572.06 samples/sec Loss 38.0341 LearningRate 0.0697 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:03:42,282-Speed 5410.20 samples/sec Loss 38.0479 LearningRate 0.0700 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:03:49,671-Speed 5546.11 samples/sec Loss 38.0104 LearningRate 0.0703 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:03:57,116-Speed 5502.83 samples/sec Loss 37.9638 LearningRate 0.0706 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:04:04,675-Speed 5419.68 samples/sec Loss 37.9485 LearningRate 0.0709 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:04:12,111-Speed 5509.68 samples/sec Loss 37.9332 LearningRate 0.0712 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:04:19,668-Speed 5421.04 samples/sec Loss 37.8870 LearningRate 0.0715 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:04:27,168-Speed 5462.02 samples/sec Loss 37.8213 LearningRate 0.0718 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:04:34,606-Speed 5508.40 samples/sec Loss 37.8112 LearningRate 0.0720 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:04:42,007-Speed 5535.20 samples/sec Loss 37.7860 LearningRate 0.0723 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:04:49,453-Speed 5501.29 samples/sec Loss 37.7495 LearningRate 0.0726 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:04:56,891-Speed 5508.10 samples/sec Loss 37.7184 LearningRate 0.0729 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:05:04,400-Speed 5455.68 samples/sec Loss 37.6822 LearningRate 0.0732 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:05:11,888-Speed 5470.80 samples/sec Loss 37.6603 LearningRate 0.0735 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:05:19,383-Speed 5465.92 samples/sec Loss 37.6211 LearningRate 0.0738 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:05:26,824-Speed 5505.38 samples/sec Loss 37.5768 LearningRate 0.0741 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:05:34,343-Speed 5449.15 samples/sec Loss 37.5733 LearningRate 0.0744 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:05:41,946-Speed 5387.91 samples/sec Loss 37.5189 LearningRate 0.0746 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:05:49,575-Speed 5369.94 samples/sec Loss 37.4435 LearningRate 0.0749 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:05:57,133-Speed 5420.42 samples/sec Loss 37.4136 LearningRate 0.0752 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:06:04,637-Speed 5459.53 samples/sec Loss 37.3582 LearningRate 0.0755 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:06:12,048-Speed 5527.80 samples/sec Loss 37.3890 LearningRate 0.0758 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:06:19,489-Speed 5505.76 samples/sec Loss 37.3441 LearningRate 0.0761 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:06:26,896-Speed 5531.04 samples/sec Loss 37.2965 LearningRate 0.0764 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:06:34,350-Speed 5495.62 samples/sec Loss 37.2542 LearningRate 0.0767 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:06:41,770-Speed 5521.03 samples/sec Loss 37.1895 LearningRate 0.0770 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:06:49,200-Speed 5513.27 samples/sec Loss 37.1949 LearningRate 0.0772 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:06:56,648-Speed 5500.81 samples/sec Loss 37.1469 LearningRate 0.0775 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:07:04,084-Speed 5509.21 samples/sec Loss 37.1170 LearningRate 0.0778 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:07:11,576-Speed 5467.77 samples/sec Loss 37.1162 LearningRate 0.0781 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:07:19,084-Speed 5456.59 samples/sec Loss 37.0394 LearningRate 0.0784 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:07:26,624-Speed 5433.14 samples/sec Loss 36.9839 LearningRate 0.0787 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:07:34,102-Speed 5478.61 samples/sec Loss 36.9716 LearningRate 0.0790 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:07:41,562-Speed 5490.65 samples/sec Loss 36.8934 LearningRate 0.0793 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:07:49,011-Speed 5500.03 samples/sec Loss 36.8685 LearningRate 0.0796 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:07:56,470-Speed 5492.65 samples/sec Loss 36.8238 LearningRate 0.0799 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:08:04,032-Speed 5416.97 samples/sec Loss 36.8370 LearningRate 0.0801 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:08:11,613-Speed 5404.25 samples/sec Loss 36.7488 LearningRate 0.0804 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:08:19,072-Speed 5491.93 samples/sec Loss 36.7479 LearningRate 0.0807 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:08:26,596-Speed 5444.62 samples/sec Loss 36.6917 LearningRate 0.0810 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:08:34,058-Speed 5490.32 samples/sec Loss 36.6001 LearningRate 0.0813 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:08:41,491-Speed 5510.82 samples/sec Loss 36.6034 LearningRate 0.0816 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:08:48,924-Speed 5511.61 samples/sec Loss 36.5371 LearningRate 0.0819 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:08:56,368-Speed 5503.80 samples/sec Loss 36.5431 LearningRate 0.0822 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:09:03,813-Speed 5502.40 samples/sec Loss 36.5060 LearningRate 0.0825 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:09:11,277-Speed 5487.80 samples/sec Loss 36.4598 LearningRate 0.0827 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:09:18,713-Speed 5509.56 samples/sec Loss 36.3733 LearningRate 0.0830 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:09:26,142-Speed 5514.26 samples/sec Loss 36.3937 LearningRate 0.0833 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:09:33,579-Speed 5508.35 samples/sec Loss 36.3008 LearningRate 0.0836 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:09:41,027-Speed 5500.03 samples/sec Loss 36.2913 LearningRate 0.0839 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:09:48,682-Speed 5352.09 samples/sec Loss 36.2412 LearningRate 0.0842 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:09:56,151-Speed 5484.51 samples/sec Loss 36.2182 LearningRate 0.0845 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:10:03,621-Speed 5484.03 samples/sec Loss 36.1121 LearningRate 0.0848 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:10:11,114-Speed 5467.37 samples/sec Loss 36.1283 LearningRate 0.0851 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:10:18,598-Speed 5473.95 samples/sec Loss 36.0817 LearningRate 0.0854 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:10:26,075-Speed 5479.08 samples/sec Loss 36.0565 LearningRate 0.0856 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:10:33,513-Speed 5507.23 samples/sec Loss 35.9862 LearningRate 0.0859 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:10:40,962-Speed 5499.84 samples/sec Loss 35.9798 LearningRate 0.0862 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:10:48,406-Speed 5503.45 samples/sec Loss 35.9151 LearningRate 0.0865 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:10:55,864-Speed 5493.26 samples/sec Loss 35.8793 LearningRate 0.0868 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:11:03,309-Speed 5502.21 samples/sec Loss 35.8429 LearningRate 0.0871 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:11:10,747-Speed 5507.89 samples/sec Loss 35.7457 LearningRate 0.0874 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:11:18,207-Speed 5491.48 samples/sec Loss 35.6966 LearningRate 0.0877 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:11:25,861-Speed 5351.85 samples/sec Loss 35.6121 LearningRate 0.0880 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:11:33,525-Speed 5345.35 samples/sec Loss 35.6466 LearningRate 0.0882 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:11:40,987-Speed 5490.67 samples/sec Loss 35.6335 LearningRate 0.0885 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:11:48,418-Speed 5512.76 samples/sec Loss 35.5558 LearningRate 0.0888 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:11:55,860-Speed 5504.62 samples/sec Loss 35.4720 LearningRate 0.0891 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:12:03,332-Speed 5482.63 samples/sec Loss 35.4817 LearningRate 0.0894 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:12:10,869-Speed 5435.32 samples/sec Loss 35.4046 LearningRate 0.0897 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:12:18,427-Speed 5420.01 samples/sec Loss 35.3655 LearningRate 0.0900 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:12:25,842-Speed 5524.89 samples/sec Loss 35.3294 LearningRate 0.0903 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:12:33,271-Speed 5514.59 samples/sec Loss 35.2389 LearningRate 0.0906 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:12:40,763-Speed 5468.32 samples/sec Loss 35.2263 LearningRate 0.0908 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:12:48,220-Speed 5493.44 samples/sec Loss 35.1906 LearningRate 0.0911 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:12:55,669-Speed 5499.35 samples/sec Loss 35.1434 LearningRate 0.0914 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:13:03,168-Speed 5462.80 samples/sec Loss 35.1307 LearningRate 0.0917 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:13:10,617-Speed 5499.84 samples/sec Loss 35.0310 LearningRate 0.0920 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:13:18,060-Speed 5503.68 samples/sec Loss 35.0358 LearningRate 0.0923 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:13:25,494-Speed 5511.27 samples/sec Loss 34.9901 LearningRate 0.0926 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:13:33,063-Speed 5412.11 samples/sec Loss 34.9040 LearningRate 0.0929 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:13:40,564-Speed 5461.82 samples/sec Loss 34.8330 LearningRate 0.0932 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:13:48,010-Speed 5501.23 samples/sec Loss 34.8000 LearningRate 0.0935 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:13:55,462-Speed 5497.22 samples/sec Loss 34.7616 LearningRate 0.0937 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:14:02,955-Speed 5467.33 samples/sec Loss 34.7479 LearningRate 0.0940 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:14:10,434-Speed 5477.64 samples/sec Loss 34.7198 LearningRate 0.0943 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:14:17,884-Speed 5498.93 samples/sec Loss 34.6152 LearningRate 0.0946 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:14:25,310-Speed 5516.44 samples/sec Loss 34.6095 LearningRate 0.0949 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:14:32,818-Speed 5456.11 samples/sec Loss 34.5498 LearningRate 0.0952 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:14:40,228-Speed 5529.12 samples/sec Loss 34.5093 LearningRate 0.0955 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:14:47,654-Speed 5516.61 samples/sec Loss 34.4279 LearningRate 0.0958 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:14:55,128-Speed 5480.54 samples/sec Loss 34.4010 LearningRate 0.0961 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:15:02,575-Speed 5501.42 samples/sec Loss 34.3092 LearningRate 0.0963 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:15:10,074-Speed 5463.15 samples/sec Loss 34.2828 LearningRate 0.0966 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:15:17,491-Speed 5523.00 samples/sec Loss 34.2441 LearningRate 0.0969 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:15:24,950-Speed 5491.83 samples/sec Loss 34.2252 LearningRate 0.0972 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:15:32,393-Speed 5504.20 samples/sec Loss 34.0922 LearningRate 0.0975 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:15:39,808-Speed 5524.98 samples/sec Loss 34.1155 LearningRate 0.0978 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:15:47,240-Speed 5512.04 samples/sec Loss 34.0180 LearningRate 0.0981 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:15:54,694-Speed 5495.31 samples/sec Loss 34.0093 LearningRate 0.0984 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:16:02,122-Speed 5515.13 samples/sec Loss 33.9363 LearningRate 0.0987 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:16:09,619-Speed 5464.61 samples/sec Loss 33.8485 LearningRate 0.0989 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:16:17,118-Speed 5463.26 samples/sec Loss 33.8903 LearningRate 0.0992 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:16:24,565-Speed 5500.55 samples/sec Loss 33.7892 LearningRate 0.0995 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:16:32,090-Speed 5444.43 samples/sec Loss 33.7509 LearningRate 0.0998 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:16:39,624-Speed 5437.62 samples/sec Loss 33.7078 LearningRate 0.1001 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:16:47,079-Speed 5495.38 samples/sec Loss 33.5981 LearningRate 0.1004 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:16:54,564-Speed 5472.44 samples/sec Loss 33.6528 LearningRate 0.1007 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:17:01,982-Speed 5522.76 samples/sec Loss 33.5801 LearningRate 0.1010 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:17:09,417-Speed 5509.46 samples/sec Loss 33.5091 LearningRate 0.1013 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:17:16,912-Speed 5466.34 samples/sec Loss 33.4475 LearningRate 0.1016 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:17:24,359-Speed 5501.06 samples/sec Loss 33.3904 LearningRate 0.1018 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:17:31,845-Speed 5472.21 samples/sec Loss 33.3961 LearningRate 0.1021 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:17:39,334-Speed 5470.17 samples/sec Loss 33.3168 LearningRate 0.1024 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:17:46,788-Speed 5496.10 samples/sec Loss 33.2825 LearningRate 0.1027 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:17:54,278-Speed 5469.09 samples/sec Loss 33.1769 LearningRate 0.1030 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:18:01,713-Speed 5510.12 samples/sec Loss 33.1615 LearningRate 0.1033 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:18:09,162-Speed 5499.77 samples/sec Loss 33.1202 LearningRate 0.1036 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:18:16,595-Speed 5511.70 samples/sec Loss 33.1008 LearningRate 0.1039 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:18:24,015-Speed 5520.54 samples/sec Loss 33.0179 LearningRate 0.1042 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:18:31,529-Speed 5451.86 samples/sec Loss 32.9805 LearningRate 0.1044 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:18:38,967-Speed 5508.12 samples/sec Loss 32.8369 LearningRate 0.1047 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:18:46,434-Speed 5486.54 samples/sec Loss 32.8570 LearningRate 0.1050 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:18:53,946-Speed 5453.36 samples/sec Loss 32.7292 LearningRate 0.1053 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:19:01,407-Speed 5490.06 samples/sec Loss 32.6648 LearningRate 0.1056 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:19:08,898-Speed 5469.14 samples/sec Loss 32.6007 LearningRate 0.1059 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:19:16,371-Speed 5482.03 samples/sec Loss 32.6322 LearningRate 0.1062 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:19:23,918-Speed 5428.32 samples/sec Loss 32.5113 LearningRate 0.1065 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:19:31,423-Speed 5458.40 samples/sec Loss 32.5448 LearningRate 0.1068 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:19:38,888-Speed 5487.71 samples/sec Loss 32.4174 LearningRate 0.1070 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:19:46,346-Speed 5493.36 samples/sec Loss 32.4315 LearningRate 0.1073 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:19:53,826-Speed 5476.25 samples/sec Loss 32.3280 LearningRate 0.1076 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:20:01,268-Speed 5504.63 samples/sec Loss 32.3105 LearningRate 0.1079 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:20:08,704-Speed 5509.49 samples/sec Loss 32.2432 LearningRate 0.1082 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:20:16,185-Speed 5475.74 samples/sec Loss 32.2148 LearningRate 0.1085 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:20:23,637-Speed 5497.36 samples/sec Loss 32.1930 LearningRate 0.1088 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:20:31,078-Speed 5506.15 samples/sec Loss 32.0994 LearningRate 0.1091 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:20:38,584-Speed 5456.96 samples/sec Loss 31.9849 LearningRate 0.1094 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:20:46,011-Speed 5516.46 samples/sec Loss 32.0466 LearningRate 0.1097 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:20:53,474-Speed 5489.11 samples/sec Loss 31.9820 LearningRate 0.1099 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 19:21:00,918-Speed 5503.70 samples/sec Loss 31.9008 LearningRate 0.1102 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:21:08,339-Speed 5519.47 samples/sec Loss 31.8188 LearningRate 0.1105 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:21:15,775-Speed 5509.80 samples/sec Loss 31.7723 LearningRate 0.1108 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:21:23,247-Speed 5482.67 samples/sec Loss 31.6902 LearningRate 0.1111 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:21:30,744-Speed 5464.18 samples/sec Loss 31.6909 LearningRate 0.1114 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:21:38,237-Speed 5467.44 samples/sec Loss 31.6343 LearningRate 0.1117 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:21:45,694-Speed 5493.64 samples/sec Loss 31.5717 LearningRate 0.1120 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:21:53,184-Speed 5469.14 samples/sec Loss 31.5208 LearningRate 0.1123 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:22:00,692-Speed 5456.52 samples/sec Loss 31.4977 LearningRate 0.1125 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:22:08,161-Speed 5485.38 samples/sec Loss 31.3727 LearningRate 0.1128 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:22:15,613-Speed 5497.08 samples/sec Loss 31.3758 LearningRate 0.1131 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 19:22:23,045-Speed 5512.22 samples/sec Loss 31.3269 LearningRate 0.1134 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:22:30,566-Speed 5446.70 samples/sec Loss 31.2029 LearningRate 0.1137 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:22:38,023-Speed 5493.86 samples/sec Loss 31.1777 LearningRate 0.1140 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:22:45,512-Speed 5470.27 samples/sec Loss 31.0566 LearningRate 0.1143 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:22:52,951-Speed 5507.32 samples/sec Loss 31.1066 LearningRate 0.1146 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:23:00,384-Speed 5512.65 samples/sec Loss 30.9683 LearningRate 0.1149 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:23:07,832-Speed 5499.91 samples/sec Loss 31.0086 LearningRate 0.1152 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:23:15,265-Speed 5511.23 samples/sec Loss 30.9280 LearningRate 0.1154 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:23:22,714-Speed 5500.25 samples/sec Loss 30.7091 LearningRate 0.1157 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:24:08,011-[lfw][4000]XNorm: 21.241078 Training: 2022-01-07 19:24:08,012-[lfw][4000]Accuracy-Flip: 0.98033+-0.00785 Training: 2022-01-07 19:24:08,013-[lfw][4000]Accuracy-Highest: 0.98033 Training: 2022-01-07 19:25:00,678-[cfp_fp][4000]XNorm: 18.989971 Training: 2022-01-07 19:25:00,679-[cfp_fp][4000]Accuracy-Flip: 0.88014+-0.01442 Training: 2022-01-07 19:25:00,680-[cfp_fp][4000]Accuracy-Highest: 0.88014 Training: 2022-01-07 19:25:46,348-[agedb_30][4000]XNorm: 20.819124 Training: 2022-01-07 19:25:46,349-[agedb_30][4000]Accuracy-Flip: 0.84617+-0.01346 Training: 2022-01-07 19:25:46,349-[agedb_30][4000]Accuracy-Highest: 0.84617 Training: 2022-01-07 19:25:53,839-Speed 271.04 samples/sec Loss 30.8200 LearningRate 0.1160 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:26:01,244-Speed 5533.24 samples/sec Loss 30.7546 LearningRate 0.1163 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:26:08,818-Speed 5409.12 samples/sec Loss 30.5659 LearningRate 0.1166 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:26:16,320-Speed 5460.91 samples/sec Loss 30.6485 LearningRate 0.1169 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:26:23,849-Speed 5441.48 samples/sec Loss 30.5623 LearningRate 0.1172 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:26:31,227-Speed 5553.28 samples/sec Loss 30.4052 LearningRate 0.1175 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:26:38,631-Speed 5532.84 samples/sec Loss 30.4505 LearningRate 0.1178 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:26:46,054-Speed 5519.22 samples/sec Loss 30.3762 LearningRate 0.1180 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:26:53,569-Speed 5451.84 samples/sec Loss 30.3233 LearningRate 0.1183 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:27:00,985-Speed 5524.72 samples/sec Loss 30.3058 LearningRate 0.1186 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:27:08,249-Speed 5641.32 samples/sec Loss 30.1089 LearningRate 0.1189 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:27:15,540-Speed 5619.35 samples/sec Loss 30.1502 LearningRate 0.1192 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:27:22,482-Speed 5901.51 samples/sec Loss 30.1601 LearningRate 0.1195 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:27:29,841-Speed 5566.48 samples/sec Loss 30.0609 LearningRate 0.1198 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:27:37,335-Speed 5466.84 samples/sec Loss 29.9687 LearningRate 0.1201 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:27:44,850-Speed 5452.25 samples/sec Loss 29.9292 LearningRate 0.1204 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:27:52,255-Speed 5532.43 samples/sec Loss 29.8077 LearningRate 0.1206 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:27:59,743-Speed 5471.68 samples/sec Loss 29.7437 LearningRate 0.1209 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:28:07,408-Speed 5344.90 samples/sec Loss 29.7582 LearningRate 0.1212 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:28:14,798-Speed 5544.10 samples/sec Loss 29.6465 LearningRate 0.1215 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:28:22,077-Speed 5627.96 samples/sec Loss 29.7081 LearningRate 0.1218 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:28:29,469-Speed 5542.53 samples/sec Loss 29.6010 LearningRate 0.1221 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:28:36,851-Speed 5550.32 samples/sec Loss 29.4316 LearningRate 0.1224 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:28:44,219-Speed 5560.86 samples/sec Loss 29.4878 LearningRate 0.1227 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:28:51,598-Speed 5551.92 samples/sec Loss 29.4699 LearningRate 0.1230 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:28:59,300-Speed 5318.55 samples/sec Loss 29.3212 LearningRate 0.1233 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:29:06,724-Speed 5519.52 samples/sec Loss 29.2851 LearningRate 0.1235 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:29:14,210-Speed 5473.14 samples/sec Loss 29.2386 LearningRate 0.1238 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:29:21,630-Speed 5521.33 samples/sec Loss 29.1283 LearningRate 0.1241 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:29:29,102-Speed 5482.72 samples/sec Loss 29.0931 LearningRate 0.1244 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:29:36,483-Speed 5551.36 samples/sec Loss 29.0149 LearningRate 0.1247 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:29:43,866-Speed 5550.07 samples/sec Loss 28.9880 LearningRate 0.1250 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:29:51,286-Speed 5521.54 samples/sec Loss 29.0315 LearningRate 0.1253 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:29:58,687-Speed 5535.97 samples/sec Loss 28.8204 LearningRate 0.1256 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:30:06,126-Speed 5506.95 samples/sec Loss 28.8564 LearningRate 0.1259 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:30:13,480-Speed 5571.57 samples/sec Loss 28.7699 LearningRate 0.1261 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:30:20,900-Speed 5521.68 samples/sec Loss 28.7069 LearningRate 0.1264 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:30:28,275-Speed 5554.79 samples/sec Loss 28.6134 LearningRate 0.1267 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:30:35,496-Speed 5673.77 samples/sec Loss 28.5316 LearningRate 0.1270 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:30:42,658-Speed 5720.42 samples/sec Loss 28.5494 LearningRate 0.1273 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:30:50,046-Speed 5545.52 samples/sec Loss 28.4340 LearningRate 0.1276 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:30:57,489-Speed 5504.53 samples/sec Loss 28.3495 LearningRate 0.1279 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:31:04,875-Speed 5547.06 samples/sec Loss 28.2765 LearningRate 0.1282 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:31:12,283-Speed 5530.57 samples/sec Loss 28.3250 LearningRate 0.1285 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:31:19,701-Speed 5522.67 samples/sec Loss 28.3031 LearningRate 0.1287 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:31:27,124-Speed 5519.19 samples/sec Loss 28.2639 LearningRate 0.1290 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:31:34,304-Speed 5705.68 samples/sec Loss 28.1209 LearningRate 0.1293 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:31:41,613-Speed 5605.45 samples/sec Loss 28.0575 LearningRate 0.1296 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:31:49,111-Speed 5464.67 samples/sec Loss 27.9526 LearningRate 0.1299 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:31:56,579-Speed 5487.40 samples/sec Loss 27.9264 LearningRate 0.1302 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:32:04,070-Speed 5469.12 samples/sec Loss 27.8773 LearningRate 0.1305 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:32:11,499-Speed 5515.52 samples/sec Loss 27.8037 LearningRate 0.1308 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:32:18,983-Speed 5474.95 samples/sec Loss 27.7692 LearningRate 0.1311 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:32:26,411-Speed 5517.41 samples/sec Loss 27.7005 LearningRate 0.1314 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:32:33,851-Speed 5506.45 samples/sec Loss 27.5878 LearningRate 0.1316 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:32:41,288-Speed 5508.62 samples/sec Loss 27.6981 LearningRate 0.1319 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:32:48,442-Speed 5726.58 samples/sec Loss 27.5588 LearningRate 0.1322 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:32:55,805-Speed 5564.56 samples/sec Loss 27.5906 LearningRate 0.1325 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:33:03,140-Speed 5585.75 samples/sec Loss 27.4110 LearningRate 0.1328 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:33:10,673-Speed 5438.48 samples/sec Loss 27.3393 LearningRate 0.1331 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:33:18,176-Speed 5460.72 samples/sec Loss 27.2801 LearningRate 0.1334 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:33:25,832-Speed 5351.51 samples/sec Loss 27.2408 LearningRate 0.1337 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:33:33,428-Speed 5393.18 samples/sec Loss 27.3004 LearningRate 0.1340 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:33:40,957-Speed 5441.70 samples/sec Loss 27.2225 LearningRate 0.1342 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:33:48,364-Speed 5531.45 samples/sec Loss 27.0071 LearningRate 0.1345 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:33:55,834-Speed 5484.33 samples/sec Loss 27.0637 LearningRate 0.1348 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:34:03,486-Speed 5353.48 samples/sec Loss 26.9365 LearningRate 0.1351 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:34:11,028-Speed 5432.13 samples/sec Loss 26.9100 LearningRate 0.1354 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:34:18,428-Speed 5536.45 samples/sec Loss 26.7849 LearningRate 0.1357 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:34:25,561-Speed 5744.20 samples/sec Loss 26.8040 LearningRate 0.1360 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:34:32,633-Speed 5793.46 samples/sec Loss 26.8127 LearningRate 0.1363 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:34:40,089-Speed 5494.35 samples/sec Loss 26.7107 LearningRate 0.1366 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:34:47,479-Speed 5543.85 samples/sec Loss 26.5709 LearningRate 0.1369 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:34:54,710-Speed 5665.89 samples/sec Loss 26.5512 LearningRate 0.1371 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:35:02,133-Speed 5519.06 samples/sec Loss 26.5511 LearningRate 0.1374 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:35:09,625-Speed 5468.53 samples/sec Loss 26.4141 LearningRate 0.1377 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:35:17,021-Speed 5538.88 samples/sec Loss 26.3589 LearningRate 0.1380 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:35:24,532-Speed 5455.00 samples/sec Loss 26.2682 LearningRate 0.1383 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:35:32,098-Speed 5414.98 samples/sec Loss 26.2487 LearningRate 0.1386 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:35:39,656-Speed 5420.30 samples/sec Loss 26.1727 LearningRate 0.1389 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:35:47,225-Speed 5412.61 samples/sec Loss 26.2012 LearningRate 0.1392 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:35:54,687-Speed 5490.69 samples/sec Loss 26.1295 LearningRate 0.1395 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:36:02,211-Speed 5445.69 samples/sec Loss 25.9380 LearningRate 0.1397 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:36:09,674-Speed 5489.20 samples/sec Loss 25.9406 LearningRate 0.1400 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:36:17,028-Speed 5571.45 samples/sec Loss 25.9077 LearningRate 0.1403 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:36:24,607-Speed 5405.74 samples/sec Loss 25.8182 LearningRate 0.1406 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:36:32,010-Speed 5533.58 samples/sec Loss 25.8517 LearningRate 0.1409 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:36:39,409-Speed 5537.17 samples/sec Loss 25.6565 LearningRate 0.1412 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:36:46,815-Speed 5533.09 samples/sec Loss 25.6534 LearningRate 0.1415 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:36:54,259-Speed 5503.49 samples/sec Loss 25.6398 LearningRate 0.1418 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:37:01,646-Speed 5546.72 samples/sec Loss 25.5083 LearningRate 0.1421 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:37:09,064-Speed 5522.84 samples/sec Loss 25.4993 LearningRate 0.1423 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:37:16,458-Speed 5540.99 samples/sec Loss 25.3682 LearningRate 0.1426 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:37:23,963-Speed 5459.06 samples/sec Loss 25.3217 LearningRate 0.1429 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:37:31,385-Speed 5520.12 samples/sec Loss 25.3297 LearningRate 0.1432 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:37:38,849-Speed 5489.03 samples/sec Loss 25.2991 LearningRate 0.1435 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:37:46,273-Speed 5517.62 samples/sec Loss 25.2393 LearningRate 0.1438 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:37:53,675-Speed 5535.76 samples/sec Loss 25.2393 LearningRate 0.1441 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:38:01,171-Speed 5465.85 samples/sec Loss 25.1180 LearningRate 0.1444 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:38:08,552-Speed 5550.03 samples/sec Loss 25.0682 LearningRate 0.1447 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:38:15,893-Speed 5580.86 samples/sec Loss 24.9828 LearningRate 0.1450 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:38:23,362-Speed 5485.44 samples/sec Loss 24.8543 LearningRate 0.1452 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:38:30,757-Speed 5540.19 samples/sec Loss 24.8137 LearningRate 0.1455 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:38:38,505-Speed 5287.88 samples/sec Loss 24.7575 LearningRate 0.1458 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:38:46,013-Speed 5456.35 samples/sec Loss 24.6868 LearningRate 0.1461 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:38:53,330-Speed 5598.75 samples/sec Loss 24.6542 LearningRate 0.1464 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:39:00,591-Speed 5642.72 samples/sec Loss 24.5479 LearningRate 0.1467 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:39:07,885-Speed 5618.63 samples/sec Loss 24.5193 LearningRate 0.1470 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:39:15,433-Speed 5427.36 samples/sec Loss 24.5077 LearningRate 0.1473 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:39:22,822-Speed 5544.02 samples/sec Loss 24.4262 LearningRate 0.1476 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:39:30,263-Speed 5506.49 samples/sec Loss 24.3795 LearningRate 0.1478 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:39:37,665-Speed 5534.55 samples/sec Loss 24.2670 LearningRate 0.1481 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:39:45,113-Speed 5501.61 samples/sec Loss 24.1709 LearningRate 0.1484 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:39:52,542-Speed 5514.73 samples/sec Loss 24.2716 LearningRate 0.1487 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:39:59,908-Speed 5562.20 samples/sec Loss 24.1719 LearningRate 0.1490 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:40:07,308-Speed 5536.33 samples/sec Loss 24.1014 LearningRate 0.1493 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:40:14,729-Speed 5520.84 samples/sec Loss 23.9848 LearningRate 0.1496 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:40:22,127-Speed 5537.75 samples/sec Loss 24.0386 LearningRate 0.1499 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:40:29,526-Speed 5537.25 samples/sec Loss 23.9327 LearningRate 0.1502 Epoch: 0 Global Step: 5190 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:40:37,007-Speed 5476.74 samples/sec Loss 23.9090 LearningRate 0.1504 Epoch: 0 Global Step: 5200 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:40:44,445-Speed 5508.20 samples/sec Loss 23.7552 LearningRate 0.1507 Epoch: 0 Global Step: 5210 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:40:51,863-Speed 5522.80 samples/sec Loss 23.7235 LearningRate 0.1510 Epoch: 0 Global Step: 5220 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:40:59,335-Speed 5483.25 samples/sec Loss 23.7381 LearningRate 0.1513 Epoch: 0 Global Step: 5230 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:41:06,939-Speed 5388.41 samples/sec Loss 23.5935 LearningRate 0.1516 Epoch: 0 Global Step: 5240 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:41:14,069-Speed 5745.48 samples/sec Loss 23.6460 LearningRate 0.1519 Epoch: 0 Global Step: 5250 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:41:21,057-Speed 5862.88 samples/sec Loss 23.5765 LearningRate 0.1522 Epoch: 0 Global Step: 5260 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:41:28,446-Speed 5544.82 samples/sec Loss 23.4556 LearningRate 0.1525 Epoch: 0 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:41:35,870-Speed 5519.23 samples/sec Loss 23.4387 LearningRate 0.1528 Epoch: 0 Global Step: 5280 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:41:43,312-Speed 5504.90 samples/sec Loss 23.3230 LearningRate 0.1531 Epoch: 0 Global Step: 5290 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:41:50,727-Speed 5525.47 samples/sec Loss 23.3808 LearningRate 0.1533 Epoch: 0 Global Step: 5300 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:41:58,153-Speed 5517.09 samples/sec Loss 23.3308 LearningRate 0.1536 Epoch: 0 Global Step: 5310 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:42:05,585-Speed 5512.53 samples/sec Loss 23.3077 LearningRate 0.1539 Epoch: 0 Global Step: 5320 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:42:13,065-Speed 5477.38 samples/sec Loss 23.1330 LearningRate 0.1542 Epoch: 0 Global Step: 5330 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:42:20,625-Speed 5419.06 samples/sec Loss 23.0965 LearningRate 0.1545 Epoch: 0 Global Step: 5340 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:42:27,921-Speed 5615.21 samples/sec Loss 23.1013 LearningRate 0.1548 Epoch: 0 Global Step: 5350 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:42:35,309-Speed 5545.43 samples/sec Loss 23.0804 LearningRate 0.1551 Epoch: 0 Global Step: 5360 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:42:42,806-Speed 5465.21 samples/sec Loss 22.8829 LearningRate 0.1554 Epoch: 0 Global Step: 5370 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:42:50,327-Speed 5446.26 samples/sec Loss 22.8602 LearningRate 0.1557 Epoch: 0 Global Step: 5380 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:42:57,860-Speed 5438.86 samples/sec Loss 22.8181 LearningRate 0.1559 Epoch: 0 Global Step: 5390 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:43:05,281-Speed 5520.73 samples/sec Loss 22.7181 LearningRate 0.1562 Epoch: 0 Global Step: 5400 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:43:12,745-Speed 5489.17 samples/sec Loss 22.6841 LearningRate 0.1565 Epoch: 0 Global Step: 5410 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:43:20,275-Speed 5440.74 samples/sec Loss 22.6541 LearningRate 0.1568 Epoch: 0 Global Step: 5420 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:43:27,848-Speed 5410.18 samples/sec Loss 22.6078 LearningRate 0.1571 Epoch: 0 Global Step: 5430 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:43:35,289-Speed 5505.38 samples/sec Loss 22.6401 LearningRate 0.1574 Epoch: 0 Global Step: 5440 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:43:42,701-Speed 5527.59 samples/sec Loss 22.6112 LearningRate 0.1577 Epoch: 0 Global Step: 5450 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:43:50,153-Speed 5497.48 samples/sec Loss 22.4965 LearningRate 0.1580 Epoch: 0 Global Step: 5460 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:43:57,587-Speed 5511.09 samples/sec Loss 22.4275 LearningRate 0.1583 Epoch: 0 Global Step: 5470 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:44:05,276-Speed 5327.69 samples/sec Loss 22.3580 LearningRate 0.1585 Epoch: 0 Global Step: 5480 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:44:12,761-Speed 5474.35 samples/sec Loss 22.3150 LearningRate 0.1588 Epoch: 0 Global Step: 5490 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:44:20,216-Speed 5494.80 samples/sec Loss 22.3175 LearningRate 0.1591 Epoch: 0 Global Step: 5500 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:44:27,717-Speed 5462.19 samples/sec Loss 22.2088 LearningRate 0.1594 Epoch: 0 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:44:35,185-Speed 5485.85 samples/sec Loss 22.2189 LearningRate 0.1597 Epoch: 0 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:44:42,593-Speed 5530.62 samples/sec Loss 22.1567 LearningRate 0.1600 Epoch: 0 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:44:50,026-Speed 5511.83 samples/sec Loss 22.0710 LearningRate 0.1603 Epoch: 0 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:44:57,471-Speed 5502.73 samples/sec Loss 21.9900 LearningRate 0.1606 Epoch: 0 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:45:05,117-Speed 5358.04 samples/sec Loss 21.9654 LearningRate 0.1609 Epoch: 0 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:45:12,773-Speed 5351.36 samples/sec Loss 21.9420 LearningRate 0.1612 Epoch: 0 Global Step: 5570 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:45:20,401-Speed 5371.07 samples/sec Loss 21.8924 LearningRate 0.1614 Epoch: 0 Global Step: 5580 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:45:28,145-Speed 5290.01 samples/sec Loss 21.8744 LearningRate 0.1617 Epoch: 0 Global Step: 5590 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:45:35,693-Speed 5427.32 samples/sec Loss 21.8290 LearningRate 0.1620 Epoch: 0 Global Step: 5600 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:45:43,234-Speed 5432.68 samples/sec Loss 21.7355 LearningRate 0.1623 Epoch: 0 Global Step: 5610 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:45:50,788-Speed 5423.82 samples/sec Loss 21.6304 LearningRate 0.1626 Epoch: 0 Global Step: 5620 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:45:58,302-Speed 5452.49 samples/sec Loss 21.6079 LearningRate 0.1629 Epoch: 0 Global Step: 5630 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:46:05,884-Speed 5403.08 samples/sec Loss 21.5606 LearningRate 0.1632 Epoch: 0 Global Step: 5640 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:46:13,535-Speed 5355.15 samples/sec Loss 21.5759 LearningRate 0.1635 Epoch: 0 Global Step: 5650 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:46:21,072-Speed 5435.87 samples/sec Loss 21.4573 LearningRate 0.1638 Epoch: 0 Global Step: 5660 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:46:28,572-Speed 5462.84 samples/sec Loss 21.5045 LearningRate 0.1640 Epoch: 0 Global Step: 5670 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:46:36,083-Speed 5454.32 samples/sec Loss 21.4279 LearningRate 0.1643 Epoch: 0 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:46:43,599-Speed 5452.77 samples/sec Loss 21.3697 LearningRate 0.1646 Epoch: 0 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:46:51,095-Speed 5465.96 samples/sec Loss 21.1908 LearningRate 0.1649 Epoch: 0 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:46:58,451-Speed 5570.01 samples/sec Loss 21.1978 LearningRate 0.1652 Epoch: 0 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:47:05,924-Speed 5482.41 samples/sec Loss 21.2516 LearningRate 0.1655 Epoch: 0 Global Step: 5720 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:47:13,497-Speed 5410.23 samples/sec Loss 21.2353 LearningRate 0.1658 Epoch: 0 Global Step: 5730 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:47:21,089-Speed 5395.58 samples/sec Loss 21.0360 LearningRate 0.1661 Epoch: 0 Global Step: 5740 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:47:29,014-Speed 5169.55 samples/sec Loss 21.0284 LearningRate 0.1664 Epoch: 0 Global Step: 5750 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:47:36,573-Speed 5420.33 samples/sec Loss 21.0027 LearningRate 0.1667 Epoch: 0 Global Step: 5760 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:47:44,130-Speed 5421.73 samples/sec Loss 21.0548 LearningRate 0.1669 Epoch: 0 Global Step: 5770 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:47:51,708-Speed 5405.96 samples/sec Loss 20.9004 LearningRate 0.1672 Epoch: 0 Global Step: 5780 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:47:59,261-Speed 5424.70 samples/sec Loss 20.8589 LearningRate 0.1675 Epoch: 0 Global Step: 5790 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:48:06,858-Speed 5393.32 samples/sec Loss 20.8797 LearningRate 0.1678 Epoch: 0 Global Step: 5800 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:48:14,484-Speed 5372.62 samples/sec Loss 20.8691 LearningRate 0.1681 Epoch: 0 Global Step: 5810 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:48:22,161-Speed 5336.18 samples/sec Loss 20.7739 LearningRate 0.1684 Epoch: 0 Global Step: 5820 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:48:29,850-Speed 5328.24 samples/sec Loss 20.9255 LearningRate 0.1687 Epoch: 0 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:48:37,446-Speed 5393.18 samples/sec Loss 20.7084 LearningRate 0.1690 Epoch: 0 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:48:44,956-Speed 5455.30 samples/sec Loss 20.6774 LearningRate 0.1693 Epoch: 0 Global Step: 5850 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:48:52,435-Speed 5477.85 samples/sec Loss 20.6532 LearningRate 0.1695 Epoch: 0 Global Step: 5860 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:48:59,884-Speed 5498.96 samples/sec Loss 20.6166 LearningRate 0.1698 Epoch: 0 Global Step: 5870 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:49:07,386-Speed 5460.25 samples/sec Loss 20.4948 LearningRate 0.1701 Epoch: 0 Global Step: 5880 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:49:14,872-Speed 5473.09 samples/sec Loss 20.5017 LearningRate 0.1704 Epoch: 0 Global Step: 5890 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:49:22,336-Speed 5488.52 samples/sec Loss 20.5409 LearningRate 0.1707 Epoch: 0 Global Step: 5900 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:49:29,790-Speed 5495.21 samples/sec Loss 20.3403 LearningRate 0.1710 Epoch: 0 Global Step: 5910 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:49:37,244-Speed 5495.28 samples/sec Loss 20.3292 LearningRate 0.1713 Epoch: 0 Global Step: 5920 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:49:44,805-Speed 5418.88 samples/sec Loss 20.2235 LearningRate 0.1716 Epoch: 0 Global Step: 5930 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:49:52,361-Speed 5421.42 samples/sec Loss 20.1557 LearningRate 0.1719 Epoch: 0 Global Step: 5940 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:49:59,827-Speed 5486.74 samples/sec Loss 20.2184 LearningRate 0.1721 Epoch: 0 Global Step: 5950 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:50:07,295-Speed 5485.10 samples/sec Loss 20.1251 LearningRate 0.1724 Epoch: 0 Global Step: 5960 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:50:14,744-Speed 5499.61 samples/sec Loss 20.0960 LearningRate 0.1727 Epoch: 0 Global Step: 5970 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:50:22,182-Speed 5507.92 samples/sec Loss 20.1102 LearningRate 0.1730 Epoch: 0 Global Step: 5980 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:50:29,649-Speed 5486.20 samples/sec Loss 20.0471 LearningRate 0.1733 Epoch: 0 Global Step: 5990 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 19:50:37,077-Speed 5514.49 samples/sec Loss 20.0268 LearningRate 0.1736 Epoch: 0 Global Step: 6000 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 19:51:22,551-[lfw][6000]XNorm: 24.317117 Training: 2022-01-07 19:51:22,552-[lfw][6000]Accuracy-Flip: 0.99367+-0.00407 Training: 2022-01-07 19:51:22,552-[lfw][6000]Accuracy-Highest: 0.99367 Training: 2022-01-07 19:52:15,361-[cfp_fp][6000]XNorm: 22.341087 Training: 2022-01-07 19:52:15,362-[cfp_fp][6000]Accuracy-Flip: 0.94243+-0.00780 Training: 2022-01-07 19:52:15,363-[cfp_fp][6000]Accuracy-Highest: 0.94243 Training: 2022-01-07 19:53:00,944-[agedb_30][6000]XNorm: 23.721091 Training: 2022-01-07 19:53:00,945-[agedb_30][6000]Accuracy-Flip: 0.92217+-0.01402 Training: 2022-01-07 19:53:00,945-[agedb_30][6000]Accuracy-Highest: 0.92217 Training: 2022-01-07 19:53:08,512-Speed 270.48 samples/sec Loss 20.0161 LearningRate 0.1739 Epoch: 0 Global Step: 6010 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:53:16,052-Speed 5433.82 samples/sec Loss 19.9685 LearningRate 0.1742 Epoch: 0 Global Step: 6020 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:53:23,504-Speed 5497.55 samples/sec Loss 19.9108 LearningRate 0.1745 Epoch: 0 Global Step: 6030 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:53:31,027-Speed 5446.30 samples/sec Loss 19.8129 LearningRate 0.1748 Epoch: 0 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:53:38,469-Speed 5505.22 samples/sec Loss 19.8113 LearningRate 0.1750 Epoch: 0 Global Step: 6050 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:53:45,946-Speed 5479.55 samples/sec Loss 19.7808 LearningRate 0.1753 Epoch: 0 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:53:53,489-Speed 5430.89 samples/sec Loss 19.7104 LearningRate 0.1756 Epoch: 0 Global Step: 6070 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:54:01,007-Speed 5449.36 samples/sec Loss 19.6467 LearningRate 0.1759 Epoch: 0 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:54:08,500-Speed 5467.45 samples/sec Loss 19.7205 LearningRate 0.1762 Epoch: 0 Global Step: 6090 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:54:15,958-Speed 5493.27 samples/sec Loss 19.6130 LearningRate 0.1765 Epoch: 0 Global Step: 6100 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:54:23,420-Speed 5490.52 samples/sec Loss 19.5520 LearningRate 0.1768 Epoch: 0 Global Step: 6110 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:54:30,882-Speed 5490.34 samples/sec Loss 19.4433 LearningRate 0.1771 Epoch: 0 Global Step: 6120 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:54:38,075-Speed 5696.53 samples/sec Loss 19.4297 LearningRate 0.1774 Epoch: 0 Global Step: 6130 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:54:45,565-Speed 5469.56 samples/sec Loss 19.4453 LearningRate 0.1776 Epoch: 0 Global Step: 6140 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:54:53,112-Speed 5429.36 samples/sec Loss 19.4192 LearningRate 0.1779 Epoch: 0 Global Step: 6150 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:55:00,630-Speed 5448.98 samples/sec Loss 19.4534 LearningRate 0.1782 Epoch: 0 Global Step: 6160 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:55:08,063-Speed 5512.02 samples/sec Loss 19.3592 LearningRate 0.1785 Epoch: 0 Global Step: 6170 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:55:15,520-Speed 5493.98 samples/sec Loss 19.3500 LearningRate 0.1788 Epoch: 0 Global Step: 6180 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:55:23,046-Speed 5443.93 samples/sec Loss 19.3020 LearningRate 0.1791 Epoch: 0 Global Step: 6190 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:55:30,549-Speed 5460.38 samples/sec Loss 19.2346 LearningRate 0.1794 Epoch: 0 Global Step: 6200 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:55:38,119-Speed 5412.24 samples/sec Loss 19.1614 LearningRate 0.1797 Epoch: 0 Global Step: 6210 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:55:45,592-Speed 5481.86 samples/sec Loss 19.1560 LearningRate 0.1800 Epoch: 0 Global Step: 6220 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:55:53,006-Speed 5526.01 samples/sec Loss 19.1178 LearningRate 0.1802 Epoch: 0 Global Step: 6230 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:56:00,468-Speed 5491.14 samples/sec Loss 19.0824 LearningRate 0.1805 Epoch: 0 Global Step: 6240 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:56:07,912-Speed 5503.58 samples/sec Loss 19.1459 LearningRate 0.1808 Epoch: 0 Global Step: 6250 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:56:15,323-Speed 5528.24 samples/sec Loss 18.9952 LearningRate 0.1811 Epoch: 0 Global Step: 6260 Fp16 Grad Scale: 262144 Required: 46 hours Training: 2022-01-07 19:56:22,782-Speed 5492.51 samples/sec Loss 19.0514 LearningRate 0.1814 Epoch: 0 Global Step: 6270 Fp16 Grad Scale: 131072 Required: 46 hours Training: 2022-01-07 19:56:30,286-Speed 5459.46 samples/sec Loss 18.9426 LearningRate 0.1817 Epoch: 0 Global Step: 6280 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:56:37,774-Speed 5471.94 samples/sec Loss 18.9009 LearningRate 0.1820 Epoch: 0 Global Step: 6290 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:56:45,315-Speed 5432.68 samples/sec Loss 18.8152 LearningRate 0.1823 Epoch: 0 Global Step: 6300 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:56:52,953-Speed 5363.83 samples/sec Loss 18.8247 LearningRate 0.1826 Epoch: 0 Global Step: 6310 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:57:00,321-Speed 5560.76 samples/sec Loss 18.8242 LearningRate 0.1829 Epoch: 0 Global Step: 6320 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-01-07 19:57:07,811-Speed 5469.71 samples/sec Loss 18.8048 LearningRate 0.1831 Epoch: 0 Global Step: 6330 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:57:15,290-Speed 5477.33 samples/sec Loss 18.7400 LearningRate 0.1834 Epoch: 0 Global Step: 6340 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:57:22,969-Speed 5334.86 samples/sec Loss 18.6827 LearningRate 0.1837 Epoch: 0 Global Step: 6350 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:57:30,598-Speed 5370.14 samples/sec Loss 18.6426 LearningRate 0.1840 Epoch: 0 Global Step: 6360 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:57:38,031-Speed 5512.68 samples/sec Loss 18.6247 LearningRate 0.1843 Epoch: 0 Global Step: 6370 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 19:57:45,173-Speed 5736.11 samples/sec Loss 18.5976 LearningRate 0.1846 Epoch: 0 Global Step: 6380 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:57:52,617-Speed 5503.44 samples/sec Loss 18.4729 LearningRate 0.1849 Epoch: 0 Global Step: 6390 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:58:00,120-Speed 5460.47 samples/sec Loss 18.5630 LearningRate 0.1852 Epoch: 0 Global Step: 6400 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:58:07,565-Speed 5502.31 samples/sec Loss 18.3917 LearningRate 0.1855 Epoch: 0 Global Step: 6410 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:58:15,041-Speed 5480.32 samples/sec Loss 18.5088 LearningRate 0.1857 Epoch: 0 Global Step: 6420 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:58:22,538-Speed 5464.53 samples/sec Loss 18.3972 LearningRate 0.1860 Epoch: 0 Global Step: 6430 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:58:30,002-Speed 5488.76 samples/sec Loss 18.3619 LearningRate 0.1863 Epoch: 0 Global Step: 6440 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:58:37,479-Speed 5479.33 samples/sec Loss 18.4116 LearningRate 0.1866 Epoch: 0 Global Step: 6450 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:58:44,791-Speed 5603.31 samples/sec Loss 18.3470 LearningRate 0.1869 Epoch: 0 Global Step: 6460 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:58:51,949-Speed 5723.78 samples/sec Loss 18.3289 LearningRate 0.1872 Epoch: 0 Global Step: 6470 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:58:59,425-Speed 5479.90 samples/sec Loss 18.2687 LearningRate 0.1875 Epoch: 0 Global Step: 6480 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 19:59:06,848-Speed 5518.99 samples/sec Loss 18.2842 LearningRate 0.1878 Epoch: 0 Global Step: 6490 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:59:14,009-Speed 5721.27 samples/sec Loss 18.2267 LearningRate 0.1881 Epoch: 0 Global Step: 6500 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:59:21,414-Speed 5532.41 samples/sec Loss 18.0968 LearningRate 0.1883 Epoch: 0 Global Step: 6510 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:59:28,986-Speed 5410.99 samples/sec Loss 18.2099 LearningRate 0.1886 Epoch: 0 Global Step: 6520 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:59:36,525-Speed 5434.11 samples/sec Loss 18.0042 LearningRate 0.1889 Epoch: 0 Global Step: 6530 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:59:44,002-Speed 5479.72 samples/sec Loss 17.9923 LearningRate 0.1892 Epoch: 0 Global Step: 6540 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:59:51,450-Speed 5500.26 samples/sec Loss 18.0607 LearningRate 0.1895 Epoch: 0 Global Step: 6550 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 19:59:58,908-Speed 5493.45 samples/sec Loss 18.0829 LearningRate 0.1898 Epoch: 0 Global Step: 6560 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:00:06,397-Speed 5470.55 samples/sec Loss 17.9495 LearningRate 0.1901 Epoch: 0 Global Step: 6570 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:00:13,861-Speed 5489.01 samples/sec Loss 17.9123 LearningRate 0.1904 Epoch: 0 Global Step: 6580 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:00:21,419-Speed 5420.15 samples/sec Loss 17.8779 LearningRate 0.1907 Epoch: 0 Global Step: 6590 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:00:28,905-Speed 5472.57 samples/sec Loss 17.8210 LearningRate 0.1910 Epoch: 0 Global Step: 6600 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:00:36,428-Speed 5446.14 samples/sec Loss 17.8541 LearningRate 0.1912 Epoch: 0 Global Step: 6610 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:00:43,921-Speed 5467.86 samples/sec Loss 17.9578 LearningRate 0.1915 Epoch: 0 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:00:51,543-Speed 5375.32 samples/sec Loss 17.8066 LearningRate 0.1918 Epoch: 0 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:00:59,173-Speed 5368.87 samples/sec Loss 17.8634 LearningRate 0.1921 Epoch: 0 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:01:06,549-Speed 5554.44 samples/sec Loss 17.8018 LearningRate 0.1924 Epoch: 0 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:01:14,042-Speed 5468.07 samples/sec Loss 17.6944 LearningRate 0.1927 Epoch: 0 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:01:21,416-Speed 5557.06 samples/sec Loss 17.7702 LearningRate 0.1930 Epoch: 0 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:01:28,819-Speed 5533.69 samples/sec Loss 17.6261 LearningRate 0.1933 Epoch: 0 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:01:36,285-Speed 5487.61 samples/sec Loss 17.5719 LearningRate 0.1936 Epoch: 0 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:01:43,637-Speed 5572.87 samples/sec Loss 17.5273 LearningRate 0.1938 Epoch: 0 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:01:50,951-Speed 5600.96 samples/sec Loss 17.6439 LearningRate 0.1941 Epoch: 0 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:01:58,365-Speed 5525.59 samples/sec Loss 17.5789 LearningRate 0.1944 Epoch: 0 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:02:05,775-Speed 5528.71 samples/sec Loss 17.5115 LearningRate 0.1947 Epoch: 0 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:02:13,195-Speed 5521.24 samples/sec Loss 17.5467 LearningRate 0.1950 Epoch: 0 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:02:20,671-Speed 5480.59 samples/sec Loss 17.5204 LearningRate 0.1953 Epoch: 0 Global Step: 6750 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:02:28,080-Speed 5529.14 samples/sec Loss 17.5821 LearningRate 0.1956 Epoch: 0 Global Step: 6760 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:02:35,502-Speed 5520.55 samples/sec Loss 17.4434 LearningRate 0.1959 Epoch: 0 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:02:42,935-Speed 5511.43 samples/sec Loss 17.4371 LearningRate 0.1962 Epoch: 0 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:02:50,421-Speed 5473.16 samples/sec Loss 17.3750 LearningRate 0.1965 Epoch: 0 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:02:57,878-Speed 5493.33 samples/sec Loss 17.2812 LearningRate 0.1967 Epoch: 0 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:03:05,470-Speed 5396.57 samples/sec Loss 17.2558 LearningRate 0.1970 Epoch: 0 Global Step: 6810 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:03:12,876-Speed 5532.02 samples/sec Loss 17.3264 LearningRate 0.1973 Epoch: 0 Global Step: 6820 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:03:20,359-Speed 5474.41 samples/sec Loss 17.3109 LearningRate 0.1976 Epoch: 0 Global Step: 6830 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:03:27,820-Speed 5491.46 samples/sec Loss 17.2885 LearningRate 0.1979 Epoch: 0 Global Step: 6840 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:03:35,424-Speed 5387.96 samples/sec Loss 17.2359 LearningRate 0.1982 Epoch: 0 Global Step: 6850 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:03:42,959-Speed 5436.59 samples/sec Loss 17.1097 LearningRate 0.1985 Epoch: 0 Global Step: 6860 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:03:50,460-Speed 5461.90 samples/sec Loss 17.0890 LearningRate 0.1988 Epoch: 0 Global Step: 6870 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:03:57,938-Speed 5478.73 samples/sec Loss 17.1852 LearningRate 0.1991 Epoch: 0 Global Step: 6880 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:04:05,542-Speed 5387.47 samples/sec Loss 17.1903 LearningRate 0.1993 Epoch: 0 Global Step: 6890 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:04:13,136-Speed 5395.42 samples/sec Loss 17.0419 LearningRate 0.1996 Epoch: 0 Global Step: 6900 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:04:20,596-Speed 5492.05 samples/sec Loss 17.0394 LearningRate 0.1999 Epoch: 0 Global Step: 6910 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:04:28,021-Speed 5517.17 samples/sec Loss 16.9957 LearningRate 0.2002 Epoch: 0 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:04:35,465-Speed 5503.79 samples/sec Loss 16.9898 LearningRate 0.2005 Epoch: 0 Global Step: 6930 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:04:42,960-Speed 5466.26 samples/sec Loss 17.0505 LearningRate 0.2008 Epoch: 0 Global Step: 6940 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:04:50,352-Speed 5542.08 samples/sec Loss 16.8923 LearningRate 0.2011 Epoch: 0 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:04:57,861-Speed 5455.49 samples/sec Loss 17.0299 LearningRate 0.2014 Epoch: 0 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:05:05,290-Speed 5514.40 samples/sec Loss 16.9096 LearningRate 0.2017 Epoch: 0 Global Step: 6970 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:05:12,513-Speed 5672.02 samples/sec Loss 16.9186 LearningRate 0.2019 Epoch: 0 Global Step: 6980 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:05:19,829-Speed 5600.53 samples/sec Loss 16.7981 LearningRate 0.2022 Epoch: 0 Global Step: 6990 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:05:27,273-Speed 5503.69 samples/sec Loss 16.8215 LearningRate 0.2025 Epoch: 0 Global Step: 7000 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:05:34,691-Speed 5522.37 samples/sec Loss 16.8175 LearningRate 0.2028 Epoch: 0 Global Step: 7010 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:05:42,135-Speed 5504.06 samples/sec Loss 16.7989 LearningRate 0.2031 Epoch: 0 Global Step: 7020 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:05:49,663-Speed 5442.08 samples/sec Loss 16.7380 LearningRate 0.2034 Epoch: 0 Global Step: 7030 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:05:57,067-Speed 5533.09 samples/sec Loss 16.7160 LearningRate 0.2037 Epoch: 0 Global Step: 7040 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:06:04,803-Speed 5296.18 samples/sec Loss 16.7093 LearningRate 0.2040 Epoch: 0 Global Step: 7050 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:06:12,201-Speed 5537.45 samples/sec Loss 16.6190 LearningRate 0.2043 Epoch: 0 Global Step: 7060 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:06:19,869-Speed 5342.51 samples/sec Loss 16.6783 LearningRate 0.2046 Epoch: 0 Global Step: 7070 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:06:27,298-Speed 5515.06 samples/sec Loss 16.7052 LearningRate 0.2048 Epoch: 0 Global Step: 7080 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:06:34,730-Speed 5512.39 samples/sec Loss 16.6336 LearningRate 0.2051 Epoch: 0 Global Step: 7090 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:06:42,257-Speed 5442.47 samples/sec Loss 16.6597 LearningRate 0.2054 Epoch: 0 Global Step: 7100 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:06:49,877-Speed 5376.93 samples/sec Loss 16.6475 LearningRate 0.2057 Epoch: 0 Global Step: 7110 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:06:57,267-Speed 5543.45 samples/sec Loss 16.4910 LearningRate 0.2060 Epoch: 0 Global Step: 7120 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:07:04,726-Speed 5492.99 samples/sec Loss 16.5009 LearningRate 0.2063 Epoch: 0 Global Step: 7130 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:07:11,782-Speed 5805.82 samples/sec Loss 16.4465 LearningRate 0.2066 Epoch: 0 Global Step: 7140 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:07:18,904-Speed 5752.30 samples/sec Loss 16.5654 LearningRate 0.2069 Epoch: 0 Global Step: 7150 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:07:26,447-Speed 5431.79 samples/sec Loss 16.4001 LearningRate 0.2072 Epoch: 0 Global Step: 7160 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:07:33,857-Speed 5528.79 samples/sec Loss 16.4053 LearningRate 0.2074 Epoch: 0 Global Step: 7170 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:07:41,308-Speed 5498.47 samples/sec Loss 16.4448 LearningRate 0.2077 Epoch: 0 Global Step: 7180 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:07:48,735-Speed 5515.68 samples/sec Loss 16.4301 LearningRate 0.2080 Epoch: 0 Global Step: 7190 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:07:56,135-Speed 5536.88 samples/sec Loss 16.3543 LearningRate 0.2083 Epoch: 0 Global Step: 7200 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:08:03,582-Speed 5501.74 samples/sec Loss 16.4164 LearningRate 0.2086 Epoch: 0 Global Step: 7210 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:08:11,084-Speed 5461.05 samples/sec Loss 16.3323 LearningRate 0.2089 Epoch: 0 Global Step: 7220 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:08:18,598-Speed 5452.38 samples/sec Loss 16.3451 LearningRate 0.2092 Epoch: 0 Global Step: 7230 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:08:26,144-Speed 5429.17 samples/sec Loss 16.2984 LearningRate 0.2095 Epoch: 0 Global Step: 7240 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:08:33,604-Speed 5492.55 samples/sec Loss 16.2480 LearningRate 0.2098 Epoch: 0 Global Step: 7250 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:08:41,090-Speed 5472.88 samples/sec Loss 16.2566 LearningRate 0.2100 Epoch: 0 Global Step: 7260 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:08:48,595-Speed 5458.32 samples/sec Loss 16.3576 LearningRate 0.2103 Epoch: 0 Global Step: 7270 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:08:56,549-Speed 5151.16 samples/sec Loss 16.2728 LearningRate 0.2106 Epoch: 0 Global Step: 7280 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:09:04,077-Speed 5442.78 samples/sec Loss 16.2869 LearningRate 0.2109 Epoch: 0 Global Step: 7290 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:09:11,501-Speed 5517.70 samples/sec Loss 16.2606 LearningRate 0.2112 Epoch: 0 Global Step: 7300 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:09:19,059-Speed 5420.56 samples/sec Loss 16.2300 LearningRate 0.2115 Epoch: 0 Global Step: 7310 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:09:26,802-Speed 5292.04 samples/sec Loss 16.1754 LearningRate 0.2118 Epoch: 0 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:09:34,422-Speed 5376.85 samples/sec Loss 16.1130 LearningRate 0.2121 Epoch: 0 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:09:41,596-Speed 5710.28 samples/sec Loss 16.2005 LearningRate 0.2124 Epoch: 0 Global Step: 7340 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:09:49,189-Speed 5395.55 samples/sec Loss 16.1524 LearningRate 0.2127 Epoch: 0 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:09:56,594-Speed 5532.96 samples/sec Loss 16.0834 LearningRate 0.2129 Epoch: 0 Global Step: 7360 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:10:04,149-Speed 5422.95 samples/sec Loss 16.0488 LearningRate 0.2132 Epoch: 0 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:10:11,587-Speed 5507.26 samples/sec Loss 16.0058 LearningRate 0.2135 Epoch: 0 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:10:19,010-Speed 5519.79 samples/sec Loss 16.0627 LearningRate 0.2138 Epoch: 0 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:10:26,466-Speed 5494.91 samples/sec Loss 16.0630 LearningRate 0.2141 Epoch: 0 Global Step: 7400 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:10:33,850-Speed 5548.77 samples/sec Loss 16.0269 LearningRate 0.2144 Epoch: 0 Global Step: 7410 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:10:41,415-Speed 5414.84 samples/sec Loss 16.0382 LearningRate 0.2147 Epoch: 0 Global Step: 7420 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:10:49,014-Speed 5391.83 samples/sec Loss 15.9221 LearningRate 0.2150 Epoch: 0 Global Step: 7430 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:10:56,723-Speed 5315.10 samples/sec Loss 16.0316 LearningRate 0.2153 Epoch: 0 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:11:04,290-Speed 5413.77 samples/sec Loss 15.9327 LearningRate 0.2155 Epoch: 0 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:11:11,845-Speed 5422.77 samples/sec Loss 15.8868 LearningRate 0.2158 Epoch: 0 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:11:19,344-Speed 5463.14 samples/sec Loss 15.8266 LearningRate 0.2161 Epoch: 0 Global Step: 7470 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:11:26,893-Speed 5427.47 samples/sec Loss 15.7827 LearningRate 0.2164 Epoch: 0 Global Step: 7480 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:11:34,385-Speed 5468.68 samples/sec Loss 15.7641 LearningRate 0.2167 Epoch: 0 Global Step: 7490 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:11:41,919-Speed 5437.35 samples/sec Loss 15.8835 LearningRate 0.2170 Epoch: 0 Global Step: 7500 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:11:49,450-Speed 5439.96 samples/sec Loss 15.8785 LearningRate 0.2173 Epoch: 0 Global Step: 7510 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:11:56,793-Speed 5580.09 samples/sec Loss 15.7159 LearningRate 0.2176 Epoch: 0 Global Step: 7520 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:12:04,406-Speed 5381.65 samples/sec Loss 15.8402 LearningRate 0.2179 Epoch: 0 Global Step: 7530 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:12:11,885-Speed 5477.65 samples/sec Loss 15.7591 LearningRate 0.2182 Epoch: 0 Global Step: 7540 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:12:19,295-Speed 5529.59 samples/sec Loss 15.7267 LearningRate 0.2184 Epoch: 0 Global Step: 7550 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:12:26,693-Speed 5537.96 samples/sec Loss 15.8257 LearningRate 0.2187 Epoch: 0 Global Step: 7560 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:12:34,156-Speed 5490.20 samples/sec Loss 15.6745 LearningRate 0.2190 Epoch: 0 Global Step: 7570 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:12:41,637-Speed 5475.71 samples/sec Loss 15.7864 LearningRate 0.2193 Epoch: 0 Global Step: 7580 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:12:49,094-Speed 5494.09 samples/sec Loss 15.7427 LearningRate 0.2196 Epoch: 0 Global Step: 7590 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:12:56,551-Speed 5494.82 samples/sec Loss 15.6878 LearningRate 0.2199 Epoch: 0 Global Step: 7600 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:13:03,991-Speed 5507.02 samples/sec Loss 15.6863 LearningRate 0.2202 Epoch: 0 Global Step: 7610 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:13:11,416-Speed 5517.06 samples/sec Loss 15.6490 LearningRate 0.2205 Epoch: 0 Global Step: 7620 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:13:18,874-Speed 5493.15 samples/sec Loss 15.6776 LearningRate 0.2208 Epoch: 0 Global Step: 7630 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:13:26,439-Speed 5415.28 samples/sec Loss 15.6791 LearningRate 0.2210 Epoch: 0 Global Step: 7640 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:13:34,134-Speed 5324.08 samples/sec Loss 15.6689 LearningRate 0.2213 Epoch: 0 Global Step: 7650 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:13:41,582-Speed 5501.28 samples/sec Loss 15.5300 LearningRate 0.2216 Epoch: 0 Global Step: 7660 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:13:49,199-Speed 5377.78 samples/sec Loss 15.5036 LearningRate 0.2219 Epoch: 0 Global Step: 7670 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:13:56,731-Speed 5439.24 samples/sec Loss 15.5316 LearningRate 0.2222 Epoch: 0 Global Step: 7680 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:14:04,142-Speed 5527.90 samples/sec Loss 15.5402 LearningRate 0.2225 Epoch: 0 Global Step: 7690 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:14:11,568-Speed 5517.20 samples/sec Loss 15.4393 LearningRate 0.2228 Epoch: 0 Global Step: 7700 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:14:19,038-Speed 5484.03 samples/sec Loss 15.4199 LearningRate 0.2231 Epoch: 0 Global Step: 7710 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:14:26,455-Speed 5523.46 samples/sec Loss 15.5028 LearningRate 0.2234 Epoch: 0 Global Step: 7720 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:14:33,970-Speed 5452.08 samples/sec Loss 15.4933 LearningRate 0.2236 Epoch: 0 Global Step: 7730 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:14:41,015-Speed 5815.72 samples/sec Loss 15.4717 LearningRate 0.2239 Epoch: 0 Global Step: 7740 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:14:48,384-Speed 5559.31 samples/sec Loss 15.4054 LearningRate 0.2242 Epoch: 0 Global Step: 7750 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:14:55,930-Speed 5429.04 samples/sec Loss 15.4853 LearningRate 0.2245 Epoch: 0 Global Step: 7760 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:15:03,357-Speed 5516.68 samples/sec Loss 15.3310 LearningRate 0.2248 Epoch: 0 Global Step: 7770 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:15:10,803-Speed 5502.29 samples/sec Loss 15.4989 LearningRate 0.2251 Epoch: 0 Global Step: 7780 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:15:18,269-Speed 5487.19 samples/sec Loss 15.3973 LearningRate 0.2254 Epoch: 0 Global Step: 7790 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:15:25,774-Speed 5459.62 samples/sec Loss 15.4000 LearningRate 0.2257 Epoch: 0 Global Step: 7800 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:15:33,301-Speed 5442.96 samples/sec Loss 15.3901 LearningRate 0.2260 Epoch: 0 Global Step: 7810 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:15:40,787-Speed 5473.34 samples/sec Loss 15.3886 LearningRate 0.2263 Epoch: 0 Global Step: 7820 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:15:48,209-Speed 5519.77 samples/sec Loss 15.3312 LearningRate 0.2265 Epoch: 0 Global Step: 7830 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:15:55,658-Speed 5499.69 samples/sec Loss 15.2532 LearningRate 0.2268 Epoch: 0 Global Step: 7840 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:16:03,098-Speed 5506.93 samples/sec Loss 15.3900 LearningRate 0.2271 Epoch: 0 Global Step: 7850 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:16:10,562-Speed 5489.07 samples/sec Loss 15.3200 LearningRate 0.2274 Epoch: 0 Global Step: 7860 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:16:18,132-Speed 5411.31 samples/sec Loss 15.2019 LearningRate 0.2277 Epoch: 0 Global Step: 7870 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:16:25,686-Speed 5423.50 samples/sec Loss 15.2642 LearningRate 0.2280 Epoch: 0 Global Step: 7880 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:16:33,247-Speed 5418.46 samples/sec Loss 15.2952 LearningRate 0.2283 Epoch: 0 Global Step: 7890 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:16:40,707-Speed 5492.23 samples/sec Loss 15.2963 LearningRate 0.2286 Epoch: 0 Global Step: 7900 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:16:48,304-Speed 5392.87 samples/sec Loss 15.2850 LearningRate 0.2289 Epoch: 0 Global Step: 7910 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:16:55,740-Speed 5509.28 samples/sec Loss 15.3123 LearningRate 0.2291 Epoch: 0 Global Step: 7920 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:17:03,307-Speed 5413.89 samples/sec Loss 15.2716 LearningRate 0.2294 Epoch: 0 Global Step: 7930 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:17:10,767-Speed 5491.60 samples/sec Loss 15.2446 LearningRate 0.2297 Epoch: 0 Global Step: 7940 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:17:18,248-Speed 5476.91 samples/sec Loss 15.2185 LearningRate 0.2300 Epoch: 0 Global Step: 7950 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:17:26,673-Speed 4861.99 samples/sec Loss 15.1207 LearningRate 0.2303 Epoch: 0 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:17:34,130-Speed 5493.80 samples/sec Loss 15.1824 LearningRate 0.2306 Epoch: 0 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:17:41,657-Speed 5443.52 samples/sec Loss 15.1881 LearningRate 0.2309 Epoch: 0 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:17:49,200-Speed 5431.22 samples/sec Loss 15.2180 LearningRate 0.2312 Epoch: 0 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:17:56,728-Speed 5442.08 samples/sec Loss 15.1600 LearningRate 0.2315 Epoch: 0 Global Step: 8000 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:18:42,198-[lfw][8000]XNorm: 22.129730 Training: 2022-01-07 20:18:42,199-[lfw][8000]Accuracy-Flip: 0.99450+-0.00395 Training: 2022-01-07 20:18:42,200-[lfw][8000]Accuracy-Highest: 0.99450 Training: 2022-01-07 20:19:34,979-[cfp_fp][8000]XNorm: 20.460830 Training: 2022-01-07 20:19:34,980-[cfp_fp][8000]Accuracy-Flip: 0.96786+-0.01033 Training: 2022-01-07 20:19:34,981-[cfp_fp][8000]Accuracy-Highest: 0.96786 Training: 2022-01-07 20:20:20,531-[agedb_30][8000]XNorm: 21.737388 Training: 2022-01-07 20:20:20,533-[agedb_30][8000]Accuracy-Flip: 0.94783+-0.01088 Training: 2022-01-07 20:20:20,533-[agedb_30][8000]Accuracy-Highest: 0.94783 Training: 2022-01-07 20:20:28,244-Speed 270.34 samples/sec Loss 15.1302 LearningRate 0.2317 Epoch: 0 Global Step: 8010 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:20:35,683-Speed 5510.00 samples/sec Loss 15.0885 LearningRate 0.2320 Epoch: 0 Global Step: 8020 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:20:43,223-Speed 5433.50 samples/sec Loss 15.0030 LearningRate 0.2323 Epoch: 0 Global Step: 8030 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:20:50,680-Speed 5494.63 samples/sec Loss 15.0732 LearningRate 0.2326 Epoch: 0 Global Step: 8040 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:20:58,161-Speed 5476.49 samples/sec Loss 15.0429 LearningRate 0.2329 Epoch: 0 Global Step: 8050 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:21:05,669-Speed 5456.27 samples/sec Loss 15.1155 LearningRate 0.2332 Epoch: 0 Global Step: 8060 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:21:13,206-Speed 5435.36 samples/sec Loss 15.0025 LearningRate 0.2335 Epoch: 0 Global Step: 8070 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:21:20,363-Speed 5724.83 samples/sec Loss 15.1174 LearningRate 0.2338 Epoch: 0 Global Step: 8080 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:21:27,484-Speed 5752.59 samples/sec Loss 15.0260 LearningRate 0.2341 Epoch: 0 Global Step: 8090 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:21:35,072-Speed 5399.07 samples/sec Loss 15.0343 LearningRate 0.2344 Epoch: 0 Global Step: 8100 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:21:42,675-Speed 5388.77 samples/sec Loss 15.0548 LearningRate 0.2346 Epoch: 0 Global Step: 8110 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:21:50,195-Speed 5448.95 samples/sec Loss 15.0523 LearningRate 0.2349 Epoch: 0 Global Step: 8120 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:21:57,705-Speed 5455.30 samples/sec Loss 14.9050 LearningRate 0.2352 Epoch: 0 Global Step: 8130 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:22:05,239-Speed 5438.39 samples/sec Loss 14.9479 LearningRate 0.2355 Epoch: 0 Global Step: 8140 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:22:12,697-Speed 5492.78 samples/sec Loss 14.9536 LearningRate 0.2358 Epoch: 0 Global Step: 8150 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:22:20,190-Speed 5467.46 samples/sec Loss 14.8065 LearningRate 0.2361 Epoch: 0 Global Step: 8160 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:22:27,724-Speed 5437.54 samples/sec Loss 14.9941 LearningRate 0.2364 Epoch: 0 Global Step: 8170 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:22:35,196-Speed 5483.15 samples/sec Loss 14.9332 LearningRate 0.2367 Epoch: 0 Global Step: 8180 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:22:42,616-Speed 5521.59 samples/sec Loss 14.9612 LearningRate 0.2370 Epoch: 0 Global Step: 8190 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:22:49,700-Speed 5783.23 samples/sec Loss 14.8972 LearningRate 0.2372 Epoch: 0 Global Step: 8200 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:22:56,863-Speed 5719.41 samples/sec Loss 14.9157 LearningRate 0.2375 Epoch: 0 Global Step: 8210 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:23:03,958-Speed 5774.11 samples/sec Loss 14.8167 LearningRate 0.2378 Epoch: 0 Global Step: 8220 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:23:10,956-Speed 5854.80 samples/sec Loss 14.8535 LearningRate 0.2381 Epoch: 0 Global Step: 8230 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:23:17,923-Speed 5879.74 samples/sec Loss 14.8961 LearningRate 0.2384 Epoch: 0 Global Step: 8240 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:23:25,150-Speed 5668.79 samples/sec Loss 14.8553 LearningRate 0.2387 Epoch: 0 Global Step: 8250 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:23:32,180-Speed 5828.02 samples/sec Loss 14.9242 LearningRate 0.2390 Epoch: 0 Global Step: 8260 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:23:39,202-Speed 5834.29 samples/sec Loss 14.8096 LearningRate 0.2393 Epoch: 0 Global Step: 8270 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:23:46,447-Speed 5655.13 samples/sec Loss 14.8255 LearningRate 0.2396 Epoch: 0 Global Step: 8280 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:23:53,993-Speed 5429.05 samples/sec Loss 14.9068 LearningRate 0.2398 Epoch: 0 Global Step: 8290 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:24:01,557-Speed 5416.62 samples/sec Loss 14.7583 LearningRate 0.2401 Epoch: 0 Global Step: 8300 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:24:09,104-Speed 5428.73 samples/sec Loss 14.7785 LearningRate 0.2404 Epoch: 0 Global Step: 8310 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:24:16,621-Speed 5450.17 samples/sec Loss 14.6930 LearningRate 0.2407 Epoch: 0 Global Step: 8320 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:24:24,298-Speed 5336.09 samples/sec Loss 14.7619 LearningRate 0.2410 Epoch: 0 Global Step: 8330 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:24:31,707-Speed 5529.37 samples/sec Loss 14.8096 LearningRate 0.2413 Epoch: 0 Global Step: 8340 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:24:39,207-Speed 5462.71 samples/sec Loss 14.7541 LearningRate 0.2416 Epoch: 0 Global Step: 8350 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:24:46,675-Speed 5485.44 samples/sec Loss 14.7991 LearningRate 0.2419 Epoch: 0 Global Step: 8360 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:24:54,095-Speed 5520.46 samples/sec Loss 14.6666 LearningRate 0.2422 Epoch: 0 Global Step: 8370 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:25:01,552-Speed 5493.86 samples/sec Loss 14.7610 LearningRate 0.2425 Epoch: 0 Global Step: 8380 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:25:09,102-Speed 5425.92 samples/sec Loss 14.6799 LearningRate 0.2427 Epoch: 0 Global Step: 8390 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:25:16,590-Speed 5470.92 samples/sec Loss 14.7241 LearningRate 0.2430 Epoch: 0 Global Step: 8400 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:25:24,019-Speed 5514.01 samples/sec Loss 14.6770 LearningRate 0.2433 Epoch: 0 Global Step: 8410 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:25:31,485-Speed 5487.50 samples/sec Loss 14.6275 LearningRate 0.2436 Epoch: 0 Global Step: 8420 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:25:38,975-Speed 5469.55 samples/sec Loss 14.6758 LearningRate 0.2439 Epoch: 0 Global Step: 8430 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:25:46,514-Speed 5433.52 samples/sec Loss 14.6971 LearningRate 0.2442 Epoch: 0 Global Step: 8440 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:25:53,962-Speed 5500.09 samples/sec Loss 14.7048 LearningRate 0.2445 Epoch: 0 Global Step: 8450 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:26:01,407-Speed 5502.54 samples/sec Loss 14.5848 LearningRate 0.2448 Epoch: 0 Global Step: 8460 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:26:08,932-Speed 5443.81 samples/sec Loss 14.6470 LearningRate 0.2451 Epoch: 0 Global Step: 8470 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:26:16,425-Speed 5467.59 samples/sec Loss 14.6031 LearningRate 0.2453 Epoch: 0 Global Step: 8480 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:26:23,979-Speed 5422.46 samples/sec Loss 14.7359 LearningRate 0.2456 Epoch: 0 Global Step: 8490 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:26:31,536-Speed 5421.23 samples/sec Loss 14.5745 LearningRate 0.2459 Epoch: 0 Global Step: 8500 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:26:39,089-Speed 5424.22 samples/sec Loss 14.6010 LearningRate 0.2462 Epoch: 0 Global Step: 8510 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:26:46,593-Speed 5459.25 samples/sec Loss 14.5508 LearningRate 0.2465 Epoch: 0 Global Step: 8520 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:26:54,181-Speed 5398.33 samples/sec Loss 14.6168 LearningRate 0.2468 Epoch: 0 Global Step: 8530 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:27:01,678-Speed 5464.72 samples/sec Loss 14.5784 LearningRate 0.2471 Epoch: 0 Global Step: 8540 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:27:09,236-Speed 5420.06 samples/sec Loss 14.5925 LearningRate 0.2474 Epoch: 0 Global Step: 8550 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:27:16,821-Speed 5400.63 samples/sec Loss 14.5534 LearningRate 0.2477 Epoch: 0 Global Step: 8560 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:27:24,235-Speed 5525.45 samples/sec Loss 14.6015 LearningRate 0.2480 Epoch: 0 Global Step: 8570 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:27:31,681-Speed 5501.99 samples/sec Loss 14.5618 LearningRate 0.2482 Epoch: 0 Global Step: 8580 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:27:39,148-Speed 5485.62 samples/sec Loss 14.5738 LearningRate 0.2485 Epoch: 0 Global Step: 8590 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:27:46,552-Speed 5532.78 samples/sec Loss 14.4994 LearningRate 0.2488 Epoch: 0 Global Step: 8600 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:27:54,048-Speed 5465.59 samples/sec Loss 14.4922 LearningRate 0.2491 Epoch: 0 Global Step: 8610 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:28:01,551-Speed 5459.85 samples/sec Loss 14.5864 LearningRate 0.2494 Epoch: 0 Global Step: 8620 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:28:09,064-Speed 5452.81 samples/sec Loss 14.5305 LearningRate 0.2497 Epoch: 0 Global Step: 8630 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:28:16,677-Speed 5380.64 samples/sec Loss 14.5211 LearningRate 0.2500 Epoch: 0 Global Step: 8640 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:28:24,250-Speed 5409.91 samples/sec Loss 14.5503 LearningRate 0.2503 Epoch: 0 Global Step: 8650 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:28:31,759-Speed 5455.09 samples/sec Loss 14.5495 LearningRate 0.2506 Epoch: 0 Global Step: 8660 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:28:39,308-Speed 5426.88 samples/sec Loss 14.4733 LearningRate 0.2508 Epoch: 0 Global Step: 8670 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:28:46,767-Speed 5491.78 samples/sec Loss 14.5276 LearningRate 0.2511 Epoch: 0 Global Step: 8680 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:28:54,220-Speed 5496.76 samples/sec Loss 14.4284 LearningRate 0.2514 Epoch: 0 Global Step: 8690 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:29:01,729-Speed 5455.42 samples/sec Loss 14.4725 LearningRate 0.2517 Epoch: 0 Global Step: 8700 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:29:09,262-Speed 5438.03 samples/sec Loss 14.4623 LearningRate 0.2520 Epoch: 0 Global Step: 8710 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:29:16,712-Speed 5499.22 samples/sec Loss 14.4117 LearningRate 0.2523 Epoch: 0 Global Step: 8720 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:29:24,244-Speed 5438.88 samples/sec Loss 14.3941 LearningRate 0.2526 Epoch: 0 Global Step: 8730 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:29:31,771-Speed 5442.69 samples/sec Loss 14.4603 LearningRate 0.2529 Epoch: 0 Global Step: 8740 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:29:39,250-Speed 5477.16 samples/sec Loss 14.4350 LearningRate 0.2532 Epoch: 0 Global Step: 8750 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:29:46,693-Speed 5504.09 samples/sec Loss 14.3575 LearningRate 0.2534 Epoch: 0 Global Step: 8760 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:29:54,142-Speed 5499.16 samples/sec Loss 14.4979 LearningRate 0.2537 Epoch: 0 Global Step: 8770 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:30:01,691-Speed 5426.73 samples/sec Loss 14.4839 LearningRate 0.2540 Epoch: 0 Global Step: 8780 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:30:09,219-Speed 5441.93 samples/sec Loss 14.4145 LearningRate 0.2543 Epoch: 0 Global Step: 8790 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:30:16,721-Speed 5460.39 samples/sec Loss 14.3597 LearningRate 0.2546 Epoch: 0 Global Step: 8800 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:30:24,206-Speed 5473.44 samples/sec Loss 14.4121 LearningRate 0.2549 Epoch: 0 Global Step: 8810 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:30:31,867-Speed 5346.75 samples/sec Loss 14.3764 LearningRate 0.2552 Epoch: 0 Global Step: 8820 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:30:39,427-Speed 5419.20 samples/sec Loss 14.2898 LearningRate 0.2555 Epoch: 0 Global Step: 8830 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:30:46,922-Speed 5465.45 samples/sec Loss 14.3456 LearningRate 0.2558 Epoch: 0 Global Step: 8840 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:30:54,383-Speed 5490.41 samples/sec Loss 14.4902 LearningRate 0.2561 Epoch: 0 Global Step: 8850 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:31:01,930-Speed 5428.51 samples/sec Loss 14.3578 LearningRate 0.2563 Epoch: 0 Global Step: 8860 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:31:09,433-Speed 5459.00 samples/sec Loss 14.3215 LearningRate 0.2566 Epoch: 0 Global Step: 8870 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:31:16,952-Speed 5448.81 samples/sec Loss 14.3375 LearningRate 0.2569 Epoch: 0 Global Step: 8880 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:31:24,432-Speed 5476.21 samples/sec Loss 14.3980 LearningRate 0.2572 Epoch: 0 Global Step: 8890 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:31:31,952-Speed 5447.54 samples/sec Loss 14.2738 LearningRate 0.2575 Epoch: 0 Global Step: 8900 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:31:39,402-Speed 5498.61 samples/sec Loss 14.3431 LearningRate 0.2578 Epoch: 0 Global Step: 8910 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:31:46,916-Speed 5451.97 samples/sec Loss 14.2911 LearningRate 0.2581 Epoch: 0 Global Step: 8920 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:31:54,528-Speed 5381.86 samples/sec Loss 14.4411 LearningRate 0.2584 Epoch: 0 Global Step: 8930 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:32:02,104-Speed 5407.23 samples/sec Loss 14.2622 LearningRate 0.2587 Epoch: 0 Global Step: 8940 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:32:09,564-Speed 5491.21 samples/sec Loss 14.1865 LearningRate 0.2589 Epoch: 0 Global Step: 8950 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:32:17,160-Speed 5393.25 samples/sec Loss 14.3095 LearningRate 0.2592 Epoch: 0 Global Step: 8960 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:32:24,577-Speed 5523.46 samples/sec Loss 14.3218 LearningRate 0.2595 Epoch: 0 Global Step: 8970 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:32:31,977-Speed 5535.48 samples/sec Loss 14.2826 LearningRate 0.2598 Epoch: 0 Global Step: 8980 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:32:39,461-Speed 5473.69 samples/sec Loss 14.2258 LearningRate 0.2601 Epoch: 0 Global Step: 8990 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:32:46,876-Speed 5524.93 samples/sec Loss 14.2779 LearningRate 0.2604 Epoch: 0 Global Step: 9000 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:32:54,374-Speed 5463.63 samples/sec Loss 14.1928 LearningRate 0.2607 Epoch: 0 Global Step: 9010 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:33:01,905-Speed 5439.56 samples/sec Loss 14.2994 LearningRate 0.2610 Epoch: 0 Global Step: 9020 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:33:09,389-Speed 5473.12 samples/sec Loss 14.2448 LearningRate 0.2613 Epoch: 0 Global Step: 9030 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:33:16,875-Speed 5472.97 samples/sec Loss 14.1702 LearningRate 0.2615 Epoch: 0 Global Step: 9040 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:33:24,305-Speed 5513.55 samples/sec Loss 14.2905 LearningRate 0.2618 Epoch: 0 Global Step: 9050 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:33:31,783-Speed 5477.81 samples/sec Loss 14.2198 LearningRate 0.2621 Epoch: 0 Global Step: 9060 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:33:39,282-Speed 5462.79 samples/sec Loss 14.2746 LearningRate 0.2624 Epoch: 0 Global Step: 9070 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:33:46,777-Speed 5466.15 samples/sec Loss 14.2273 LearningRate 0.2627 Epoch: 0 Global Step: 9080 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:33:54,188-Speed 5528.20 samples/sec Loss 14.1881 LearningRate 0.2630 Epoch: 0 Global Step: 9090 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:34:01,683-Speed 5465.87 samples/sec Loss 14.2516 LearningRate 0.2633 Epoch: 0 Global Step: 9100 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:34:09,227-Speed 5430.06 samples/sec Loss 14.2326 LearningRate 0.2636 Epoch: 0 Global Step: 9110 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:34:16,624-Speed 5538.34 samples/sec Loss 14.2858 LearningRate 0.2639 Epoch: 0 Global Step: 9120 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:34:24,043-Speed 5521.58 samples/sec Loss 14.2020 LearningRate 0.2642 Epoch: 0 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:34:31,626-Speed 5402.13 samples/sec Loss 14.2066 LearningRate 0.2644 Epoch: 0 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:34:39,084-Speed 5493.03 samples/sec Loss 14.2058 LearningRate 0.2647 Epoch: 0 Global Step: 9150 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:34:46,540-Speed 5493.95 samples/sec Loss 14.2304 LearningRate 0.2650 Epoch: 0 Global Step: 9160 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:34:54,103-Speed 5416.75 samples/sec Loss 14.1849 LearningRate 0.2653 Epoch: 0 Global Step: 9170 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:35:01,605-Speed 5460.75 samples/sec Loss 14.1222 LearningRate 0.2656 Epoch: 0 Global Step: 9180 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:35:09,107-Speed 5460.49 samples/sec Loss 14.1990 LearningRate 0.2659 Epoch: 0 Global Step: 9190 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:35:16,737-Speed 5368.78 samples/sec Loss 14.1229 LearningRate 0.2662 Epoch: 0 Global Step: 9200 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:35:24,269-Speed 5439.05 samples/sec Loss 14.0223 LearningRate 0.2665 Epoch: 0 Global Step: 9210 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:35:31,715-Speed 5501.87 samples/sec Loss 14.1704 LearningRate 0.2668 Epoch: 0 Global Step: 9220 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:35:39,214-Speed 5463.10 samples/sec Loss 14.2064 LearningRate 0.2670 Epoch: 0 Global Step: 9230 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:35:46,768-Speed 5423.09 samples/sec Loss 14.1549 LearningRate 0.2673 Epoch: 0 Global Step: 9240 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:35:54,321-Speed 5423.12 samples/sec Loss 14.1625 LearningRate 0.2676 Epoch: 0 Global Step: 9250 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:36:01,752-Speed 5513.52 samples/sec Loss 14.1597 LearningRate 0.2679 Epoch: 0 Global Step: 9260 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:36:09,209-Speed 5493.35 samples/sec Loss 14.0770 LearningRate 0.2682 Epoch: 0 Global Step: 9270 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:36:16,615-Speed 5531.57 samples/sec Loss 14.0218 LearningRate 0.2685 Epoch: 0 Global Step: 9280 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:36:24,142-Speed 5442.13 samples/sec Loss 14.1522 LearningRate 0.2688 Epoch: 0 Global Step: 9290 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:36:31,595-Speed 5496.60 samples/sec Loss 14.1133 LearningRate 0.2691 Epoch: 0 Global Step: 9300 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:36:39,064-Speed 5485.53 samples/sec Loss 14.1543 LearningRate 0.2694 Epoch: 0 Global Step: 9310 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:36:46,525-Speed 5489.92 samples/sec Loss 14.2882 LearningRate 0.2696 Epoch: 0 Global Step: 9320 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:36:53,976-Speed 5498.27 samples/sec Loss 14.1360 LearningRate 0.2699 Epoch: 0 Global Step: 9330 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:37:01,497-Speed 5446.93 samples/sec Loss 14.1478 LearningRate 0.2702 Epoch: 0 Global Step: 9340 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:37:08,927-Speed 5514.06 samples/sec Loss 14.1237 LearningRate 0.2705 Epoch: 0 Global Step: 9350 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:37:16,365-Speed 5506.99 samples/sec Loss 14.2116 LearningRate 0.2708 Epoch: 0 Global Step: 9360 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:37:23,823-Speed 5492.89 samples/sec Loss 14.1346 LearningRate 0.2711 Epoch: 0 Global Step: 9370 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:37:31,306-Speed 5474.64 samples/sec Loss 14.1033 LearningRate 0.2714 Epoch: 0 Global Step: 9380 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:37:38,751-Speed 5502.50 samples/sec Loss 14.0806 LearningRate 0.2717 Epoch: 0 Global Step: 9390 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:37:46,247-Speed 5465.03 samples/sec Loss 14.0646 LearningRate 0.2720 Epoch: 0 Global Step: 9400 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:37:53,738-Speed 5468.28 samples/sec Loss 14.1265 LearningRate 0.2723 Epoch: 0 Global Step: 9410 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:38:01,187-Speed 5500.42 samples/sec Loss 14.0264 LearningRate 0.2725 Epoch: 0 Global Step: 9420 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:38:08,627-Speed 5505.85 samples/sec Loss 14.0642 LearningRate 0.2728 Epoch: 0 Global Step: 9430 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:38:16,083-Speed 5494.25 samples/sec Loss 14.0569 LearningRate 0.2731 Epoch: 0 Global Step: 9440 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:38:23,506-Speed 5518.40 samples/sec Loss 14.0781 LearningRate 0.2734 Epoch: 0 Global Step: 9450 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:38:30,954-Speed 5500.30 samples/sec Loss 14.0731 LearningRate 0.2737 Epoch: 0 Global Step: 9460 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:38:38,605-Speed 5354.69 samples/sec Loss 14.0206 LearningRate 0.2740 Epoch: 0 Global Step: 9470 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:38:46,041-Speed 5509.26 samples/sec Loss 14.0869 LearningRate 0.2743 Epoch: 0 Global Step: 9480 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:38:53,482-Speed 5504.66 samples/sec Loss 14.0266 LearningRate 0.2746 Epoch: 0 Global Step: 9490 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:39:01,032-Speed 5426.19 samples/sec Loss 14.0427 LearningRate 0.2749 Epoch: 0 Global Step: 9500 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:39:08,487-Speed 5495.46 samples/sec Loss 13.9802 LearningRate 0.2751 Epoch: 0 Global Step: 9510 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:39:15,981-Speed 5466.23 samples/sec Loss 13.9596 LearningRate 0.2754 Epoch: 0 Global Step: 9520 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:39:23,509-Speed 5441.97 samples/sec Loss 14.1009 LearningRate 0.2757 Epoch: 0 Global Step: 9530 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:39:30,924-Speed 5524.46 samples/sec Loss 13.9731 LearningRate 0.2760 Epoch: 0 Global Step: 9540 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:39:38,362-Speed 5507.73 samples/sec Loss 13.9817 LearningRate 0.2763 Epoch: 0 Global Step: 9550 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:39:45,801-Speed 5506.77 samples/sec Loss 13.9634 LearningRate 0.2766 Epoch: 0 Global Step: 9560 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:39:53,413-Speed 5381.46 samples/sec Loss 14.1002 LearningRate 0.2769 Epoch: 0 Global Step: 9570 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:40:00,878-Speed 5488.57 samples/sec Loss 13.9308 LearningRate 0.2772 Epoch: 0 Global Step: 9580 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:40:08,292-Speed 5524.96 samples/sec Loss 14.4544 LearningRate 0.2775 Epoch: 0 Global Step: 9590 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:40:15,712-Speed 5521.61 samples/sec Loss 14.4927 LearningRate 0.2778 Epoch: 0 Global Step: 9600 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 20:40:23,211-Speed 5462.05 samples/sec Loss 14.2164 LearningRate 0.2780 Epoch: 0 Global Step: 9610 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 20:40:30,807-Speed 5393.32 samples/sec Loss 14.0396 LearningRate 0.2783 Epoch: 0 Global Step: 9620 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 20:40:38,233-Speed 5516.38 samples/sec Loss 14.0909 LearningRate 0.2786 Epoch: 0 Global Step: 9630 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 20:40:45,684-Speed 5498.16 samples/sec Loss 13.9813 LearningRate 0.2789 Epoch: 0 Global Step: 9640 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 20:40:53,153-Speed 5485.05 samples/sec Loss 13.9863 LearningRate 0.2792 Epoch: 0 Global Step: 9650 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 20:41:00,604-Speed 5497.55 samples/sec Loss 14.0429 LearningRate 0.2795 Epoch: 0 Global Step: 9660 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 20:41:08,028-Speed 5518.75 samples/sec Loss 14.0537 LearningRate 0.2798 Epoch: 0 Global Step: 9670 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 20:41:15,445-Speed 5523.23 samples/sec Loss 13.9689 LearningRate 0.2801 Epoch: 0 Global Step: 9680 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 20:41:22,855-Speed 5527.71 samples/sec Loss 13.9905 LearningRate 0.2804 Epoch: 0 Global Step: 9690 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 20:41:30,309-Speed 5496.28 samples/sec Loss 14.0016 LearningRate 0.2806 Epoch: 0 Global Step: 9700 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:41:37,810-Speed 5461.23 samples/sec Loss 13.9536 LearningRate 0.2809 Epoch: 0 Global Step: 9710 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:41:45,278-Speed 5485.69 samples/sec Loss 13.9510 LearningRate 0.2812 Epoch: 0 Global Step: 9720 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:41:52,700-Speed 5519.74 samples/sec Loss 13.9670 LearningRate 0.2815 Epoch: 0 Global Step: 9730 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:42:00,169-Speed 5484.86 samples/sec Loss 13.9725 LearningRate 0.2818 Epoch: 0 Global Step: 9740 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:42:07,614-Speed 5502.39 samples/sec Loss 13.9324 LearningRate 0.2821 Epoch: 0 Global Step: 9750 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:42:15,045-Speed 5513.36 samples/sec Loss 13.9971 LearningRate 0.2824 Epoch: 0 Global Step: 9760 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:42:22,485-Speed 5505.99 samples/sec Loss 14.0568 LearningRate 0.2827 Epoch: 0 Global Step: 9770 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:42:29,926-Speed 5505.34 samples/sec Loss 14.0047 LearningRate 0.2830 Epoch: 0 Global Step: 9780 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:42:37,426-Speed 5462.42 samples/sec Loss 13.9211 LearningRate 0.2832 Epoch: 0 Global Step: 9790 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 20:42:44,867-Speed 5505.02 samples/sec Loss 13.9178 LearningRate 0.2835 Epoch: 0 Global Step: 9800 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:42:52,328-Speed 5490.98 samples/sec Loss 13.9492 LearningRate 0.2838 Epoch: 0 Global Step: 9810 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:42:59,759-Speed 5512.38 samples/sec Loss 13.9100 LearningRate 0.2841 Epoch: 0 Global Step: 9820 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:43:07,316-Speed 5421.14 samples/sec Loss 14.0000 LearningRate 0.2844 Epoch: 0 Global Step: 9830 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:43:14,740-Speed 5518.59 samples/sec Loss 13.8934 LearningRate 0.2847 Epoch: 0 Global Step: 9840 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:43:22,169-Speed 5513.89 samples/sec Loss 13.9084 LearningRate 0.2850 Epoch: 0 Global Step: 9850 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:43:29,600-Speed 5512.73 samples/sec Loss 13.8353 LearningRate 0.2853 Epoch: 0 Global Step: 9860 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:43:37,054-Speed 5496.17 samples/sec Loss 13.9456 LearningRate 0.2856 Epoch: 0 Global Step: 9870 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:43:44,514-Speed 5491.07 samples/sec Loss 13.9694 LearningRate 0.2859 Epoch: 0 Global Step: 9880 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:43:51,992-Speed 5478.44 samples/sec Loss 13.9083 LearningRate 0.2861 Epoch: 0 Global Step: 9890 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:43:59,426-Speed 5510.45 samples/sec Loss 13.9515 LearningRate 0.2864 Epoch: 0 Global Step: 9900 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:44:06,871-Speed 5502.48 samples/sec Loss 13.9572 LearningRate 0.2867 Epoch: 0 Global Step: 9910 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:44:14,348-Speed 5478.70 samples/sec Loss 13.9070 LearningRate 0.2870 Epoch: 0 Global Step: 9920 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:44:21,771-Speed 5518.68 samples/sec Loss 13.8922 LearningRate 0.2873 Epoch: 0 Global Step: 9930 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:44:29,250-Speed 5477.61 samples/sec Loss 13.8573 LearningRate 0.2876 Epoch: 0 Global Step: 9940 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:44:36,742-Speed 5467.43 samples/sec Loss 13.9540 LearningRate 0.2879 Epoch: 0 Global Step: 9950 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:44:44,199-Speed 5493.95 samples/sec Loss 13.8472 LearningRate 0.2882 Epoch: 0 Global Step: 9960 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:44:51,635-Speed 5508.90 samples/sec Loss 13.9780 LearningRate 0.2885 Epoch: 0 Global Step: 9970 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:44:59,020-Speed 5547.70 samples/sec Loss 13.8697 LearningRate 0.2887 Epoch: 0 Global Step: 9980 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:45:06,473-Speed 5496.36 samples/sec Loss 13.9913 LearningRate 0.2890 Epoch: 0 Global Step: 9990 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:45:13,926-Speed 5496.59 samples/sec Loss 13.9089 LearningRate 0.2893 Epoch: 0 Global Step: 10000 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:45:58,511-[lfw][10000]XNorm: 23.119454 Training: 2022-01-07 20:45:58,512-[lfw][10000]Accuracy-Flip: 0.99567+-0.00327 Training: 2022-01-07 20:45:58,513-[lfw][10000]Accuracy-Highest: 0.99567 Training: 2022-01-07 20:46:51,699-[cfp_fp][10000]XNorm: 21.330772 Training: 2022-01-07 20:46:51,700-[cfp_fp][10000]Accuracy-Flip: 0.97143+-0.00740 Training: 2022-01-07 20:46:51,701-[cfp_fp][10000]Accuracy-Highest: 0.97143 Training: 2022-01-07 20:47:37,423-[agedb_30][10000]XNorm: 23.151996 Training: 2022-01-07 20:47:37,424-[agedb_30][10000]Accuracy-Flip: 0.95500+-0.00869 Training: 2022-01-07 20:47:37,425-[agedb_30][10000]Accuracy-Highest: 0.95500 Training: 2022-01-07 20:47:45,028-Speed 271.08 samples/sec Loss 13.8010 LearningRate 0.2896 Epoch: 0 Global Step: 10010 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:47:52,651-Speed 5374.29 samples/sec Loss 13.9753 LearningRate 0.2899 Epoch: 0 Global Step: 10020 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:48:00,252-Speed 5390.08 samples/sec Loss 13.9044 LearningRate 0.2902 Epoch: 0 Global Step: 10030 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:48:07,725-Speed 5482.56 samples/sec Loss 13.8992 LearningRate 0.2905 Epoch: 0 Global Step: 10040 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:48:15,164-Speed 5507.72 samples/sec Loss 13.8962 LearningRate 0.2908 Epoch: 0 Global Step: 10050 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:48:22,674-Speed 5455.18 samples/sec Loss 13.8347 LearningRate 0.2911 Epoch: 0 Global Step: 10060 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:48:30,257-Speed 5403.37 samples/sec Loss 13.8300 LearningRate 0.2913 Epoch: 0 Global Step: 10070 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:48:37,852-Speed 5394.21 samples/sec Loss 13.8547 LearningRate 0.2916 Epoch: 0 Global Step: 10080 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:48:45,290-Speed 5507.43 samples/sec Loss 13.9728 LearningRate 0.2919 Epoch: 0 Global Step: 10090 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:48:52,816-Speed 5444.12 samples/sec Loss 13.9322 LearningRate 0.2922 Epoch: 0 Global Step: 10100 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:49:00,484-Speed 5342.66 samples/sec Loss 13.8914 LearningRate 0.2925 Epoch: 0 Global Step: 10110 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:49:07,753-Speed 5636.14 samples/sec Loss 13.9653 LearningRate 0.2928 Epoch: 0 Global Step: 10120 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:49:15,263-Speed 5455.07 samples/sec Loss 13.9788 LearningRate 0.2931 Epoch: 0 Global Step: 10130 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:49:22,716-Speed 5497.09 samples/sec Loss 13.8336 LearningRate 0.2934 Epoch: 0 Global Step: 10140 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:49:30,280-Speed 5416.80 samples/sec Loss 13.9003 LearningRate 0.2937 Epoch: 0 Global Step: 10150 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:49:37,809-Speed 5440.91 samples/sec Loss 13.9053 LearningRate 0.2940 Epoch: 0 Global Step: 10160 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:49:45,234-Speed 5518.29 samples/sec Loss 13.8760 LearningRate 0.2942 Epoch: 0 Global Step: 10170 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:49:52,701-Speed 5486.60 samples/sec Loss 13.8570 LearningRate 0.2945 Epoch: 0 Global Step: 10180 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:50:00,044-Speed 5579.55 samples/sec Loss 13.9280 LearningRate 0.2948 Epoch: 0 Global Step: 10190 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:50:07,494-Speed 5499.27 samples/sec Loss 13.8596 LearningRate 0.2951 Epoch: 0 Global Step: 10200 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:50:14,906-Speed 5528.11 samples/sec Loss 13.8332 LearningRate 0.2954 Epoch: 0 Global Step: 10210 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:50:22,586-Speed 5334.23 samples/sec Loss 13.8616 LearningRate 0.2957 Epoch: 0 Global Step: 10220 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:50:30,107-Speed 5447.23 samples/sec Loss 13.8835 LearningRate 0.2960 Epoch: 0 Global Step: 10230 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:50:37,193-Speed 5781.84 samples/sec Loss 13.7577 LearningRate 0.2963 Epoch: 0 Global Step: 10240 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:50:44,518-Speed 5592.89 samples/sec Loss 13.8189 LearningRate 0.2966 Epoch: 0 Global Step: 10250 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:50:52,057-Speed 5434.15 samples/sec Loss 13.7958 LearningRate 0.2968 Epoch: 0 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:50:59,221-Speed 5719.24 samples/sec Loss 13.7974 LearningRate 0.2971 Epoch: 0 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:51:06,792-Speed 5411.02 samples/sec Loss 13.9565 LearningRate 0.2974 Epoch: 0 Global Step: 10280 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:51:14,549-Speed 5281.15 samples/sec Loss 13.7485 LearningRate 0.2977 Epoch: 0 Global Step: 10290 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:51:22,206-Speed 5350.67 samples/sec Loss 13.8137 LearningRate 0.2980 Epoch: 0 Global Step: 10300 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:51:29,789-Speed 5403.05 samples/sec Loss 13.8153 LearningRate 0.2983 Epoch: 0 Global Step: 10310 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:51:37,344-Speed 5422.51 samples/sec Loss 13.7642 LearningRate 0.2986 Epoch: 0 Global Step: 10320 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:51:45,027-Speed 5332.59 samples/sec Loss 13.8676 LearningRate 0.2989 Epoch: 0 Global Step: 10330 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:51:52,727-Speed 5319.98 samples/sec Loss 13.8651 LearningRate 0.2992 Epoch: 0 Global Step: 10340 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:52:00,269-Speed 5432.38 samples/sec Loss 13.8303 LearningRate 0.2995 Epoch: 0 Global Step: 10350 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:52:07,783-Speed 5452.27 samples/sec Loss 13.8039 LearningRate 0.2997 Epoch: 0 Global Step: 10360 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:52:15,342-Speed 5420.19 samples/sec Loss 13.8142 LearningRate 0.3000 Epoch: 0 Global Step: 10370 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:52:38,276-Speed 1786.27 samples/sec Loss 13.7174 LearningRate 0.3000 Epoch: 1 Global Step: 10380 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:52:45,669-Speed 5541.46 samples/sec Loss 13.7615 LearningRate 0.2999 Epoch: 1 Global Step: 10390 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:52:53,047-Speed 5552.55 samples/sec Loss 13.7743 LearningRate 0.2999 Epoch: 1 Global Step: 10400 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:53:00,468-Speed 5521.14 samples/sec Loss 13.8331 LearningRate 0.2999 Epoch: 1 Global Step: 10410 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:53:07,958-Speed 5470.15 samples/sec Loss 13.8493 LearningRate 0.2998 Epoch: 1 Global Step: 10420 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:53:15,362-Speed 5534.05 samples/sec Loss 13.8542 LearningRate 0.2998 Epoch: 1 Global Step: 10430 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:53:22,921-Speed 5419.70 samples/sec Loss 13.7681 LearningRate 0.2998 Epoch: 1 Global Step: 10440 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:53:30,398-Speed 5479.17 samples/sec Loss 13.7879 LearningRate 0.2998 Epoch: 1 Global Step: 10450 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:53:37,791-Speed 5542.09 samples/sec Loss 13.8541 LearningRate 0.2997 Epoch: 1 Global Step: 10460 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:53:45,397-Speed 5385.80 samples/sec Loss 13.7615 LearningRate 0.2997 Epoch: 1 Global Step: 10470 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:53:52,800-Speed 5534.17 samples/sec Loss 13.8087 LearningRate 0.2997 Epoch: 1 Global Step: 10480 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:53:59,883-Speed 5784.60 samples/sec Loss 13.7360 LearningRate 0.2996 Epoch: 1 Global Step: 10490 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:54:07,167-Speed 5624.73 samples/sec Loss 13.8070 LearningRate 0.2996 Epoch: 1 Global Step: 10500 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:54:14,398-Speed 5666.58 samples/sec Loss 13.7994 LearningRate 0.2996 Epoch: 1 Global Step: 10510 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:54:21,678-Speed 5627.51 samples/sec Loss 13.7025 LearningRate 0.2995 Epoch: 1 Global Step: 10520 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-01-07 20:54:29,112-Speed 5511.26 samples/sec Loss 13.7201 LearningRate 0.2995 Epoch: 1 Global Step: 10530 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:54:36,314-Speed 5688.94 samples/sec Loss 13.7639 LearningRate 0.2995 Epoch: 1 Global Step: 10540 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:54:43,544-Speed 5667.15 samples/sec Loss 13.7682 LearningRate 0.2994 Epoch: 1 Global Step: 10550 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:54:51,058-Speed 5451.98 samples/sec Loss 13.7215 LearningRate 0.2994 Epoch: 1 Global Step: 10560 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:54:58,525-Speed 5486.55 samples/sec Loss 13.7943 LearningRate 0.2994 Epoch: 1 Global Step: 10570 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:55:06,031-Speed 5458.12 samples/sec Loss 13.7260 LearningRate 0.2994 Epoch: 1 Global Step: 10580 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:55:13,455-Speed 5518.84 samples/sec Loss 13.6908 LearningRate 0.2993 Epoch: 1 Global Step: 10590 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:55:20,609-Speed 5726.39 samples/sec Loss 13.7780 LearningRate 0.2993 Epoch: 1 Global Step: 10600 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:55:27,585-Speed 5873.12 samples/sec Loss 13.7498 LearningRate 0.2993 Epoch: 1 Global Step: 10610 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:55:34,932-Speed 5575.88 samples/sec Loss 13.7494 LearningRate 0.2992 Epoch: 1 Global Step: 10620 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-01-07 20:55:42,391-Speed 5491.92 samples/sec Loss 13.7498 LearningRate 0.2992 Epoch: 1 Global Step: 10630 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:55:49,816-Speed 5518.23 samples/sec Loss 13.7834 LearningRate 0.2992 Epoch: 1 Global Step: 10640 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:55:57,109-Speed 5617.25 samples/sec Loss 13.7155 LearningRate 0.2991 Epoch: 1 Global Step: 10650 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:56:04,247-Speed 5739.62 samples/sec Loss 13.7926 LearningRate 0.2991 Epoch: 1 Global Step: 10660 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:56:11,639-Speed 5541.97 samples/sec Loss 13.6340 LearningRate 0.2991 Epoch: 1 Global Step: 10670 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:56:19,054-Speed 5525.29 samples/sec Loss 13.6930 LearningRate 0.2991 Epoch: 1 Global Step: 10680 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:56:26,405-Speed 5573.65 samples/sec Loss 13.7176 LearningRate 0.2990 Epoch: 1 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:56:33,868-Speed 5489.23 samples/sec Loss 13.6654 LearningRate 0.2990 Epoch: 1 Global Step: 10700 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:56:41,217-Speed 5574.99 samples/sec Loss 13.6777 LearningRate 0.2990 Epoch: 1 Global Step: 10710 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:56:48,286-Speed 5795.69 samples/sec Loss 13.6046 LearningRate 0.2989 Epoch: 1 Global Step: 10720 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:56:55,897-Speed 5382.72 samples/sec Loss 13.6660 LearningRate 0.2989 Epoch: 1 Global Step: 10730 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 20:57:03,369-Speed 5482.65 samples/sec Loss 13.6465 LearningRate 0.2989 Epoch: 1 Global Step: 10740 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:57:10,897-Speed 5443.42 samples/sec Loss 13.6251 LearningRate 0.2988 Epoch: 1 Global Step: 10750 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:57:18,600-Speed 5318.62 samples/sec Loss 13.6724 LearningRate 0.2988 Epoch: 1 Global Step: 10760 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 20:57:26,196-Speed 5393.99 samples/sec Loss 13.6349 LearningRate 0.2988 Epoch: 1 Global Step: 10770 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:57:33,747-Speed 5425.23 samples/sec Loss 13.6547 LearningRate 0.2987 Epoch: 1 Global Step: 10780 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:57:41,207-Speed 5492.38 samples/sec Loss 13.6461 LearningRate 0.2987 Epoch: 1 Global Step: 10790 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:57:48,609-Speed 5534.89 samples/sec Loss 13.6551 LearningRate 0.2987 Epoch: 1 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:57:56,029-Speed 5521.45 samples/sec Loss 13.6284 LearningRate 0.2987 Epoch: 1 Global Step: 10810 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:58:03,567-Speed 5434.96 samples/sec Loss 13.5989 LearningRate 0.2986 Epoch: 1 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:58:11,028-Speed 5491.28 samples/sec Loss 13.6195 LearningRate 0.2986 Epoch: 1 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:58:18,491-Speed 5489.69 samples/sec Loss 13.5961 LearningRate 0.2986 Epoch: 1 Global Step: 10840 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:58:25,898-Speed 5533.22 samples/sec Loss 13.5596 LearningRate 0.2985 Epoch: 1 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:58:33,306-Speed 5530.26 samples/sec Loss 13.6033 LearningRate 0.2985 Epoch: 1 Global Step: 10860 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:58:40,725-Speed 5522.13 samples/sec Loss 13.5722 LearningRate 0.2985 Epoch: 1 Global Step: 10870 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:58:48,190-Speed 5488.17 samples/sec Loss 13.6217 LearningRate 0.2984 Epoch: 1 Global Step: 10880 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:58:55,703-Speed 5453.44 samples/sec Loss 13.5656 LearningRate 0.2984 Epoch: 1 Global Step: 10890 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:59:03,211-Speed 5455.65 samples/sec Loss 13.5524 LearningRate 0.2984 Epoch: 1 Global Step: 10900 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:59:10,721-Speed 5455.58 samples/sec Loss 13.5366 LearningRate 0.2984 Epoch: 1 Global Step: 10910 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:59:18,233-Speed 5453.04 samples/sec Loss 13.5964 LearningRate 0.2983 Epoch: 1 Global Step: 10920 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:59:25,726-Speed 5467.64 samples/sec Loss 13.6064 LearningRate 0.2983 Epoch: 1 Global Step: 10930 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:59:33,343-Speed 5377.65 samples/sec Loss 13.5003 LearningRate 0.2983 Epoch: 1 Global Step: 10940 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 20:59:40,802-Speed 5492.44 samples/sec Loss 13.5403 LearningRate 0.2982 Epoch: 1 Global Step: 10950 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:59:48,264-Speed 5490.38 samples/sec Loss 13.5805 LearningRate 0.2982 Epoch: 1 Global Step: 10960 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 20:59:55,886-Speed 5374.49 samples/sec Loss 13.5662 LearningRate 0.2982 Epoch: 1 Global Step: 10970 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:00:03,482-Speed 5392.75 samples/sec Loss 13.6030 LearningRate 0.2981 Epoch: 1 Global Step: 10980 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:00:11,086-Speed 5387.04 samples/sec Loss 13.5207 LearningRate 0.2981 Epoch: 1 Global Step: 10990 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:00:18,674-Speed 5399.11 samples/sec Loss 13.6135 LearningRate 0.2981 Epoch: 1 Global Step: 11000 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:00:26,309-Speed 5365.29 samples/sec Loss 13.4981 LearningRate 0.2981 Epoch: 1 Global Step: 11010 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:00:33,974-Speed 5344.60 samples/sec Loss 13.4796 LearningRate 0.2980 Epoch: 1 Global Step: 11020 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:00:41,594-Speed 5376.15 samples/sec Loss 13.5199 LearningRate 0.2980 Epoch: 1 Global Step: 11030 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:00:49,177-Speed 5402.71 samples/sec Loss 13.4643 LearningRate 0.2980 Epoch: 1 Global Step: 11040 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:00:56,935-Speed 5280.49 samples/sec Loss 13.3979 LearningRate 0.2979 Epoch: 1 Global Step: 11050 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:01:04,610-Speed 5336.99 samples/sec Loss 13.4865 LearningRate 0.2979 Epoch: 1 Global Step: 11060 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:01:12,399-Speed 5259.37 samples/sec Loss 13.5801 LearningRate 0.2979 Epoch: 1 Global Step: 11070 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:01:20,089-Speed 5326.90 samples/sec Loss 13.4871 LearningRate 0.2978 Epoch: 1 Global Step: 11080 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:01:27,856-Speed 5274.62 samples/sec Loss 13.4998 LearningRate 0.2978 Epoch: 1 Global Step: 11090 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:01:35,550-Speed 5324.48 samples/sec Loss 13.5059 LearningRate 0.2978 Epoch: 1 Global Step: 11100 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:01:43,251-Speed 5319.45 samples/sec Loss 13.5007 LearningRate 0.2977 Epoch: 1 Global Step: 11110 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:01:50,941-Speed 5326.57 samples/sec Loss 13.4613 LearningRate 0.2977 Epoch: 1 Global Step: 11120 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:01:58,620-Speed 5334.81 samples/sec Loss 13.4809 LearningRate 0.2977 Epoch: 1 Global Step: 11130 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:02:06,330-Speed 5313.56 samples/sec Loss 13.4768 LearningRate 0.2977 Epoch: 1 Global Step: 11140 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:02:14,044-Speed 5310.57 samples/sec Loss 13.4295 LearningRate 0.2976 Epoch: 1 Global Step: 11150 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:02:21,693-Speed 5355.67 samples/sec Loss 13.4623 LearningRate 0.2976 Epoch: 1 Global Step: 11160 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:02:29,375-Speed 5332.03 samples/sec Loss 13.4085 LearningRate 0.2976 Epoch: 1 Global Step: 11170 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:02:37,162-Speed 5261.03 samples/sec Loss 13.6190 LearningRate 0.2975 Epoch: 1 Global Step: 11180 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:02:44,935-Speed 5270.11 samples/sec Loss 13.4382 LearningRate 0.2975 Epoch: 1 Global Step: 11190 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:02:52,683-Speed 5287.12 samples/sec Loss 13.4068 LearningRate 0.2975 Epoch: 1 Global Step: 11200 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:03:00,406-Speed 5304.45 samples/sec Loss 13.4025 LearningRate 0.2974 Epoch: 1 Global Step: 11210 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:03:08,176-Speed 5272.83 samples/sec Loss 13.4626 LearningRate 0.2974 Epoch: 1 Global Step: 11220 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:03:15,888-Speed 5311.71 samples/sec Loss 13.4027 LearningRate 0.2974 Epoch: 1 Global Step: 11230 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:03:23,669-Speed 5264.89 samples/sec Loss 13.4063 LearningRate 0.2974 Epoch: 1 Global Step: 11240 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:03:31,414-Speed 5288.94 samples/sec Loss 13.4272 LearningRate 0.2973 Epoch: 1 Global Step: 11250 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:03:39,162-Speed 5287.31 samples/sec Loss 13.4636 LearningRate 0.2973 Epoch: 1 Global Step: 11260 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:03:46,897-Speed 5296.58 samples/sec Loss 13.3905 LearningRate 0.2973 Epoch: 1 Global Step: 11270 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:03:54,624-Speed 5301.19 samples/sec Loss 13.3754 LearningRate 0.2972 Epoch: 1 Global Step: 11280 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:04:02,377-Speed 5284.07 samples/sec Loss 13.3913 LearningRate 0.2972 Epoch: 1 Global Step: 11290 Fp16 Grad Scale: 524288 Required: 44 hours Training: 2022-01-07 21:04:10,143-Speed 5274.60 samples/sec Loss 13.4145 LearningRate 0.2972 Epoch: 1 Global Step: 11300 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:04:17,989-Speed 5221.49 samples/sec Loss 13.3946 LearningRate 0.2971 Epoch: 1 Global Step: 11310 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:04:25,762-Speed 5269.73 samples/sec Loss 13.4338 LearningRate 0.2971 Epoch: 1 Global Step: 11320 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:04:33,553-Speed 5258.46 samples/sec Loss 13.3936 LearningRate 0.2971 Epoch: 1 Global Step: 11330 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:04:41,332-Speed 5266.39 samples/sec Loss 13.3635 LearningRate 0.2971 Epoch: 1 Global Step: 11340 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:04:49,121-Speed 5258.81 samples/sec Loss 13.4264 LearningRate 0.2970 Epoch: 1 Global Step: 11350 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:04:56,912-Speed 5258.10 samples/sec Loss 13.3395 LearningRate 0.2970 Epoch: 1 Global Step: 11360 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:05:04,706-Speed 5256.00 samples/sec Loss 13.3704 LearningRate 0.2970 Epoch: 1 Global Step: 11370 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:05:12,484-Speed 5267.55 samples/sec Loss 13.3746 LearningRate 0.2969 Epoch: 1 Global Step: 11380 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:05:20,267-Speed 5262.92 samples/sec Loss 13.4048 LearningRate 0.2969 Epoch: 1 Global Step: 11390 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:05:28,159-Speed 5190.33 samples/sec Loss 13.4194 LearningRate 0.2969 Epoch: 1 Global Step: 11400 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:05:35,888-Speed 5301.04 samples/sec Loss 13.4849 LearningRate 0.2968 Epoch: 1 Global Step: 11410 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:05:43,646-Speed 5279.99 samples/sec Loss 13.3605 LearningRate 0.2968 Epoch: 1 Global Step: 11420 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:05:51,260-Speed 5380.50 samples/sec Loss 13.2974 LearningRate 0.2968 Epoch: 1 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:05:58,933-Speed 5338.67 samples/sec Loss 13.3457 LearningRate 0.2967 Epoch: 1 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:06:06,589-Speed 5350.97 samples/sec Loss 13.2438 LearningRate 0.2967 Epoch: 1 Global Step: 11450 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:06:14,284-Speed 5323.85 samples/sec Loss 13.1299 LearningRate 0.2967 Epoch: 1 Global Step: 11460 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:06:22,008-Speed 5303.41 samples/sec Loss 13.3141 LearningRate 0.2967 Epoch: 1 Global Step: 11470 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:06:29,818-Speed 5245.47 samples/sec Loss 13.2452 LearningRate 0.2966 Epoch: 1 Global Step: 11480 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:06:37,602-Speed 5262.49 samples/sec Loss 13.2575 LearningRate 0.2966 Epoch: 1 Global Step: 11490 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:06:45,377-Speed 5269.30 samples/sec Loss 13.2523 LearningRate 0.2966 Epoch: 1 Global Step: 11500 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:06:53,142-Speed 5275.54 samples/sec Loss 13.3322 LearningRate 0.2965 Epoch: 1 Global Step: 11510 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:07:00,879-Speed 5297.15 samples/sec Loss 13.2028 LearningRate 0.2965 Epoch: 1 Global Step: 11520 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:07:08,740-Speed 5211.54 samples/sec Loss 13.2290 LearningRate 0.2965 Epoch: 1 Global Step: 11530 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:07:16,469-Speed 5300.26 samples/sec Loss 13.2599 LearningRate 0.2964 Epoch: 1 Global Step: 11540 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:07:24,102-Speed 5367.18 samples/sec Loss 13.2839 LearningRate 0.2964 Epoch: 1 Global Step: 11550 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:07:31,694-Speed 5396.16 samples/sec Loss 13.2566 LearningRate 0.2964 Epoch: 1 Global Step: 11560 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:07:39,352-Speed 5348.90 samples/sec Loss 13.2513 LearningRate 0.2964 Epoch: 1 Global Step: 11570 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:07:46,983-Speed 5368.06 samples/sec Loss 13.2679 LearningRate 0.2963 Epoch: 1 Global Step: 11580 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:07:54,631-Speed 5356.26 samples/sec Loss 13.2683 LearningRate 0.2963 Epoch: 1 Global Step: 11590 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:08:02,311-Speed 5334.90 samples/sec Loss 13.2650 LearningRate 0.2963 Epoch: 1 Global Step: 11600 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:08:09,885-Speed 5408.48 samples/sec Loss 13.2123 LearningRate 0.2962 Epoch: 1 Global Step: 11610 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:08:17,512-Speed 5371.02 samples/sec Loss 13.2267 LearningRate 0.2962 Epoch: 1 Global Step: 11620 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:08:25,127-Speed 5379.54 samples/sec Loss 13.3253 LearningRate 0.2962 Epoch: 1 Global Step: 11630 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:08:32,733-Speed 5385.97 samples/sec Loss 13.1986 LearningRate 0.2961 Epoch: 1 Global Step: 11640 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:08:40,338-Speed 5386.75 samples/sec Loss 13.1991 LearningRate 0.2961 Epoch: 1 Global Step: 11650 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:08:47,831-Speed 5466.96 samples/sec Loss 13.2160 LearningRate 0.2961 Epoch: 1 Global Step: 11660 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:08:55,384-Speed 5423.91 samples/sec Loss 13.0999 LearningRate 0.2961 Epoch: 1 Global Step: 11670 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:09:02,921-Speed 5435.53 samples/sec Loss 13.1795 LearningRate 0.2960 Epoch: 1 Global Step: 11680 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:09:10,465-Speed 5429.88 samples/sec Loss 13.1979 LearningRate 0.2960 Epoch: 1 Global Step: 11690 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:09:17,917-Speed 5497.10 samples/sec Loss 13.2174 LearningRate 0.2960 Epoch: 1 Global Step: 11700 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:09:25,468-Speed 5425.64 samples/sec Loss 13.2076 LearningRate 0.2959 Epoch: 1 Global Step: 11710 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:09:32,978-Speed 5454.18 samples/sec Loss 13.1710 LearningRate 0.2959 Epoch: 1 Global Step: 11720 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:09:40,511-Speed 5438.64 samples/sec Loss 13.1435 LearningRate 0.2959 Epoch: 1 Global Step: 11730 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:09:48,012-Speed 5461.72 samples/sec Loss 13.0338 LearningRate 0.2958 Epoch: 1 Global Step: 11740 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:09:55,599-Speed 5398.87 samples/sec Loss 13.1484 LearningRate 0.2958 Epoch: 1 Global Step: 11750 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:10:03,196-Speed 5392.68 samples/sec Loss 13.1845 LearningRate 0.2958 Epoch: 1 Global Step: 11760 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:10:10,802-Speed 5385.81 samples/sec Loss 13.1962 LearningRate 0.2957 Epoch: 1 Global Step: 11770 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:10:18,364-Speed 5416.90 samples/sec Loss 13.2717 LearningRate 0.2957 Epoch: 1 Global Step: 11780 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:10:25,884-Speed 5447.57 samples/sec Loss 13.1031 LearningRate 0.2957 Epoch: 1 Global Step: 11790 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:10:33,491-Speed 5385.50 samples/sec Loss 13.1908 LearningRate 0.2957 Epoch: 1 Global Step: 11800 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:10:41,035-Speed 5430.31 samples/sec Loss 13.1675 LearningRate 0.2956 Epoch: 1 Global Step: 11810 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:10:48,475-Speed 5505.90 samples/sec Loss 13.1607 LearningRate 0.2956 Epoch: 1 Global Step: 11820 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:10:56,054-Speed 5404.83 samples/sec Loss 13.0563 LearningRate 0.2956 Epoch: 1 Global Step: 11830 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:11:03,603-Speed 5426.97 samples/sec Loss 13.1686 LearningRate 0.2955 Epoch: 1 Global Step: 11840 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:11:11,100-Speed 5463.94 samples/sec Loss 13.1308 LearningRate 0.2955 Epoch: 1 Global Step: 11850 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:11:18,607-Speed 5456.95 samples/sec Loss 13.2911 LearningRate 0.2955 Epoch: 1 Global Step: 11860 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:11:26,245-Speed 5363.84 samples/sec Loss 13.1577 LearningRate 0.2954 Epoch: 1 Global Step: 11870 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:11:33,845-Speed 5389.99 samples/sec Loss 13.0823 LearningRate 0.2954 Epoch: 1 Global Step: 11880 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:11:41,644-Speed 5252.82 samples/sec Loss 13.0828 LearningRate 0.2954 Epoch: 1 Global Step: 11890 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:11:49,354-Speed 5313.15 samples/sec Loss 13.1124 LearningRate 0.2954 Epoch: 1 Global Step: 11900 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:11:56,809-Speed 5495.51 samples/sec Loss 13.1026 LearningRate 0.2953 Epoch: 1 Global Step: 11910 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:12:04,358-Speed 5426.17 samples/sec Loss 13.1142 LearningRate 0.2953 Epoch: 1 Global Step: 11920 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:12:11,865-Speed 5456.84 samples/sec Loss 13.0623 LearningRate 0.2953 Epoch: 1 Global Step: 11930 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:12:19,415-Speed 5426.14 samples/sec Loss 13.0022 LearningRate 0.2952 Epoch: 1 Global Step: 11940 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:12:26,869-Speed 5496.41 samples/sec Loss 13.0730 LearningRate 0.2952 Epoch: 1 Global Step: 11950 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:12:34,417-Speed 5426.63 samples/sec Loss 13.0846 LearningRate 0.2952 Epoch: 1 Global Step: 11960 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:12:41,921-Speed 5459.79 samples/sec Loss 12.9961 LearningRate 0.2951 Epoch: 1 Global Step: 11970 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:12:49,562-Speed 5360.64 samples/sec Loss 13.1302 LearningRate 0.2951 Epoch: 1 Global Step: 11980 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:12:57,078-Speed 5451.04 samples/sec Loss 13.0519 LearningRate 0.2951 Epoch: 1 Global Step: 11990 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:13:04,622-Speed 5430.38 samples/sec Loss 13.1204 LearningRate 0.2951 Epoch: 1 Global Step: 12000 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:13:49,845-[lfw][12000]XNorm: 23.169394 Training: 2022-01-07 21:13:49,846-[lfw][12000]Accuracy-Flip: 0.99567+-0.00291 Training: 2022-01-07 21:13:49,847-[lfw][12000]Accuracy-Highest: 0.99567 Training: 2022-01-07 21:14:42,993-[cfp_fp][12000]XNorm: 20.977489 Training: 2022-01-07 21:14:42,995-[cfp_fp][12000]Accuracy-Flip: 0.97486+-0.00756 Training: 2022-01-07 21:14:42,995-[cfp_fp][12000]Accuracy-Highest: 0.97486 Training: 2022-01-07 21:15:28,466-[agedb_30][12000]XNorm: 22.756491 Training: 2022-01-07 21:15:28,467-[agedb_30][12000]Accuracy-Flip: 0.95033+-0.00856 Training: 2022-01-07 21:15:28,468-[agedb_30][12000]Accuracy-Highest: 0.95500 Training: 2022-01-07 21:15:35,913-Speed 270.74 samples/sec Loss 13.0954 LearningRate 0.2950 Epoch: 1 Global Step: 12010 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 21:15:43,457-Speed 5431.47 samples/sec Loss 12.9869 LearningRate 0.2950 Epoch: 1 Global Step: 12020 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 21:15:50,971-Speed 5452.39 samples/sec Loss 13.0627 LearningRate 0.2950 Epoch: 1 Global Step: 12030 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 21:15:58,555-Speed 5401.61 samples/sec Loss 13.0714 LearningRate 0.2949 Epoch: 1 Global Step: 12040 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 21:16:06,050-Speed 5466.31 samples/sec Loss 12.9489 LearningRate 0.2949 Epoch: 1 Global Step: 12050 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 21:16:13,504-Speed 5495.28 samples/sec Loss 13.0250 LearningRate 0.2949 Epoch: 1 Global Step: 12060 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 21:16:20,990-Speed 5473.00 samples/sec Loss 13.0391 LearningRate 0.2948 Epoch: 1 Global Step: 12070 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 21:16:28,604-Speed 5380.25 samples/sec Loss 12.9319 LearningRate 0.2948 Epoch: 1 Global Step: 12080 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 21:16:36,167-Speed 5416.41 samples/sec Loss 13.0468 LearningRate 0.2948 Epoch: 1 Global Step: 12090 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 21:16:43,687-Speed 5447.49 samples/sec Loss 13.0873 LearningRate 0.2948 Epoch: 1 Global Step: 12100 Fp16 Grad Scale: 262144 Required: 45 hours Training: 2022-01-07 21:16:51,284-Speed 5392.56 samples/sec Loss 13.0246 LearningRate 0.2947 Epoch: 1 Global Step: 12110 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 21:16:58,835-Speed 5424.87 samples/sec Loss 12.9656 LearningRate 0.2947 Epoch: 1 Global Step: 12120 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 21:17:06,416-Speed 5403.58 samples/sec Loss 12.9858 LearningRate 0.2947 Epoch: 1 Global Step: 12130 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 21:17:14,008-Speed 5395.61 samples/sec Loss 13.0218 LearningRate 0.2946 Epoch: 1 Global Step: 12140 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 21:17:21,576-Speed 5413.09 samples/sec Loss 13.0186 LearningRate 0.2946 Epoch: 1 Global Step: 12150 Fp16 Grad Scale: 131072 Required: 45 hours Training: 2022-01-07 21:17:29,217-Speed 5361.09 samples/sec Loss 12.9985 LearningRate 0.2946 Epoch: 1 Global Step: 12160 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:17:36,767-Speed 5426.00 samples/sec Loss 12.9943 LearningRate 0.2945 Epoch: 1 Global Step: 12170 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:17:44,331-Speed 5415.65 samples/sec Loss 12.9677 LearningRate 0.2945 Epoch: 1 Global Step: 12180 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:17:51,815-Speed 5474.15 samples/sec Loss 12.9440 LearningRate 0.2945 Epoch: 1 Global Step: 12190 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:17:59,377-Speed 5417.13 samples/sec Loss 12.9398 LearningRate 0.2944 Epoch: 1 Global Step: 12200 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:18:06,956-Speed 5405.03 samples/sec Loss 12.9105 LearningRate 0.2944 Epoch: 1 Global Step: 12210 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:18:14,500-Speed 5429.99 samples/sec Loss 12.8647 LearningRate 0.2944 Epoch: 1 Global Step: 12220 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:18:22,166-Speed 5344.60 samples/sec Loss 13.0062 LearningRate 0.2944 Epoch: 1 Global Step: 12230 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:18:29,683-Speed 5449.61 samples/sec Loss 13.0078 LearningRate 0.2943 Epoch: 1 Global Step: 12240 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:18:37,289-Speed 5385.24 samples/sec Loss 12.9056 LearningRate 0.2943 Epoch: 1 Global Step: 12250 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:18:44,801-Speed 5453.68 samples/sec Loss 12.9464 LearningRate 0.2943 Epoch: 1 Global Step: 12260 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:18:52,354-Speed 5424.23 samples/sec Loss 12.8854 LearningRate 0.2942 Epoch: 1 Global Step: 12270 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:18:59,896-Speed 5431.11 samples/sec Loss 12.9327 LearningRate 0.2942 Epoch: 1 Global Step: 12280 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:19:07,562-Speed 5343.86 samples/sec Loss 12.8387 LearningRate 0.2942 Epoch: 1 Global Step: 12290 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:19:15,102-Speed 5432.50 samples/sec Loss 12.8677 LearningRate 0.2941 Epoch: 1 Global Step: 12300 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:19:22,561-Speed 5492.98 samples/sec Loss 12.8989 LearningRate 0.2941 Epoch: 1 Global Step: 12310 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:19:30,052-Speed 5468.51 samples/sec Loss 12.9954 LearningRate 0.2941 Epoch: 1 Global Step: 12320 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:19:37,655-Speed 5387.35 samples/sec Loss 12.8717 LearningRate 0.2941 Epoch: 1 Global Step: 12330 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:19:45,197-Speed 5431.68 samples/sec Loss 12.9747 LearningRate 0.2940 Epoch: 1 Global Step: 12340 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:19:52,792-Speed 5394.14 samples/sec Loss 12.8236 LearningRate 0.2940 Epoch: 1 Global Step: 12350 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:20:00,312-Speed 5447.39 samples/sec Loss 12.8641 LearningRate 0.2940 Epoch: 1 Global Step: 12360 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:20:07,982-Speed 5340.63 samples/sec Loss 12.9403 LearningRate 0.2939 Epoch: 1 Global Step: 12370 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:20:15,527-Speed 5429.57 samples/sec Loss 12.9187 LearningRate 0.2939 Epoch: 1 Global Step: 12380 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:20:23,159-Speed 5368.13 samples/sec Loss 12.8235 LearningRate 0.2939 Epoch: 1 Global Step: 12390 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:20:30,782-Speed 5373.62 samples/sec Loss 12.7978 LearningRate 0.2938 Epoch: 1 Global Step: 12400 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:20:38,410-Speed 5370.27 samples/sec Loss 12.8439 LearningRate 0.2938 Epoch: 1 Global Step: 12410 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:20:46,056-Speed 5358.10 samples/sec Loss 12.9074 LearningRate 0.2938 Epoch: 1 Global Step: 12420 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:20:53,661-Speed 5386.51 samples/sec Loss 12.8756 LearningRate 0.2938 Epoch: 1 Global Step: 12430 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:21:01,312-Speed 5354.85 samples/sec Loss 12.8546 LearningRate 0.2937 Epoch: 1 Global Step: 12440 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:21:08,786-Speed 5480.30 samples/sec Loss 12.8946 LearningRate 0.2937 Epoch: 1 Global Step: 12450 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:21:16,355-Speed 5412.96 samples/sec Loss 12.9205 LearningRate 0.2937 Epoch: 1 Global Step: 12460 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:21:23,853-Speed 5463.06 samples/sec Loss 12.8130 LearningRate 0.2936 Epoch: 1 Global Step: 12470 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:21:31,389-Speed 5436.18 samples/sec Loss 12.7886 LearningRate 0.2936 Epoch: 1 Global Step: 12480 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-01-07 21:21:38,852-Speed 5488.82 samples/sec Loss 12.8410 LearningRate 0.2936 Epoch: 1 Global Step: 12490 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-01-07 21:21:46,497-Speed 5358.16 samples/sec Loss 12.8369 LearningRate 0.2935 Epoch: 1 Global Step: 12500 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-01-07 21:21:54,060-Speed 5416.54 samples/sec Loss 12.8686 LearningRate 0.2935 Epoch: 1 Global Step: 12510 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-01-07 21:22:01,586-Speed 5443.70 samples/sec Loss 12.7894 LearningRate 0.2935 Epoch: 1 Global Step: 12520 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-01-07 21:22:09,066-Speed 5476.22 samples/sec Loss 12.7894 LearningRate 0.2935 Epoch: 1 Global Step: 12530 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-01-07 21:22:16,674-Speed 5384.76 samples/sec Loss 12.8669 LearningRate 0.2934 Epoch: 1 Global Step: 12540 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-01-07 21:22:24,165-Speed 5468.80 samples/sec Loss 12.7800 LearningRate 0.2934 Epoch: 1 Global Step: 12550 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-01-07 21:22:31,783-Speed 5377.21 samples/sec Loss 12.8728 LearningRate 0.2934 Epoch: 1 Global Step: 12560 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-01-07 21:22:39,363-Speed 5404.65 samples/sec Loss 12.8529 LearningRate 0.2933 Epoch: 1 Global Step: 12570 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-01-07 21:22:47,048-Speed 5330.25 samples/sec Loss 12.7943 LearningRate 0.2933 Epoch: 1 Global Step: 12580 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 21:22:54,521-Speed 5481.46 samples/sec Loss 12.7103 LearningRate 0.2933 Epoch: 1 Global Step: 12590 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 21:23:02,089-Speed 5413.32 samples/sec Loss 12.8623 LearningRate 0.2932 Epoch: 1 Global Step: 12600 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 21:23:09,577-Speed 5470.78 samples/sec Loss 12.8043 LearningRate 0.2932 Epoch: 1 Global Step: 12610 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 21:23:17,162-Speed 5400.36 samples/sec Loss 12.8633 LearningRate 0.2932 Epoch: 1 Global Step: 12620 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 21:23:24,652-Speed 5469.17 samples/sec Loss 12.7585 LearningRate 0.2932 Epoch: 1 Global Step: 12630 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 21:23:32,276-Speed 5373.37 samples/sec Loss 12.7518 LearningRate 0.2931 Epoch: 1 Global Step: 12640 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 21:23:39,920-Speed 5359.56 samples/sec Loss 12.7895 LearningRate 0.2931 Epoch: 1 Global Step: 12650 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 21:23:47,604-Speed 5331.36 samples/sec Loss 12.8228 LearningRate 0.2931 Epoch: 1 Global Step: 12660 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 21:23:55,097-Speed 5466.74 samples/sec Loss 12.8635 LearningRate 0.2930 Epoch: 1 Global Step: 12670 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-01-07 21:24:02,671-Speed 5408.21 samples/sec Loss 12.8678 LearningRate 0.2930 Epoch: 1 Global Step: 12680 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:24:10,180-Speed 5456.21 samples/sec Loss 12.9057 LearningRate 0.2930 Epoch: 1 Global Step: 12690 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:24:17,759-Speed 5404.80 samples/sec Loss 12.8906 LearningRate 0.2929 Epoch: 1 Global Step: 12700 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:24:25,301-Speed 5431.26 samples/sec Loss 12.7420 LearningRate 0.2929 Epoch: 1 Global Step: 12710 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:24:32,885-Speed 5401.96 samples/sec Loss 12.7021 LearningRate 0.2929 Epoch: 1 Global Step: 12720 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:24:40,461-Speed 5407.23 samples/sec Loss 12.8497 LearningRate 0.2929 Epoch: 1 Global Step: 12730 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:24:47,939-Speed 5478.06 samples/sec Loss 12.8073 LearningRate 0.2928 Epoch: 1 Global Step: 12740 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:24:55,467-Speed 5441.48 samples/sec Loss 12.6738 LearningRate 0.2928 Epoch: 1 Global Step: 12750 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:25:03,074-Speed 5385.00 samples/sec Loss 12.7060 LearningRate 0.2928 Epoch: 1 Global Step: 12760 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:25:10,658-Speed 5402.52 samples/sec Loss 12.7301 LearningRate 0.2927 Epoch: 1 Global Step: 12770 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:25:18,133-Speed 5479.84 samples/sec Loss 12.7431 LearningRate 0.2927 Epoch: 1 Global Step: 12780 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:25:25,678-Speed 5428.87 samples/sec Loss 12.8561 LearningRate 0.2927 Epoch: 1 Global Step: 12790 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:25:33,284-Speed 5386.28 samples/sec Loss 12.7123 LearningRate 0.2926 Epoch: 1 Global Step: 12800 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:25:40,866-Speed 5416.22 samples/sec Loss 12.6940 LearningRate 0.2926 Epoch: 1 Global Step: 12810 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:25:48,343-Speed 5479.14 samples/sec Loss 12.6908 LearningRate 0.2926 Epoch: 1 Global Step: 12820 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:25:55,833-Speed 5468.79 samples/sec Loss 12.6478 LearningRate 0.2926 Epoch: 1 Global Step: 12830 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:26:03,336-Speed 5459.82 samples/sec Loss 12.7603 LearningRate 0.2925 Epoch: 1 Global Step: 12840 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:26:10,789-Speed 5496.66 samples/sec Loss 12.6845 LearningRate 0.2925 Epoch: 1 Global Step: 12850 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:26:18,351-Speed 5417.51 samples/sec Loss 12.6644 LearningRate 0.2925 Epoch: 1 Global Step: 12860 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:26:25,881-Speed 5439.75 samples/sec Loss 12.7377 LearningRate 0.2924 Epoch: 1 Global Step: 12870 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:26:33,562-Speed 5333.10 samples/sec Loss 12.6519 LearningRate 0.2924 Epoch: 1 Global Step: 12880 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:26:41,063-Speed 5462.07 samples/sec Loss 12.7072 LearningRate 0.2924 Epoch: 1 Global Step: 12890 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:26:48,628-Speed 5414.64 samples/sec Loss 12.6974 LearningRate 0.2923 Epoch: 1 Global Step: 12900 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:26:56,180-Speed 5424.29 samples/sec Loss 12.7156 LearningRate 0.2923 Epoch: 1 Global Step: 12910 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:27:03,993-Speed 5244.05 samples/sec Loss 12.6229 LearningRate 0.2923 Epoch: 1 Global Step: 12920 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:27:11,561-Speed 5413.08 samples/sec Loss 12.6430 LearningRate 0.2923 Epoch: 1 Global Step: 12930 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:27:19,290-Speed 5300.19 samples/sec Loss 12.7075 LearningRate 0.2922 Epoch: 1 Global Step: 12940 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:27:27,043-Speed 5283.61 samples/sec Loss 12.7042 LearningRate 0.2922 Epoch: 1 Global Step: 12950 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:27:34,759-Speed 5309.20 samples/sec Loss 12.6862 LearningRate 0.2922 Epoch: 1 Global Step: 12960 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:27:42,385-Speed 5372.54 samples/sec Loss 12.6520 LearningRate 0.2921 Epoch: 1 Global Step: 12970 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:27:50,034-Speed 5355.09 samples/sec Loss 12.7256 LearningRate 0.2921 Epoch: 1 Global Step: 12980 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:27:57,536-Speed 5460.28 samples/sec Loss 12.6721 LearningRate 0.2921 Epoch: 1 Global Step: 12990 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:28:05,145-Speed 5384.02 samples/sec Loss 12.7526 LearningRate 0.2920 Epoch: 1 Global Step: 13000 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:28:12,714-Speed 5412.61 samples/sec Loss 12.7668 LearningRate 0.2920 Epoch: 1 Global Step: 13010 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:28:20,405-Speed 5326.38 samples/sec Loss 12.6150 LearningRate 0.2920 Epoch: 1 Global Step: 13020 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:28:27,984-Speed 5405.37 samples/sec Loss 12.6571 LearningRate 0.2920 Epoch: 1 Global Step: 13030 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:28:35,512-Speed 5441.77 samples/sec Loss 12.6693 LearningRate 0.2919 Epoch: 1 Global Step: 13040 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:28:43,059-Speed 5428.38 samples/sec Loss 12.6376 LearningRate 0.2919 Epoch: 1 Global Step: 13050 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:28:50,716-Speed 5349.68 samples/sec Loss 12.6570 LearningRate 0.2919 Epoch: 1 Global Step: 13060 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:28:58,179-Speed 5488.96 samples/sec Loss 12.7047 LearningRate 0.2918 Epoch: 1 Global Step: 13070 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:29:05,890-Speed 5312.89 samples/sec Loss 12.5767 LearningRate 0.2918 Epoch: 1 Global Step: 13080 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:29:13,471-Speed 5403.89 samples/sec Loss 12.7304 LearningRate 0.2918 Epoch: 1 Global Step: 13090 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:29:21,086-Speed 5379.12 samples/sec Loss 12.6837 LearningRate 0.2917 Epoch: 1 Global Step: 13100 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:29:28,787-Speed 5319.87 samples/sec Loss 12.5683 LearningRate 0.2917 Epoch: 1 Global Step: 13110 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:29:36,317-Speed 5440.31 samples/sec Loss 12.7078 LearningRate 0.2917 Epoch: 1 Global Step: 13120 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:29:43,981-Speed 5345.32 samples/sec Loss 12.6879 LearningRate 0.2917 Epoch: 1 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:29:51,547-Speed 5413.92 samples/sec Loss 12.7269 LearningRate 0.2916 Epoch: 1 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:29:59,101-Speed 5423.27 samples/sec Loss 12.6655 LearningRate 0.2916 Epoch: 1 Global Step: 13150 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:30:06,641-Speed 5433.52 samples/sec Loss 12.6647 LearningRate 0.2916 Epoch: 1 Global Step: 13160 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:30:14,224-Speed 5402.49 samples/sec Loss 12.6477 LearningRate 0.2915 Epoch: 1 Global Step: 13170 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:30:21,758-Speed 5437.00 samples/sec Loss 12.6472 LearningRate 0.2915 Epoch: 1 Global Step: 13180 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:30:29,283-Speed 5443.66 samples/sec Loss 12.6025 LearningRate 0.2915 Epoch: 1 Global Step: 13190 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:30:36,807-Speed 5444.64 samples/sec Loss 12.6491 LearningRate 0.2914 Epoch: 1 Global Step: 13200 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:30:44,350-Speed 5431.49 samples/sec Loss 12.5566 LearningRate 0.2914 Epoch: 1 Global Step: 13210 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:30:51,967-Speed 5378.12 samples/sec Loss 12.5478 LearningRate 0.2914 Epoch: 1 Global Step: 13220 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:30:59,496-Speed 5440.52 samples/sec Loss 12.6399 LearningRate 0.2914 Epoch: 1 Global Step: 13230 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:31:07,166-Speed 5340.85 samples/sec Loss 12.5809 LearningRate 0.2913 Epoch: 1 Global Step: 13240 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:31:14,675-Speed 5455.92 samples/sec Loss 12.5658 LearningRate 0.2913 Epoch: 1 Global Step: 13250 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:31:22,282-Speed 5385.36 samples/sec Loss 12.6075 LearningRate 0.2913 Epoch: 1 Global Step: 13260 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:31:29,783-Speed 5460.84 samples/sec Loss 12.6126 LearningRate 0.2912 Epoch: 1 Global Step: 13270 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:31:37,354-Speed 5411.00 samples/sec Loss 12.6383 LearningRate 0.2912 Epoch: 1 Global Step: 13280 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:31:44,882-Speed 5441.63 samples/sec Loss 12.5946 LearningRate 0.2912 Epoch: 1 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:31:52,499-Speed 5378.19 samples/sec Loss 12.5775 LearningRate 0.2911 Epoch: 1 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:32:00,033-Speed 5437.43 samples/sec Loss 12.5519 LearningRate 0.2911 Epoch: 1 Global Step: 13310 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:32:07,625-Speed 5395.43 samples/sec Loss 12.5724 LearningRate 0.2911 Epoch: 1 Global Step: 13320 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:32:15,168-Speed 5431.16 samples/sec Loss 12.5032 LearningRate 0.2910 Epoch: 1 Global Step: 13330 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:32:22,717-Speed 5426.59 samples/sec Loss 12.6105 LearningRate 0.2910 Epoch: 1 Global Step: 13340 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:32:30,523-Speed 5248.19 samples/sec Loss 12.5599 LearningRate 0.2910 Epoch: 1 Global Step: 13350 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:32:38,245-Speed 5304.72 samples/sec Loss 12.5549 LearningRate 0.2910 Epoch: 1 Global Step: 13360 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:32:45,869-Speed 5373.12 samples/sec Loss 12.5501 LearningRate 0.2909 Epoch: 1 Global Step: 13370 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:32:53,510-Speed 5361.50 samples/sec Loss 12.4573 LearningRate 0.2909 Epoch: 1 Global Step: 13380 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:33:01,063-Speed 5423.36 samples/sec Loss 12.4904 LearningRate 0.2909 Epoch: 1 Global Step: 13390 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:33:08,633-Speed 5411.42 samples/sec Loss 12.5328 LearningRate 0.2908 Epoch: 1 Global Step: 13400 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:33:16,174-Speed 5432.75 samples/sec Loss 12.5116 LearningRate 0.2908 Epoch: 1 Global Step: 13410 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:33:23,758-Speed 5401.14 samples/sec Loss 12.5449 LearningRate 0.2908 Epoch: 1 Global Step: 13420 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:33:31,347-Speed 5397.89 samples/sec Loss 12.5721 LearningRate 0.2908 Epoch: 1 Global Step: 13430 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:33:38,790-Speed 5504.42 samples/sec Loss 12.5708 LearningRate 0.2907 Epoch: 1 Global Step: 13440 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:33:46,337-Speed 5427.69 samples/sec Loss 12.5217 LearningRate 0.2907 Epoch: 1 Global Step: 13450 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:33:53,904-Speed 5413.67 samples/sec Loss 12.4674 LearningRate 0.2907 Epoch: 1 Global Step: 13460 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:34:01,540-Speed 5365.21 samples/sec Loss 12.5462 LearningRate 0.2906 Epoch: 1 Global Step: 13470 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:34:09,005-Speed 5487.92 samples/sec Loss 12.5960 LearningRate 0.2906 Epoch: 1 Global Step: 13480 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:34:16,595-Speed 5396.90 samples/sec Loss 12.5483 LearningRate 0.2906 Epoch: 1 Global Step: 13490 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:34:24,073-Speed 5478.38 samples/sec Loss 12.4156 LearningRate 0.2905 Epoch: 1 Global Step: 13500 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:34:31,800-Speed 5301.92 samples/sec Loss 12.5294 LearningRate 0.2905 Epoch: 1 Global Step: 13510 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:34:39,354-Speed 5423.18 samples/sec Loss 12.4290 LearningRate 0.2905 Epoch: 1 Global Step: 13520 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:34:46,781-Speed 5515.63 samples/sec Loss 12.5417 LearningRate 0.2905 Epoch: 1 Global Step: 13530 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:34:54,307-Speed 5442.80 samples/sec Loss 12.5317 LearningRate 0.2904 Epoch: 1 Global Step: 13540 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:35:01,766-Speed 5492.61 samples/sec Loss 12.4753 LearningRate 0.2904 Epoch: 1 Global Step: 13550 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:35:09,253-Speed 5471.85 samples/sec Loss 12.4583 LearningRate 0.2904 Epoch: 1 Global Step: 13560 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:35:16,705-Speed 5497.08 samples/sec Loss 12.4278 LearningRate 0.2903 Epoch: 1 Global Step: 13570 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:35:24,203-Speed 5463.12 samples/sec Loss 12.4822 LearningRate 0.2903 Epoch: 1 Global Step: 13580 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:35:31,749-Speed 5428.94 samples/sec Loss 12.4341 LearningRate 0.2903 Epoch: 1 Global Step: 13590 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:35:39,224-Speed 5480.21 samples/sec Loss 12.5106 LearningRate 0.2902 Epoch: 1 Global Step: 13600 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:35:46,710-Speed 5472.45 samples/sec Loss 12.4517 LearningRate 0.2902 Epoch: 1 Global Step: 13610 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:35:54,213-Speed 5459.69 samples/sec Loss 12.5062 LearningRate 0.2902 Epoch: 1 Global Step: 13620 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:36:01,769-Speed 5421.50 samples/sec Loss 12.4933 LearningRate 0.2902 Epoch: 1 Global Step: 13630 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:36:09,282-Speed 5453.04 samples/sec Loss 12.5672 LearningRate 0.2901 Epoch: 1 Global Step: 13640 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:36:16,815-Speed 5437.64 samples/sec Loss 12.5784 LearningRate 0.2901 Epoch: 1 Global Step: 13650 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:36:24,380-Speed 5415.14 samples/sec Loss 12.4710 LearningRate 0.2901 Epoch: 1 Global Step: 13660 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:36:31,954-Speed 5409.27 samples/sec Loss 12.4979 LearningRate 0.2900 Epoch: 1 Global Step: 13670 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:36:39,434-Speed 5476.71 samples/sec Loss 12.4648 LearningRate 0.2900 Epoch: 1 Global Step: 13680 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:36:46,871-Speed 5508.05 samples/sec Loss 12.3590 LearningRate 0.2900 Epoch: 1 Global Step: 13690 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:36:54,479-Speed 5384.45 samples/sec Loss 12.4690 LearningRate 0.2899 Epoch: 1 Global Step: 13700 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:37:02,005-Speed 5442.81 samples/sec Loss 12.4087 LearningRate 0.2899 Epoch: 1 Global Step: 13710 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:37:09,543-Speed 5434.76 samples/sec Loss 12.5434 LearningRate 0.2899 Epoch: 1 Global Step: 13720 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:37:16,990-Speed 5500.71 samples/sec Loss 12.4477 LearningRate 0.2899 Epoch: 1 Global Step: 13730 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:37:24,461-Speed 5483.52 samples/sec Loss 12.4172 LearningRate 0.2898 Epoch: 1 Global Step: 13740 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:37:31,912-Speed 5497.93 samples/sec Loss 12.4540 LearningRate 0.2898 Epoch: 1 Global Step: 13750 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:37:39,499-Speed 5399.46 samples/sec Loss 12.3987 LearningRate 0.2898 Epoch: 1 Global Step: 13760 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:37:46,993-Speed 5466.38 samples/sec Loss 12.3342 LearningRate 0.2897 Epoch: 1 Global Step: 13770 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:37:54,591-Speed 5391.60 samples/sec Loss 12.4145 LearningRate 0.2897 Epoch: 1 Global Step: 13780 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:38:02,076-Speed 5472.88 samples/sec Loss 12.4494 LearningRate 0.2897 Epoch: 1 Global Step: 13790 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:38:09,603-Speed 5443.10 samples/sec Loss 12.4702 LearningRate 0.2896 Epoch: 1 Global Step: 13800 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:38:17,112-Speed 5455.05 samples/sec Loss 12.4359 LearningRate 0.2896 Epoch: 1 Global Step: 13810 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:38:24,655-Speed 5430.76 samples/sec Loss 12.4807 LearningRate 0.2896 Epoch: 1 Global Step: 13820 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:38:32,197-Speed 5431.41 samples/sec Loss 12.3940 LearningRate 0.2896 Epoch: 1 Global Step: 13830 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:38:39,730-Speed 5438.86 samples/sec Loss 12.3387 LearningRate 0.2895 Epoch: 1 Global Step: 13840 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:38:47,199-Speed 5484.15 samples/sec Loss 12.4800 LearningRate 0.2895 Epoch: 1 Global Step: 13850 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:38:54,783-Speed 5401.33 samples/sec Loss 12.2948 LearningRate 0.2895 Epoch: 1 Global Step: 13860 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:39:02,402-Speed 5376.99 samples/sec Loss 12.3457 LearningRate 0.2894 Epoch: 1 Global Step: 13870 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:39:09,969-Speed 5413.96 samples/sec Loss 12.4804 LearningRate 0.2894 Epoch: 1 Global Step: 13880 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:39:17,449-Speed 5476.40 samples/sec Loss 12.3843 LearningRate 0.2894 Epoch: 1 Global Step: 13890 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:39:24,906-Speed 5493.43 samples/sec Loss 12.3827 LearningRate 0.2893 Epoch: 1 Global Step: 13900 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:39:32,442-Speed 5436.21 samples/sec Loss 12.5234 LearningRate 0.2893 Epoch: 1 Global Step: 13910 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:39:39,868-Speed 5516.39 samples/sec Loss 12.4804 LearningRate 0.2893 Epoch: 1 Global Step: 13920 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:39:47,318-Speed 5498.71 samples/sec Loss 12.4513 LearningRate 0.2893 Epoch: 1 Global Step: 13930 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:39:54,838-Speed 5447.42 samples/sec Loss 12.3627 LearningRate 0.2892 Epoch: 1 Global Step: 13940 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:40:02,385-Speed 5428.14 samples/sec Loss 12.3609 LearningRate 0.2892 Epoch: 1 Global Step: 13950 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:40:09,852-Speed 5486.62 samples/sec Loss 12.3971 LearningRate 0.2892 Epoch: 1 Global Step: 13960 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:40:17,438-Speed 5399.98 samples/sec Loss 12.3756 LearningRate 0.2891 Epoch: 1 Global Step: 13970 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:40:25,016-Speed 5405.77 samples/sec Loss 12.3415 LearningRate 0.2891 Epoch: 1 Global Step: 13980 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:40:32,507-Speed 5469.02 samples/sec Loss 12.3365 LearningRate 0.2891 Epoch: 1 Global Step: 13990 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:40:40,066-Speed 5419.14 samples/sec Loss 12.4009 LearningRate 0.2890 Epoch: 1 Global Step: 14000 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:41:24,458-[lfw][14000]XNorm: 22.441222 Training: 2022-01-07 21:41:24,459-[lfw][14000]Accuracy-Flip: 0.99583+-0.00271 Training: 2022-01-07 21:41:24,460-[lfw][14000]Accuracy-Highest: 0.99583 Training: 2022-01-07 21:42:17,480-[cfp_fp][14000]XNorm: 20.112162 Training: 2022-01-07 21:42:17,482-[cfp_fp][14000]Accuracy-Flip: 0.97571+-0.00723 Training: 2022-01-07 21:42:17,482-[cfp_fp][14000]Accuracy-Highest: 0.97571 Training: 2022-01-07 21:43:02,910-[agedb_30][14000]XNorm: 21.935689 Training: 2022-01-07 21:43:02,911-[agedb_30][14000]Accuracy-Flip: 0.96083+-0.01023 Training: 2022-01-07 21:43:02,911-[agedb_30][14000]Accuracy-Highest: 0.96083 Training: 2022-01-07 21:43:10,388-Speed 272.49 samples/sec Loss 12.4301 LearningRate 0.2890 Epoch: 1 Global Step: 14010 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:43:17,823-Speed 5510.53 samples/sec Loss 12.3692 LearningRate 0.2890 Epoch: 1 Global Step: 14020 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:43:25,363-Speed 5433.47 samples/sec Loss 12.3483 LearningRate 0.2890 Epoch: 1 Global Step: 14030 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:43:32,835-Speed 5482.86 samples/sec Loss 12.4098 LearningRate 0.2889 Epoch: 1 Global Step: 14040 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:43:40,328-Speed 5467.42 samples/sec Loss 12.3178 LearningRate 0.2889 Epoch: 1 Global Step: 14050 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:43:47,849-Speed 5447.12 samples/sec Loss 12.3039 LearningRate 0.2889 Epoch: 1 Global Step: 14060 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:43:55,289-Speed 5505.65 samples/sec Loss 12.4757 LearningRate 0.2888 Epoch: 1 Global Step: 14070 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:44:02,766-Speed 5478.66 samples/sec Loss 12.4649 LearningRate 0.2888 Epoch: 1 Global Step: 14080 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:44:10,344-Speed 5406.28 samples/sec Loss 12.3674 LearningRate 0.2888 Epoch: 1 Global Step: 14090 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:44:17,887-Speed 5431.08 samples/sec Loss 12.3085 LearningRate 0.2887 Epoch: 1 Global Step: 14100 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:44:25,404-Speed 5449.04 samples/sec Loss 12.3516 LearningRate 0.2887 Epoch: 1 Global Step: 14110 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:44:32,970-Speed 5414.21 samples/sec Loss 12.2761 LearningRate 0.2887 Epoch: 1 Global Step: 14120 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:44:40,378-Speed 5530.47 samples/sec Loss 12.3513 LearningRate 0.2887 Epoch: 1 Global Step: 14130 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:44:48,069-Speed 5326.45 samples/sec Loss 12.3753 LearningRate 0.2886 Epoch: 1 Global Step: 14140 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:44:55,667-Speed 5391.47 samples/sec Loss 12.3664 LearningRate 0.2886 Epoch: 1 Global Step: 14150 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:45:03,179-Speed 5452.95 samples/sec Loss 12.3876 LearningRate 0.2886 Epoch: 1 Global Step: 14160 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:45:10,853-Speed 5338.08 samples/sec Loss 12.3115 LearningRate 0.2885 Epoch: 1 Global Step: 14170 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:45:18,404-Speed 5425.62 samples/sec Loss 12.3335 LearningRate 0.2885 Epoch: 1 Global Step: 14180 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:45:25,834-Speed 5512.83 samples/sec Loss 12.2460 LearningRate 0.2885 Epoch: 1 Global Step: 14190 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:45:33,277-Speed 5504.09 samples/sec Loss 12.3597 LearningRate 0.2884 Epoch: 1 Global Step: 14200 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:45:40,880-Speed 5388.71 samples/sec Loss 12.3711 LearningRate 0.2884 Epoch: 1 Global Step: 14210 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:45:48,391-Speed 5453.88 samples/sec Loss 12.2943 LearningRate 0.2884 Epoch: 1 Global Step: 14220 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:45:55,956-Speed 5415.29 samples/sec Loss 12.2129 LearningRate 0.2884 Epoch: 1 Global Step: 14230 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:46:03,370-Speed 5525.21 samples/sec Loss 12.3628 LearningRate 0.2883 Epoch: 1 Global Step: 14240 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:46:10,906-Speed 5436.10 samples/sec Loss 12.3240 LearningRate 0.2883 Epoch: 1 Global Step: 14250 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:46:18,415-Speed 5455.60 samples/sec Loss 12.2284 LearningRate 0.2883 Epoch: 1 Global Step: 14260 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:46:25,960-Speed 5429.45 samples/sec Loss 12.3104 LearningRate 0.2882 Epoch: 1 Global Step: 14270 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:46:33,480-Speed 5447.29 samples/sec Loss 12.3352 LearningRate 0.2882 Epoch: 1 Global Step: 14280 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:46:40,953-Speed 5481.74 samples/sec Loss 12.2764 LearningRate 0.2882 Epoch: 1 Global Step: 14290 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:46:48,459-Speed 5457.09 samples/sec Loss 12.2269 LearningRate 0.2881 Epoch: 1 Global Step: 14300 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:46:55,861-Speed 5534.53 samples/sec Loss 12.3147 LearningRate 0.2881 Epoch: 1 Global Step: 14310 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:47:03,322-Speed 5490.99 samples/sec Loss 12.3252 LearningRate 0.2881 Epoch: 1 Global Step: 14320 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:47:10,729-Speed 5530.34 samples/sec Loss 12.2893 LearningRate 0.2881 Epoch: 1 Global Step: 14330 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:47:18,203-Speed 5480.85 samples/sec Loss 12.2000 LearningRate 0.2880 Epoch: 1 Global Step: 14340 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:47:25,652-Speed 5499.94 samples/sec Loss 12.3291 LearningRate 0.2880 Epoch: 1 Global Step: 14350 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:47:33,126-Speed 5481.05 samples/sec Loss 12.3541 LearningRate 0.2880 Epoch: 1 Global Step: 14360 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:47:40,554-Speed 5514.68 samples/sec Loss 12.3236 LearningRate 0.2879 Epoch: 1 Global Step: 14370 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:47:47,973-Speed 5522.25 samples/sec Loss 12.3368 LearningRate 0.2879 Epoch: 1 Global Step: 14380 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:47:55,428-Speed 5494.72 samples/sec Loss 12.2552 LearningRate 0.2879 Epoch: 1 Global Step: 14390 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:48:02,858-Speed 5513.96 samples/sec Loss 12.2552 LearningRate 0.2878 Epoch: 1 Global Step: 14400 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:48:10,290-Speed 5511.81 samples/sec Loss 12.2727 LearningRate 0.2878 Epoch: 1 Global Step: 14410 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:48:17,664-Speed 5555.32 samples/sec Loss 12.2455 LearningRate 0.2878 Epoch: 1 Global Step: 14420 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:48:25,069-Speed 5532.64 samples/sec Loss 12.2629 LearningRate 0.2878 Epoch: 1 Global Step: 14430 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:48:32,520-Speed 5497.80 samples/sec Loss 12.2335 LearningRate 0.2877 Epoch: 1 Global Step: 14440 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:48:39,970-Speed 5498.69 samples/sec Loss 12.2537 LearningRate 0.2877 Epoch: 1 Global Step: 14450 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:48:47,472-Speed 5461.01 samples/sec Loss 12.2840 LearningRate 0.2877 Epoch: 1 Global Step: 14460 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:48:54,946-Speed 5481.15 samples/sec Loss 12.2940 LearningRate 0.2876 Epoch: 1 Global Step: 14470 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:49:02,455-Speed 5455.29 samples/sec Loss 12.2502 LearningRate 0.2876 Epoch: 1 Global Step: 14480 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:49:09,929-Speed 5481.20 samples/sec Loss 12.1756 LearningRate 0.2876 Epoch: 1 Global Step: 14490 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:49:17,412-Speed 5474.52 samples/sec Loss 12.3102 LearningRate 0.2876 Epoch: 1 Global Step: 14500 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:49:24,905-Speed 5466.52 samples/sec Loss 12.2763 LearningRate 0.2875 Epoch: 1 Global Step: 14510 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:49:32,334-Speed 5514.65 samples/sec Loss 12.1500 LearningRate 0.2875 Epoch: 1 Global Step: 14520 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:49:39,787-Speed 5496.56 samples/sec Loss 12.1094 LearningRate 0.2875 Epoch: 1 Global Step: 14530 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:49:47,341-Speed 5422.90 samples/sec Loss 12.1229 LearningRate 0.2874 Epoch: 1 Global Step: 14540 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:49:55,093-Speed 5284.06 samples/sec Loss 12.2309 LearningRate 0.2874 Epoch: 1 Global Step: 14550 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:50:02,567-Speed 5481.49 samples/sec Loss 12.2540 LearningRate 0.2874 Epoch: 1 Global Step: 14560 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:50:10,028-Speed 5490.69 samples/sec Loss 12.2431 LearningRate 0.2873 Epoch: 1 Global Step: 14570 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:50:17,480-Speed 5497.49 samples/sec Loss 12.2370 LearningRate 0.2873 Epoch: 1 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:50:25,078-Speed 5391.15 samples/sec Loss 12.3006 LearningRate 0.2873 Epoch: 1 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:50:32,556-Speed 5478.45 samples/sec Loss 12.2801 LearningRate 0.2873 Epoch: 1 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:50:40,021-Speed 5487.88 samples/sec Loss 12.1951 LearningRate 0.2872 Epoch: 1 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:50:47,438-Speed 5523.50 samples/sec Loss 12.2061 LearningRate 0.2872 Epoch: 1 Global Step: 14620 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:50:54,944-Speed 5457.25 samples/sec Loss 12.1984 LearningRate 0.2872 Epoch: 1 Global Step: 14630 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:51:02,470-Speed 5443.16 samples/sec Loss 12.1669 LearningRate 0.2871 Epoch: 1 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:51:09,944-Speed 5481.15 samples/sec Loss 12.1411 LearningRate 0.2871 Epoch: 1 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:51:17,533-Speed 5398.02 samples/sec Loss 12.2352 LearningRate 0.2871 Epoch: 1 Global Step: 14660 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:51:25,135-Speed 5388.73 samples/sec Loss 12.2891 LearningRate 0.2870 Epoch: 1 Global Step: 14670 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:51:32,641-Speed 5457.93 samples/sec Loss 12.2388 LearningRate 0.2870 Epoch: 1 Global Step: 14680 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:51:40,177-Speed 5436.58 samples/sec Loss 12.3763 LearningRate 0.2870 Epoch: 1 Global Step: 14690 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:51:47,668-Speed 5468.32 samples/sec Loss 12.2010 LearningRate 0.2870 Epoch: 1 Global Step: 14700 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:51:55,143-Speed 5480.79 samples/sec Loss 12.2460 LearningRate 0.2869 Epoch: 1 Global Step: 14710 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:52:02,659-Speed 5450.41 samples/sec Loss 12.2294 LearningRate 0.2869 Epoch: 1 Global Step: 14720 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:52:10,197-Speed 5433.99 samples/sec Loss 12.2499 LearningRate 0.2869 Epoch: 1 Global Step: 14730 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:52:17,649-Speed 5497.84 samples/sec Loss 12.2377 LearningRate 0.2868 Epoch: 1 Global Step: 14740 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:52:25,107-Speed 5493.33 samples/sec Loss 12.2317 LearningRate 0.2868 Epoch: 1 Global Step: 14750 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:52:32,554-Speed 5500.82 samples/sec Loss 12.2076 LearningRate 0.2868 Epoch: 1 Global Step: 14760 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:52:40,050-Speed 5465.14 samples/sec Loss 12.1738 LearningRate 0.2867 Epoch: 1 Global Step: 14770 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:52:47,469-Speed 5521.28 samples/sec Loss 12.1298 LearningRate 0.2867 Epoch: 1 Global Step: 14780 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:52:54,919-Speed 5499.30 samples/sec Loss 12.2354 LearningRate 0.2867 Epoch: 1 Global Step: 14790 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:53:02,468-Speed 5426.25 samples/sec Loss 12.2319 LearningRate 0.2867 Epoch: 1 Global Step: 14800 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:53:09,955-Speed 5471.43 samples/sec Loss 12.2164 LearningRate 0.2866 Epoch: 1 Global Step: 14810 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:53:17,367-Speed 5527.41 samples/sec Loss 12.2844 LearningRate 0.2866 Epoch: 1 Global Step: 14820 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:53:24,805-Speed 5507.47 samples/sec Loss 12.2772 LearningRate 0.2866 Epoch: 1 Global Step: 14830 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:53:32,238-Speed 5511.54 samples/sec Loss 12.1402 LearningRate 0.2865 Epoch: 1 Global Step: 14840 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:53:39,721-Speed 5474.01 samples/sec Loss 12.1795 LearningRate 0.2865 Epoch: 1 Global Step: 14850 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:53:47,270-Speed 5427.15 samples/sec Loss 12.1402 LearningRate 0.2865 Epoch: 1 Global Step: 14860 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:53:54,890-Speed 5375.67 samples/sec Loss 12.2010 LearningRate 0.2864 Epoch: 1 Global Step: 14870 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:54:02,400-Speed 5455.14 samples/sec Loss 12.2071 LearningRate 0.2864 Epoch: 1 Global Step: 14880 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:54:09,898-Speed 5463.66 samples/sec Loss 12.0670 LearningRate 0.2864 Epoch: 1 Global Step: 14890 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:54:17,423-Speed 5443.33 samples/sec Loss 12.1757 LearningRate 0.2864 Epoch: 1 Global Step: 14900 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:54:25,047-Speed 5374.06 samples/sec Loss 12.1603 LearningRate 0.2863 Epoch: 1 Global Step: 14910 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:54:32,526-Speed 5477.01 samples/sec Loss 12.1607 LearningRate 0.2863 Epoch: 1 Global Step: 14920 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:54:40,011-Speed 5473.32 samples/sec Loss 12.1211 LearningRate 0.2863 Epoch: 1 Global Step: 14930 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:54:47,511-Speed 5461.88 samples/sec Loss 12.0481 LearningRate 0.2862 Epoch: 1 Global Step: 14940 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:54:55,127-Speed 5379.22 samples/sec Loss 12.2598 LearningRate 0.2862 Epoch: 1 Global Step: 14950 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:55:02,649-Speed 5445.51 samples/sec Loss 12.1879 LearningRate 0.2862 Epoch: 1 Global Step: 14960 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:55:10,200-Speed 5425.65 samples/sec Loss 12.1482 LearningRate 0.2862 Epoch: 1 Global Step: 14970 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:55:17,772-Speed 5410.16 samples/sec Loss 12.1169 LearningRate 0.2861 Epoch: 1 Global Step: 14980 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:55:25,308-Speed 5435.82 samples/sec Loss 12.1402 LearningRate 0.2861 Epoch: 1 Global Step: 14990 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:55:32,922-Speed 5380.65 samples/sec Loss 12.2179 LearningRate 0.2861 Epoch: 1 Global Step: 15000 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:55:40,484-Speed 5416.40 samples/sec Loss 12.1599 LearningRate 0.2860 Epoch: 1 Global Step: 15010 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:55:47,975-Speed 5469.22 samples/sec Loss 12.1988 LearningRate 0.2860 Epoch: 1 Global Step: 15020 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:55:55,486-Speed 5454.27 samples/sec Loss 12.1944 LearningRate 0.2860 Epoch: 1 Global Step: 15030 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:56:03,143-Speed 5349.71 samples/sec Loss 12.1404 LearningRate 0.2859 Epoch: 1 Global Step: 15040 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:56:10,746-Speed 5387.99 samples/sec Loss 12.0876 LearningRate 0.2859 Epoch: 1 Global Step: 15050 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:56:18,416-Speed 5340.59 samples/sec Loss 12.1564 LearningRate 0.2859 Epoch: 1 Global Step: 15060 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:56:26,032-Speed 5379.52 samples/sec Loss 12.1160 LearningRate 0.2859 Epoch: 1 Global Step: 15070 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:56:33,596-Speed 5415.50 samples/sec Loss 12.0452 LearningRate 0.2858 Epoch: 1 Global Step: 15080 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:56:41,047-Speed 5497.88 samples/sec Loss 12.2411 LearningRate 0.2858 Epoch: 1 Global Step: 15090 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:56:48,567-Speed 5447.68 samples/sec Loss 12.1706 LearningRate 0.2858 Epoch: 1 Global Step: 15100 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 21:56:56,134-Speed 5413.70 samples/sec Loss 12.2155 LearningRate 0.2857 Epoch: 1 Global Step: 15110 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:57:03,690-Speed 5422.02 samples/sec Loss 12.1229 LearningRate 0.2857 Epoch: 1 Global Step: 15120 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:57:11,347-Speed 5350.17 samples/sec Loss 12.2142 LearningRate 0.2857 Epoch: 1 Global Step: 15130 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:57:18,880-Speed 5438.37 samples/sec Loss 12.1915 LearningRate 0.2856 Epoch: 1 Global Step: 15140 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:57:26,479-Speed 5390.44 samples/sec Loss 12.1047 LearningRate 0.2856 Epoch: 1 Global Step: 15150 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:57:33,988-Speed 5455.58 samples/sec Loss 12.1248 LearningRate 0.2856 Epoch: 1 Global Step: 15160 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:57:41,499-Speed 5453.98 samples/sec Loss 12.1798 LearningRate 0.2856 Epoch: 1 Global Step: 15170 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:57:49,030-Speed 5439.93 samples/sec Loss 12.1065 LearningRate 0.2855 Epoch: 1 Global Step: 15180 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:57:56,615-Speed 5400.95 samples/sec Loss 12.0946 LearningRate 0.2855 Epoch: 1 Global Step: 15190 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:58:04,153-Speed 5434.01 samples/sec Loss 12.0681 LearningRate 0.2855 Epoch: 1 Global Step: 15200 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:58:11,722-Speed 5412.29 samples/sec Loss 12.0872 LearningRate 0.2854 Epoch: 1 Global Step: 15210 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 21:58:19,373-Speed 5354.74 samples/sec Loss 12.1810 LearningRate 0.2854 Epoch: 1 Global Step: 15220 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:58:26,832-Speed 5491.43 samples/sec Loss 12.1379 LearningRate 0.2854 Epoch: 1 Global Step: 15230 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:58:34,469-Speed 5364.53 samples/sec Loss 12.1496 LearningRate 0.2853 Epoch: 1 Global Step: 15240 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:58:42,161-Speed 5325.15 samples/sec Loss 12.0580 LearningRate 0.2853 Epoch: 1 Global Step: 15250 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:58:49,676-Speed 5451.54 samples/sec Loss 12.0333 LearningRate 0.2853 Epoch: 1 Global Step: 15260 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:58:57,379-Speed 5317.88 samples/sec Loss 12.1592 LearningRate 0.2853 Epoch: 1 Global Step: 15270 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:59:04,911-Speed 5438.97 samples/sec Loss 12.0997 LearningRate 0.2852 Epoch: 1 Global Step: 15280 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:59:12,416-Speed 5458.25 samples/sec Loss 12.1326 LearningRate 0.2852 Epoch: 1 Global Step: 15290 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:59:20,103-Speed 5329.20 samples/sec Loss 12.0860 LearningRate 0.2852 Epoch: 1 Global Step: 15300 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:59:27,517-Speed 5525.48 samples/sec Loss 12.0671 LearningRate 0.2851 Epoch: 1 Global Step: 15310 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:59:35,069-Speed 5424.54 samples/sec Loss 12.1547 LearningRate 0.2851 Epoch: 1 Global Step: 15320 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:59:42,546-Speed 5478.35 samples/sec Loss 12.0429 LearningRate 0.2851 Epoch: 1 Global Step: 15330 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:59:50,046-Speed 5461.93 samples/sec Loss 12.0837 LearningRate 0.2851 Epoch: 1 Global Step: 15340 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 21:59:57,525-Speed 5477.69 samples/sec Loss 12.0140 LearningRate 0.2850 Epoch: 1 Global Step: 15350 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:00:05,171-Speed 5357.91 samples/sec Loss 12.1251 LearningRate 0.2850 Epoch: 1 Global Step: 15360 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:00:12,812-Speed 5360.90 samples/sec Loss 12.0295 LearningRate 0.2850 Epoch: 1 Global Step: 15370 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:00:20,340-Speed 5442.10 samples/sec Loss 12.1015 LearningRate 0.2849 Epoch: 1 Global Step: 15380 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:00:27,873-Speed 5437.65 samples/sec Loss 12.1639 LearningRate 0.2849 Epoch: 1 Global Step: 15390 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:00:35,488-Speed 5380.10 samples/sec Loss 12.0662 LearningRate 0.2849 Epoch: 1 Global Step: 15400 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:00:43,102-Speed 5380.02 samples/sec Loss 12.0423 LearningRate 0.2848 Epoch: 1 Global Step: 15410 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:00:50,591-Speed 5470.23 samples/sec Loss 12.0610 LearningRate 0.2848 Epoch: 1 Global Step: 15420 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 22:00:58,101-Speed 5454.83 samples/sec Loss 12.1212 LearningRate 0.2848 Epoch: 1 Global Step: 15430 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:01:05,648-Speed 5427.88 samples/sec Loss 12.0466 LearningRate 0.2848 Epoch: 1 Global Step: 15440 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:01:13,201-Speed 5424.08 samples/sec Loss 11.9914 LearningRate 0.2847 Epoch: 1 Global Step: 15450 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:01:20,754-Speed 5423.72 samples/sec Loss 12.1407 LearningRate 0.2847 Epoch: 1 Global Step: 15460 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:01:28,382-Speed 5370.06 samples/sec Loss 12.1449 LearningRate 0.2847 Epoch: 1 Global Step: 15470 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:01:35,887-Speed 5458.68 samples/sec Loss 12.1123 LearningRate 0.2846 Epoch: 1 Global Step: 15480 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:01:43,332-Speed 5502.25 samples/sec Loss 12.1090 LearningRate 0.2846 Epoch: 1 Global Step: 15490 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:01:50,856-Speed 5445.04 samples/sec Loss 12.0826 LearningRate 0.2846 Epoch: 1 Global Step: 15500 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:01:58,537-Speed 5332.86 samples/sec Loss 12.0643 LearningRate 0.2845 Epoch: 1 Global Step: 15510 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:02:06,236-Speed 5320.73 samples/sec Loss 11.9840 LearningRate 0.2845 Epoch: 1 Global Step: 15520 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:02:13,836-Speed 5390.70 samples/sec Loss 12.0048 LearningRate 0.2845 Epoch: 1 Global Step: 15530 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:02:21,538-Speed 5318.34 samples/sec Loss 12.0624 LearningRate 0.2845 Epoch: 1 Global Step: 15540 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:02:29,458-Speed 5172.29 samples/sec Loss 12.0722 LearningRate 0.2844 Epoch: 1 Global Step: 15550 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:02:37,066-Speed 5384.95 samples/sec Loss 11.9984 LearningRate 0.2844 Epoch: 1 Global Step: 15560 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:02:44,641-Speed 5407.82 samples/sec Loss 12.1318 LearningRate 0.2844 Epoch: 1 Global Step: 15570 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:02:52,132-Speed 5468.64 samples/sec Loss 12.0888 LearningRate 0.2843 Epoch: 1 Global Step: 15580 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:02:59,669-Speed 5434.63 samples/sec Loss 12.0377 LearningRate 0.2843 Epoch: 1 Global Step: 15590 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:03:07,191-Speed 5446.15 samples/sec Loss 12.0455 LearningRate 0.2843 Epoch: 1 Global Step: 15600 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:03:14,766-Speed 5408.31 samples/sec Loss 12.0350 LearningRate 0.2843 Epoch: 1 Global Step: 15610 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:03:22,402-Speed 5364.63 samples/sec Loss 12.1025 LearningRate 0.2842 Epoch: 1 Global Step: 15620 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:03:29,908-Speed 5457.56 samples/sec Loss 12.0606 LearningRate 0.2842 Epoch: 1 Global Step: 15630 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:03:37,464-Speed 5421.47 samples/sec Loss 12.0020 LearningRate 0.2842 Epoch: 1 Global Step: 15640 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:03:45,015-Speed 5425.40 samples/sec Loss 12.0277 LearningRate 0.2841 Epoch: 1 Global Step: 15650 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:03:52,469-Speed 5496.06 samples/sec Loss 12.0875 LearningRate 0.2841 Epoch: 1 Global Step: 15660 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:03:59,974-Speed 5458.14 samples/sec Loss 12.0739 LearningRate 0.2841 Epoch: 1 Global Step: 15670 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:04:07,471-Speed 5464.08 samples/sec Loss 11.9298 LearningRate 0.2840 Epoch: 1 Global Step: 15680 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:04:14,966-Speed 5467.51 samples/sec Loss 12.1075 LearningRate 0.2840 Epoch: 1 Global Step: 15690 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:04:22,418-Speed 5496.85 samples/sec Loss 12.0433 LearningRate 0.2840 Epoch: 1 Global Step: 15700 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:04:29,987-Speed 5412.73 samples/sec Loss 11.9214 LearningRate 0.2840 Epoch: 1 Global Step: 15710 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:04:37,529-Speed 5431.12 samples/sec Loss 11.9440 LearningRate 0.2839 Epoch: 1 Global Step: 15720 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:04:45,024-Speed 5466.17 samples/sec Loss 12.0464 LearningRate 0.2839 Epoch: 1 Global Step: 15730 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:04:52,522-Speed 5463.38 samples/sec Loss 12.0619 LearningRate 0.2839 Epoch: 1 Global Step: 15740 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:05:00,010-Speed 5470.88 samples/sec Loss 12.0800 LearningRate 0.2838 Epoch: 1 Global Step: 15750 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:05:07,604-Speed 5394.80 samples/sec Loss 12.0668 LearningRate 0.2838 Epoch: 1 Global Step: 15760 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:05:15,091-Speed 5471.56 samples/sec Loss 12.0013 LearningRate 0.2838 Epoch: 1 Global Step: 15770 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:05:22,670-Speed 5405.36 samples/sec Loss 11.9690 LearningRate 0.2837 Epoch: 1 Global Step: 15780 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:05:30,270-Speed 5389.76 samples/sec Loss 11.9724 LearningRate 0.2837 Epoch: 1 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:05:37,828-Speed 5420.15 samples/sec Loss 11.9900 LearningRate 0.2837 Epoch: 1 Global Step: 15800 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:05:45,346-Speed 5449.41 samples/sec Loss 11.9519 LearningRate 0.2837 Epoch: 1 Global Step: 15810 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:05:52,875-Speed 5440.79 samples/sec Loss 12.0236 LearningRate 0.2836 Epoch: 1 Global Step: 15820 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:06:00,379-Speed 5459.36 samples/sec Loss 11.9751 LearningRate 0.2836 Epoch: 1 Global Step: 15830 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:06:07,872-Speed 5467.33 samples/sec Loss 12.0442 LearningRate 0.2836 Epoch: 1 Global Step: 15840 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:06:15,334-Speed 5489.21 samples/sec Loss 12.0976 LearningRate 0.2835 Epoch: 1 Global Step: 15850 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:06:22,763-Speed 5514.94 samples/sec Loss 11.9989 LearningRate 0.2835 Epoch: 1 Global Step: 15860 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:06:30,252-Speed 5469.88 samples/sec Loss 11.9487 LearningRate 0.2835 Epoch: 1 Global Step: 15870 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:06:37,740-Speed 5470.65 samples/sec Loss 12.0144 LearningRate 0.2835 Epoch: 1 Global Step: 15880 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:06:45,164-Speed 5518.48 samples/sec Loss 12.0102 LearningRate 0.2834 Epoch: 1 Global Step: 15890 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:06:52,640-Speed 5479.70 samples/sec Loss 11.9485 LearningRate 0.2834 Epoch: 1 Global Step: 15900 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:07:00,139-Speed 5463.12 samples/sec Loss 11.9680 LearningRate 0.2834 Epoch: 1 Global Step: 15910 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:07:07,727-Speed 5398.07 samples/sec Loss 12.0135 LearningRate 0.2833 Epoch: 1 Global Step: 15920 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:07:15,225-Speed 5463.60 samples/sec Loss 11.9721 LearningRate 0.2833 Epoch: 1 Global Step: 15930 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:07:22,626-Speed 5535.29 samples/sec Loss 11.9583 LearningRate 0.2833 Epoch: 1 Global Step: 15940 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:07:30,098-Speed 5482.97 samples/sec Loss 12.0381 LearningRate 0.2832 Epoch: 1 Global Step: 15950 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:07:37,644-Speed 5428.16 samples/sec Loss 11.9562 LearningRate 0.2832 Epoch: 1 Global Step: 15960 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:07:45,186-Speed 5432.29 samples/sec Loss 11.9560 LearningRate 0.2832 Epoch: 1 Global Step: 15970 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:07:52,745-Speed 5419.50 samples/sec Loss 12.0759 LearningRate 0.2832 Epoch: 1 Global Step: 15980 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:08:00,226-Speed 5475.87 samples/sec Loss 11.9686 LearningRate 0.2831 Epoch: 1 Global Step: 15990 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:08:07,733-Speed 5456.95 samples/sec Loss 12.0538 LearningRate 0.2831 Epoch: 1 Global Step: 16000 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:08:52,706-[lfw][16000]XNorm: 21.761526 Training: 2022-01-07 22:08:52,707-[lfw][16000]Accuracy-Flip: 0.99650+-0.00320 Training: 2022-01-07 22:08:52,707-[lfw][16000]Accuracy-Highest: 0.99650 Training: 2022-01-07 22:09:45,165-[cfp_fp][16000]XNorm: 19.362699 Training: 2022-01-07 22:09:45,168-[cfp_fp][16000]Accuracy-Flip: 0.97871+-0.00891 Training: 2022-01-07 22:09:45,168-[cfp_fp][16000]Accuracy-Highest: 0.97871 Training: 2022-01-07 22:10:30,781-[agedb_30][16000]XNorm: 21.484574 Training: 2022-01-07 22:10:30,782-[agedb_30][16000]Accuracy-Flip: 0.96367+-0.00942 Training: 2022-01-07 22:10:30,783-[agedb_30][16000]Accuracy-Highest: 0.96367 Training: 2022-01-07 22:10:38,427-Speed 271.81 samples/sec Loss 11.9727 LearningRate 0.2831 Epoch: 1 Global Step: 16010 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:10:45,922-Speed 5466.15 samples/sec Loss 12.0170 LearningRate 0.2830 Epoch: 1 Global Step: 16020 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:10:53,428-Speed 5457.66 samples/sec Loss 11.8415 LearningRate 0.2830 Epoch: 1 Global Step: 16030 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:11:00,996-Speed 5413.80 samples/sec Loss 11.9885 LearningRate 0.2830 Epoch: 1 Global Step: 16040 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:11:08,496-Speed 5462.99 samples/sec Loss 11.9564 LearningRate 0.2829 Epoch: 1 Global Step: 16050 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:11:15,997-Speed 5461.22 samples/sec Loss 11.9800 LearningRate 0.2829 Epoch: 1 Global Step: 16060 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:11:23,461-Speed 5488.89 samples/sec Loss 11.9398 LearningRate 0.2829 Epoch: 1 Global Step: 16070 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:11:31,383-Speed 5170.81 samples/sec Loss 11.9938 LearningRate 0.2829 Epoch: 1 Global Step: 16080 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:11:38,932-Speed 5426.14 samples/sec Loss 11.9241 LearningRate 0.2828 Epoch: 1 Global Step: 16090 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:11:46,409-Speed 5479.00 samples/sec Loss 11.9227 LearningRate 0.2828 Epoch: 1 Global Step: 16100 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 22:11:53,912-Speed 5460.03 samples/sec Loss 11.9694 LearningRate 0.2828 Epoch: 1 Global Step: 16110 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 22:12:01,408-Speed 5464.80 samples/sec Loss 11.8951 LearningRate 0.2827 Epoch: 1 Global Step: 16120 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 22:12:08,924-Speed 5449.95 samples/sec Loss 12.1204 LearningRate 0.2827 Epoch: 1 Global Step: 16130 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 22:12:16,417-Speed 5467.63 samples/sec Loss 11.9497 LearningRate 0.2827 Epoch: 1 Global Step: 16140 Fp16 Grad Scale: 262144 Required: 44 hours Training: 2022-01-07 22:12:23,863-Speed 5501.47 samples/sec Loss 12.0267 LearningRate 0.2827 Epoch: 1 Global Step: 16150 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:12:31,368-Speed 5458.60 samples/sec Loss 11.9111 LearningRate 0.2826 Epoch: 1 Global Step: 16160 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:12:38,962-Speed 5394.27 samples/sec Loss 11.9392 LearningRate 0.2826 Epoch: 1 Global Step: 16170 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:12:46,565-Speed 5388.55 samples/sec Loss 11.9020 LearningRate 0.2826 Epoch: 1 Global Step: 16180 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:12:54,084-Speed 5448.39 samples/sec Loss 11.9080 LearningRate 0.2825 Epoch: 1 Global Step: 16190 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:13:01,544-Speed 5490.87 samples/sec Loss 11.9647 LearningRate 0.2825 Epoch: 1 Global Step: 16200 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:13:09,059-Speed 5451.24 samples/sec Loss 11.9416 LearningRate 0.2825 Epoch: 1 Global Step: 16210 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:13:16,515-Speed 5494.22 samples/sec Loss 12.0265 LearningRate 0.2824 Epoch: 1 Global Step: 16220 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:13:23,964-Speed 5499.84 samples/sec Loss 11.8712 LearningRate 0.2824 Epoch: 1 Global Step: 16230 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:13:31,424-Speed 5490.64 samples/sec Loss 11.8683 LearningRate 0.2824 Epoch: 1 Global Step: 16240 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:13:38,914-Speed 5469.56 samples/sec Loss 11.9481 LearningRate 0.2824 Epoch: 1 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:13:46,362-Speed 5500.06 samples/sec Loss 11.8664 LearningRate 0.2823 Epoch: 1 Global Step: 16260 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:13:53,973-Speed 5382.15 samples/sec Loss 11.8610 LearningRate 0.2823 Epoch: 1 Global Step: 16270 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:14:01,473-Speed 5462.43 samples/sec Loss 11.9150 LearningRate 0.2823 Epoch: 1 Global Step: 16280 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:14:08,899-Speed 5516.48 samples/sec Loss 11.8809 LearningRate 0.2822 Epoch: 1 Global Step: 16290 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:14:16,469-Speed 5411.49 samples/sec Loss 11.8689 LearningRate 0.2822 Epoch: 1 Global Step: 16300 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:14:23,945-Speed 5478.95 samples/sec Loss 11.8531 LearningRate 0.2822 Epoch: 1 Global Step: 16310 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:14:31,540-Speed 5394.16 samples/sec Loss 12.0001 LearningRate 0.2821 Epoch: 1 Global Step: 16320 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:14:39,032-Speed 5467.45 samples/sec Loss 11.9334 LearningRate 0.2821 Epoch: 1 Global Step: 16330 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:14:46,545-Speed 5453.09 samples/sec Loss 11.9738 LearningRate 0.2821 Epoch: 1 Global Step: 16340 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:14:54,031-Speed 5472.08 samples/sec Loss 11.8681 LearningRate 0.2821 Epoch: 1 Global Step: 16350 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:15:01,643-Speed 5381.72 samples/sec Loss 11.8746 LearningRate 0.2820 Epoch: 1 Global Step: 16360 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:15:09,141-Speed 5463.98 samples/sec Loss 11.9397 LearningRate 0.2820 Epoch: 1 Global Step: 16370 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:15:16,645-Speed 5458.89 samples/sec Loss 11.9901 LearningRate 0.2820 Epoch: 1 Global Step: 16380 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:15:24,219-Speed 5408.54 samples/sec Loss 11.8824 LearningRate 0.2819 Epoch: 1 Global Step: 16390 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:15:31,897-Speed 5335.76 samples/sec Loss 11.8579 LearningRate 0.2819 Epoch: 1 Global Step: 16400 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:15:39,392-Speed 5465.62 samples/sec Loss 11.9369 LearningRate 0.2819 Epoch: 1 Global Step: 16410 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:15:47,090-Speed 5321.62 samples/sec Loss 11.8939 LearningRate 0.2819 Epoch: 1 Global Step: 16420 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:15:54,566-Speed 5479.91 samples/sec Loss 11.9192 LearningRate 0.2818 Epoch: 1 Global Step: 16430 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:16:02,033-Speed 5485.91 samples/sec Loss 11.9696 LearningRate 0.2818 Epoch: 1 Global Step: 16440 Fp16 Grad Scale: 131072 Required: 44 hours Training: 2022-01-07 22:16:09,582-Speed 5426.34 samples/sec Loss 11.8898 LearningRate 0.2818 Epoch: 1 Global Step: 16450 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:16:17,039-Speed 5493.71 samples/sec Loss 11.9455 LearningRate 0.2817 Epoch: 1 Global Step: 16460 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:16:24,537-Speed 5463.82 samples/sec Loss 11.9215 LearningRate 0.2817 Epoch: 1 Global Step: 16470 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:16:32,018-Speed 5476.14 samples/sec Loss 11.9503 LearningRate 0.2817 Epoch: 1 Global Step: 16480 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:16:39,643-Speed 5372.24 samples/sec Loss 11.7922 LearningRate 0.2816 Epoch: 1 Global Step: 16490 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-01-07 22:16:47,134-Speed 5468.79 samples/sec Loss 11.9215 LearningRate 0.2816 Epoch: 1 Global Step: 16500 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:16:54,674-Speed 5432.86 samples/sec Loss 11.8768 LearningRate 0.2816 Epoch: 1 Global Step: 16510 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:17:02,187-Speed 5453.17 samples/sec Loss 11.9417 LearningRate 0.2816 Epoch: 1 Global Step: 16520 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:17:09,766-Speed 5405.13 samples/sec Loss 11.9013 LearningRate 0.2815 Epoch: 1 Global Step: 16530 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:17:17,358-Speed 5396.26 samples/sec Loss 11.8647 LearningRate 0.2815 Epoch: 1 Global Step: 16540 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:17:24,850-Speed 5467.70 samples/sec Loss 11.8029 LearningRate 0.2815 Epoch: 1 Global Step: 16550 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:17:32,327-Speed 5478.98 samples/sec Loss 11.9163 LearningRate 0.2814 Epoch: 1 Global Step: 16560 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:17:39,861-Speed 5437.48 samples/sec Loss 11.9220 LearningRate 0.2814 Epoch: 1 Global Step: 16570 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:17:47,310-Speed 5499.24 samples/sec Loss 11.8637 LearningRate 0.2814 Epoch: 1 Global Step: 16580 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:17:54,829-Speed 5448.60 samples/sec Loss 11.8443 LearningRate 0.2814 Epoch: 1 Global Step: 16590 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:18:02,321-Speed 5467.60 samples/sec Loss 11.9191 LearningRate 0.2813 Epoch: 1 Global Step: 16600 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:18:09,916-Speed 5393.68 samples/sec Loss 11.9200 LearningRate 0.2813 Epoch: 1 Global Step: 16610 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:18:17,428-Speed 5453.99 samples/sec Loss 11.9130 LearningRate 0.2813 Epoch: 1 Global Step: 16620 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:18:24,921-Speed 5467.03 samples/sec Loss 11.8297 LearningRate 0.2812 Epoch: 1 Global Step: 16630 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:18:32,409-Speed 5470.63 samples/sec Loss 11.8171 LearningRate 0.2812 Epoch: 1 Global Step: 16640 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:18:39,971-Speed 5417.28 samples/sec Loss 11.8007 LearningRate 0.2812 Epoch: 1 Global Step: 16650 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:18:47,457-Speed 5472.34 samples/sec Loss 11.8565 LearningRate 0.2811 Epoch: 1 Global Step: 16660 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:18:54,915-Speed 5493.02 samples/sec Loss 11.8022 LearningRate 0.2811 Epoch: 1 Global Step: 16670 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:19:02,451-Speed 5435.92 samples/sec Loss 11.8642 LearningRate 0.2811 Epoch: 1 Global Step: 16680 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:19:09,956-Speed 5458.24 samples/sec Loss 11.8389 LearningRate 0.2811 Epoch: 1 Global Step: 16690 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:19:17,452-Speed 5465.43 samples/sec Loss 11.8730 LearningRate 0.2810 Epoch: 1 Global Step: 16700 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:19:24,891-Speed 5506.59 samples/sec Loss 11.9409 LearningRate 0.2810 Epoch: 1 Global Step: 16710 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:19:32,352-Speed 5490.90 samples/sec Loss 11.8849 LearningRate 0.2810 Epoch: 1 Global Step: 16720 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:19:39,803-Speed 5497.69 samples/sec Loss 11.8794 LearningRate 0.2809 Epoch: 1 Global Step: 16730 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:19:47,392-Speed 5398.40 samples/sec Loss 11.8603 LearningRate 0.2809 Epoch: 1 Global Step: 16740 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:19:54,960-Speed 5412.62 samples/sec Loss 11.8432 LearningRate 0.2809 Epoch: 1 Global Step: 16750 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:20:02,484-Speed 5445.15 samples/sec Loss 11.8204 LearningRate 0.2809 Epoch: 1 Global Step: 16760 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:20:09,971-Speed 5472.04 samples/sec Loss 11.8541 LearningRate 0.2808 Epoch: 1 Global Step: 16770 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:20:17,424-Speed 5496.43 samples/sec Loss 11.7822 LearningRate 0.2808 Epoch: 1 Global Step: 16780 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:20:24,983-Speed 5419.25 samples/sec Loss 11.8950 LearningRate 0.2808 Epoch: 1 Global Step: 16790 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:20:32,563-Speed 5405.25 samples/sec Loss 11.8269 LearningRate 0.2807 Epoch: 1 Global Step: 16800 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:20:40,119-Speed 5421.64 samples/sec Loss 11.8345 LearningRate 0.2807 Epoch: 1 Global Step: 16810 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:20:47,659-Speed 5432.91 samples/sec Loss 11.8710 LearningRate 0.2807 Epoch: 1 Global Step: 16820 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:20:55,086-Speed 5515.62 samples/sec Loss 11.8998 LearningRate 0.2806 Epoch: 1 Global Step: 16830 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:21:02,670-Speed 5401.91 samples/sec Loss 11.8243 LearningRate 0.2806 Epoch: 1 Global Step: 16840 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:21:10,160-Speed 5469.34 samples/sec Loss 11.8116 LearningRate 0.2806 Epoch: 1 Global Step: 16850 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:21:17,569-Speed 5528.69 samples/sec Loss 11.8418 LearningRate 0.2806 Epoch: 1 Global Step: 16860 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:21:25,036-Speed 5486.52 samples/sec Loss 11.8848 LearningRate 0.2805 Epoch: 1 Global Step: 16870 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:21:32,592-Speed 5421.64 samples/sec Loss 11.7758 LearningRate 0.2805 Epoch: 1 Global Step: 16880 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:21:40,143-Speed 5425.20 samples/sec Loss 11.8796 LearningRate 0.2805 Epoch: 1 Global Step: 16890 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:21:47,645-Speed 5460.54 samples/sec Loss 11.8587 LearningRate 0.2804 Epoch: 1 Global Step: 16900 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:21:55,075-Speed 5512.82 samples/sec Loss 11.8005 LearningRate 0.2804 Epoch: 1 Global Step: 16910 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:22:02,771-Speed 5323.29 samples/sec Loss 11.7888 LearningRate 0.2804 Epoch: 1 Global Step: 16920 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:22:10,483-Speed 5311.83 samples/sec Loss 11.9554 LearningRate 0.2804 Epoch: 1 Global Step: 16930 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:22:17,930-Speed 5501.09 samples/sec Loss 11.7596 LearningRate 0.2803 Epoch: 1 Global Step: 16940 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:22:25,405-Speed 5480.33 samples/sec Loss 11.8170 LearningRate 0.2803 Epoch: 1 Global Step: 16950 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:22:32,888-Speed 5474.12 samples/sec Loss 11.7901 LearningRate 0.2803 Epoch: 1 Global Step: 16960 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:22:40,380-Speed 5467.95 samples/sec Loss 11.8136 LearningRate 0.2802 Epoch: 1 Global Step: 16970 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:22:47,915-Speed 5436.67 samples/sec Loss 11.7885 LearningRate 0.2802 Epoch: 1 Global Step: 16980 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:22:55,416-Speed 5461.21 samples/sec Loss 11.8070 LearningRate 0.2802 Epoch: 1 Global Step: 16990 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:23:02,890-Speed 5480.84 samples/sec Loss 11.7453 LearningRate 0.2801 Epoch: 1 Global Step: 17000 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:23:10,350-Speed 5491.27 samples/sec Loss 11.8360 LearningRate 0.2801 Epoch: 1 Global Step: 17010 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:23:17,929-Speed 5405.06 samples/sec Loss 11.7457 LearningRate 0.2801 Epoch: 1 Global Step: 17020 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:23:25,468-Speed 5433.98 samples/sec Loss 11.8287 LearningRate 0.2801 Epoch: 1 Global Step: 17030 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:23:33,068-Speed 5390.39 samples/sec Loss 11.8810 LearningRate 0.2800 Epoch: 1 Global Step: 17040 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:23:40,675-Speed 5385.04 samples/sec Loss 11.8252 LearningRate 0.2800 Epoch: 1 Global Step: 17050 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:23:48,292-Speed 5378.49 samples/sec Loss 11.7770 LearningRate 0.2800 Epoch: 1 Global Step: 17060 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:23:55,775-Speed 5474.33 samples/sec Loss 11.7805 LearningRate 0.2799 Epoch: 1 Global Step: 17070 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:24:03,387-Speed 5381.56 samples/sec Loss 11.7377 LearningRate 0.2799 Epoch: 1 Global Step: 17080 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:24:10,846-Speed 5492.04 samples/sec Loss 11.8516 LearningRate 0.2799 Epoch: 1 Global Step: 17090 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:24:18,317-Speed 5484.01 samples/sec Loss 11.7803 LearningRate 0.2799 Epoch: 1 Global Step: 17100 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:24:25,789-Speed 5482.06 samples/sec Loss 11.7730 LearningRate 0.2798 Epoch: 1 Global Step: 17110 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:24:33,318-Speed 5440.58 samples/sec Loss 11.7852 LearningRate 0.2798 Epoch: 1 Global Step: 17120 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:24:40,821-Speed 5460.86 samples/sec Loss 11.7631 LearningRate 0.2798 Epoch: 1 Global Step: 17130 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:24:48,546-Speed 5303.10 samples/sec Loss 11.7879 LearningRate 0.2797 Epoch: 1 Global Step: 17140 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:24:56,270-Speed 5302.75 samples/sec Loss 11.8560 LearningRate 0.2797 Epoch: 1 Global Step: 17150 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:25:03,979-Speed 5314.55 samples/sec Loss 11.8337 LearningRate 0.2797 Epoch: 1 Global Step: 17160 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:25:11,669-Speed 5327.17 samples/sec Loss 11.7590 LearningRate 0.2796 Epoch: 1 Global Step: 17170 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:25:19,348-Speed 5334.55 samples/sec Loss 11.7980 LearningRate 0.2796 Epoch: 1 Global Step: 17180 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:25:26,928-Speed 5404.54 samples/sec Loss 11.7557 LearningRate 0.2796 Epoch: 1 Global Step: 17190 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:25:34,429-Speed 5461.18 samples/sec Loss 11.7251 LearningRate 0.2796 Epoch: 1 Global Step: 17200 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:25:41,885-Speed 5494.25 samples/sec Loss 11.7396 LearningRate 0.2795 Epoch: 1 Global Step: 17210 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:25:49,348-Speed 5489.67 samples/sec Loss 11.7647 LearningRate 0.2795 Epoch: 1 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:25:56,785-Speed 5507.96 samples/sec Loss 11.7440 LearningRate 0.2795 Epoch: 1 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:26:04,288-Speed 5460.22 samples/sec Loss 11.6905 LearningRate 0.2794 Epoch: 1 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:26:11,768-Speed 5476.57 samples/sec Loss 11.7251 LearningRate 0.2794 Epoch: 1 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:26:19,293-Speed 5444.51 samples/sec Loss 11.7953 LearningRate 0.2794 Epoch: 1 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:26:26,808-Speed 5450.64 samples/sec Loss 11.7203 LearningRate 0.2794 Epoch: 1 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:26:34,325-Speed 5449.57 samples/sec Loss 11.6925 LearningRate 0.2793 Epoch: 1 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:26:41,780-Speed 5495.61 samples/sec Loss 11.6674 LearningRate 0.2793 Epoch: 1 Global Step: 17290 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:26:49,319-Speed 5433.24 samples/sec Loss 11.8251 LearningRate 0.2793 Epoch: 1 Global Step: 17300 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:26:56,976-Speed 5350.33 samples/sec Loss 11.7053 LearningRate 0.2792 Epoch: 1 Global Step: 17310 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:27:04,465-Speed 5469.96 samples/sec Loss 11.7708 LearningRate 0.2792 Epoch: 1 Global Step: 17320 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:27:11,936-Speed 5483.64 samples/sec Loss 11.7606 LearningRate 0.2792 Epoch: 1 Global Step: 17330 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:27:19,379-Speed 5503.55 samples/sec Loss 11.7131 LearningRate 0.2791 Epoch: 1 Global Step: 17340 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:27:26,991-Speed 5381.48 samples/sec Loss 11.8009 LearningRate 0.2791 Epoch: 1 Global Step: 17350 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:27:34,487-Speed 5465.82 samples/sec Loss 11.7101 LearningRate 0.2791 Epoch: 1 Global Step: 17360 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:27:42,156-Speed 5341.79 samples/sec Loss 11.7035 LearningRate 0.2791 Epoch: 1 Global Step: 17370 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:27:49,907-Speed 5285.26 samples/sec Loss 11.7666 LearningRate 0.2790 Epoch: 1 Global Step: 17380 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:27:57,627-Speed 5305.76 samples/sec Loss 11.6701 LearningRate 0.2790 Epoch: 1 Global Step: 17390 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:28:05,231-Speed 5387.94 samples/sec Loss 11.7415 LearningRate 0.2790 Epoch: 1 Global Step: 17400 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:28:12,745-Speed 5451.94 samples/sec Loss 11.8421 LearningRate 0.2789 Epoch: 1 Global Step: 17410 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:28:20,366-Speed 5375.23 samples/sec Loss 11.7053 LearningRate 0.2789 Epoch: 1 Global Step: 17420 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:28:27,897-Speed 5439.01 samples/sec Loss 11.5990 LearningRate 0.2789 Epoch: 1 Global Step: 17430 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:28:35,322-Speed 5517.99 samples/sec Loss 11.7936 LearningRate 0.2789 Epoch: 1 Global Step: 17440 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:28:42,799-Speed 5478.28 samples/sec Loss 11.6749 LearningRate 0.2788 Epoch: 1 Global Step: 17450 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:28:50,281-Speed 5475.19 samples/sec Loss 11.7122 LearningRate 0.2788 Epoch: 1 Global Step: 17460 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:28:57,963-Speed 5333.01 samples/sec Loss 11.6976 LearningRate 0.2788 Epoch: 1 Global Step: 17470 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:29:05,451-Speed 5470.50 samples/sec Loss 11.7080 LearningRate 0.2787 Epoch: 1 Global Step: 17480 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:29:12,878-Speed 5515.84 samples/sec Loss 11.7594 LearningRate 0.2787 Epoch: 1 Global Step: 17490 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:29:20,297-Speed 5521.35 samples/sec Loss 11.7628 LearningRate 0.2787 Epoch: 1 Global Step: 17500 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:29:27,802-Speed 5458.66 samples/sec Loss 11.7693 LearningRate 0.2786 Epoch: 1 Global Step: 17510 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:29:35,251-Speed 5498.99 samples/sec Loss 11.8484 LearningRate 0.2786 Epoch: 1 Global Step: 17520 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:29:42,788-Speed 5435.57 samples/sec Loss 11.7212 LearningRate 0.2786 Epoch: 1 Global Step: 17530 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:29:50,279-Speed 5469.06 samples/sec Loss 11.6231 LearningRate 0.2786 Epoch: 1 Global Step: 17540 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:29:57,708-Speed 5514.22 samples/sec Loss 11.7298 LearningRate 0.2785 Epoch: 1 Global Step: 17550 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:30:05,183-Speed 5479.75 samples/sec Loss 11.6099 LearningRate 0.2785 Epoch: 1 Global Step: 17560 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:30:12,601-Speed 5522.61 samples/sec Loss 11.7400 LearningRate 0.2785 Epoch: 1 Global Step: 17570 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:30:20,133-Speed 5439.16 samples/sec Loss 11.7175 LearningRate 0.2784 Epoch: 1 Global Step: 17580 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:30:27,593-Speed 5491.35 samples/sec Loss 11.7141 LearningRate 0.2784 Epoch: 1 Global Step: 17590 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:30:35,037-Speed 5503.14 samples/sec Loss 11.6389 LearningRate 0.2784 Epoch: 1 Global Step: 17600 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:30:42,586-Speed 5426.94 samples/sec Loss 11.6348 LearningRate 0.2784 Epoch: 1 Global Step: 17610 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:30:50,042-Speed 5493.85 samples/sec Loss 11.7030 LearningRate 0.2783 Epoch: 1 Global Step: 17620 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:30:57,542-Speed 5462.18 samples/sec Loss 11.6822 LearningRate 0.2783 Epoch: 1 Global Step: 17630 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:31:05,039-Speed 5464.01 samples/sec Loss 11.7790 LearningRate 0.2783 Epoch: 1 Global Step: 17640 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:31:12,524-Speed 5472.98 samples/sec Loss 11.7627 LearningRate 0.2782 Epoch: 1 Global Step: 17650 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:31:19,988-Speed 5489.03 samples/sec Loss 11.7763 LearningRate 0.2782 Epoch: 1 Global Step: 17660 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:31:27,473-Speed 5472.58 samples/sec Loss 11.7053 LearningRate 0.2782 Epoch: 1 Global Step: 17670 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:31:34,953-Speed 5476.82 samples/sec Loss 11.8049 LearningRate 0.2781 Epoch: 1 Global Step: 17680 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:31:42,423-Speed 5484.18 samples/sec Loss 11.7423 LearningRate 0.2781 Epoch: 1 Global Step: 17690 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:31:49,883-Speed 5491.77 samples/sec Loss 11.6406 LearningRate 0.2781 Epoch: 1 Global Step: 17700 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:31:57,499-Speed 5378.65 samples/sec Loss 11.6358 LearningRate 0.2781 Epoch: 1 Global Step: 17710 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:32:05,028-Speed 5441.24 samples/sec Loss 11.7090 LearningRate 0.2780 Epoch: 1 Global Step: 17720 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:32:12,483-Speed 5495.14 samples/sec Loss 11.7663 LearningRate 0.2780 Epoch: 1 Global Step: 17730 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:32:20,026-Speed 5431.07 samples/sec Loss 11.6907 LearningRate 0.2780 Epoch: 1 Global Step: 17740 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:32:27,574-Speed 5427.36 samples/sec Loss 11.7093 LearningRate 0.2779 Epoch: 1 Global Step: 17750 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:32:35,103-Speed 5440.92 samples/sec Loss 11.5441 LearningRate 0.2779 Epoch: 1 Global Step: 17760 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:32:42,578-Speed 5480.80 samples/sec Loss 11.6094 LearningRate 0.2779 Epoch: 1 Global Step: 17770 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:32:50,143-Speed 5415.11 samples/sec Loss 11.6666 LearningRate 0.2779 Epoch: 1 Global Step: 17780 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:32:57,603-Speed 5491.11 samples/sec Loss 11.6631 LearningRate 0.2778 Epoch: 1 Global Step: 17790 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:33:05,067-Speed 5488.43 samples/sec Loss 11.6854 LearningRate 0.2778 Epoch: 1 Global Step: 17800 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:33:12,605-Speed 5434.67 samples/sec Loss 11.7289 LearningRate 0.2778 Epoch: 1 Global Step: 17810 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:33:20,151-Speed 5428.67 samples/sec Loss 11.7011 LearningRate 0.2777 Epoch: 1 Global Step: 17820 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:33:27,854-Speed 5318.32 samples/sec Loss 11.6350 LearningRate 0.2777 Epoch: 1 Global Step: 17830 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:33:35,418-Speed 5415.00 samples/sec Loss 11.7098 LearningRate 0.2777 Epoch: 1 Global Step: 17840 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:33:43,038-Speed 5376.66 samples/sec Loss 11.7254 LearningRate 0.2776 Epoch: 1 Global Step: 17850 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:33:50,708-Speed 5341.43 samples/sec Loss 11.5757 LearningRate 0.2776 Epoch: 1 Global Step: 17860 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:33:58,285-Speed 5405.95 samples/sec Loss 11.7531 LearningRate 0.2776 Epoch: 1 Global Step: 17870 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:34:05,814-Speed 5441.01 samples/sec Loss 11.6541 LearningRate 0.2776 Epoch: 1 Global Step: 17880 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:34:13,404-Speed 5397.81 samples/sec Loss 11.6639 LearningRate 0.2775 Epoch: 1 Global Step: 17890 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:34:21,009-Speed 5386.12 samples/sec Loss 11.7013 LearningRate 0.2775 Epoch: 1 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:34:28,460-Speed 5498.68 samples/sec Loss 11.6154 LearningRate 0.2775 Epoch: 1 Global Step: 17910 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:34:35,896-Speed 5508.19 samples/sec Loss 11.6060 LearningRate 0.2774 Epoch: 1 Global Step: 17920 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:34:43,437-Speed 5432.77 samples/sec Loss 11.6715 LearningRate 0.2774 Epoch: 1 Global Step: 17930 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:34:50,935-Speed 5463.68 samples/sec Loss 11.6054 LearningRate 0.2774 Epoch: 1 Global Step: 17940 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:34:58,413-Speed 5478.49 samples/sec Loss 11.6115 LearningRate 0.2774 Epoch: 1 Global Step: 17950 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:35:05,897-Speed 5473.35 samples/sec Loss 11.6630 LearningRate 0.2773 Epoch: 1 Global Step: 17960 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:35:13,363-Speed 5487.35 samples/sec Loss 11.5352 LearningRate 0.2773 Epoch: 1 Global Step: 17970 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:35:20,923-Speed 5418.74 samples/sec Loss 11.6419 LearningRate 0.2773 Epoch: 1 Global Step: 17980 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:35:28,342-Speed 5521.41 samples/sec Loss 11.5856 LearningRate 0.2772 Epoch: 1 Global Step: 17990 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:35:35,799-Speed 5493.57 samples/sec Loss 11.6032 LearningRate 0.2772 Epoch: 1 Global Step: 18000 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:36:20,900-[lfw][18000]XNorm: 22.731087 Training: 2022-01-07 22:36:20,901-[lfw][18000]Accuracy-Flip: 0.99700+-0.00277 Training: 2022-01-07 22:36:20,902-[lfw][18000]Accuracy-Highest: 0.99700 Training: 2022-01-07 22:37:13,876-[cfp_fp][18000]XNorm: 20.796297 Training: 2022-01-07 22:37:13,887-[cfp_fp][18000]Accuracy-Flip: 0.97357+-0.00668 Training: 2022-01-07 22:37:13,888-[cfp_fp][18000]Accuracy-Highest: 0.97871 Training: 2022-01-07 22:37:59,194-[agedb_30][18000]XNorm: 22.436626 Training: 2022-01-07 22:37:59,196-[agedb_30][18000]Accuracy-Flip: 0.95167+-0.01080 Training: 2022-01-07 22:37:59,196-[agedb_30][18000]Accuracy-Highest: 0.96367 Training: 2022-01-07 22:38:06,735-Speed 271.38 samples/sec Loss 11.6687 LearningRate 0.2772 Epoch: 1 Global Step: 18010 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:38:14,169-Speed 5510.91 samples/sec Loss 11.5959 LearningRate 0.2772 Epoch: 1 Global Step: 18020 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:38:21,790-Speed 5376.08 samples/sec Loss 11.5058 LearningRate 0.2771 Epoch: 1 Global Step: 18030 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:38:29,250-Speed 5492.74 samples/sec Loss 11.6279 LearningRate 0.2771 Epoch: 1 Global Step: 18040 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:38:36,745-Speed 5465.58 samples/sec Loss 11.6413 LearningRate 0.2771 Epoch: 1 Global Step: 18050 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:38:44,305-Speed 5418.91 samples/sec Loss 11.7038 LearningRate 0.2770 Epoch: 1 Global Step: 18060 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:38:51,790-Speed 5473.45 samples/sec Loss 11.6345 LearningRate 0.2770 Epoch: 1 Global Step: 18070 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:38:59,294-Speed 5458.79 samples/sec Loss 11.7005 LearningRate 0.2770 Epoch: 1 Global Step: 18080 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:39:06,785-Speed 5468.74 samples/sec Loss 11.6670 LearningRate 0.2769 Epoch: 1 Global Step: 18090 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:39:14,323-Speed 5434.67 samples/sec Loss 11.5878 LearningRate 0.2769 Epoch: 1 Global Step: 18100 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:39:21,773-Speed 5499.02 samples/sec Loss 11.5690 LearningRate 0.2769 Epoch: 1 Global Step: 18110 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:39:29,250-Speed 5478.21 samples/sec Loss 11.6409 LearningRate 0.2769 Epoch: 1 Global Step: 18120 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:39:36,680-Speed 5513.90 samples/sec Loss 11.6096 LearningRate 0.2768 Epoch: 1 Global Step: 18130 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:39:44,173-Speed 5467.34 samples/sec Loss 11.6669 LearningRate 0.2768 Epoch: 1 Global Step: 18140 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:39:51,628-Speed 5495.18 samples/sec Loss 11.5295 LearningRate 0.2768 Epoch: 1 Global Step: 18150 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:39:59,160-Speed 5438.33 samples/sec Loss 11.6197 LearningRate 0.2767 Epoch: 1 Global Step: 18160 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:40:06,700-Speed 5433.26 samples/sec Loss 11.5632 LearningRate 0.2767 Epoch: 1 Global Step: 18170 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:40:14,378-Speed 5335.66 samples/sec Loss 11.6391 LearningRate 0.2767 Epoch: 1 Global Step: 18180 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:40:21,818-Speed 5506.42 samples/sec Loss 11.6029 LearningRate 0.2767 Epoch: 1 Global Step: 18190 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:40:29,413-Speed 5393.51 samples/sec Loss 11.6464 LearningRate 0.2766 Epoch: 1 Global Step: 18200 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:40:36,887-Speed 5480.76 samples/sec Loss 11.5001 LearningRate 0.2766 Epoch: 1 Global Step: 18210 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:40:44,338-Speed 5498.07 samples/sec Loss 11.5276 LearningRate 0.2766 Epoch: 1 Global Step: 18220 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:40:51,940-Speed 5389.39 samples/sec Loss 11.6805 LearningRate 0.2765 Epoch: 1 Global Step: 18230 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:40:59,399-Speed 5491.49 samples/sec Loss 11.5713 LearningRate 0.2765 Epoch: 1 Global Step: 18240 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:41:06,861-Speed 5490.19 samples/sec Loss 11.5620 LearningRate 0.2765 Epoch: 1 Global Step: 18250 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:41:14,326-Speed 5487.57 samples/sec Loss 11.5733 LearningRate 0.2764 Epoch: 1 Global Step: 18260 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:41:21,769-Speed 5504.23 samples/sec Loss 11.7044 LearningRate 0.2764 Epoch: 1 Global Step: 18270 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:41:29,207-Speed 5507.57 samples/sec Loss 11.6645 LearningRate 0.2764 Epoch: 1 Global Step: 18280 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:41:36,756-Speed 5426.60 samples/sec Loss 11.5610 LearningRate 0.2764 Epoch: 1 Global Step: 18290 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:41:44,225-Speed 5484.89 samples/sec Loss 11.5453 LearningRate 0.2763 Epoch: 1 Global Step: 18300 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:41:51,854-Speed 5369.95 samples/sec Loss 11.6351 LearningRate 0.2763 Epoch: 1 Global Step: 18310 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:41:59,399-Speed 5429.33 samples/sec Loss 11.4960 LearningRate 0.2763 Epoch: 1 Global Step: 18320 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:42:06,941-Speed 5431.74 samples/sec Loss 11.5510 LearningRate 0.2762 Epoch: 1 Global Step: 18330 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:42:14,439-Speed 5462.97 samples/sec Loss 11.6441 LearningRate 0.2762 Epoch: 1 Global Step: 18340 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:42:21,910-Speed 5484.08 samples/sec Loss 11.4726 LearningRate 0.2762 Epoch: 1 Global Step: 18350 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:42:29,397-Speed 5471.19 samples/sec Loss 11.5416 LearningRate 0.2762 Epoch: 1 Global Step: 18360 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:42:36,893-Speed 5464.53 samples/sec Loss 11.6207 LearningRate 0.2761 Epoch: 1 Global Step: 18370 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:42:44,404-Speed 5453.99 samples/sec Loss 11.6269 LearningRate 0.2761 Epoch: 1 Global Step: 18380 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:42:51,928-Speed 5444.84 samples/sec Loss 11.6673 LearningRate 0.2761 Epoch: 1 Global Step: 18390 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:42:59,326-Speed 5537.94 samples/sec Loss 11.6287 LearningRate 0.2760 Epoch: 1 Global Step: 18400 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:43:06,857-Speed 5439.41 samples/sec Loss 11.5275 LearningRate 0.2760 Epoch: 1 Global Step: 18410 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:43:14,297-Speed 5506.22 samples/sec Loss 11.4513 LearningRate 0.2760 Epoch: 1 Global Step: 18420 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:43:21,791-Speed 5466.16 samples/sec Loss 11.5601 LearningRate 0.2760 Epoch: 1 Global Step: 18430 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:43:29,276-Speed 5473.51 samples/sec Loss 11.5288 LearningRate 0.2759 Epoch: 1 Global Step: 18440 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:43:36,771-Speed 5465.24 samples/sec Loss 11.6645 LearningRate 0.2759 Epoch: 1 Global Step: 18450 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:43:44,366-Speed 5393.54 samples/sec Loss 11.5353 LearningRate 0.2759 Epoch: 1 Global Step: 18460 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:43:51,970-Speed 5387.71 samples/sec Loss 11.5234 LearningRate 0.2758 Epoch: 1 Global Step: 18470 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:43:59,523-Speed 5424.04 samples/sec Loss 11.6624 LearningRate 0.2758 Epoch: 1 Global Step: 18480 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:44:07,072-Speed 5425.93 samples/sec Loss 11.5955 LearningRate 0.2758 Epoch: 1 Global Step: 18490 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:44:14,579-Speed 5457.10 samples/sec Loss 11.5886 LearningRate 0.2757 Epoch: 1 Global Step: 18500 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:44:22,090-Speed 5454.91 samples/sec Loss 11.5309 LearningRate 0.2757 Epoch: 1 Global Step: 18510 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:44:29,589-Speed 5462.81 samples/sec Loss 11.5553 LearningRate 0.2757 Epoch: 1 Global Step: 18520 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:44:37,076-Speed 5470.83 samples/sec Loss 11.5161 LearningRate 0.2757 Epoch: 1 Global Step: 18530 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:44:44,642-Speed 5415.32 samples/sec Loss 11.6038 LearningRate 0.2756 Epoch: 1 Global Step: 18540 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:44:52,059-Speed 5522.88 samples/sec Loss 11.5728 LearningRate 0.2756 Epoch: 1 Global Step: 18550 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:44:59,564-Speed 5459.26 samples/sec Loss 11.5534 LearningRate 0.2756 Epoch: 1 Global Step: 18560 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:45:07,147-Speed 5401.97 samples/sec Loss 11.5615 LearningRate 0.2755 Epoch: 1 Global Step: 18570 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:45:14,801-Speed 5352.44 samples/sec Loss 11.5857 LearningRate 0.2755 Epoch: 1 Global Step: 18580 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:45:22,419-Speed 5377.36 samples/sec Loss 11.5504 LearningRate 0.2755 Epoch: 1 Global Step: 18590 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:45:29,939-Speed 5447.23 samples/sec Loss 11.5385 LearningRate 0.2755 Epoch: 1 Global Step: 18600 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:45:37,379-Speed 5506.40 samples/sec Loss 11.5004 LearningRate 0.2754 Epoch: 1 Global Step: 18610 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:45:44,810-Speed 5513.04 samples/sec Loss 11.5655 LearningRate 0.2754 Epoch: 1 Global Step: 18620 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:45:52,248-Speed 5507.22 samples/sec Loss 11.4887 LearningRate 0.2754 Epoch: 1 Global Step: 18630 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:45:59,685-Speed 5508.15 samples/sec Loss 11.5201 LearningRate 0.2753 Epoch: 1 Global Step: 18640 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:46:07,224-Speed 5434.73 samples/sec Loss 11.4626 LearningRate 0.2753 Epoch: 1 Global Step: 18650 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:46:14,682-Speed 5492.50 samples/sec Loss 11.5354 LearningRate 0.2753 Epoch: 1 Global Step: 18660 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:46:22,247-Speed 5414.94 samples/sec Loss 11.5782 LearningRate 0.2753 Epoch: 1 Global Step: 18670 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:46:29,783-Speed 5436.23 samples/sec Loss 11.4759 LearningRate 0.2752 Epoch: 1 Global Step: 18680 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:46:37,230-Speed 5501.08 samples/sec Loss 11.5291 LearningRate 0.2752 Epoch: 1 Global Step: 18690 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:46:44,727-Speed 5464.10 samples/sec Loss 11.5255 LearningRate 0.2752 Epoch: 1 Global Step: 18700 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:46:52,254-Speed 5442.54 samples/sec Loss 11.4855 LearningRate 0.2751 Epoch: 1 Global Step: 18710 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:46:59,841-Speed 5399.42 samples/sec Loss 11.4101 LearningRate 0.2751 Epoch: 1 Global Step: 18720 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:47:07,415-Speed 5409.15 samples/sec Loss 11.5460 LearningRate 0.2751 Epoch: 1 Global Step: 18730 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:47:14,931-Speed 5449.65 samples/sec Loss 11.4671 LearningRate 0.2750 Epoch: 1 Global Step: 18740 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:47:22,403-Speed 5483.16 samples/sec Loss 11.5094 LearningRate 0.2750 Epoch: 1 Global Step: 18750 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:47:29,891-Speed 5470.83 samples/sec Loss 11.5649 LearningRate 0.2750 Epoch: 1 Global Step: 18760 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:47:37,501-Speed 5382.87 samples/sec Loss 11.5317 LearningRate 0.2750 Epoch: 1 Global Step: 18770 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:47:45,053-Speed 5424.22 samples/sec Loss 11.5455 LearningRate 0.2749 Epoch: 1 Global Step: 18780 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:47:52,648-Speed 5393.98 samples/sec Loss 11.4671 LearningRate 0.2749 Epoch: 1 Global Step: 18790 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:48:00,109-Speed 5490.50 samples/sec Loss 11.4732 LearningRate 0.2749 Epoch: 1 Global Step: 18800 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:48:07,608-Speed 5463.46 samples/sec Loss 11.6280 LearningRate 0.2748 Epoch: 1 Global Step: 18810 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:48:15,096-Speed 5470.50 samples/sec Loss 11.4898 LearningRate 0.2748 Epoch: 1 Global Step: 18820 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:48:22,635-Speed 5433.45 samples/sec Loss 11.4922 LearningRate 0.2748 Epoch: 1 Global Step: 18830 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:48:30,161-Speed 5442.87 samples/sec Loss 11.6050 LearningRate 0.2748 Epoch: 1 Global Step: 18840 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:48:37,709-Speed 5427.62 samples/sec Loss 11.4882 LearningRate 0.2747 Epoch: 1 Global Step: 18850 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:48:45,305-Speed 5392.47 samples/sec Loss 11.4735 LearningRate 0.2747 Epoch: 1 Global Step: 18860 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:48:52,825-Speed 5447.95 samples/sec Loss 11.4748 LearningRate 0.2747 Epoch: 1 Global Step: 18870 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:49:00,349-Speed 5444.63 samples/sec Loss 11.5225 LearningRate 0.2746 Epoch: 1 Global Step: 18880 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:49:07,913-Speed 5415.78 samples/sec Loss 11.5361 LearningRate 0.2746 Epoch: 1 Global Step: 18890 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:49:15,477-Speed 5415.98 samples/sec Loss 11.4828 LearningRate 0.2746 Epoch: 1 Global Step: 18900 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:49:23,059-Speed 5402.13 samples/sec Loss 11.4557 LearningRate 0.2746 Epoch: 1 Global Step: 18910 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:49:30,595-Speed 5436.34 samples/sec Loss 11.5184 LearningRate 0.2745 Epoch: 1 Global Step: 18920 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:49:38,092-Speed 5464.56 samples/sec Loss 11.3805 LearningRate 0.2745 Epoch: 1 Global Step: 18930 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:49:45,596-Speed 5459.15 samples/sec Loss 11.4810 LearningRate 0.2745 Epoch: 1 Global Step: 18940 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:49:53,127-Speed 5439.04 samples/sec Loss 11.5068 LearningRate 0.2744 Epoch: 1 Global Step: 18950 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:50:00,615-Speed 5471.42 samples/sec Loss 11.4278 LearningRate 0.2744 Epoch: 1 Global Step: 18960 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:50:08,125-Speed 5454.69 samples/sec Loss 11.5305 LearningRate 0.2744 Epoch: 1 Global Step: 18970 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:50:15,559-Speed 5510.76 samples/sec Loss 11.4851 LearningRate 0.2743 Epoch: 1 Global Step: 18980 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:50:23,058-Speed 5462.29 samples/sec Loss 11.4366 LearningRate 0.2743 Epoch: 1 Global Step: 18990 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:50:30,530-Speed 5482.92 samples/sec Loss 11.4488 LearningRate 0.2743 Epoch: 1 Global Step: 19000 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:50:38,057-Speed 5441.94 samples/sec Loss 11.4344 LearningRate 0.2743 Epoch: 1 Global Step: 19010 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:50:45,530-Speed 5482.32 samples/sec Loss 11.4165 LearningRate 0.2742 Epoch: 1 Global Step: 19020 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:50:53,013-Speed 5474.33 samples/sec Loss 11.5135 LearningRate 0.2742 Epoch: 1 Global Step: 19030 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:51:00,467-Speed 5496.16 samples/sec Loss 11.4055 LearningRate 0.2742 Epoch: 1 Global Step: 19040 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:51:07,948-Speed 5475.59 samples/sec Loss 11.4678 LearningRate 0.2741 Epoch: 1 Global Step: 19050 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:51:15,511-Speed 5416.56 samples/sec Loss 11.4741 LearningRate 0.2741 Epoch: 1 Global Step: 19060 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:51:23,121-Speed 5383.64 samples/sec Loss 11.4649 LearningRate 0.2741 Epoch: 1 Global Step: 19070 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:51:30,599-Speed 5477.51 samples/sec Loss 11.4486 LearningRate 0.2741 Epoch: 1 Global Step: 19080 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:51:38,095-Speed 5465.01 samples/sec Loss 11.4361 LearningRate 0.2740 Epoch: 1 Global Step: 19090 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:51:45,611-Speed 5450.79 samples/sec Loss 11.4602 LearningRate 0.2740 Epoch: 1 Global Step: 19100 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:51:53,083-Speed 5482.69 samples/sec Loss 11.4139 LearningRate 0.2740 Epoch: 1 Global Step: 19110 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:52:00,605-Speed 5446.40 samples/sec Loss 11.4746 LearningRate 0.2739 Epoch: 1 Global Step: 19120 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:52:08,196-Speed 5396.48 samples/sec Loss 11.5037 LearningRate 0.2739 Epoch: 1 Global Step: 19130 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:52:15,738-Speed 5431.40 samples/sec Loss 11.4211 LearningRate 0.2739 Epoch: 1 Global Step: 19140 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:52:23,234-Speed 5465.82 samples/sec Loss 11.4602 LearningRate 0.2739 Epoch: 1 Global Step: 19150 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:52:30,892-Speed 5348.86 samples/sec Loss 11.4973 LearningRate 0.2738 Epoch: 1 Global Step: 19160 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:52:38,366-Speed 5480.96 samples/sec Loss 11.5375 LearningRate 0.2738 Epoch: 1 Global Step: 19170 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:52:45,865-Speed 5462.53 samples/sec Loss 11.4603 LearningRate 0.2738 Epoch: 1 Global Step: 19180 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:52:53,530-Speed 5344.77 samples/sec Loss 11.4924 LearningRate 0.2737 Epoch: 1 Global Step: 19190 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:53:00,974-Speed 5503.12 samples/sec Loss 11.3894 LearningRate 0.2737 Epoch: 1 Global Step: 19200 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:53:08,469-Speed 5465.44 samples/sec Loss 11.4909 LearningRate 0.2737 Epoch: 1 Global Step: 19210 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:53:16,103-Speed 5366.02 samples/sec Loss 11.4342 LearningRate 0.2736 Epoch: 1 Global Step: 19220 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:53:23,591-Speed 5470.95 samples/sec Loss 11.4039 LearningRate 0.2736 Epoch: 1 Global Step: 19230 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:53:31,088-Speed 5464.73 samples/sec Loss 11.4607 LearningRate 0.2736 Epoch: 1 Global Step: 19240 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:53:38,550-Speed 5489.75 samples/sec Loss 11.4941 LearningRate 0.2736 Epoch: 1 Global Step: 19250 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:53:45,985-Speed 5508.92 samples/sec Loss 11.3961 LearningRate 0.2735 Epoch: 1 Global Step: 19260 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:53:53,470-Speed 5473.28 samples/sec Loss 11.4488 LearningRate 0.2735 Epoch: 1 Global Step: 19270 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:54:00,948-Speed 5478.50 samples/sec Loss 11.3641 LearningRate 0.2735 Epoch: 1 Global Step: 19280 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:54:08,402-Speed 5495.87 samples/sec Loss 11.4348 LearningRate 0.2734 Epoch: 1 Global Step: 19290 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:54:15,885-Speed 5474.12 samples/sec Loss 11.3983 LearningRate 0.2734 Epoch: 1 Global Step: 19300 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:54:23,356-Speed 5483.53 samples/sec Loss 11.4374 LearningRate 0.2734 Epoch: 1 Global Step: 19310 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:54:30,806-Speed 5498.90 samples/sec Loss 11.3678 LearningRate 0.2734 Epoch: 1 Global Step: 19320 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:54:38,352-Speed 5428.89 samples/sec Loss 11.4100 LearningRate 0.2733 Epoch: 1 Global Step: 19330 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:54:45,844-Speed 5467.74 samples/sec Loss 11.4880 LearningRate 0.2733 Epoch: 1 Global Step: 19340 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:54:53,352-Speed 5455.93 samples/sec Loss 11.4288 LearningRate 0.2733 Epoch: 1 Global Step: 19350 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:55:00,894-Speed 5431.74 samples/sec Loss 11.3825 LearningRate 0.2732 Epoch: 1 Global Step: 19360 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:55:08,493-Speed 5391.49 samples/sec Loss 11.3700 LearningRate 0.2732 Epoch: 1 Global Step: 19370 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:55:15,940-Speed 5500.67 samples/sec Loss 11.4200 LearningRate 0.2732 Epoch: 1 Global Step: 19380 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:55:23,363-Speed 5518.34 samples/sec Loss 11.4518 LearningRate 0.2732 Epoch: 1 Global Step: 19390 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:55:30,904-Speed 5432.44 samples/sec Loss 11.4555 LearningRate 0.2731 Epoch: 1 Global Step: 19400 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:55:38,400-Speed 5465.40 samples/sec Loss 11.3698 LearningRate 0.2731 Epoch: 1 Global Step: 19410 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:55:45,886-Speed 5471.81 samples/sec Loss 11.4789 LearningRate 0.2731 Epoch: 1 Global Step: 19420 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:55:53,573-Speed 5329.30 samples/sec Loss 11.3872 LearningRate 0.2730 Epoch: 1 Global Step: 19430 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:56:01,180-Speed 5385.26 samples/sec Loss 11.4274 LearningRate 0.2730 Epoch: 1 Global Step: 19440 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:56:08,806-Speed 5372.53 samples/sec Loss 11.3972 LearningRate 0.2730 Epoch: 1 Global Step: 19450 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:56:16,234-Speed 5514.58 samples/sec Loss 11.3956 LearningRate 0.2730 Epoch: 1 Global Step: 19460 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:56:23,752-Speed 5448.81 samples/sec Loss 11.4424 LearningRate 0.2729 Epoch: 1 Global Step: 19470 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:56:31,290-Speed 5434.50 samples/sec Loss 11.3618 LearningRate 0.2729 Epoch: 1 Global Step: 19480 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-01-07 22:56:38,821-Speed 5439.46 samples/sec Loss 11.4350 LearningRate 0.2729 Epoch: 1 Global Step: 19490 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:56:46,364-Speed 5431.33 samples/sec Loss 11.4035 LearningRate 0.2728 Epoch: 1 Global Step: 19500 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:56:53,843-Speed 5477.02 samples/sec Loss 11.3216 LearningRate 0.2728 Epoch: 1 Global Step: 19510 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:57:01,413-Speed 5411.72 samples/sec Loss 11.3007 LearningRate 0.2728 Epoch: 1 Global Step: 19520 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:57:08,841-Speed 5514.67 samples/sec Loss 11.3491 LearningRate 0.2727 Epoch: 1 Global Step: 19530 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:57:16,327-Speed 5472.72 samples/sec Loss 11.3888 LearningRate 0.2727 Epoch: 1 Global Step: 19540 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:57:23,786-Speed 5491.77 samples/sec Loss 11.3088 LearningRate 0.2727 Epoch: 1 Global Step: 19550 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:57:31,232-Speed 5501.27 samples/sec Loss 11.3392 LearningRate 0.2727 Epoch: 1 Global Step: 19560 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:57:38,756-Speed 5444.75 samples/sec Loss 11.4129 LearningRate 0.2726 Epoch: 1 Global Step: 19570 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:57:46,205-Speed 5500.01 samples/sec Loss 11.4412 LearningRate 0.2726 Epoch: 1 Global Step: 19580 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 22:57:53,724-Speed 5448.04 samples/sec Loss 11.3442 LearningRate 0.2726 Epoch: 1 Global Step: 19590 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:58:01,220-Speed 5464.89 samples/sec Loss 11.3118 LearningRate 0.2725 Epoch: 1 Global Step: 19600 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:58:08,781-Speed 5418.50 samples/sec Loss 11.4212 LearningRate 0.2725 Epoch: 1 Global Step: 19610 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:58:16,277-Speed 5464.89 samples/sec Loss 11.2983 LearningRate 0.2725 Epoch: 1 Global Step: 19620 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:58:23,774-Speed 5464.34 samples/sec Loss 11.4236 LearningRate 0.2725 Epoch: 1 Global Step: 19630 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:58:31,301-Speed 5442.04 samples/sec Loss 11.3561 LearningRate 0.2724 Epoch: 1 Global Step: 19640 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:58:38,776-Speed 5480.62 samples/sec Loss 11.3336 LearningRate 0.2724 Epoch: 1 Global Step: 19650 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:58:46,248-Speed 5482.43 samples/sec Loss 11.4114 LearningRate 0.2724 Epoch: 1 Global Step: 19660 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:58:53,753-Speed 5458.93 samples/sec Loss 11.3962 LearningRate 0.2723 Epoch: 1 Global Step: 19670 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:59:01,280-Speed 5442.38 samples/sec Loss 11.3670 LearningRate 0.2723 Epoch: 1 Global Step: 19680 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:59:08,802-Speed 5445.58 samples/sec Loss 11.4137 LearningRate 0.2723 Epoch: 1 Global Step: 19690 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:59:16,329-Speed 5442.56 samples/sec Loss 11.3418 LearningRate 0.2723 Epoch: 1 Global Step: 19700 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:59:23,866-Speed 5435.65 samples/sec Loss 11.3499 LearningRate 0.2722 Epoch: 1 Global Step: 19710 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 22:59:31,392-Speed 5442.87 samples/sec Loss 11.3578 LearningRate 0.2722 Epoch: 1 Global Step: 19720 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:59:39,017-Speed 5372.39 samples/sec Loss 11.3200 LearningRate 0.2722 Epoch: 1 Global Step: 19730 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:59:46,577-Speed 5418.98 samples/sec Loss 11.4350 LearningRate 0.2721 Epoch: 1 Global Step: 19740 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 22:59:54,176-Speed 5390.77 samples/sec Loss 11.3699 LearningRate 0.2721 Epoch: 1 Global Step: 19750 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:00:01,884-Speed 5314.41 samples/sec Loss 11.4183 LearningRate 0.2721 Epoch: 1 Global Step: 19760 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:00:09,453-Speed 5412.05 samples/sec Loss 11.3284 LearningRate 0.2721 Epoch: 1 Global Step: 19770 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:00:17,001-Speed 5428.04 samples/sec Loss 11.3513 LearningRate 0.2720 Epoch: 1 Global Step: 19780 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:00:24,558-Speed 5420.75 samples/sec Loss 11.3301 LearningRate 0.2720 Epoch: 1 Global Step: 19790 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:00:32,097-Speed 5433.06 samples/sec Loss 11.3647 LearningRate 0.2720 Epoch: 1 Global Step: 19800 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:00:39,525-Speed 5515.23 samples/sec Loss 11.4014 LearningRate 0.2719 Epoch: 1 Global Step: 19810 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:00:47,105-Speed 5404.72 samples/sec Loss 11.3541 LearningRate 0.2719 Epoch: 1 Global Step: 19820 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:00:54,658-Speed 5423.47 samples/sec Loss 11.3811 LearningRate 0.2719 Epoch: 1 Global Step: 19830 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:01:02,155-Speed 5463.90 samples/sec Loss 11.4022 LearningRate 0.2718 Epoch: 1 Global Step: 19840 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:01:09,652-Speed 5464.92 samples/sec Loss 11.2866 LearningRate 0.2718 Epoch: 1 Global Step: 19850 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:01:17,143-Speed 5468.87 samples/sec Loss 11.3362 LearningRate 0.2718 Epoch: 1 Global Step: 19860 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:01:24,563-Speed 5520.79 samples/sec Loss 11.3508 LearningRate 0.2718 Epoch: 1 Global Step: 19870 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:01:32,042-Speed 5476.63 samples/sec Loss 11.2402 LearningRate 0.2717 Epoch: 1 Global Step: 19880 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:01:39,631-Speed 5398.12 samples/sec Loss 11.2721 LearningRate 0.2717 Epoch: 1 Global Step: 19890 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:01:47,107-Speed 5479.88 samples/sec Loss 11.3614 LearningRate 0.2717 Epoch: 1 Global Step: 19900 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:01:54,648-Speed 5432.52 samples/sec Loss 11.3585 LearningRate 0.2716 Epoch: 1 Global Step: 19910 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:02:02,157-Speed 5454.96 samples/sec Loss 11.3435 LearningRate 0.2716 Epoch: 1 Global Step: 19920 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:02:09,635-Speed 5479.43 samples/sec Loss 11.4228 LearningRate 0.2716 Epoch: 1 Global Step: 19930 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:02:17,144-Speed 5455.13 samples/sec Loss 11.3351 LearningRate 0.2716 Epoch: 1 Global Step: 19940 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:02:24,637-Speed 5467.37 samples/sec Loss 11.3418 LearningRate 0.2715 Epoch: 1 Global Step: 19950 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:02:32,153-Speed 5449.73 samples/sec Loss 11.3319 LearningRate 0.2715 Epoch: 1 Global Step: 19960 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:02:39,624-Speed 5484.09 samples/sec Loss 11.3341 LearningRate 0.2715 Epoch: 1 Global Step: 19970 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:02:47,086-Speed 5490.03 samples/sec Loss 11.3460 LearningRate 0.2714 Epoch: 1 Global Step: 19980 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:02:54,575-Speed 5469.83 samples/sec Loss 11.3015 LearningRate 0.2714 Epoch: 1 Global Step: 19990 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:03:02,047-Speed 5481.58 samples/sec Loss 11.3117 LearningRate 0.2714 Epoch: 1 Global Step: 20000 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:03:47,077-[lfw][20000]XNorm: 22.719998 Training: 2022-01-07 23:03:47,078-[lfw][20000]Accuracy-Flip: 0.99617+-0.00299 Training: 2022-01-07 23:03:47,078-[lfw][20000]Accuracy-Highest: 0.99700 Training: 2022-01-07 23:04:39,975-[cfp_fp][20000]XNorm: 20.118125 Training: 2022-01-07 23:04:39,976-[cfp_fp][20000]Accuracy-Flip: 0.97986+-0.00501 Training: 2022-01-07 23:04:39,977-[cfp_fp][20000]Accuracy-Highest: 0.97986 Training: 2022-01-07 23:05:25,332-[agedb_30][20000]XNorm: 22.496124 Training: 2022-01-07 23:05:25,333-[agedb_30][20000]Accuracy-Flip: 0.96583+-0.00761 Training: 2022-01-07 23:05:25,334-[agedb_30][20000]Accuracy-Highest: 0.96583 Training: 2022-01-07 23:05:32,906-Speed 271.52 samples/sec Loss 11.3580 LearningRate 0.2714 Epoch: 1 Global Step: 20010 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:05:40,405-Speed 5464.26 samples/sec Loss 11.3398 LearningRate 0.2713 Epoch: 1 Global Step: 20020 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:05:48,023-Speed 5377.30 samples/sec Loss 11.3315 LearningRate 0.2713 Epoch: 1 Global Step: 20030 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:05:55,575-Speed 5425.39 samples/sec Loss 11.2960 LearningRate 0.2713 Epoch: 1 Global Step: 20040 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:06:03,165-Speed 5397.27 samples/sec Loss 11.2791 LearningRate 0.2712 Epoch: 1 Global Step: 20050 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:06:10,621-Speed 5495.30 samples/sec Loss 11.2852 LearningRate 0.2712 Epoch: 1 Global Step: 20060 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:06:18,182-Speed 5418.45 samples/sec Loss 11.2641 LearningRate 0.2712 Epoch: 1 Global Step: 20070 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:06:25,745-Speed 5416.47 samples/sec Loss 11.3550 LearningRate 0.2712 Epoch: 1 Global Step: 20080 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:06:33,260-Speed 5451.72 samples/sec Loss 11.2663 LearningRate 0.2711 Epoch: 1 Global Step: 20090 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:06:40,834-Speed 5408.95 samples/sec Loss 11.3102 LearningRate 0.2711 Epoch: 1 Global Step: 20100 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:06:48,295-Speed 5491.72 samples/sec Loss 11.2876 LearningRate 0.2711 Epoch: 1 Global Step: 20110 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:06:55,785-Speed 5469.26 samples/sec Loss 11.3519 LearningRate 0.2710 Epoch: 1 Global Step: 20120 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:07:03,334-Speed 5426.91 samples/sec Loss 11.2656 LearningRate 0.2710 Epoch: 1 Global Step: 20130 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:07:10,800-Speed 5487.15 samples/sec Loss 11.3329 LearningRate 0.2710 Epoch: 1 Global Step: 20140 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:07:18,423-Speed 5374.77 samples/sec Loss 11.2630 LearningRate 0.2710 Epoch: 1 Global Step: 20150 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:07:25,919-Speed 5465.43 samples/sec Loss 11.3111 LearningRate 0.2709 Epoch: 1 Global Step: 20160 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:07:33,474-Speed 5422.83 samples/sec Loss 11.3854 LearningRate 0.2709 Epoch: 1 Global Step: 20170 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 23:07:41,142-Speed 5342.97 samples/sec Loss 11.2381 LearningRate 0.2709 Epoch: 1 Global Step: 20180 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:07:48,714-Speed 5410.12 samples/sec Loss 11.3156 LearningRate 0.2708 Epoch: 1 Global Step: 20190 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:07:56,327-Speed 5381.31 samples/sec Loss 11.3037 LearningRate 0.2708 Epoch: 1 Global Step: 20200 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:08:03,884-Speed 5421.21 samples/sec Loss 11.3119 LearningRate 0.2708 Epoch: 1 Global Step: 20210 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:08:11,030-Speed 5733.07 samples/sec Loss 11.2783 LearningRate 0.2707 Epoch: 1 Global Step: 20220 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:08:18,436-Speed 5532.37 samples/sec Loss 11.2766 LearningRate 0.2707 Epoch: 1 Global Step: 20230 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:08:25,926-Speed 5469.05 samples/sec Loss 11.3240 LearningRate 0.2707 Epoch: 1 Global Step: 20240 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:08:33,467-Speed 5432.89 samples/sec Loss 11.3496 LearningRate 0.2707 Epoch: 1 Global Step: 20250 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:08:41,027-Speed 5418.70 samples/sec Loss 11.3046 LearningRate 0.2706 Epoch: 1 Global Step: 20260 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:08:48,433-Speed 5532.17 samples/sec Loss 11.3640 LearningRate 0.2706 Epoch: 1 Global Step: 20270 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:08:55,698-Speed 5639.59 samples/sec Loss 11.2480 LearningRate 0.2706 Epoch: 1 Global Step: 20280 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 23:09:03,148-Speed 5499.32 samples/sec Loss 11.2973 LearningRate 0.2705 Epoch: 1 Global Step: 20290 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:09:10,621-Speed 5481.72 samples/sec Loss 11.2669 LearningRate 0.2705 Epoch: 1 Global Step: 20300 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:09:18,153-Speed 5439.45 samples/sec Loss 11.2557 LearningRate 0.2705 Epoch: 1 Global Step: 20310 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:09:25,681-Speed 5442.87 samples/sec Loss 11.2432 LearningRate 0.2705 Epoch: 1 Global Step: 20320 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:09:33,137-Speed 5494.26 samples/sec Loss 11.2446 LearningRate 0.2704 Epoch: 1 Global Step: 20330 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:09:40,694-Speed 5421.70 samples/sec Loss 11.2161 LearningRate 0.2704 Epoch: 1 Global Step: 20340 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:09:48,127-Speed 5511.01 samples/sec Loss 11.2089 LearningRate 0.2704 Epoch: 1 Global Step: 20350 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:09:55,494-Speed 5561.70 samples/sec Loss 11.2207 LearningRate 0.2703 Epoch: 1 Global Step: 20360 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:10:02,996-Speed 5460.78 samples/sec Loss 11.2980 LearningRate 0.2703 Epoch: 1 Global Step: 20370 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:10:10,444-Speed 5500.95 samples/sec Loss 11.2715 LearningRate 0.2703 Epoch: 1 Global Step: 20380 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:10:17,853-Speed 5529.62 samples/sec Loss 11.3481 LearningRate 0.2703 Epoch: 1 Global Step: 20390 Fp16 Grad Scale: 262144 Required: 43 hours Training: 2022-01-07 23:10:25,308-Speed 5495.78 samples/sec Loss 11.3243 LearningRate 0.2702 Epoch: 1 Global Step: 20400 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:10:32,763-Speed 5495.94 samples/sec Loss 11.2270 LearningRate 0.2702 Epoch: 1 Global Step: 20410 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:10:40,340-Speed 5406.62 samples/sec Loss 11.2360 LearningRate 0.2702 Epoch: 1 Global Step: 20420 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:10:47,826-Speed 5472.54 samples/sec Loss 11.2690 LearningRate 0.2701 Epoch: 1 Global Step: 20430 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:10:55,262-Speed 5509.57 samples/sec Loss 11.3234 LearningRate 0.2701 Epoch: 1 Global Step: 20440 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:11:02,804-Speed 5432.24 samples/sec Loss 11.2100 LearningRate 0.2701 Epoch: 1 Global Step: 20450 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:11:10,510-Speed 5316.33 samples/sec Loss 11.2470 LearningRate 0.2701 Epoch: 1 Global Step: 20460 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:11:18,118-Speed 5384.80 samples/sec Loss 11.2144 LearningRate 0.2700 Epoch: 1 Global Step: 20470 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:11:25,658-Speed 5433.32 samples/sec Loss 11.2968 LearningRate 0.2700 Epoch: 1 Global Step: 20480 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:11:33,215-Speed 5421.32 samples/sec Loss 11.2519 LearningRate 0.2700 Epoch: 1 Global Step: 20490 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:11:40,728-Speed 5453.40 samples/sec Loss 11.2700 LearningRate 0.2699 Epoch: 1 Global Step: 20500 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:11:48,355-Speed 5371.22 samples/sec Loss 11.3111 LearningRate 0.2699 Epoch: 1 Global Step: 20510 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:11:56,027-Speed 5339.93 samples/sec Loss 11.2574 LearningRate 0.2699 Epoch: 1 Global Step: 20520 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:12:03,533-Speed 5458.14 samples/sec Loss 11.2552 LearningRate 0.2699 Epoch: 1 Global Step: 20530 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:12:10,980-Speed 5501.73 samples/sec Loss 11.2817 LearningRate 0.2698 Epoch: 1 Global Step: 20540 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:12:18,606-Speed 5372.25 samples/sec Loss 11.1893 LearningRate 0.2698 Epoch: 1 Global Step: 20550 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:12:26,099-Speed 5467.41 samples/sec Loss 11.2846 LearningRate 0.2698 Epoch: 1 Global Step: 20560 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:12:33,588-Speed 5470.68 samples/sec Loss 11.2810 LearningRate 0.2697 Epoch: 1 Global Step: 20570 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:12:41,005-Speed 5523.31 samples/sec Loss 11.3462 LearningRate 0.2697 Epoch: 1 Global Step: 20580 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:12:48,454-Speed 5500.21 samples/sec Loss 11.1859 LearningRate 0.2697 Epoch: 1 Global Step: 20590 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:12:55,914-Speed 5491.91 samples/sec Loss 11.2664 LearningRate 0.2697 Epoch: 1 Global Step: 20600 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:13:03,363-Speed 5499.14 samples/sec Loss 11.2259 LearningRate 0.2696 Epoch: 1 Global Step: 20610 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:13:10,895-Speed 5439.65 samples/sec Loss 11.2159 LearningRate 0.2696 Epoch: 1 Global Step: 20620 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:13:18,541-Speed 5358.23 samples/sec Loss 11.2825 LearningRate 0.2696 Epoch: 1 Global Step: 20630 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:13:26,011-Speed 5484.11 samples/sec Loss 11.1912 LearningRate 0.2695 Epoch: 1 Global Step: 20640 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:13:33,481-Speed 5484.42 samples/sec Loss 11.1943 LearningRate 0.2695 Epoch: 1 Global Step: 20650 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:13:40,934-Speed 5497.00 samples/sec Loss 11.2139 LearningRate 0.2695 Epoch: 1 Global Step: 20660 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:13:48,600-Speed 5344.66 samples/sec Loss 11.2116 LearningRate 0.2694 Epoch: 1 Global Step: 20670 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:13:56,092-Speed 5468.14 samples/sec Loss 11.1590 LearningRate 0.2694 Epoch: 1 Global Step: 20680 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:14:03,559-Speed 5486.86 samples/sec Loss 11.2240 LearningRate 0.2694 Epoch: 1 Global Step: 20690 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:14:11,120-Speed 5418.30 samples/sec Loss 11.1120 LearningRate 0.2694 Epoch: 1 Global Step: 20700 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:14:18,565-Speed 5503.06 samples/sec Loss 11.1850 LearningRate 0.2693 Epoch: 1 Global Step: 20710 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:14:26,080-Speed 5451.24 samples/sec Loss 11.1698 LearningRate 0.2693 Epoch: 1 Global Step: 20720 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:14:33,568-Speed 5471.19 samples/sec Loss 11.2548 LearningRate 0.2693 Epoch: 1 Global Step: 20730 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:14:40,985-Speed 5523.92 samples/sec Loss 11.1771 LearningRate 0.2692 Epoch: 1 Global Step: 20740 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:15:04,074-Speed 1774.14 samples/sec Loss 11.2344 LearningRate 0.2692 Epoch: 2 Global Step: 20750 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:15:11,537-Speed 5490.17 samples/sec Loss 11.1909 LearningRate 0.2692 Epoch: 2 Global Step: 20760 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:15:19,074-Speed 5435.66 samples/sec Loss 11.2265 LearningRate 0.2692 Epoch: 2 Global Step: 20770 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:15:26,759-Speed 5331.02 samples/sec Loss 11.2276 LearningRate 0.2691 Epoch: 2 Global Step: 20780 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:15:34,359-Speed 5390.40 samples/sec Loss 11.2022 LearningRate 0.2691 Epoch: 2 Global Step: 20790 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-01-07 23:15:41,821-Speed 5490.39 samples/sec Loss 11.1431 LearningRate 0.2691 Epoch: 2 Global Step: 20800 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:15:49,211-Speed 5543.47 samples/sec Loss 11.2018 LearningRate 0.2690 Epoch: 2 Global Step: 20810 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:15:56,652-Speed 5506.02 samples/sec Loss 11.2193 LearningRate 0.2690 Epoch: 2 Global Step: 20820 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:16:04,108-Speed 5495.58 samples/sec Loss 11.1743 LearningRate 0.2690 Epoch: 2 Global Step: 20830 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:16:11,585-Speed 5479.37 samples/sec Loss 11.1841 LearningRate 0.2690 Epoch: 2 Global Step: 20840 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:16:19,052-Speed 5486.06 samples/sec Loss 11.1986 LearningRate 0.2689 Epoch: 2 Global Step: 20850 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:16:26,500-Speed 5499.93 samples/sec Loss 11.1687 LearningRate 0.2689 Epoch: 2 Global Step: 20860 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:16:33,924-Speed 5517.90 samples/sec Loss 11.1680 LearningRate 0.2689 Epoch: 2 Global Step: 20870 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:16:41,366-Speed 5504.88 samples/sec Loss 11.1903 LearningRate 0.2688 Epoch: 2 Global Step: 20880 Fp16 Grad Scale: 131072 Required: 43 hours Training: 2022-01-07 23:16:48,779-Speed 5526.28 samples/sec Loss 11.1951 LearningRate 0.2688 Epoch: 2 Global Step: 20890 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:16:56,270-Speed 5468.66 samples/sec Loss 11.1634 LearningRate 0.2688 Epoch: 2 Global Step: 20900 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:17:03,836-Speed 5414.32 samples/sec Loss 11.1071 LearningRate 0.2688 Epoch: 2 Global Step: 20910 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:17:11,401-Speed 5415.75 samples/sec Loss 11.1754 LearningRate 0.2687 Epoch: 2 Global Step: 20920 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:17:18,943-Speed 5431.37 samples/sec Loss 11.2178 LearningRate 0.2687 Epoch: 2 Global Step: 20930 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:17:26,409-Speed 5486.86 samples/sec Loss 11.1054 LearningRate 0.2687 Epoch: 2 Global Step: 20940 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:17:33,926-Speed 5449.98 samples/sec Loss 11.1838 LearningRate 0.2686 Epoch: 2 Global Step: 20950 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:17:41,458-Speed 5438.63 samples/sec Loss 11.2068 LearningRate 0.2686 Epoch: 2 Global Step: 20960 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:17:48,955-Speed 5464.18 samples/sec Loss 11.1927 LearningRate 0.2686 Epoch: 2 Global Step: 20970 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:17:56,431-Speed 5479.50 samples/sec Loss 11.2208 LearningRate 0.2686 Epoch: 2 Global Step: 20980 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:18:04,041-Speed 5383.24 samples/sec Loss 11.1227 LearningRate 0.2685 Epoch: 2 Global Step: 20990 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:18:11,561-Speed 5447.41 samples/sec Loss 11.2397 LearningRate 0.2685 Epoch: 2 Global Step: 21000 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:18:19,119-Speed 5420.61 samples/sec Loss 11.1767 LearningRate 0.2685 Epoch: 2 Global Step: 21010 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:18:26,632-Speed 5452.78 samples/sec Loss 11.2045 LearningRate 0.2684 Epoch: 2 Global Step: 21020 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:18:34,190-Speed 5419.94 samples/sec Loss 11.2293 LearningRate 0.2684 Epoch: 2 Global Step: 21030 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:18:41,709-Speed 5448.62 samples/sec Loss 11.2026 LearningRate 0.2684 Epoch: 2 Global Step: 21040 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:18:49,185-Speed 5479.43 samples/sec Loss 11.1103 LearningRate 0.2684 Epoch: 2 Global Step: 21050 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:18:56,652-Speed 5486.31 samples/sec Loss 11.2185 LearningRate 0.2683 Epoch: 2 Global Step: 21060 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:19:04,108-Speed 5494.54 samples/sec Loss 11.1168 LearningRate 0.2683 Epoch: 2 Global Step: 21070 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:19:11,582-Speed 5480.84 samples/sec Loss 11.1028 LearningRate 0.2683 Epoch: 2 Global Step: 21080 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:19:19,109-Speed 5442.65 samples/sec Loss 11.1265 LearningRate 0.2682 Epoch: 2 Global Step: 21090 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:19:26,562-Speed 5496.16 samples/sec Loss 11.1532 LearningRate 0.2682 Epoch: 2 Global Step: 21100 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:19:34,019-Speed 5493.88 samples/sec Loss 11.0977 LearningRate 0.2682 Epoch: 2 Global Step: 21110 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:19:41,497-Speed 5477.77 samples/sec Loss 11.0692 LearningRate 0.2682 Epoch: 2 Global Step: 21120 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:19:48,996-Speed 5462.78 samples/sec Loss 11.0748 LearningRate 0.2681 Epoch: 2 Global Step: 21130 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:19:56,444-Speed 5500.67 samples/sec Loss 11.1107 LearningRate 0.2681 Epoch: 2 Global Step: 21140 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:20:03,868-Speed 5517.53 samples/sec Loss 11.1992 LearningRate 0.2681 Epoch: 2 Global Step: 21150 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:20:11,300-Speed 5512.45 samples/sec Loss 11.0954 LearningRate 0.2680 Epoch: 2 Global Step: 21160 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:20:18,756-Speed 5494.18 samples/sec Loss 11.1986 LearningRate 0.2680 Epoch: 2 Global Step: 21170 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:20:26,242-Speed 5473.02 samples/sec Loss 11.1239 LearningRate 0.2680 Epoch: 2 Global Step: 21180 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:20:33,732-Speed 5468.84 samples/sec Loss 11.1371 LearningRate 0.2679 Epoch: 2 Global Step: 21190 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:20:41,163-Speed 5512.81 samples/sec Loss 11.1509 LearningRate 0.2679 Epoch: 2 Global Step: 21200 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:20:48,585-Speed 5519.91 samples/sec Loss 11.1505 LearningRate 0.2679 Epoch: 2 Global Step: 21210 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:20:56,136-Speed 5425.03 samples/sec Loss 11.1121 LearningRate 0.2679 Epoch: 2 Global Step: 21220 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:21:03,556-Speed 5521.03 samples/sec Loss 11.1761 LearningRate 0.2678 Epoch: 2 Global Step: 21230 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:21:10,985-Speed 5514.14 samples/sec Loss 11.1769 LearningRate 0.2678 Epoch: 2 Global Step: 21240 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:21:18,530-Speed 5430.04 samples/sec Loss 11.1851 LearningRate 0.2678 Epoch: 2 Global Step: 21250 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:21:25,965-Speed 5509.58 samples/sec Loss 11.1297 LearningRate 0.2677 Epoch: 2 Global Step: 21260 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:21:33,383-Speed 5522.42 samples/sec Loss 11.1595 LearningRate 0.2677 Epoch: 2 Global Step: 21270 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:21:40,796-Speed 5525.83 samples/sec Loss 11.1465 LearningRate 0.2677 Epoch: 2 Global Step: 21280 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:21:48,227-Speed 5513.37 samples/sec Loss 11.1575 LearningRate 0.2677 Epoch: 2 Global Step: 21290 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:21:55,710-Speed 5474.64 samples/sec Loss 11.1527 LearningRate 0.2676 Epoch: 2 Global Step: 21300 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:22:03,309-Speed 5390.07 samples/sec Loss 11.1026 LearningRate 0.2676 Epoch: 2 Global Step: 21310 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:22:10,798-Speed 5470.32 samples/sec Loss 11.1308 LearningRate 0.2676 Epoch: 2 Global Step: 21320 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:22:18,329-Speed 5439.74 samples/sec Loss 11.0686 LearningRate 0.2675 Epoch: 2 Global Step: 21330 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:22:25,779-Speed 5498.74 samples/sec Loss 11.1765 LearningRate 0.2675 Epoch: 2 Global Step: 21340 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:22:33,207-Speed 5514.98 samples/sec Loss 11.0577 LearningRate 0.2675 Epoch: 2 Global Step: 21350 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:22:40,687-Speed 5476.30 samples/sec Loss 11.0631 LearningRate 0.2675 Epoch: 2 Global Step: 21360 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:22:48,085-Speed 5537.54 samples/sec Loss 11.1136 LearningRate 0.2674 Epoch: 2 Global Step: 21370 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:22:55,525-Speed 5506.17 samples/sec Loss 11.1260 LearningRate 0.2674 Epoch: 2 Global Step: 21380 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:23:02,924-Speed 5536.69 samples/sec Loss 11.0848 LearningRate 0.2674 Epoch: 2 Global Step: 21390 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:23:10,408-Speed 5473.19 samples/sec Loss 11.1314 LearningRate 0.2673 Epoch: 2 Global Step: 21400 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:23:17,839-Speed 5513.54 samples/sec Loss 11.1276 LearningRate 0.2673 Epoch: 2 Global Step: 21410 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:23:25,398-Speed 5419.62 samples/sec Loss 11.1029 LearningRate 0.2673 Epoch: 2 Global Step: 21420 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:23:32,859-Speed 5490.45 samples/sec Loss 11.1373 LearningRate 0.2673 Epoch: 2 Global Step: 21430 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:23:40,272-Speed 5525.82 samples/sec Loss 11.1771 LearningRate 0.2672 Epoch: 2 Global Step: 21440 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:23:47,814-Speed 5431.91 samples/sec Loss 11.0597 LearningRate 0.2672 Epoch: 2 Global Step: 21450 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:23:55,245-Speed 5513.01 samples/sec Loss 11.1414 LearningRate 0.2672 Epoch: 2 Global Step: 21460 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:24:02,647-Speed 5534.03 samples/sec Loss 11.0790 LearningRate 0.2671 Epoch: 2 Global Step: 21470 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:24:10,108-Speed 5490.58 samples/sec Loss 11.0864 LearningRate 0.2671 Epoch: 2 Global Step: 21480 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:24:17,539-Speed 5513.00 samples/sec Loss 11.1491 LearningRate 0.2671 Epoch: 2 Global Step: 21490 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:24:25,013-Speed 5481.67 samples/sec Loss 11.1665 LearningRate 0.2671 Epoch: 2 Global Step: 21500 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:24:32,490-Speed 5478.32 samples/sec Loss 11.1332 LearningRate 0.2670 Epoch: 2 Global Step: 21510 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:24:39,970-Speed 5476.78 samples/sec Loss 11.0830 LearningRate 0.2670 Epoch: 2 Global Step: 21520 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:24:47,376-Speed 5531.39 samples/sec Loss 11.1303 LearningRate 0.2670 Epoch: 2 Global Step: 21530 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:24:55,864-Speed 4826.44 samples/sec Loss 11.1614 LearningRate 0.2669 Epoch: 2 Global Step: 21540 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:25:03,313-Speed 5499.20 samples/sec Loss 11.1160 LearningRate 0.2669 Epoch: 2 Global Step: 21550 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:25:10,753-Speed 5506.53 samples/sec Loss 11.0824 LearningRate 0.2669 Epoch: 2 Global Step: 21560 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:25:18,190-Speed 5508.31 samples/sec Loss 11.1342 LearningRate 0.2669 Epoch: 2 Global Step: 21570 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:25:25,585-Speed 5539.37 samples/sec Loss 11.0231 LearningRate 0.2668 Epoch: 2 Global Step: 21580 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:25:33,044-Speed 5492.52 samples/sec Loss 11.1483 LearningRate 0.2668 Epoch: 2 Global Step: 21590 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:25:40,473-Speed 5513.82 samples/sec Loss 11.0752 LearningRate 0.2668 Epoch: 2 Global Step: 21600 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:25:47,961-Speed 5471.14 samples/sec Loss 11.1516 LearningRate 0.2667 Epoch: 2 Global Step: 21610 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:25:55,404-Speed 5503.52 samples/sec Loss 11.1113 LearningRate 0.2667 Epoch: 2 Global Step: 21620 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:26:02,838-Speed 5510.54 samples/sec Loss 11.1186 LearningRate 0.2667 Epoch: 2 Global Step: 21630 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:26:10,254-Speed 5524.70 samples/sec Loss 11.1081 LearningRate 0.2667 Epoch: 2 Global Step: 21640 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:26:17,772-Speed 5448.96 samples/sec Loss 11.1352 LearningRate 0.2666 Epoch: 2 Global Step: 21650 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:26:25,209-Speed 5507.98 samples/sec Loss 11.1775 LearningRate 0.2666 Epoch: 2 Global Step: 21660 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:26:32,678-Speed 5484.87 samples/sec Loss 11.1357 LearningRate 0.2666 Epoch: 2 Global Step: 21670 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:26:40,118-Speed 5506.64 samples/sec Loss 11.1749 LearningRate 0.2665 Epoch: 2 Global Step: 21680 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:26:47,671-Speed 5423.21 samples/sec Loss 11.0567 LearningRate 0.2665 Epoch: 2 Global Step: 21690 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:26:55,296-Speed 5372.75 samples/sec Loss 11.1511 LearningRate 0.2665 Epoch: 2 Global Step: 21700 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:27:02,864-Speed 5412.85 samples/sec Loss 11.1755 LearningRate 0.2665 Epoch: 2 Global Step: 21710 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:27:10,445-Speed 5404.64 samples/sec Loss 11.1895 LearningRate 0.2664 Epoch: 2 Global Step: 21720 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:27:17,873-Speed 5514.43 samples/sec Loss 11.0780 LearningRate 0.2664 Epoch: 2 Global Step: 21730 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:27:25,328-Speed 5494.85 samples/sec Loss 11.0504 LearningRate 0.2664 Epoch: 2 Global Step: 21740 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:27:32,810-Speed 5475.32 samples/sec Loss 11.0496 LearningRate 0.2663 Epoch: 2 Global Step: 21750 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:27:40,356-Speed 5429.26 samples/sec Loss 11.0780 LearningRate 0.2663 Epoch: 2 Global Step: 21760 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:27:47,874-Speed 5448.75 samples/sec Loss 11.1490 LearningRate 0.2663 Epoch: 2 Global Step: 21770 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:27:55,340-Speed 5486.92 samples/sec Loss 11.0733 LearningRate 0.2663 Epoch: 2 Global Step: 21780 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:28:02,762-Speed 5519.28 samples/sec Loss 11.1463 LearningRate 0.2662 Epoch: 2 Global Step: 21790 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:28:10,176-Speed 5526.22 samples/sec Loss 11.0484 LearningRate 0.2662 Epoch: 2 Global Step: 21800 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:28:17,589-Speed 5525.79 samples/sec Loss 11.0198 LearningRate 0.2662 Epoch: 2 Global Step: 21810 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:28:25,204-Speed 5379.74 samples/sec Loss 11.0803 LearningRate 0.2661 Epoch: 2 Global Step: 21820 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:28:32,718-Speed 5451.25 samples/sec Loss 11.0680 LearningRate 0.2661 Epoch: 2 Global Step: 21830 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:28:40,184-Speed 5487.42 samples/sec Loss 11.1270 LearningRate 0.2661 Epoch: 2 Global Step: 21840 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:28:47,606-Speed 5519.32 samples/sec Loss 11.1066 LearningRate 0.2661 Epoch: 2 Global Step: 21850 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:28:55,018-Speed 5527.33 samples/sec Loss 11.0477 LearningRate 0.2660 Epoch: 2 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:29:02,462-Speed 5502.68 samples/sec Loss 11.0503 LearningRate 0.2660 Epoch: 2 Global Step: 21870 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:29:09,983-Speed 5447.76 samples/sec Loss 11.0110 LearningRate 0.2660 Epoch: 2 Global Step: 21880 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:29:17,433-Speed 5498.04 samples/sec Loss 11.0111 LearningRate 0.2659 Epoch: 2 Global Step: 21890 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:29:24,908-Speed 5480.59 samples/sec Loss 11.0166 LearningRate 0.2659 Epoch: 2 Global Step: 21900 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:29:32,353-Speed 5502.14 samples/sec Loss 11.0615 LearningRate 0.2659 Epoch: 2 Global Step: 21910 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:29:39,805-Speed 5498.00 samples/sec Loss 11.0283 LearningRate 0.2659 Epoch: 2 Global Step: 21920 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:29:47,219-Speed 5525.37 samples/sec Loss 11.0350 LearningRate 0.2658 Epoch: 2 Global Step: 21930 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:29:54,666-Speed 5500.41 samples/sec Loss 11.0288 LearningRate 0.2658 Epoch: 2 Global Step: 21940 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:30:02,154-Speed 5471.57 samples/sec Loss 11.0077 LearningRate 0.2658 Epoch: 2 Global Step: 21950 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:30:09,562-Speed 5529.84 samples/sec Loss 11.0629 LearningRate 0.2657 Epoch: 2 Global Step: 21960 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:30:16,961-Speed 5536.65 samples/sec Loss 11.0466 LearningRate 0.2657 Epoch: 2 Global Step: 21970 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:30:24,384-Speed 5518.71 samples/sec Loss 10.9941 LearningRate 0.2657 Epoch: 2 Global Step: 21980 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:30:31,795-Speed 5527.30 samples/sec Loss 11.0680 LearningRate 0.2657 Epoch: 2 Global Step: 21990 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:30:39,231-Speed 5510.05 samples/sec Loss 11.1854 LearningRate 0.2656 Epoch: 2 Global Step: 22000 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:31:23,985-[lfw][22000]XNorm: 23.268626 Training: 2022-01-07 23:31:23,986-[lfw][22000]Accuracy-Flip: 0.99667+-0.00307 Training: 2022-01-07 23:31:23,987-[lfw][22000]Accuracy-Highest: 0.99700 Training: 2022-01-07 23:32:17,662-[cfp_fp][22000]XNorm: 20.725033 Training: 2022-01-07 23:32:17,663-[cfp_fp][22000]Accuracy-Flip: 0.97686+-0.01000 Training: 2022-01-07 23:32:17,664-[cfp_fp][22000]Accuracy-Highest: 0.97986 Training: 2022-01-07 23:33:03,455-[agedb_30][22000]XNorm: 22.970369 Training: 2022-01-07 23:33:03,456-[agedb_30][22000]Accuracy-Flip: 0.96883+-0.00658 Training: 2022-01-07 23:33:03,457-[agedb_30][22000]Accuracy-Highest: 0.96883 Training: 2022-01-07 23:33:10,883-Speed 270.09 samples/sec Loss 10.9078 LearningRate 0.2656 Epoch: 2 Global Step: 22010 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:33:18,289-Speed 5532.68 samples/sec Loss 11.0790 LearningRate 0.2656 Epoch: 2 Global Step: 22020 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:33:25,812-Speed 5446.27 samples/sec Loss 11.0612 LearningRate 0.2655 Epoch: 2 Global Step: 22030 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:33:33,235-Speed 5519.43 samples/sec Loss 11.0414 LearningRate 0.2655 Epoch: 2 Global Step: 22040 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:33:40,749-Speed 5452.34 samples/sec Loss 11.0148 LearningRate 0.2655 Epoch: 2 Global Step: 22050 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:33:48,176-Speed 5516.06 samples/sec Loss 11.0341 LearningRate 0.2655 Epoch: 2 Global Step: 22060 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:33:55,621-Speed 5503.04 samples/sec Loss 11.0944 LearningRate 0.2654 Epoch: 2 Global Step: 22070 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:34:03,128-Speed 5457.06 samples/sec Loss 11.0128 LearningRate 0.2654 Epoch: 2 Global Step: 22080 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:34:10,643-Speed 5451.15 samples/sec Loss 10.9913 LearningRate 0.2654 Epoch: 2 Global Step: 22090 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:34:18,238-Speed 5393.15 samples/sec Loss 10.9683 LearningRate 0.2653 Epoch: 2 Global Step: 22100 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:34:25,742-Speed 5459.45 samples/sec Loss 10.9578 LearningRate 0.2653 Epoch: 2 Global Step: 22110 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:34:33,325-Speed 5402.11 samples/sec Loss 10.9648 LearningRate 0.2653 Epoch: 2 Global Step: 22120 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:34:40,863-Speed 5434.57 samples/sec Loss 10.9495 LearningRate 0.2653 Epoch: 2 Global Step: 22130 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:34:48,296-Speed 5511.98 samples/sec Loss 11.0282 LearningRate 0.2652 Epoch: 2 Global Step: 22140 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:34:55,812-Speed 5450.36 samples/sec Loss 11.0063 LearningRate 0.2652 Epoch: 2 Global Step: 22150 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:35:03,521-Speed 5314.15 samples/sec Loss 11.0643 LearningRate 0.2652 Epoch: 2 Global Step: 22160 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:35:11,023-Speed 5460.53 samples/sec Loss 11.0666 LearningRate 0.2651 Epoch: 2 Global Step: 22170 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:35:18,451-Speed 5514.58 samples/sec Loss 11.0221 LearningRate 0.2651 Epoch: 2 Global Step: 22180 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:35:26,003-Speed 5424.61 samples/sec Loss 11.0047 LearningRate 0.2651 Epoch: 2 Global Step: 22190 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:35:33,490-Speed 5471.51 samples/sec Loss 11.0080 LearningRate 0.2651 Epoch: 2 Global Step: 22200 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:35:40,893-Speed 5534.15 samples/sec Loss 10.9454 LearningRate 0.2650 Epoch: 2 Global Step: 22210 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:35:48,343-Speed 5498.08 samples/sec Loss 10.9685 LearningRate 0.2650 Epoch: 2 Global Step: 22220 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:35:55,787-Speed 5503.78 samples/sec Loss 10.9773 LearningRate 0.2650 Epoch: 2 Global Step: 22230 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:36:03,282-Speed 5465.30 samples/sec Loss 11.0890 LearningRate 0.2649 Epoch: 2 Global Step: 22240 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:36:10,923-Speed 5361.19 samples/sec Loss 11.1079 LearningRate 0.2649 Epoch: 2 Global Step: 22250 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:36:18,337-Speed 5525.52 samples/sec Loss 11.0209 LearningRate 0.2649 Epoch: 2 Global Step: 22260 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:36:25,750-Speed 5527.06 samples/sec Loss 11.0060 LearningRate 0.2649 Epoch: 2 Global Step: 22270 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:36:33,228-Speed 5477.42 samples/sec Loss 10.9853 LearningRate 0.2648 Epoch: 2 Global Step: 22280 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:36:40,722-Speed 5466.33 samples/sec Loss 11.0261 LearningRate 0.2648 Epoch: 2 Global Step: 22290 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:36:48,324-Speed 5389.48 samples/sec Loss 10.9754 LearningRate 0.2648 Epoch: 2 Global Step: 22300 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:36:55,759-Speed 5509.63 samples/sec Loss 11.0742 LearningRate 0.2647 Epoch: 2 Global Step: 22310 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:37:03,245-Speed 5472.25 samples/sec Loss 11.0062 LearningRate 0.2647 Epoch: 2 Global Step: 22320 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:37:10,726-Speed 5475.58 samples/sec Loss 11.0283 LearningRate 0.2647 Epoch: 2 Global Step: 22330 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:37:18,172-Speed 5501.98 samples/sec Loss 11.0350 LearningRate 0.2646 Epoch: 2 Global Step: 22340 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:37:25,650-Speed 5478.17 samples/sec Loss 11.0606 LearningRate 0.2646 Epoch: 2 Global Step: 22350 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:37:33,150-Speed 5461.78 samples/sec Loss 11.0135 LearningRate 0.2646 Epoch: 2 Global Step: 22360 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:37:40,600-Speed 5499.35 samples/sec Loss 11.0078 LearningRate 0.2646 Epoch: 2 Global Step: 22370 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:37:48,081-Speed 5475.70 samples/sec Loss 10.9225 LearningRate 0.2645 Epoch: 2 Global Step: 22380 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:37:55,576-Speed 5465.99 samples/sec Loss 10.9720 LearningRate 0.2645 Epoch: 2 Global Step: 22390 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:38:03,010-Speed 5510.59 samples/sec Loss 11.0134 LearningRate 0.2645 Epoch: 2 Global Step: 22400 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:38:10,474-Speed 5488.31 samples/sec Loss 10.9657 LearningRate 0.2644 Epoch: 2 Global Step: 22410 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:38:17,949-Speed 5480.12 samples/sec Loss 11.0306 LearningRate 0.2644 Epoch: 2 Global Step: 22420 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:38:25,407-Speed 5492.97 samples/sec Loss 11.0131 LearningRate 0.2644 Epoch: 2 Global Step: 22430 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:38:32,992-Speed 5401.04 samples/sec Loss 11.0757 LearningRate 0.2644 Epoch: 2 Global Step: 22440 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:38:40,439-Speed 5501.06 samples/sec Loss 10.9950 LearningRate 0.2643 Epoch: 2 Global Step: 22450 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:38:47,889-Speed 5498.95 samples/sec Loss 10.9954 LearningRate 0.2643 Epoch: 2 Global Step: 22460 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:38:55,460-Speed 5410.90 samples/sec Loss 11.0387 LearningRate 0.2643 Epoch: 2 Global Step: 22470 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:39:02,834-Speed 5554.82 samples/sec Loss 11.0630 LearningRate 0.2642 Epoch: 2 Global Step: 22480 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:39:10,315-Speed 5476.25 samples/sec Loss 11.0609 LearningRate 0.2642 Epoch: 2 Global Step: 22490 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:39:17,817-Speed 5460.79 samples/sec Loss 11.0442 LearningRate 0.2642 Epoch: 2 Global Step: 22500 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:39:25,281-Speed 5488.10 samples/sec Loss 11.0258 LearningRate 0.2642 Epoch: 2 Global Step: 22510 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:39:32,744-Speed 5489.53 samples/sec Loss 11.0174 LearningRate 0.2641 Epoch: 2 Global Step: 22520 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:39:40,186-Speed 5504.34 samples/sec Loss 11.0272 LearningRate 0.2641 Epoch: 2 Global Step: 22530 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:39:47,632-Speed 5501.76 samples/sec Loss 11.0278 LearningRate 0.2641 Epoch: 2 Global Step: 22540 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:39:55,201-Speed 5412.37 samples/sec Loss 11.0016 LearningRate 0.2640 Epoch: 2 Global Step: 22550 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:40:02,695-Speed 5466.56 samples/sec Loss 11.0108 LearningRate 0.2640 Epoch: 2 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:40:10,248-Speed 5423.48 samples/sec Loss 10.9866 LearningRate 0.2640 Epoch: 2 Global Step: 22570 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:40:17,777-Speed 5441.36 samples/sec Loss 10.9914 LearningRate 0.2640 Epoch: 2 Global Step: 22580 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:40:25,194-Speed 5522.97 samples/sec Loss 11.0742 LearningRate 0.2639 Epoch: 2 Global Step: 22590 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:40:32,788-Speed 5394.63 samples/sec Loss 10.9671 LearningRate 0.2639 Epoch: 2 Global Step: 22600 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:40:40,224-Speed 5508.56 samples/sec Loss 10.9751 LearningRate 0.2639 Epoch: 2 Global Step: 22610 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:40:47,771-Speed 5428.04 samples/sec Loss 10.9171 LearningRate 0.2638 Epoch: 2 Global Step: 22620 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:40:55,393-Speed 5375.30 samples/sec Loss 11.0116 LearningRate 0.2638 Epoch: 2 Global Step: 22630 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:41:02,926-Speed 5437.90 samples/sec Loss 10.8609 LearningRate 0.2638 Epoch: 2 Global Step: 22640 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:41:10,383-Speed 5493.24 samples/sec Loss 10.9325 LearningRate 0.2638 Epoch: 2 Global Step: 22650 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:41:17,833-Speed 5499.32 samples/sec Loss 10.9016 LearningRate 0.2637 Epoch: 2 Global Step: 22660 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:41:25,378-Speed 5429.83 samples/sec Loss 10.9801 LearningRate 0.2637 Epoch: 2 Global Step: 22670 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:41:32,828-Speed 5498.41 samples/sec Loss 10.9169 LearningRate 0.2637 Epoch: 2 Global Step: 22680 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:41:40,384-Speed 5421.22 samples/sec Loss 10.9366 LearningRate 0.2636 Epoch: 2 Global Step: 22690 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:41:47,898-Speed 5452.43 samples/sec Loss 10.9330 LearningRate 0.2636 Epoch: 2 Global Step: 22700 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:41:55,572-Speed 5338.49 samples/sec Loss 10.9402 LearningRate 0.2636 Epoch: 2 Global Step: 22710 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:42:03,194-Speed 5374.89 samples/sec Loss 11.0319 LearningRate 0.2636 Epoch: 2 Global Step: 22720 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:42:10,890-Speed 5322.04 samples/sec Loss 10.9252 LearningRate 0.2635 Epoch: 2 Global Step: 22730 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:42:18,379-Speed 5470.74 samples/sec Loss 10.9631 LearningRate 0.2635 Epoch: 2 Global Step: 22740 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:42:26,049-Speed 5340.66 samples/sec Loss 10.9774 LearningRate 0.2635 Epoch: 2 Global Step: 22750 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:42:33,771-Speed 5305.17 samples/sec Loss 11.0116 LearningRate 0.2634 Epoch: 2 Global Step: 22760 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:42:41,294-Speed 5445.06 samples/sec Loss 10.9469 LearningRate 0.2634 Epoch: 2 Global Step: 22770 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:42:48,917-Speed 5373.85 samples/sec Loss 10.9240 LearningRate 0.2634 Epoch: 2 Global Step: 22780 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:42:56,563-Speed 5357.71 samples/sec Loss 10.9211 LearningRate 0.2634 Epoch: 2 Global Step: 22790 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:43:04,053-Speed 5469.39 samples/sec Loss 11.0130 LearningRate 0.2633 Epoch: 2 Global Step: 22800 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:43:11,715-Speed 5346.44 samples/sec Loss 10.9615 LearningRate 0.2633 Epoch: 2 Global Step: 22810 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:43:19,190-Speed 5480.19 samples/sec Loss 10.9356 LearningRate 0.2633 Epoch: 2 Global Step: 22820 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:43:26,722-Speed 5438.69 samples/sec Loss 10.9259 LearningRate 0.2633 Epoch: 2 Global Step: 22830 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:43:34,276-Speed 5423.57 samples/sec Loss 10.9806 LearningRate 0.2632 Epoch: 2 Global Step: 22840 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:43:41,783-Speed 5457.04 samples/sec Loss 10.9104 LearningRate 0.2632 Epoch: 2 Global Step: 22850 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:43:49,368-Speed 5400.36 samples/sec Loss 10.9297 LearningRate 0.2632 Epoch: 2 Global Step: 22860 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:43:56,904-Speed 5435.81 samples/sec Loss 10.9661 LearningRate 0.2631 Epoch: 2 Global Step: 22870 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:44:04,614-Speed 5313.26 samples/sec Loss 10.9679 LearningRate 0.2631 Epoch: 2 Global Step: 22880 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:44:12,101-Speed 5471.46 samples/sec Loss 10.9290 LearningRate 0.2631 Epoch: 2 Global Step: 22890 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:44:19,631-Speed 5440.63 samples/sec Loss 10.9289 LearningRate 0.2631 Epoch: 2 Global Step: 22900 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:44:27,136-Speed 5457.88 samples/sec Loss 10.8642 LearningRate 0.2630 Epoch: 2 Global Step: 22910 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:44:34,585-Speed 5499.68 samples/sec Loss 10.8623 LearningRate 0.2630 Epoch: 2 Global Step: 22920 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:44:42,364-Speed 5265.94 samples/sec Loss 10.8893 LearningRate 0.2630 Epoch: 2 Global Step: 22930 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:44:49,906-Speed 5432.04 samples/sec Loss 10.9993 LearningRate 0.2629 Epoch: 2 Global Step: 22940 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:44:57,376-Speed 5483.77 samples/sec Loss 10.8969 LearningRate 0.2629 Epoch: 2 Global Step: 22950 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:45:04,949-Speed 5409.45 samples/sec Loss 10.9269 LearningRate 0.2629 Epoch: 2 Global Step: 22960 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:45:12,462-Speed 5452.36 samples/sec Loss 10.9379 LearningRate 0.2629 Epoch: 2 Global Step: 22970 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:45:19,926-Speed 5488.09 samples/sec Loss 10.9238 LearningRate 0.2628 Epoch: 2 Global Step: 22980 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:45:27,407-Speed 5476.04 samples/sec Loss 10.9306 LearningRate 0.2628 Epoch: 2 Global Step: 22990 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:45:34,927-Speed 5447.20 samples/sec Loss 10.8749 LearningRate 0.2628 Epoch: 2 Global Step: 23000 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:45:42,493-Speed 5414.66 samples/sec Loss 10.9072 LearningRate 0.2627 Epoch: 2 Global Step: 23010 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:45:50,060-Speed 5413.65 samples/sec Loss 10.9318 LearningRate 0.2627 Epoch: 2 Global Step: 23020 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:45:57,563-Speed 5459.68 samples/sec Loss 10.9286 LearningRate 0.2627 Epoch: 2 Global Step: 23030 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:46:05,107-Speed 5430.12 samples/sec Loss 10.9701 LearningRate 0.2627 Epoch: 2 Global Step: 23040 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:46:12,590-Speed 5474.88 samples/sec Loss 10.9153 LearningRate 0.2626 Epoch: 2 Global Step: 23050 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:46:20,064-Speed 5481.07 samples/sec Loss 10.8952 LearningRate 0.2626 Epoch: 2 Global Step: 23060 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:46:27,587-Speed 5445.18 samples/sec Loss 10.8905 LearningRate 0.2626 Epoch: 2 Global Step: 23070 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:46:35,141-Speed 5422.80 samples/sec Loss 10.8937 LearningRate 0.2625 Epoch: 2 Global Step: 23080 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:46:42,836-Speed 5324.21 samples/sec Loss 10.8919 LearningRate 0.2625 Epoch: 2 Global Step: 23090 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:46:50,679-Speed 5222.85 samples/sec Loss 10.9573 LearningRate 0.2625 Epoch: 2 Global Step: 23100 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:46:58,316-Speed 5363.92 samples/sec Loss 10.9971 LearningRate 0.2625 Epoch: 2 Global Step: 23110 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:47:06,022-Speed 5315.55 samples/sec Loss 10.8642 LearningRate 0.2624 Epoch: 2 Global Step: 23120 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:47:13,470-Speed 5500.59 samples/sec Loss 10.9687 LearningRate 0.2624 Epoch: 2 Global Step: 23130 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:47:20,999-Speed 5440.80 samples/sec Loss 10.8887 LearningRate 0.2624 Epoch: 2 Global Step: 23140 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:47:28,487-Speed 5470.61 samples/sec Loss 10.9554 LearningRate 0.2623 Epoch: 2 Global Step: 23150 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:47:36,212-Speed 5305.72 samples/sec Loss 10.9755 LearningRate 0.2623 Epoch: 2 Global Step: 23160 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:47:43,871-Speed 5348.97 samples/sec Loss 10.9947 LearningRate 0.2623 Epoch: 2 Global Step: 23170 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:47:51,427-Speed 5421.59 samples/sec Loss 10.9189 LearningRate 0.2623 Epoch: 2 Global Step: 23180 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:47:59,013-Speed 5400.13 samples/sec Loss 10.8826 LearningRate 0.2622 Epoch: 2 Global Step: 23190 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:48:06,691-Speed 5335.17 samples/sec Loss 10.9500 LearningRate 0.2622 Epoch: 2 Global Step: 23200 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:48:14,351-Speed 5348.30 samples/sec Loss 10.9164 LearningRate 0.2622 Epoch: 2 Global Step: 23210 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:48:21,786-Speed 5510.14 samples/sec Loss 10.9691 LearningRate 0.2621 Epoch: 2 Global Step: 23220 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:48:29,224-Speed 5507.07 samples/sec Loss 10.9575 LearningRate 0.2621 Epoch: 2 Global Step: 23230 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:48:36,718-Speed 5467.03 samples/sec Loss 10.9581 LearningRate 0.2621 Epoch: 2 Global Step: 23240 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:48:44,304-Speed 5399.63 samples/sec Loss 10.8268 LearningRate 0.2621 Epoch: 2 Global Step: 23250 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:48:51,750-Speed 5502.25 samples/sec Loss 10.9373 LearningRate 0.2620 Epoch: 2 Global Step: 23260 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:48:59,326-Speed 5407.38 samples/sec Loss 10.8933 LearningRate 0.2620 Epoch: 2 Global Step: 23270 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:49:06,872-Speed 5428.27 samples/sec Loss 10.8756 LearningRate 0.2620 Epoch: 2 Global Step: 23280 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:49:14,355-Speed 5475.05 samples/sec Loss 10.9227 LearningRate 0.2619 Epoch: 2 Global Step: 23290 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:49:21,880-Speed 5443.38 samples/sec Loss 10.9125 LearningRate 0.2619 Epoch: 2 Global Step: 23300 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:49:29,355-Speed 5480.46 samples/sec Loss 10.8577 LearningRate 0.2619 Epoch: 2 Global Step: 23310 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:49:36,940-Speed 5400.69 samples/sec Loss 10.8912 LearningRate 0.2619 Epoch: 2 Global Step: 23320 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:49:44,439-Speed 5463.21 samples/sec Loss 10.9078 LearningRate 0.2618 Epoch: 2 Global Step: 23330 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:49:51,975-Speed 5435.51 samples/sec Loss 10.8832 LearningRate 0.2618 Epoch: 2 Global Step: 23340 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:49:59,516-Speed 5433.03 samples/sec Loss 10.9118 LearningRate 0.2618 Epoch: 2 Global Step: 23350 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:50:07,027-Speed 5453.86 samples/sec Loss 10.8319 LearningRate 0.2617 Epoch: 2 Global Step: 23360 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:50:14,540-Speed 5452.89 samples/sec Loss 10.9091 LearningRate 0.2617 Epoch: 2 Global Step: 23370 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:50:22,045-Speed 5458.53 samples/sec Loss 10.8906 LearningRate 0.2617 Epoch: 2 Global Step: 23380 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:50:29,563-Speed 5449.00 samples/sec Loss 10.9110 LearningRate 0.2617 Epoch: 2 Global Step: 23390 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:50:37,088-Speed 5444.10 samples/sec Loss 10.8160 LearningRate 0.2616 Epoch: 2 Global Step: 23400 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:50:44,641-Speed 5423.85 samples/sec Loss 10.8590 LearningRate 0.2616 Epoch: 2 Global Step: 23410 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:50:52,127-Speed 5472.44 samples/sec Loss 10.8333 LearningRate 0.2616 Epoch: 2 Global Step: 23420 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:50:59,663-Speed 5435.60 samples/sec Loss 10.8470 LearningRate 0.2615 Epoch: 2 Global Step: 23430 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:51:07,213-Speed 5426.03 samples/sec Loss 10.8837 LearningRate 0.2615 Epoch: 2 Global Step: 23440 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:51:14,725-Speed 5453.17 samples/sec Loss 10.8957 LearningRate 0.2615 Epoch: 2 Global Step: 23450 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:51:22,271-Speed 5429.32 samples/sec Loss 10.8990 LearningRate 0.2615 Epoch: 2 Global Step: 23460 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:51:29,791-Speed 5447.73 samples/sec Loss 10.8892 LearningRate 0.2614 Epoch: 2 Global Step: 23470 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:51:37,294-Speed 5460.07 samples/sec Loss 10.8617 LearningRate 0.2614 Epoch: 2 Global Step: 23480 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:51:44,831-Speed 5434.70 samples/sec Loss 10.9155 LearningRate 0.2614 Epoch: 2 Global Step: 23490 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:51:52,383-Speed 5424.30 samples/sec Loss 10.9616 LearningRate 0.2613 Epoch: 2 Global Step: 23500 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:51:59,932-Speed 5427.24 samples/sec Loss 10.9183 LearningRate 0.2613 Epoch: 2 Global Step: 23510 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:52:07,443-Speed 5453.73 samples/sec Loss 10.9456 LearningRate 0.2613 Epoch: 2 Global Step: 23520 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:52:15,007-Speed 5415.50 samples/sec Loss 10.9071 LearningRate 0.2613 Epoch: 2 Global Step: 23530 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:52:22,568-Speed 5418.10 samples/sec Loss 10.9275 LearningRate 0.2612 Epoch: 2 Global Step: 23540 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:52:30,111-Speed 5430.98 samples/sec Loss 10.8874 LearningRate 0.2612 Epoch: 2 Global Step: 23550 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:52:37,548-Speed 5508.76 samples/sec Loss 10.9061 LearningRate 0.2612 Epoch: 2 Global Step: 23560 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:52:45,159-Speed 5382.24 samples/sec Loss 10.9090 LearningRate 0.2611 Epoch: 2 Global Step: 23570 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:52:52,599-Speed 5506.34 samples/sec Loss 10.9133 LearningRate 0.2611 Epoch: 2 Global Step: 23580 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:53:00,245-Speed 5357.76 samples/sec Loss 10.9079 LearningRate 0.2611 Epoch: 2 Global Step: 23590 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:53:07,760-Speed 5450.96 samples/sec Loss 10.8680 LearningRate 0.2611 Epoch: 2 Global Step: 23600 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:53:15,316-Speed 5421.84 samples/sec Loss 10.9124 LearningRate 0.2610 Epoch: 2 Global Step: 23610 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:53:22,876-Speed 5418.87 samples/sec Loss 10.8319 LearningRate 0.2610 Epoch: 2 Global Step: 23620 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:53:30,438-Speed 5417.22 samples/sec Loss 10.8862 LearningRate 0.2610 Epoch: 2 Global Step: 23630 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:53:38,054-Speed 5378.92 samples/sec Loss 10.8212 LearningRate 0.2609 Epoch: 2 Global Step: 23640 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:53:45,573-Speed 5448.38 samples/sec Loss 10.7935 LearningRate 0.2609 Epoch: 2 Global Step: 23650 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:53:53,057-Speed 5473.69 samples/sec Loss 10.8826 LearningRate 0.2609 Epoch: 2 Global Step: 23660 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:54:00,686-Speed 5369.82 samples/sec Loss 10.8299 LearningRate 0.2609 Epoch: 2 Global Step: 23670 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:54:08,119-Speed 5511.83 samples/sec Loss 10.8574 LearningRate 0.2608 Epoch: 2 Global Step: 23680 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:54:15,729-Speed 5382.97 samples/sec Loss 10.8556 LearningRate 0.2608 Epoch: 2 Global Step: 23690 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:54:23,379-Speed 5354.35 samples/sec Loss 10.8858 LearningRate 0.2608 Epoch: 2 Global Step: 23700 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:54:30,847-Speed 5486.08 samples/sec Loss 10.8630 LearningRate 0.2607 Epoch: 2 Global Step: 23710 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:54:38,448-Speed 5389.53 samples/sec Loss 10.9198 LearningRate 0.2607 Epoch: 2 Global Step: 23720 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:54:46,003-Speed 5422.27 samples/sec Loss 10.9055 LearningRate 0.2607 Epoch: 2 Global Step: 23730 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:54:53,649-Speed 5357.66 samples/sec Loss 10.8114 LearningRate 0.2607 Epoch: 2 Global Step: 23740 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:55:01,140-Speed 5469.51 samples/sec Loss 10.9123 LearningRate 0.2606 Epoch: 2 Global Step: 23750 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:55:08,709-Speed 5412.30 samples/sec Loss 10.8722 LearningRate 0.2606 Epoch: 2 Global Step: 23760 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:55:16,150-Speed 5505.44 samples/sec Loss 10.8524 LearningRate 0.2606 Epoch: 2 Global Step: 23770 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:55:23,801-Speed 5353.97 samples/sec Loss 10.8601 LearningRate 0.2605 Epoch: 2 Global Step: 23780 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:55:31,272-Speed 5483.67 samples/sec Loss 10.9125 LearningRate 0.2605 Epoch: 2 Global Step: 23790 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:55:38,869-Speed 5392.56 samples/sec Loss 10.8283 LearningRate 0.2605 Epoch: 2 Global Step: 23800 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:55:46,470-Speed 5389.24 samples/sec Loss 10.8082 LearningRate 0.2605 Epoch: 2 Global Step: 23810 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-07 23:55:53,973-Speed 5459.43 samples/sec Loss 10.7511 LearningRate 0.2604 Epoch: 2 Global Step: 23820 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:56:01,546-Speed 5409.69 samples/sec Loss 10.8506 LearningRate 0.2604 Epoch: 2 Global Step: 23830 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:56:09,117-Speed 5411.21 samples/sec Loss 10.8297 LearningRate 0.2604 Epoch: 2 Global Step: 23840 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:56:16,752-Speed 5365.87 samples/sec Loss 10.8390 LearningRate 0.2603 Epoch: 2 Global Step: 23850 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:56:24,226-Speed 5480.26 samples/sec Loss 10.7964 LearningRate 0.2603 Epoch: 2 Global Step: 23860 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:56:31,758-Speed 5439.52 samples/sec Loss 10.8455 LearningRate 0.2603 Epoch: 2 Global Step: 23870 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:56:39,377-Speed 5376.70 samples/sec Loss 10.7930 LearningRate 0.2603 Epoch: 2 Global Step: 23880 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:56:46,961-Speed 5401.34 samples/sec Loss 10.7879 LearningRate 0.2602 Epoch: 2 Global Step: 23890 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:56:54,538-Speed 5406.22 samples/sec Loss 10.7936 LearningRate 0.2602 Epoch: 2 Global Step: 23900 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:57:02,176-Speed 5363.73 samples/sec Loss 10.7967 LearningRate 0.2602 Epoch: 2 Global Step: 23910 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:57:09,630-Speed 5495.69 samples/sec Loss 10.8692 LearningRate 0.2601 Epoch: 2 Global Step: 23920 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:57:17,080-Speed 5498.83 samples/sec Loss 10.7783 LearningRate 0.2601 Epoch: 2 Global Step: 23930 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:57:24,619-Speed 5433.59 samples/sec Loss 10.8510 LearningRate 0.2601 Epoch: 2 Global Step: 23940 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:57:32,108-Speed 5470.34 samples/sec Loss 10.8989 LearningRate 0.2601 Epoch: 2 Global Step: 23950 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-07 23:57:39,580-Speed 5482.38 samples/sec Loss 10.8338 LearningRate 0.2600 Epoch: 2 Global Step: 23960 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:57:47,360-Speed 5265.78 samples/sec Loss 10.8868 LearningRate 0.2600 Epoch: 2 Global Step: 23970 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:57:54,801-Speed 5505.02 samples/sec Loss 10.8580 LearningRate 0.2600 Epoch: 2 Global Step: 23980 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:58:02,308-Speed 5457.15 samples/sec Loss 10.8328 LearningRate 0.2600 Epoch: 2 Global Step: 23990 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:58:09,872-Speed 5415.64 samples/sec Loss 10.8889 LearningRate 0.2599 Epoch: 2 Global Step: 24000 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-07 23:58:54,009-[lfw][24000]XNorm: 22.602196 Training: 2022-01-07 23:58:54,010-[lfw][24000]Accuracy-Flip: 0.99767+-0.00291 Training: 2022-01-07 23:58:54,010-[lfw][24000]Accuracy-Highest: 0.99767 Training: 2022-01-07 23:59:46,833-[cfp_fp][24000]XNorm: 20.366085 Training: 2022-01-07 23:59:46,834-[cfp_fp][24000]Accuracy-Flip: 0.97686+-0.00758 Training: 2022-01-07 23:59:46,835-[cfp_fp][24000]Accuracy-Highest: 0.97986 Training: 2022-01-08 00:00:32,440-[agedb_30][24000]XNorm: 22.227586 Training: 2022-01-08 00:00:32,442-[agedb_30][24000]Accuracy-Flip: 0.96317+-0.00899 Training: 2022-01-08 00:00:32,442-[agedb_30][24000]Accuracy-Highest: 0.96883 Training: 2022-01-08 00:00:40,114-Speed 272.63 samples/sec Loss 11.0215 LearningRate 0.2599 Epoch: 2 Global Step: 24010 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:00:47,580-Speed 5488.21 samples/sec Loss 10.8469 LearningRate 0.2599 Epoch: 2 Global Step: 24020 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:00:55,178-Speed 5392.04 samples/sec Loss 10.8725 LearningRate 0.2598 Epoch: 2 Global Step: 24030 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:01:02,867-Speed 5327.87 samples/sec Loss 10.8117 LearningRate 0.2598 Epoch: 2 Global Step: 24040 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:01:10,354-Speed 5472.29 samples/sec Loss 10.8697 LearningRate 0.2598 Epoch: 2 Global Step: 24050 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:01:18,099-Speed 5289.30 samples/sec Loss 10.8008 LearningRate 0.2598 Epoch: 2 Global Step: 24060 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:01:25,570-Speed 5483.07 samples/sec Loss 10.8267 LearningRate 0.2597 Epoch: 2 Global Step: 24070 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:01:33,214-Speed 5359.05 samples/sec Loss 10.7373 LearningRate 0.2597 Epoch: 2 Global Step: 24080 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:01:40,728-Speed 5452.08 samples/sec Loss 10.8084 LearningRate 0.2597 Epoch: 2 Global Step: 24090 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:01:48,401-Speed 5338.65 samples/sec Loss 10.8917 LearningRate 0.2596 Epoch: 2 Global Step: 24100 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:01:56,016-Speed 5379.92 samples/sec Loss 10.7756 LearningRate 0.2596 Epoch: 2 Global Step: 24110 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:02:03,624-Speed 5384.28 samples/sec Loss 10.7695 LearningRate 0.2596 Epoch: 2 Global Step: 24120 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:02:11,174-Speed 5425.96 samples/sec Loss 10.7851 LearningRate 0.2596 Epoch: 2 Global Step: 24130 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:02:18,655-Speed 5475.67 samples/sec Loss 10.9079 LearningRate 0.2595 Epoch: 2 Global Step: 24140 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:02:26,239-Speed 5401.93 samples/sec Loss 10.7184 LearningRate 0.2595 Epoch: 2 Global Step: 24150 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:02:33,631-Speed 5541.90 samples/sec Loss 10.7375 LearningRate 0.2595 Epoch: 2 Global Step: 24160 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:02:41,146-Speed 5450.97 samples/sec Loss 10.7992 LearningRate 0.2594 Epoch: 2 Global Step: 24170 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:02:48,571-Speed 5516.94 samples/sec Loss 10.8045 LearningRate 0.2594 Epoch: 2 Global Step: 24180 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:02:56,131-Speed 5418.45 samples/sec Loss 10.6832 LearningRate 0.2594 Epoch: 2 Global Step: 24190 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:03:03,626-Speed 5465.91 samples/sec Loss 10.7711 LearningRate 0.2594 Epoch: 2 Global Step: 24200 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:03:11,239-Speed 5380.89 samples/sec Loss 10.8733 LearningRate 0.2593 Epoch: 2 Global Step: 24210 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:03:18,706-Speed 5486.01 samples/sec Loss 10.7693 LearningRate 0.2593 Epoch: 2 Global Step: 24220 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:03:26,252-Speed 5428.71 samples/sec Loss 10.7619 LearningRate 0.2593 Epoch: 2 Global Step: 24230 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:03:33,817-Speed 5415.21 samples/sec Loss 10.7570 LearningRate 0.2592 Epoch: 2 Global Step: 24240 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:03:41,358-Speed 5432.12 samples/sec Loss 10.7545 LearningRate 0.2592 Epoch: 2 Global Step: 24250 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:03:48,957-Speed 5391.10 samples/sec Loss 10.8631 LearningRate 0.2592 Epoch: 2 Global Step: 24260 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:03:56,526-Speed 5412.64 samples/sec Loss 10.7618 LearningRate 0.2592 Epoch: 2 Global Step: 24270 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:04:04,083-Speed 5420.67 samples/sec Loss 10.7847 LearningRate 0.2591 Epoch: 2 Global Step: 24280 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:04:11,587-Speed 5459.22 samples/sec Loss 10.7763 LearningRate 0.2591 Epoch: 2 Global Step: 24290 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:04:19,135-Speed 5427.25 samples/sec Loss 10.6942 LearningRate 0.2591 Epoch: 2 Global Step: 24300 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:04:26,682-Speed 5428.39 samples/sec Loss 10.8350 LearningRate 0.2590 Epoch: 2 Global Step: 24310 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:04:34,221-Speed 5433.92 samples/sec Loss 10.7808 LearningRate 0.2590 Epoch: 2 Global Step: 24320 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:04:41,831-Speed 5383.22 samples/sec Loss 10.7812 LearningRate 0.2590 Epoch: 2 Global Step: 24330 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:04:49,555-Speed 5303.71 samples/sec Loss 10.7787 LearningRate 0.2590 Epoch: 2 Global Step: 24340 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:04:56,966-Speed 5527.60 samples/sec Loss 10.7380 LearningRate 0.2589 Epoch: 2 Global Step: 24350 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:05:04,425-Speed 5492.63 samples/sec Loss 10.7653 LearningRate 0.2589 Epoch: 2 Global Step: 24360 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:05:11,906-Speed 5475.15 samples/sec Loss 10.8423 LearningRate 0.2589 Epoch: 2 Global Step: 24370 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:05:19,366-Speed 5491.64 samples/sec Loss 10.8156 LearningRate 0.2588 Epoch: 2 Global Step: 24380 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:05:26,961-Speed 5394.32 samples/sec Loss 10.8187 LearningRate 0.2588 Epoch: 2 Global Step: 24390 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:05:34,712-Speed 5285.38 samples/sec Loss 10.7216 LearningRate 0.2588 Epoch: 2 Global Step: 24400 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:05:42,151-Speed 5506.64 samples/sec Loss 10.8223 LearningRate 0.2588 Epoch: 2 Global Step: 24410 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:05:49,663-Speed 5453.10 samples/sec Loss 10.8936 LearningRate 0.2587 Epoch: 2 Global Step: 24420 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:05:57,256-Speed 5395.51 samples/sec Loss 10.7867 LearningRate 0.2587 Epoch: 2 Global Step: 24430 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:06:04,809-Speed 5423.42 samples/sec Loss 10.7342 LearningRate 0.2587 Epoch: 2 Global Step: 24440 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:06:12,278-Speed 5484.79 samples/sec Loss 10.8070 LearningRate 0.2586 Epoch: 2 Global Step: 24450 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:06:19,870-Speed 5395.72 samples/sec Loss 10.7298 LearningRate 0.2586 Epoch: 2 Global Step: 24460 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:06:27,383-Speed 5453.34 samples/sec Loss 10.8079 LearningRate 0.2586 Epoch: 2 Global Step: 24470 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:06:34,901-Speed 5448.79 samples/sec Loss 10.7596 LearningRate 0.2586 Epoch: 2 Global Step: 24480 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:06:42,328-Speed 5515.12 samples/sec Loss 10.7342 LearningRate 0.2585 Epoch: 2 Global Step: 24490 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:06:49,868-Speed 5433.69 samples/sec Loss 10.7985 LearningRate 0.2585 Epoch: 2 Global Step: 24500 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:06:57,333-Speed 5487.78 samples/sec Loss 10.7809 LearningRate 0.2585 Epoch: 2 Global Step: 24510 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:07:04,810-Speed 5478.65 samples/sec Loss 10.7132 LearningRate 0.2585 Epoch: 2 Global Step: 24520 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:07:12,395-Speed 5400.87 samples/sec Loss 10.7175 LearningRate 0.2584 Epoch: 2 Global Step: 24530 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:07:20,072-Speed 5336.45 samples/sec Loss 10.7267 LearningRate 0.2584 Epoch: 2 Global Step: 24540 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:07:27,585-Speed 5452.42 samples/sec Loss 10.7782 LearningRate 0.2584 Epoch: 2 Global Step: 24550 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:07:35,047-Speed 5489.78 samples/sec Loss 10.8256 LearningRate 0.2583 Epoch: 2 Global Step: 24560 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:07:42,541-Speed 5466.48 samples/sec Loss 10.7784 LearningRate 0.2583 Epoch: 2 Global Step: 24570 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:07:50,021-Speed 5476.85 samples/sec Loss 10.7798 LearningRate 0.2583 Epoch: 2 Global Step: 24580 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:07:57,510-Speed 5470.36 samples/sec Loss 10.8037 LearningRate 0.2583 Epoch: 2 Global Step: 24590 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:08:04,953-Speed 5503.72 samples/sec Loss 10.7041 LearningRate 0.2582 Epoch: 2 Global Step: 24600 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:08:12,396-Speed 5504.03 samples/sec Loss 10.8055 LearningRate 0.2582 Epoch: 2 Global Step: 24610 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:08:19,852-Speed 5494.07 samples/sec Loss 10.7371 LearningRate 0.2582 Epoch: 2 Global Step: 24620 Fp16 Grad Scale: 262144 Required: 42 hours Training: 2022-01-08 00:08:27,543-Speed 5326.84 samples/sec Loss 10.7543 LearningRate 0.2581 Epoch: 2 Global Step: 24630 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:08:35,153-Speed 5383.50 samples/sec Loss 10.7778 LearningRate 0.2581 Epoch: 2 Global Step: 24640 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:08:42,567-Speed 5525.20 samples/sec Loss 10.7110 LearningRate 0.2581 Epoch: 2 Global Step: 24650 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:08:49,976-Speed 5529.13 samples/sec Loss 10.8396 LearningRate 0.2581 Epoch: 2 Global Step: 24660 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:08:57,546-Speed 5411.63 samples/sec Loss 10.6818 LearningRate 0.2580 Epoch: 2 Global Step: 24670 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:09:05,060-Speed 5451.73 samples/sec Loss 10.7934 LearningRate 0.2580 Epoch: 2 Global Step: 24680 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:09:12,574-Speed 5451.46 samples/sec Loss 10.7502 LearningRate 0.2580 Epoch: 2 Global Step: 24690 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:09:20,055-Speed 5476.49 samples/sec Loss 10.7813 LearningRate 0.2579 Epoch: 2 Global Step: 24700 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:09:27,619-Speed 5415.92 samples/sec Loss 10.7045 LearningRate 0.2579 Epoch: 2 Global Step: 24710 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:09:35,173-Speed 5422.79 samples/sec Loss 10.6669 LearningRate 0.2579 Epoch: 2 Global Step: 24720 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:09:42,651-Speed 5477.69 samples/sec Loss 10.7079 LearningRate 0.2579 Epoch: 2 Global Step: 24730 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:09:50,180-Speed 5441.84 samples/sec Loss 10.7638 LearningRate 0.2578 Epoch: 2 Global Step: 24740 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:09:57,756-Speed 5406.96 samples/sec Loss 10.7256 LearningRate 0.2578 Epoch: 2 Global Step: 24750 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:10:05,198-Speed 5505.28 samples/sec Loss 10.7087 LearningRate 0.2578 Epoch: 2 Global Step: 24760 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:10:12,716-Speed 5448.61 samples/sec Loss 10.7508 LearningRate 0.2577 Epoch: 2 Global Step: 24770 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:10:20,192-Speed 5480.16 samples/sec Loss 10.6798 LearningRate 0.2577 Epoch: 2 Global Step: 24780 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:10:28,007-Speed 5241.80 samples/sec Loss 10.7618 LearningRate 0.2577 Epoch: 2 Global Step: 24790 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:10:35,697-Speed 5326.96 samples/sec Loss 10.7420 LearningRate 0.2577 Epoch: 2 Global Step: 24800 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:10:43,202-Speed 5458.72 samples/sec Loss 10.8296 LearningRate 0.2576 Epoch: 2 Global Step: 24810 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:10:50,682-Speed 5476.63 samples/sec Loss 10.7879 LearningRate 0.2576 Epoch: 2 Global Step: 24820 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:10:58,261-Speed 5404.92 samples/sec Loss 10.8035 LearningRate 0.2576 Epoch: 2 Global Step: 24830 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:11:05,787-Speed 5443.55 samples/sec Loss 10.7567 LearningRate 0.2575 Epoch: 2 Global Step: 24840 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:11:13,296-Speed 5454.96 samples/sec Loss 10.7395 LearningRate 0.2575 Epoch: 2 Global Step: 24850 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:11:20,921-Speed 5372.87 samples/sec Loss 10.7476 LearningRate 0.2575 Epoch: 2 Global Step: 24860 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:11:28,420-Speed 5462.68 samples/sec Loss 10.7198 LearningRate 0.2575 Epoch: 2 Global Step: 24870 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:11:35,995-Speed 5407.95 samples/sec Loss 10.7279 LearningRate 0.2574 Epoch: 2 Global Step: 24880 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:11:43,478-Speed 5474.16 samples/sec Loss 10.7504 LearningRate 0.2574 Epoch: 2 Global Step: 24890 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:11:51,087-Speed 5384.38 samples/sec Loss 10.7668 LearningRate 0.2574 Epoch: 2 Global Step: 24900 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:11:58,588-Speed 5460.64 samples/sec Loss 10.6671 LearningRate 0.2573 Epoch: 2 Global Step: 24910 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:12:06,147-Speed 5419.57 samples/sec Loss 10.7265 LearningRate 0.2573 Epoch: 2 Global Step: 24920 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:12:13,594-Speed 5501.45 samples/sec Loss 10.7083 LearningRate 0.2573 Epoch: 2 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:12:21,106-Speed 5452.88 samples/sec Loss 10.5801 LearningRate 0.2573 Epoch: 2 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:12:28,615-Speed 5455.54 samples/sec Loss 10.7061 LearningRate 0.2572 Epoch: 2 Global Step: 24950 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:12:36,199-Speed 5401.61 samples/sec Loss 10.6993 LearningRate 0.2572 Epoch: 2 Global Step: 24960 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:12:43,681-Speed 5475.39 samples/sec Loss 10.7028 LearningRate 0.2572 Epoch: 2 Global Step: 24970 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:12:51,162-Speed 5475.65 samples/sec Loss 10.7615 LearningRate 0.2572 Epoch: 2 Global Step: 24980 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:12:58,607-Speed 5502.82 samples/sec Loss 10.6828 LearningRate 0.2571 Epoch: 2 Global Step: 24990 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:13:06,224-Speed 5377.54 samples/sec Loss 10.6865 LearningRate 0.2571 Epoch: 2 Global Step: 25000 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:13:13,712-Speed 5471.48 samples/sec Loss 10.7378 LearningRate 0.2571 Epoch: 2 Global Step: 25010 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:13:21,243-Speed 5439.91 samples/sec Loss 10.7041 LearningRate 0.2570 Epoch: 2 Global Step: 25020 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:13:28,717-Speed 5480.52 samples/sec Loss 10.7854 LearningRate 0.2570 Epoch: 2 Global Step: 25030 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:13:36,257-Speed 5433.19 samples/sec Loss 10.7143 LearningRate 0.2570 Epoch: 2 Global Step: 25040 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:13:43,716-Speed 5491.92 samples/sec Loss 10.6866 LearningRate 0.2570 Epoch: 2 Global Step: 25050 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:13:51,265-Speed 5427.29 samples/sec Loss 10.7107 LearningRate 0.2569 Epoch: 2 Global Step: 25060 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:13:58,695-Speed 5512.78 samples/sec Loss 10.7264 LearningRate 0.2569 Epoch: 2 Global Step: 25070 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:14:06,145-Speed 5499.01 samples/sec Loss 10.6787 LearningRate 0.2569 Epoch: 2 Global Step: 25080 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:14:13,669-Speed 5444.65 samples/sec Loss 10.7204 LearningRate 0.2568 Epoch: 2 Global Step: 25090 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:14:21,193-Speed 5444.88 samples/sec Loss 10.7606 LearningRate 0.2568 Epoch: 2 Global Step: 25100 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:14:28,743-Speed 5425.72 samples/sec Loss 10.7079 LearningRate 0.2568 Epoch: 2 Global Step: 25110 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:14:36,155-Speed 5526.80 samples/sec Loss 10.6998 LearningRate 0.2568 Epoch: 2 Global Step: 25120 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:14:43,626-Speed 5483.39 samples/sec Loss 10.6160 LearningRate 0.2567 Epoch: 2 Global Step: 25130 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:14:51,226-Speed 5390.82 samples/sec Loss 10.6978 LearningRate 0.2567 Epoch: 2 Global Step: 25140 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:14:58,726-Speed 5462.05 samples/sec Loss 10.6868 LearningRate 0.2567 Epoch: 2 Global Step: 25150 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:15:06,263-Speed 5434.93 samples/sec Loss 10.7819 LearningRate 0.2566 Epoch: 2 Global Step: 25160 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:15:13,758-Speed 5466.34 samples/sec Loss 10.6963 LearningRate 0.2566 Epoch: 2 Global Step: 25170 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:15:21,246-Speed 5470.74 samples/sec Loss 10.7403 LearningRate 0.2566 Epoch: 2 Global Step: 25180 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-01-08 00:15:28,718-Speed 5482.01 samples/sec Loss 10.7132 LearningRate 0.2566 Epoch: 2 Global Step: 25190 Fp16 Grad Scale: 131072 Required: 42 hours Training: 2022-01-08 00:15:36,222-Speed 5459.27 samples/sec Loss 10.6238 LearningRate 0.2565 Epoch: 2 Global Step: 25200 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:15:43,814-Speed 5396.36 samples/sec Loss 10.7492 LearningRate 0.2565 Epoch: 2 Global Step: 25210 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:15:51,211-Speed 5537.88 samples/sec Loss 10.7011 LearningRate 0.2565 Epoch: 2 Global Step: 25220 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:15:58,773-Speed 5417.42 samples/sec Loss 10.6968 LearningRate 0.2564 Epoch: 2 Global Step: 25230 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:16:06,277-Speed 5458.91 samples/sec Loss 10.6760 LearningRate 0.2564 Epoch: 2 Global Step: 25240 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:16:13,850-Speed 5410.06 samples/sec Loss 10.6837 LearningRate 0.2564 Epoch: 2 Global Step: 25250 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:16:21,420-Speed 5411.71 samples/sec Loss 10.7154 LearningRate 0.2564 Epoch: 2 Global Step: 25260 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:16:28,986-Speed 5414.23 samples/sec Loss 10.6453 LearningRate 0.2563 Epoch: 2 Global Step: 25270 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:16:36,691-Speed 5316.87 samples/sec Loss 10.6472 LearningRate 0.2563 Epoch: 2 Global Step: 25280 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:16:44,269-Speed 5405.67 samples/sec Loss 10.6872 LearningRate 0.2563 Epoch: 2 Global Step: 25290 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:16:51,708-Speed 5507.05 samples/sec Loss 10.7322 LearningRate 0.2563 Epoch: 2 Global Step: 25300 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:16:59,125-Speed 5522.88 samples/sec Loss 10.6430 LearningRate 0.2562 Epoch: 2 Global Step: 25310 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:17:06,615-Speed 5469.69 samples/sec Loss 10.6846 LearningRate 0.2562 Epoch: 2 Global Step: 25320 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:17:14,183-Speed 5413.24 samples/sec Loss 10.7901 LearningRate 0.2562 Epoch: 2 Global Step: 25330 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:17:21,739-Speed 5421.70 samples/sec Loss 10.7179 LearningRate 0.2561 Epoch: 2 Global Step: 25340 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:17:29,203-Speed 5488.04 samples/sec Loss 10.6702 LearningRate 0.2561 Epoch: 2 Global Step: 25350 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:17:36,702-Speed 5462.86 samples/sec Loss 10.6793 LearningRate 0.2561 Epoch: 2 Global Step: 25360 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:17:44,243-Speed 5432.49 samples/sec Loss 10.6377 LearningRate 0.2561 Epoch: 2 Global Step: 25370 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:17:51,849-Speed 5385.98 samples/sec Loss 10.7458 LearningRate 0.2560 Epoch: 2 Global Step: 25380 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:17:59,319-Speed 5484.14 samples/sec Loss 10.6427 LearningRate 0.2560 Epoch: 2 Global Step: 25390 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:18:06,943-Speed 5372.84 samples/sec Loss 10.7258 LearningRate 0.2560 Epoch: 2 Global Step: 25400 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:18:14,438-Speed 5466.60 samples/sec Loss 10.6991 LearningRate 0.2559 Epoch: 2 Global Step: 25410 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:18:22,031-Speed 5394.77 samples/sec Loss 10.6707 LearningRate 0.2559 Epoch: 2 Global Step: 25420 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:18:29,572-Speed 5432.42 samples/sec Loss 10.6800 LearningRate 0.2559 Epoch: 2 Global Step: 25430 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:18:37,243-Speed 5340.07 samples/sec Loss 10.7246 LearningRate 0.2559 Epoch: 2 Global Step: 25440 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:18:44,832-Speed 5398.72 samples/sec Loss 10.7045 LearningRate 0.2558 Epoch: 2 Global Step: 25450 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:18:52,467-Speed 5365.31 samples/sec Loss 10.6916 LearningRate 0.2558 Epoch: 2 Global Step: 25460 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:19:00,133-Speed 5343.41 samples/sec Loss 10.6828 LearningRate 0.2558 Epoch: 2 Global Step: 25470 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:19:07,612-Speed 5477.73 samples/sec Loss 10.6938 LearningRate 0.2557 Epoch: 2 Global Step: 25480 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:19:15,202-Speed 5397.00 samples/sec Loss 10.7097 LearningRate 0.2557 Epoch: 2 Global Step: 25490 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:19:22,762-Speed 5418.69 samples/sec Loss 10.6338 LearningRate 0.2557 Epoch: 2 Global Step: 25500 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:19:30,166-Speed 5532.93 samples/sec Loss 10.6595 LearningRate 0.2557 Epoch: 2 Global Step: 25510 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:19:37,883-Speed 5308.20 samples/sec Loss 10.6618 LearningRate 0.2556 Epoch: 2 Global Step: 25520 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:19:45,325-Speed 5505.24 samples/sec Loss 10.6594 LearningRate 0.2556 Epoch: 2 Global Step: 25530 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:19:52,958-Speed 5366.85 samples/sec Loss 10.6499 LearningRate 0.2556 Epoch: 2 Global Step: 25540 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:20:00,582-Speed 5373.51 samples/sec Loss 10.7151 LearningRate 0.2555 Epoch: 2 Global Step: 25550 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:20:08,187-Speed 5386.03 samples/sec Loss 10.6072 LearningRate 0.2555 Epoch: 2 Global Step: 25560 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:20:15,698-Speed 5454.51 samples/sec Loss 10.6024 LearningRate 0.2555 Epoch: 2 Global Step: 25570 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:20:23,107-Speed 5528.62 samples/sec Loss 10.6243 LearningRate 0.2555 Epoch: 2 Global Step: 25580 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:20:30,702-Speed 5393.96 samples/sec Loss 10.6280 LearningRate 0.2554 Epoch: 2 Global Step: 25590 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:20:38,195-Speed 5467.20 samples/sec Loss 10.7198 LearningRate 0.2554 Epoch: 2 Global Step: 25600 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:20:45,687-Speed 5467.61 samples/sec Loss 10.6364 LearningRate 0.2554 Epoch: 2 Global Step: 25610 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:20:53,129-Speed 5505.14 samples/sec Loss 10.6645 LearningRate 0.2554 Epoch: 2 Global Step: 25620 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:21:00,741-Speed 5381.67 samples/sec Loss 10.6576 LearningRate 0.2553 Epoch: 2 Global Step: 25630 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:21:08,240-Speed 5462.63 samples/sec Loss 10.6173 LearningRate 0.2553 Epoch: 2 Global Step: 25640 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:21:15,743-Speed 5459.33 samples/sec Loss 10.7016 LearningRate 0.2553 Epoch: 2 Global Step: 25650 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:21:23,346-Speed 5388.60 samples/sec Loss 10.7357 LearningRate 0.2552 Epoch: 2 Global Step: 25660 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:21:30,891-Speed 5428.90 samples/sec Loss 10.6539 LearningRate 0.2552 Epoch: 2 Global Step: 25670 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:21:38,460-Speed 5412.78 samples/sec Loss 10.5640 LearningRate 0.2552 Epoch: 2 Global Step: 25680 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:21:45,984-Speed 5444.57 samples/sec Loss 10.6049 LearningRate 0.2552 Epoch: 2 Global Step: 25690 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:21:53,625-Speed 5360.86 samples/sec Loss 10.6537 LearningRate 0.2551 Epoch: 2 Global Step: 25700 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:22:01,145-Speed 5447.40 samples/sec Loss 10.7018 LearningRate 0.2551 Epoch: 2 Global Step: 25710 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:22:08,566-Speed 5520.48 samples/sec Loss 10.6724 LearningRate 0.2551 Epoch: 2 Global Step: 25720 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:22:16,086-Speed 5447.66 samples/sec Loss 10.6482 LearningRate 0.2550 Epoch: 2 Global Step: 25730 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:22:23,624-Speed 5434.85 samples/sec Loss 10.7936 LearningRate 0.2550 Epoch: 2 Global Step: 25740 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:22:31,090-Speed 5486.98 samples/sec Loss 10.5410 LearningRate 0.2550 Epoch: 2 Global Step: 25750 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:22:38,697-Speed 5384.79 samples/sec Loss 10.6784 LearningRate 0.2550 Epoch: 2 Global Step: 25760 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:22:46,239-Speed 5431.65 samples/sec Loss 10.6224 LearningRate 0.2549 Epoch: 2 Global Step: 25770 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:22:53,903-Speed 5345.27 samples/sec Loss 10.6640 LearningRate 0.2549 Epoch: 2 Global Step: 25780 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:23:01,427-Speed 5444.71 samples/sec Loss 10.6100 LearningRate 0.2549 Epoch: 2 Global Step: 25790 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:23:08,922-Speed 5465.95 samples/sec Loss 10.7088 LearningRate 0.2548 Epoch: 2 Global Step: 25800 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:23:16,377-Speed 5494.83 samples/sec Loss 10.6523 LearningRate 0.2548 Epoch: 2 Global Step: 25810 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:23:23,844-Speed 5486.60 samples/sec Loss 10.6108 LearningRate 0.2548 Epoch: 2 Global Step: 25820 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:23:31,299-Speed 5495.13 samples/sec Loss 10.6528 LearningRate 0.2548 Epoch: 2 Global Step: 25830 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-01-08 00:23:38,754-Speed 5494.83 samples/sec Loss 10.5999 LearningRate 0.2547 Epoch: 2 Global Step: 25840 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-01-08 00:23:46,189-Speed 5509.80 samples/sec Loss 10.5700 LearningRate 0.2547 Epoch: 2 Global Step: 25850 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-01-08 00:23:53,649-Speed 5491.45 samples/sec Loss 10.6332 LearningRate 0.2547 Epoch: 2 Global Step: 25860 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-01-08 00:24:01,135-Speed 5472.51 samples/sec Loss 10.5829 LearningRate 0.2546 Epoch: 2 Global Step: 25870 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-01-08 00:24:08,805-Speed 5340.98 samples/sec Loss 10.5342 LearningRate 0.2546 Epoch: 2 Global Step: 25880 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-01-08 00:24:16,315-Speed 5454.66 samples/sec Loss 10.6541 LearningRate 0.2546 Epoch: 2 Global Step: 25890 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-01-08 00:24:23,835-Speed 5447.14 samples/sec Loss 10.7082 LearningRate 0.2546 Epoch: 2 Global Step: 25900 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-01-08 00:24:31,363-Speed 5442.02 samples/sec Loss 10.6719 LearningRate 0.2545 Epoch: 2 Global Step: 25910 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-01-08 00:24:38,788-Speed 5517.14 samples/sec Loss 10.5661 LearningRate 0.2545 Epoch: 2 Global Step: 25920 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-01-08 00:24:46,300-Speed 5453.49 samples/sec Loss 10.7053 LearningRate 0.2545 Epoch: 2 Global Step: 25930 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:24:53,822-Speed 5446.43 samples/sec Loss 10.6990 LearningRate 0.2545 Epoch: 2 Global Step: 25940 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:25:01,352-Speed 5440.90 samples/sec Loss 10.6254 LearningRate 0.2544 Epoch: 2 Global Step: 25950 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:25:08,812-Speed 5490.81 samples/sec Loss 10.6073 LearningRate 0.2544 Epoch: 2 Global Step: 25960 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:25:16,278-Speed 5487.48 samples/sec Loss 10.6186 LearningRate 0.2544 Epoch: 2 Global Step: 25970 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:25:23,728-Speed 5498.15 samples/sec Loss 10.6067 LearningRate 0.2543 Epoch: 2 Global Step: 25980 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:25:31,256-Speed 5442.35 samples/sec Loss 10.5888 LearningRate 0.2543 Epoch: 2 Global Step: 25990 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:25:38,658-Speed 5533.80 samples/sec Loss 10.6389 LearningRate 0.2543 Epoch: 2 Global Step: 26000 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:26:22,415-[lfw][26000]XNorm: 22.938673 Training: 2022-01-08 00:26:22,416-[lfw][26000]Accuracy-Flip: 0.99717+-0.00279 Training: 2022-01-08 00:26:22,416-[lfw][26000]Accuracy-Highest: 0.99767 Training: 2022-01-08 00:27:14,920-[cfp_fp][26000]XNorm: 20.785323 Training: 2022-01-08 00:27:14,921-[cfp_fp][26000]Accuracy-Flip: 0.98271+-0.00449 Training: 2022-01-08 00:27:14,922-[cfp_fp][26000]Accuracy-Highest: 0.98271 Training: 2022-01-08 00:28:00,104-[agedb_30][26000]XNorm: 22.997359 Training: 2022-01-08 00:28:00,105-[agedb_30][26000]Accuracy-Flip: 0.97167+-0.00707 Training: 2022-01-08 00:28:00,106-[agedb_30][26000]Accuracy-Highest: 0.97167 Training: 2022-01-08 00:28:07,601-Speed 275.01 samples/sec Loss 10.6200 LearningRate 0.2543 Epoch: 2 Global Step: 26010 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:28:15,134-Speed 5440.20 samples/sec Loss 10.6133 LearningRate 0.2542 Epoch: 2 Global Step: 26020 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:28:22,599-Speed 5488.36 samples/sec Loss 10.4990 LearningRate 0.2542 Epoch: 2 Global Step: 26030 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:28:30,057-Speed 5493.27 samples/sec Loss 10.6111 LearningRate 0.2542 Epoch: 2 Global Step: 26040 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:28:37,539-Speed 5476.13 samples/sec Loss 10.5320 LearningRate 0.2541 Epoch: 2 Global Step: 26050 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:28:44,993-Speed 5495.81 samples/sec Loss 10.6225 LearningRate 0.2541 Epoch: 2 Global Step: 26060 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:28:52,585-Speed 5396.42 samples/sec Loss 10.6535 LearningRate 0.2541 Epoch: 2 Global Step: 26070 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:29:00,023-Speed 5507.56 samples/sec Loss 10.5186 LearningRate 0.2541 Epoch: 2 Global Step: 26080 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:29:07,623-Speed 5390.26 samples/sec Loss 10.6725 LearningRate 0.2540 Epoch: 2 Global Step: 26090 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:29:15,056-Speed 5510.99 samples/sec Loss 10.6794 LearningRate 0.2540 Epoch: 2 Global Step: 26100 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:29:22,518-Speed 5490.01 samples/sec Loss 10.6845 LearningRate 0.2540 Epoch: 2 Global Step: 26110 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:29:29,923-Speed 5532.21 samples/sec Loss 10.6610 LearningRate 0.2539 Epoch: 2 Global Step: 26120 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:29:37,390-Speed 5486.23 samples/sec Loss 10.5646 LearningRate 0.2539 Epoch: 2 Global Step: 26130 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:29:44,858-Speed 5486.03 samples/sec Loss 10.5695 LearningRate 0.2539 Epoch: 2 Global Step: 26140 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:29:52,269-Speed 5527.87 samples/sec Loss 10.6339 LearningRate 0.2539 Epoch: 2 Global Step: 26150 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:29:59,766-Speed 5463.82 samples/sec Loss 10.5690 LearningRate 0.2538 Epoch: 2 Global Step: 26160 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:30:07,231-Speed 5488.14 samples/sec Loss 10.6050 LearningRate 0.2538 Epoch: 2 Global Step: 26170 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:30:14,735-Speed 5459.23 samples/sec Loss 10.6500 LearningRate 0.2538 Epoch: 2 Global Step: 26180 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:30:22,152-Speed 5523.22 samples/sec Loss 10.5737 LearningRate 0.2538 Epoch: 2 Global Step: 26190 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:30:29,577-Speed 5517.10 samples/sec Loss 10.6521 LearningRate 0.2537 Epoch: 2 Global Step: 26200 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:30:36,965-Speed 5545.01 samples/sec Loss 10.5724 LearningRate 0.2537 Epoch: 2 Global Step: 26210 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:30:44,402-Speed 5508.05 samples/sec Loss 10.5461 LearningRate 0.2537 Epoch: 2 Global Step: 26220 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:30:51,833-Speed 5513.63 samples/sec Loss 10.5715 LearningRate 0.2536 Epoch: 2 Global Step: 26230 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:30:59,255-Speed 5518.85 samples/sec Loss 10.6742 LearningRate 0.2536 Epoch: 2 Global Step: 26240 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:31:06,767-Speed 5453.71 samples/sec Loss 10.5695 LearningRate 0.2536 Epoch: 2 Global Step: 26250 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:31:14,239-Speed 5482.43 samples/sec Loss 10.5942 LearningRate 0.2536 Epoch: 2 Global Step: 26260 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:31:21,780-Speed 5432.95 samples/sec Loss 10.6411 LearningRate 0.2535 Epoch: 2 Global Step: 26270 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:31:29,260-Speed 5475.99 samples/sec Loss 10.5993 LearningRate 0.2535 Epoch: 2 Global Step: 26280 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:31:36,823-Speed 5416.86 samples/sec Loss 10.6140 LearningRate 0.2535 Epoch: 2 Global Step: 26290 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:31:44,394-Speed 5410.65 samples/sec Loss 10.5374 LearningRate 0.2534 Epoch: 2 Global Step: 26300 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:31:51,840-Speed 5501.77 samples/sec Loss 10.6753 LearningRate 0.2534 Epoch: 2 Global Step: 26310 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:31:59,266-Speed 5516.69 samples/sec Loss 10.5809 LearningRate 0.2534 Epoch: 2 Global Step: 26320 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:32:06,800-Speed 5437.36 samples/sec Loss 10.5773 LearningRate 0.2534 Epoch: 2 Global Step: 26330 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:32:14,418-Speed 5377.49 samples/sec Loss 10.5177 LearningRate 0.2533 Epoch: 2 Global Step: 26340 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:32:21,817-Speed 5536.87 samples/sec Loss 10.6495 LearningRate 0.2533 Epoch: 2 Global Step: 26350 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:32:29,278-Speed 5490.22 samples/sec Loss 10.6163 LearningRate 0.2533 Epoch: 2 Global Step: 26360 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:32:36,712-Speed 5510.97 samples/sec Loss 10.6889 LearningRate 0.2532 Epoch: 2 Global Step: 26370 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:32:44,216-Speed 5458.94 samples/sec Loss 10.5277 LearningRate 0.2532 Epoch: 2 Global Step: 26380 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:32:51,598-Speed 5549.36 samples/sec Loss 10.5246 LearningRate 0.2532 Epoch: 2 Global Step: 26390 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:32:59,162-Speed 5416.32 samples/sec Loss 10.5339 LearningRate 0.2532 Epoch: 2 Global Step: 26400 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:33:06,756-Speed 5393.84 samples/sec Loss 10.6275 LearningRate 0.2531 Epoch: 2 Global Step: 26410 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:33:14,196-Speed 5506.40 samples/sec Loss 10.7076 LearningRate 0.2531 Epoch: 2 Global Step: 26420 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:33:21,694-Speed 5464.31 samples/sec Loss 10.5289 LearningRate 0.2531 Epoch: 2 Global Step: 26430 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:33:29,175-Speed 5475.64 samples/sec Loss 10.5547 LearningRate 0.2531 Epoch: 2 Global Step: 26440 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:33:36,679-Speed 5459.10 samples/sec Loss 10.5893 LearningRate 0.2530 Epoch: 2 Global Step: 26450 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:33:44,093-Speed 5525.28 samples/sec Loss 10.5474 LearningRate 0.2530 Epoch: 2 Global Step: 26460 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:33:51,536-Speed 5504.29 samples/sec Loss 10.6719 LearningRate 0.2530 Epoch: 2 Global Step: 26470 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:33:59,041-Speed 5458.35 samples/sec Loss 10.5745 LearningRate 0.2529 Epoch: 2 Global Step: 26480 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:34:06,474-Speed 5511.36 samples/sec Loss 10.4960 LearningRate 0.2529 Epoch: 2 Global Step: 26490 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:34:13,939-Speed 5487.34 samples/sec Loss 10.5843 LearningRate 0.2529 Epoch: 2 Global Step: 26500 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:34:21,361-Speed 5519.89 samples/sec Loss 10.5340 LearningRate 0.2529 Epoch: 2 Global Step: 26510 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:34:28,787-Speed 5516.46 samples/sec Loss 10.5969 LearningRate 0.2528 Epoch: 2 Global Step: 26520 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:34:36,181-Speed 5540.69 samples/sec Loss 10.5198 LearningRate 0.2528 Epoch: 2 Global Step: 26530 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:34:44,005-Speed 5236.05 samples/sec Loss 10.5814 LearningRate 0.2528 Epoch: 2 Global Step: 26540 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:34:51,395-Speed 5543.58 samples/sec Loss 10.6045 LearningRate 0.2527 Epoch: 2 Global Step: 26550 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:34:58,844-Speed 5499.17 samples/sec Loss 10.6798 LearningRate 0.2527 Epoch: 2 Global Step: 26560 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:35:06,339-Speed 5465.92 samples/sec Loss 10.5708 LearningRate 0.2527 Epoch: 2 Global Step: 26570 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:35:13,902-Speed 5416.54 samples/sec Loss 10.6048 LearningRate 0.2527 Epoch: 2 Global Step: 26580 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:35:21,404-Speed 5460.26 samples/sec Loss 10.5526 LearningRate 0.2526 Epoch: 2 Global Step: 26590 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:35:28,880-Speed 5479.75 samples/sec Loss 10.5951 LearningRate 0.2526 Epoch: 2 Global Step: 26600 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:35:36,413-Speed 5438.42 samples/sec Loss 10.5071 LearningRate 0.2526 Epoch: 2 Global Step: 26610 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:35:43,882-Speed 5484.52 samples/sec Loss 10.6041 LearningRate 0.2525 Epoch: 2 Global Step: 26620 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:35:51,314-Speed 5511.94 samples/sec Loss 10.6027 LearningRate 0.2525 Epoch: 2 Global Step: 26630 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:35:58,792-Speed 5478.57 samples/sec Loss 10.5820 LearningRate 0.2525 Epoch: 2 Global Step: 26640 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:36:06,277-Speed 5473.13 samples/sec Loss 10.5129 LearningRate 0.2525 Epoch: 2 Global Step: 26650 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:36:13,774-Speed 5464.38 samples/sec Loss 10.5248 LearningRate 0.2524 Epoch: 2 Global Step: 26660 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:36:21,315-Speed 5432.47 samples/sec Loss 10.5508 LearningRate 0.2524 Epoch: 2 Global Step: 26670 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:36:28,955-Speed 5361.87 samples/sec Loss 10.5053 LearningRate 0.2524 Epoch: 2 Global Step: 26680 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:36:36,427-Speed 5483.22 samples/sec Loss 10.5487 LearningRate 0.2524 Epoch: 2 Global Step: 26690 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:36:43,854-Speed 5515.71 samples/sec Loss 10.5268 LearningRate 0.2523 Epoch: 2 Global Step: 26700 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:36:51,282-Speed 5514.94 samples/sec Loss 10.5386 LearningRate 0.2523 Epoch: 2 Global Step: 26710 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:36:58,867-Speed 5400.58 samples/sec Loss 10.5799 LearningRate 0.2523 Epoch: 2 Global Step: 26720 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:37:06,499-Speed 5367.94 samples/sec Loss 10.5412 LearningRate 0.2522 Epoch: 2 Global Step: 26730 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:37:13,969-Speed 5484.07 samples/sec Loss 10.5259 LearningRate 0.2522 Epoch: 2 Global Step: 26740 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:37:21,458-Speed 5470.23 samples/sec Loss 10.4796 LearningRate 0.2522 Epoch: 2 Global Step: 26750 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:37:28,942-Speed 5473.58 samples/sec Loss 10.5694 LearningRate 0.2522 Epoch: 2 Global Step: 26760 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:37:36,394-Speed 5497.42 samples/sec Loss 10.5124 LearningRate 0.2521 Epoch: 2 Global Step: 26770 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:37:43,842-Speed 5499.80 samples/sec Loss 10.4956 LearningRate 0.2521 Epoch: 2 Global Step: 26780 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:37:51,278-Speed 5509.41 samples/sec Loss 10.5226 LearningRate 0.2521 Epoch: 2 Global Step: 26790 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:37:58,684-Speed 5531.79 samples/sec Loss 10.5860 LearningRate 0.2520 Epoch: 2 Global Step: 26800 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:38:06,255-Speed 5410.77 samples/sec Loss 10.4967 LearningRate 0.2520 Epoch: 2 Global Step: 26810 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:38:13,674-Speed 5521.94 samples/sec Loss 10.5633 LearningRate 0.2520 Epoch: 2 Global Step: 26820 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:38:21,118-Speed 5503.20 samples/sec Loss 10.5948 LearningRate 0.2520 Epoch: 2 Global Step: 26830 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:38:28,487-Speed 5558.27 samples/sec Loss 10.5507 LearningRate 0.2519 Epoch: 2 Global Step: 26840 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:38:36,011-Speed 5445.19 samples/sec Loss 10.5054 LearningRate 0.2519 Epoch: 2 Global Step: 26850 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:38:43,462-Speed 5498.05 samples/sec Loss 10.5903 LearningRate 0.2519 Epoch: 2 Global Step: 26860 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:38:50,911-Speed 5498.94 samples/sec Loss 10.5514 LearningRate 0.2519 Epoch: 2 Global Step: 26870 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:38:58,363-Speed 5497.43 samples/sec Loss 10.5842 LearningRate 0.2518 Epoch: 2 Global Step: 26880 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:39:05,922-Speed 5420.02 samples/sec Loss 10.4907 LearningRate 0.2518 Epoch: 2 Global Step: 26890 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:39:13,433-Speed 5453.93 samples/sec Loss 10.5713 LearningRate 0.2518 Epoch: 2 Global Step: 26900 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:39:21,004-Speed 5410.22 samples/sec Loss 10.5670 LearningRate 0.2517 Epoch: 2 Global Step: 26910 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:39:28,593-Speed 5398.33 samples/sec Loss 10.4797 LearningRate 0.2517 Epoch: 2 Global Step: 26920 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:39:36,196-Speed 5388.34 samples/sec Loss 10.5068 LearningRate 0.2517 Epoch: 2 Global Step: 26930 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:39:43,702-Speed 5458.01 samples/sec Loss 10.4315 LearningRate 0.2517 Epoch: 2 Global Step: 26940 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:39:51,200-Speed 5462.99 samples/sec Loss 10.5293 LearningRate 0.2516 Epoch: 2 Global Step: 26950 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:39:58,791-Speed 5397.02 samples/sec Loss 10.5528 LearningRate 0.2516 Epoch: 2 Global Step: 26960 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:40:06,319-Speed 5441.39 samples/sec Loss 10.5348 LearningRate 0.2516 Epoch: 2 Global Step: 26970 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:40:13,765-Speed 5502.36 samples/sec Loss 10.5467 LearningRate 0.2515 Epoch: 2 Global Step: 26980 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:40:21,286-Speed 5446.23 samples/sec Loss 10.5505 LearningRate 0.2515 Epoch: 2 Global Step: 26990 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:40:28,783-Speed 5464.60 samples/sec Loss 10.5834 LearningRate 0.2515 Epoch: 2 Global Step: 27000 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:40:36,311-Speed 5441.85 samples/sec Loss 10.4927 LearningRate 0.2515 Epoch: 2 Global Step: 27010 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:40:43,772-Speed 5490.64 samples/sec Loss 10.4729 LearningRate 0.2514 Epoch: 2 Global Step: 27020 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:40:51,362-Speed 5396.68 samples/sec Loss 10.6080 LearningRate 0.2514 Epoch: 2 Global Step: 27030 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:40:58,863-Speed 5462.24 samples/sec Loss 10.5608 LearningRate 0.2514 Epoch: 2 Global Step: 27040 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:41:06,330-Speed 5486.36 samples/sec Loss 10.5548 LearningRate 0.2513 Epoch: 2 Global Step: 27050 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:41:13,835-Speed 5458.05 samples/sec Loss 10.5945 LearningRate 0.2513 Epoch: 2 Global Step: 27060 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:41:21,362-Speed 5441.84 samples/sec Loss 10.5720 LearningRate 0.2513 Epoch: 2 Global Step: 27070 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:41:28,840-Speed 5478.48 samples/sec Loss 10.4637 LearningRate 0.2513 Epoch: 2 Global Step: 27080 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:41:36,261-Speed 5520.52 samples/sec Loss 10.4734 LearningRate 0.2512 Epoch: 2 Global Step: 27090 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:41:43,708-Speed 5500.98 samples/sec Loss 10.4882 LearningRate 0.2512 Epoch: 2 Global Step: 27100 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:41:51,136-Speed 5514.68 samples/sec Loss 10.4582 LearningRate 0.2512 Epoch: 2 Global Step: 27110 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:41:58,669-Speed 5438.44 samples/sec Loss 10.4255 LearningRate 0.2512 Epoch: 2 Global Step: 27120 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:42:06,150-Speed 5475.66 samples/sec Loss 10.4959 LearningRate 0.2511 Epoch: 2 Global Step: 27130 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:42:13,657-Speed 5457.55 samples/sec Loss 10.4280 LearningRate 0.2511 Epoch: 2 Global Step: 27140 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:42:21,245-Speed 5398.25 samples/sec Loss 10.4869 LearningRate 0.2511 Epoch: 2 Global Step: 27150 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:42:28,757-Speed 5453.44 samples/sec Loss 10.5265 LearningRate 0.2510 Epoch: 2 Global Step: 27160 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:42:36,327-Speed 5412.20 samples/sec Loss 10.5606 LearningRate 0.2510 Epoch: 2 Global Step: 27170 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:42:43,752-Speed 5517.19 samples/sec Loss 10.4586 LearningRate 0.2510 Epoch: 2 Global Step: 27180 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:42:51,305-Speed 5423.47 samples/sec Loss 10.5306 LearningRate 0.2510 Epoch: 2 Global Step: 27190 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:42:58,702-Speed 5538.14 samples/sec Loss 10.4885 LearningRate 0.2509 Epoch: 2 Global Step: 27200 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:43:06,186-Speed 5474.26 samples/sec Loss 10.5213 LearningRate 0.2509 Epoch: 2 Global Step: 27210 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:43:13,643-Speed 5493.57 samples/sec Loss 10.5542 LearningRate 0.2509 Epoch: 2 Global Step: 27220 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:43:21,134-Speed 5468.46 samples/sec Loss 10.4913 LearningRate 0.2508 Epoch: 2 Global Step: 27230 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:43:28,604-Speed 5483.74 samples/sec Loss 10.5371 LearningRate 0.2508 Epoch: 2 Global Step: 27240 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:43:36,076-Speed 5483.23 samples/sec Loss 10.4507 LearningRate 0.2508 Epoch: 2 Global Step: 27250 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:43:43,569-Speed 5466.64 samples/sec Loss 10.4947 LearningRate 0.2508 Epoch: 2 Global Step: 27260 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:43:51,078-Speed 5455.73 samples/sec Loss 10.4542 LearningRate 0.2507 Epoch: 2 Global Step: 27270 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:43:58,499-Speed 5519.92 samples/sec Loss 10.4319 LearningRate 0.2507 Epoch: 2 Global Step: 27280 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:44:05,970-Speed 5483.86 samples/sec Loss 10.4967 LearningRate 0.2507 Epoch: 2 Global Step: 27290 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:44:13,459-Speed 5469.53 samples/sec Loss 10.4996 LearningRate 0.2507 Epoch: 2 Global Step: 27300 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:44:20,921-Speed 5490.23 samples/sec Loss 10.4375 LearningRate 0.2506 Epoch: 2 Global Step: 27310 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:44:28,393-Speed 5482.34 samples/sec Loss 10.5282 LearningRate 0.2506 Epoch: 2 Global Step: 27320 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:44:35,860-Speed 5486.50 samples/sec Loss 10.4534 LearningRate 0.2506 Epoch: 2 Global Step: 27330 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:44:43,309-Speed 5499.13 samples/sec Loss 10.5456 LearningRate 0.2505 Epoch: 2 Global Step: 27340 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:44:50,733-Speed 5518.00 samples/sec Loss 10.5418 LearningRate 0.2505 Epoch: 2 Global Step: 27350 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:44:58,269-Speed 5436.42 samples/sec Loss 10.4756 LearningRate 0.2505 Epoch: 2 Global Step: 27360 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:45:05,714-Speed 5502.15 samples/sec Loss 10.5471 LearningRate 0.2505 Epoch: 2 Global Step: 27370 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:45:13,225-Speed 5454.31 samples/sec Loss 10.4331 LearningRate 0.2504 Epoch: 2 Global Step: 27380 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:45:20,665-Speed 5505.78 samples/sec Loss 10.4107 LearningRate 0.2504 Epoch: 2 Global Step: 27390 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:45:28,204-Speed 5434.19 samples/sec Loss 10.4377 LearningRate 0.2504 Epoch: 2 Global Step: 27400 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:45:35,687-Speed 5474.78 samples/sec Loss 10.4937 LearningRate 0.2503 Epoch: 2 Global Step: 27410 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:45:43,195-Speed 5455.94 samples/sec Loss 10.4291 LearningRate 0.2503 Epoch: 2 Global Step: 27420 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:45:50,671-Speed 5479.95 samples/sec Loss 10.4927 LearningRate 0.2503 Epoch: 2 Global Step: 27430 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:45:58,121-Speed 5498.41 samples/sec Loss 10.4566 LearningRate 0.2503 Epoch: 2 Global Step: 27440 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:46:05,551-Speed 5513.30 samples/sec Loss 10.4387 LearningRate 0.2502 Epoch: 2 Global Step: 27450 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:46:13,019-Speed 5485.88 samples/sec Loss 10.4782 LearningRate 0.2502 Epoch: 2 Global Step: 27460 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:46:20,458-Speed 5507.21 samples/sec Loss 10.4457 LearningRate 0.2502 Epoch: 2 Global Step: 27470 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:46:27,889-Speed 5512.09 samples/sec Loss 10.4601 LearningRate 0.2502 Epoch: 2 Global Step: 27480 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:46:35,364-Speed 5480.78 samples/sec Loss 10.4484 LearningRate 0.2501 Epoch: 2 Global Step: 27490 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:46:42,922-Speed 5419.98 samples/sec Loss 10.5676 LearningRate 0.2501 Epoch: 2 Global Step: 27500 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:46:50,405-Speed 5474.32 samples/sec Loss 10.5531 LearningRate 0.2501 Epoch: 2 Global Step: 27510 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:46:57,858-Speed 5497.00 samples/sec Loss 10.4972 LearningRate 0.2500 Epoch: 2 Global Step: 27520 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:47:05,315-Speed 5493.46 samples/sec Loss 10.4574 LearningRate 0.2500 Epoch: 2 Global Step: 27530 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:47:12,766-Speed 5497.79 samples/sec Loss 10.4305 LearningRate 0.2500 Epoch: 2 Global Step: 27540 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:47:20,201-Speed 5510.39 samples/sec Loss 10.4303 LearningRate 0.2500 Epoch: 2 Global Step: 27550 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:47:27,805-Speed 5386.94 samples/sec Loss 10.4221 LearningRate 0.2499 Epoch: 2 Global Step: 27560 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:47:35,359-Speed 5423.42 samples/sec Loss 10.4998 LearningRate 0.2499 Epoch: 2 Global Step: 27570 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:47:42,872-Speed 5452.28 samples/sec Loss 10.4969 LearningRate 0.2499 Epoch: 2 Global Step: 27580 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:47:50,289-Speed 5523.31 samples/sec Loss 10.4409 LearningRate 0.2498 Epoch: 2 Global Step: 27590 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:47:57,775-Speed 5472.61 samples/sec Loss 10.3868 LearningRate 0.2498 Epoch: 2 Global Step: 27600 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:48:05,358-Speed 5401.87 samples/sec Loss 10.4148 LearningRate 0.2498 Epoch: 2 Global Step: 27610 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:48:12,964-Speed 5386.30 samples/sec Loss 10.3910 LearningRate 0.2498 Epoch: 2 Global Step: 27620 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:48:20,432-Speed 5485.53 samples/sec Loss 10.4679 LearningRate 0.2497 Epoch: 2 Global Step: 27630 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:48:27,865-Speed 5510.80 samples/sec Loss 10.4525 LearningRate 0.2497 Epoch: 2 Global Step: 27640 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:48:35,307-Speed 5505.27 samples/sec Loss 10.5495 LearningRate 0.2497 Epoch: 2 Global Step: 27650 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:48:42,767-Speed 5491.11 samples/sec Loss 10.3910 LearningRate 0.2497 Epoch: 2 Global Step: 27660 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:48:50,335-Speed 5413.06 samples/sec Loss 10.4578 LearningRate 0.2496 Epoch: 2 Global Step: 27670 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:48:57,869-Speed 5437.81 samples/sec Loss 10.4693 LearningRate 0.2496 Epoch: 2 Global Step: 27680 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:49:05,337-Speed 5485.31 samples/sec Loss 10.4345 LearningRate 0.2496 Epoch: 2 Global Step: 27690 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:49:12,832-Speed 5465.72 samples/sec Loss 10.4535 LearningRate 0.2495 Epoch: 2 Global Step: 27700 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:49:20,292-Speed 5491.79 samples/sec Loss 10.5049 LearningRate 0.2495 Epoch: 2 Global Step: 27710 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:49:27,756-Speed 5487.98 samples/sec Loss 10.3838 LearningRate 0.2495 Epoch: 2 Global Step: 27720 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:49:35,212-Speed 5494.51 samples/sec Loss 10.4627 LearningRate 0.2495 Epoch: 2 Global Step: 27730 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:49:42,834-Speed 5375.11 samples/sec Loss 10.4530 LearningRate 0.2494 Epoch: 2 Global Step: 27740 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:49:50,575-Speed 5292.65 samples/sec Loss 10.4341 LearningRate 0.2494 Epoch: 2 Global Step: 27750 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:49:58,166-Speed 5396.80 samples/sec Loss 10.3959 LearningRate 0.2494 Epoch: 2 Global Step: 27760 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:50:05,698-Speed 5438.81 samples/sec Loss 10.4524 LearningRate 0.2493 Epoch: 2 Global Step: 27770 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:50:13,126-Speed 5514.84 samples/sec Loss 10.4337 LearningRate 0.2493 Epoch: 2 Global Step: 27780 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:50:20,714-Speed 5398.95 samples/sec Loss 10.4901 LearningRate 0.2493 Epoch: 2 Global Step: 27790 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:50:28,187-Speed 5481.73 samples/sec Loss 10.3677 LearningRate 0.2493 Epoch: 2 Global Step: 27800 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:50:35,596-Speed 5529.41 samples/sec Loss 10.4672 LearningRate 0.2492 Epoch: 2 Global Step: 27810 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:50:43,120-Speed 5444.63 samples/sec Loss 10.4447 LearningRate 0.2492 Epoch: 2 Global Step: 27820 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:50:50,610-Speed 5468.98 samples/sec Loss 10.4941 LearningRate 0.2492 Epoch: 2 Global Step: 27830 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:50:58,094-Speed 5473.61 samples/sec Loss 10.4840 LearningRate 0.2492 Epoch: 2 Global Step: 27840 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:51:05,533-Speed 5507.23 samples/sec Loss 10.4458 LearningRate 0.2491 Epoch: 2 Global Step: 27850 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:51:12,975-Speed 5504.07 samples/sec Loss 10.4249 LearningRate 0.2491 Epoch: 2 Global Step: 27860 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:51:20,415-Speed 5506.25 samples/sec Loss 10.3856 LearningRate 0.2491 Epoch: 2 Global Step: 27870 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:51:27,919-Speed 5459.05 samples/sec Loss 10.4247 LearningRate 0.2490 Epoch: 2 Global Step: 27880 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:51:35,345-Speed 5516.92 samples/sec Loss 10.4667 LearningRate 0.2490 Epoch: 2 Global Step: 27890 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:51:42,789-Speed 5502.79 samples/sec Loss 10.4385 LearningRate 0.2490 Epoch: 2 Global Step: 27900 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:51:50,204-Speed 5525.15 samples/sec Loss 10.4063 LearningRate 0.2490 Epoch: 2 Global Step: 27910 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:51:57,599-Speed 5539.46 samples/sec Loss 10.4412 LearningRate 0.2489 Epoch: 2 Global Step: 27920 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:52:05,065-Speed 5487.28 samples/sec Loss 10.5103 LearningRate 0.2489 Epoch: 2 Global Step: 27930 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:52:12,574-Speed 5455.30 samples/sec Loss 10.4153 LearningRate 0.2489 Epoch: 2 Global Step: 27940 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:52:20,144-Speed 5411.24 samples/sec Loss 10.4006 LearningRate 0.2488 Epoch: 2 Global Step: 27950 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:52:27,579-Speed 5510.32 samples/sec Loss 10.3880 LearningRate 0.2488 Epoch: 2 Global Step: 27960 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:52:35,159-Speed 5403.95 samples/sec Loss 10.4063 LearningRate 0.2488 Epoch: 2 Global Step: 27970 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:52:42,629-Speed 5484.45 samples/sec Loss 10.4581 LearningRate 0.2488 Epoch: 2 Global Step: 27980 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:52:50,104-Speed 5480.70 samples/sec Loss 10.4315 LearningRate 0.2487 Epoch: 2 Global Step: 27990 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:52:57,541-Speed 5508.12 samples/sec Loss 10.3888 LearningRate 0.2487 Epoch: 2 Global Step: 28000 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:53:42,316-[lfw][28000]XNorm: 23.403026 Training: 2022-01-08 00:53:42,316-[lfw][28000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-01-08 00:53:42,317-[lfw][28000]Accuracy-Highest: 0.99767 Training: 2022-01-08 00:54:35,138-[cfp_fp][28000]XNorm: 21.135900 Training: 2022-01-08 00:54:35,139-[cfp_fp][28000]Accuracy-Flip: 0.98043+-0.00626 Training: 2022-01-08 00:54:35,140-[cfp_fp][28000]Accuracy-Highest: 0.98271 Training: 2022-01-08 00:55:21,509-[agedb_30][28000]XNorm: 23.027044 Training: 2022-01-08 00:55:21,510-[agedb_30][28000]Accuracy-Flip: 0.96683+-0.00858 Training: 2022-01-08 00:55:21,511-[agedb_30][28000]Accuracy-Highest: 0.97167 Training: 2022-01-08 00:55:29,080-Speed 270.30 samples/sec Loss 10.5031 LearningRate 0.2487 Epoch: 2 Global Step: 28010 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:55:36,606-Speed 5443.70 samples/sec Loss 10.5074 LearningRate 0.2487 Epoch: 2 Global Step: 28020 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:55:44,069-Speed 5489.82 samples/sec Loss 10.3859 LearningRate 0.2486 Epoch: 2 Global Step: 28030 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:55:51,510-Speed 5506.41 samples/sec Loss 10.4960 LearningRate 0.2486 Epoch: 2 Global Step: 28040 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:55:58,941-Speed 5513.54 samples/sec Loss 10.3912 LearningRate 0.2486 Epoch: 2 Global Step: 28050 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:56:06,544-Speed 5388.82 samples/sec Loss 10.4438 LearningRate 0.2485 Epoch: 2 Global Step: 28060 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:56:14,034-Speed 5468.61 samples/sec Loss 10.4406 LearningRate 0.2485 Epoch: 2 Global Step: 28070 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:56:21,516-Speed 5475.75 samples/sec Loss 10.4261 LearningRate 0.2485 Epoch: 2 Global Step: 28080 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:56:28,972-Speed 5494.68 samples/sec Loss 10.3799 LearningRate 0.2485 Epoch: 2 Global Step: 28090 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:56:36,470-Speed 5463.32 samples/sec Loss 10.3119 LearningRate 0.2484 Epoch: 2 Global Step: 28100 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:56:44,028-Speed 5420.01 samples/sec Loss 10.4507 LearningRate 0.2484 Epoch: 2 Global Step: 28110 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:56:51,432-Speed 5533.04 samples/sec Loss 10.5030 LearningRate 0.2484 Epoch: 2 Global Step: 28120 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:56:58,839-Speed 5530.61 samples/sec Loss 10.5120 LearningRate 0.2483 Epoch: 2 Global Step: 28130 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:57:06,303-Speed 5487.89 samples/sec Loss 10.4313 LearningRate 0.2483 Epoch: 2 Global Step: 28140 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:57:13,750-Speed 5501.17 samples/sec Loss 10.4042 LearningRate 0.2483 Epoch: 2 Global Step: 28150 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:57:21,281-Speed 5439.57 samples/sec Loss 10.4462 LearningRate 0.2483 Epoch: 2 Global Step: 28160 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:57:28,704-Speed 5518.68 samples/sec Loss 10.3460 LearningRate 0.2482 Epoch: 2 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:57:36,410-Speed 5315.66 samples/sec Loss 10.3556 LearningRate 0.2482 Epoch: 2 Global Step: 28180 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:57:43,908-Speed 5463.79 samples/sec Loss 10.4144 LearningRate 0.2482 Epoch: 2 Global Step: 28190 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 00:57:51,350-Speed 5504.69 samples/sec Loss 10.4097 LearningRate 0.2482 Epoch: 2 Global Step: 28200 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:57:58,807-Speed 5494.17 samples/sec Loss 10.4463 LearningRate 0.2481 Epoch: 2 Global Step: 28210 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:58:06,478-Speed 5339.47 samples/sec Loss 10.4182 LearningRate 0.2481 Epoch: 2 Global Step: 28220 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:58:13,932-Speed 5496.10 samples/sec Loss 10.4328 LearningRate 0.2481 Epoch: 2 Global Step: 28230 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:58:21,513-Speed 5404.06 samples/sec Loss 10.4360 LearningRate 0.2480 Epoch: 2 Global Step: 28240 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:58:28,954-Speed 5504.68 samples/sec Loss 10.3564 LearningRate 0.2480 Epoch: 2 Global Step: 28250 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:58:36,367-Speed 5526.00 samples/sec Loss 10.4399 LearningRate 0.2480 Epoch: 2 Global Step: 28260 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:58:43,821-Speed 5496.24 samples/sec Loss 10.3692 LearningRate 0.2480 Epoch: 2 Global Step: 28270 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:58:51,319-Speed 5463.45 samples/sec Loss 10.3885 LearningRate 0.2479 Epoch: 2 Global Step: 28280 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:58:58,729-Speed 5528.51 samples/sec Loss 10.3948 LearningRate 0.2479 Epoch: 2 Global Step: 28290 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 00:59:06,306-Speed 5406.11 samples/sec Loss 10.3503 LearningRate 0.2479 Epoch: 2 Global Step: 28300 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:59:13,804-Speed 5463.49 samples/sec Loss 10.4399 LearningRate 0.2478 Epoch: 2 Global Step: 28310 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:59:21,286-Speed 5486.02 samples/sec Loss 10.4010 LearningRate 0.2478 Epoch: 2 Global Step: 28320 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:59:28,845-Speed 5418.85 samples/sec Loss 10.2974 LearningRate 0.2478 Epoch: 2 Global Step: 28330 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:59:36,279-Speed 5510.82 samples/sec Loss 10.3884 LearningRate 0.2478 Epoch: 2 Global Step: 28340 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:59:43,728-Speed 5499.57 samples/sec Loss 10.3930 LearningRate 0.2477 Epoch: 2 Global Step: 28350 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:59:51,132-Speed 5533.15 samples/sec Loss 10.4189 LearningRate 0.2477 Epoch: 2 Global Step: 28360 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 00:59:58,592-Speed 5491.24 samples/sec Loss 10.3602 LearningRate 0.2477 Epoch: 2 Global Step: 28370 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:00:06,093-Speed 5461.65 samples/sec Loss 10.3796 LearningRate 0.2477 Epoch: 2 Global Step: 28380 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:00:13,582-Speed 5469.70 samples/sec Loss 10.4058 LearningRate 0.2476 Epoch: 2 Global Step: 28390 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:00:21,047-Speed 5488.33 samples/sec Loss 10.4145 LearningRate 0.2476 Epoch: 2 Global Step: 28400 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:00:28,584-Speed 5435.22 samples/sec Loss 10.3931 LearningRate 0.2476 Epoch: 2 Global Step: 28410 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:00:36,028-Speed 5502.41 samples/sec Loss 10.3513 LearningRate 0.2475 Epoch: 2 Global Step: 28420 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:00:43,518-Speed 5469.76 samples/sec Loss 10.3876 LearningRate 0.2475 Epoch: 2 Global Step: 28430 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:00:50,924-Speed 5531.77 samples/sec Loss 10.4254 LearningRate 0.2475 Epoch: 2 Global Step: 28440 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:00:58,376-Speed 5496.35 samples/sec Loss 10.3749 LearningRate 0.2475 Epoch: 2 Global Step: 28450 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:01:05,861-Speed 5473.32 samples/sec Loss 10.3242 LearningRate 0.2474 Epoch: 2 Global Step: 28460 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:01:13,320-Speed 5492.67 samples/sec Loss 10.3717 LearningRate 0.2474 Epoch: 2 Global Step: 28470 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:01:20,833-Speed 5452.30 samples/sec Loss 10.3794 LearningRate 0.2474 Epoch: 2 Global Step: 28480 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:01:28,336-Speed 5460.08 samples/sec Loss 10.4189 LearningRate 0.2474 Epoch: 2 Global Step: 28490 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:01:35,783-Speed 5500.97 samples/sec Loss 10.2567 LearningRate 0.2473 Epoch: 2 Global Step: 28500 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:01:43,208-Speed 5517.48 samples/sec Loss 10.3149 LearningRate 0.2473 Epoch: 2 Global Step: 28510 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:01:50,632-Speed 5517.79 samples/sec Loss 10.3176 LearningRate 0.2473 Epoch: 2 Global Step: 28520 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:01:58,138-Speed 5457.61 samples/sec Loss 10.3304 LearningRate 0.2472 Epoch: 2 Global Step: 28530 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:02:05,644-Speed 5457.98 samples/sec Loss 10.3127 LearningRate 0.2472 Epoch: 2 Global Step: 28540 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:02:13,156-Speed 5453.41 samples/sec Loss 10.3147 LearningRate 0.2472 Epoch: 2 Global Step: 28550 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:02:20,644-Speed 5470.36 samples/sec Loss 10.5062 LearningRate 0.2472 Epoch: 2 Global Step: 28560 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:02:28,088-Speed 5503.59 samples/sec Loss 10.3970 LearningRate 0.2471 Epoch: 2 Global Step: 28570 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:02:35,593-Speed 5458.32 samples/sec Loss 10.2842 LearningRate 0.2471 Epoch: 2 Global Step: 28580 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:02:43,015-Speed 5519.69 samples/sec Loss 10.3775 LearningRate 0.2471 Epoch: 2 Global Step: 28590 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:02:50,476-Speed 5490.14 samples/sec Loss 10.3674 LearningRate 0.2470 Epoch: 2 Global Step: 28600 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:02:57,958-Speed 5475.94 samples/sec Loss 10.3852 LearningRate 0.2470 Epoch: 2 Global Step: 28610 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:03:05,459-Speed 5460.93 samples/sec Loss 10.3477 LearningRate 0.2470 Epoch: 2 Global Step: 28620 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:03:13,058-Speed 5391.16 samples/sec Loss 10.3486 LearningRate 0.2470 Epoch: 2 Global Step: 28630 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:03:20,624-Speed 5414.15 samples/sec Loss 10.4293 LearningRate 0.2469 Epoch: 2 Global Step: 28640 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:03:28,079-Speed 5495.26 samples/sec Loss 10.4544 LearningRate 0.2469 Epoch: 2 Global Step: 28650 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:03:35,556-Speed 5479.13 samples/sec Loss 10.3045 LearningRate 0.2469 Epoch: 2 Global Step: 28660 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:03:43,025-Speed 5485.14 samples/sec Loss 10.3365 LearningRate 0.2469 Epoch: 2 Global Step: 28670 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:03:50,411-Speed 5545.43 samples/sec Loss 10.2640 LearningRate 0.2468 Epoch: 2 Global Step: 28680 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:03:57,875-Speed 5488.78 samples/sec Loss 10.2456 LearningRate 0.2468 Epoch: 2 Global Step: 28690 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:04:05,574-Speed 5321.22 samples/sec Loss 10.3326 LearningRate 0.2468 Epoch: 2 Global Step: 28700 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:04:12,992-Speed 5522.50 samples/sec Loss 10.3368 LearningRate 0.2467 Epoch: 2 Global Step: 28710 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:04:20,546-Speed 5422.74 samples/sec Loss 10.4483 LearningRate 0.2467 Epoch: 2 Global Step: 28720 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:04:27,981-Speed 5509.50 samples/sec Loss 10.3580 LearningRate 0.2467 Epoch: 2 Global Step: 28730 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:04:35,377-Speed 5538.97 samples/sec Loss 10.4111 LearningRate 0.2467 Epoch: 2 Global Step: 28740 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:04:42,780-Speed 5533.80 samples/sec Loss 10.4250 LearningRate 0.2466 Epoch: 2 Global Step: 28750 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:04:50,218-Speed 5507.72 samples/sec Loss 10.3333 LearningRate 0.2466 Epoch: 2 Global Step: 28760 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:04:57,675-Speed 5493.69 samples/sec Loss 10.3811 LearningRate 0.2466 Epoch: 2 Global Step: 28770 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:05:05,154-Speed 5477.05 samples/sec Loss 10.3315 LearningRate 0.2465 Epoch: 2 Global Step: 28780 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:05:12,643-Speed 5470.47 samples/sec Loss 10.3571 LearningRate 0.2465 Epoch: 2 Global Step: 28790 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:05:20,125-Speed 5475.27 samples/sec Loss 10.3486 LearningRate 0.2465 Epoch: 2 Global Step: 28800 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:05:27,613-Speed 5470.80 samples/sec Loss 10.3085 LearningRate 0.2465 Epoch: 2 Global Step: 28810 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:05:35,153-Speed 5432.97 samples/sec Loss 10.3022 LearningRate 0.2464 Epoch: 2 Global Step: 28820 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:05:42,740-Speed 5400.02 samples/sec Loss 10.3579 LearningRate 0.2464 Epoch: 2 Global Step: 28830 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:05:50,303-Speed 5416.45 samples/sec Loss 10.2850 LearningRate 0.2464 Epoch: 2 Global Step: 28840 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:05:57,684-Speed 5549.80 samples/sec Loss 10.4319 LearningRate 0.2464 Epoch: 2 Global Step: 28850 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:06:05,233-Speed 5426.96 samples/sec Loss 10.4070 LearningRate 0.2463 Epoch: 2 Global Step: 28860 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 01:06:12,653-Speed 5520.46 samples/sec Loss 10.3923 LearningRate 0.2463 Epoch: 2 Global Step: 28870 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 01:06:20,132-Speed 5477.38 samples/sec Loss 10.3170 LearningRate 0.2463 Epoch: 2 Global Step: 28880 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 01:06:27,576-Speed 5503.54 samples/sec Loss 10.3453 LearningRate 0.2462 Epoch: 2 Global Step: 28890 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 01:06:34,993-Speed 5522.92 samples/sec Loss 10.3159 LearningRate 0.2462 Epoch: 2 Global Step: 28900 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:06:42,434-Speed 5505.73 samples/sec Loss 10.3516 LearningRate 0.2462 Epoch: 2 Global Step: 28910 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:06:49,880-Speed 5501.52 samples/sec Loss 10.3344 LearningRate 0.2462 Epoch: 2 Global Step: 28920 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:06:57,335-Speed 5495.06 samples/sec Loss 10.3175 LearningRate 0.2461 Epoch: 2 Global Step: 28930 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:07:04,873-Speed 5434.71 samples/sec Loss 10.3049 LearningRate 0.2461 Epoch: 2 Global Step: 28940 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:07:12,349-Speed 5480.21 samples/sec Loss 10.3365 LearningRate 0.2461 Epoch: 2 Global Step: 28950 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:07:19,845-Speed 5464.63 samples/sec Loss 10.3849 LearningRate 0.2461 Epoch: 2 Global Step: 28960 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:07:27,365-Speed 5447.75 samples/sec Loss 10.3234 LearningRate 0.2460 Epoch: 2 Global Step: 28970 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:07:34,846-Speed 5475.86 samples/sec Loss 10.3206 LearningRate 0.2460 Epoch: 2 Global Step: 28980 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:07:42,265-Speed 5522.13 samples/sec Loss 10.3125 LearningRate 0.2460 Epoch: 2 Global Step: 28990 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:07:49,707-Speed 5504.43 samples/sec Loss 10.3222 LearningRate 0.2459 Epoch: 2 Global Step: 29000 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 01:07:57,134-Speed 5515.29 samples/sec Loss 10.3220 LearningRate 0.2459 Epoch: 2 Global Step: 29010 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 01:08:04,630-Speed 5465.31 samples/sec Loss 10.2858 LearningRate 0.2459 Epoch: 2 Global Step: 29020 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 01:08:12,014-Speed 5548.02 samples/sec Loss 10.3506 LearningRate 0.2459 Epoch: 2 Global Step: 29030 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:08:19,485-Speed 5483.02 samples/sec Loss 10.3723 LearningRate 0.2458 Epoch: 2 Global Step: 29040 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:08:26,919-Speed 5510.60 samples/sec Loss 10.3503 LearningRate 0.2458 Epoch: 2 Global Step: 29050 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:08:34,454-Speed 5437.18 samples/sec Loss 10.3091 LearningRate 0.2458 Epoch: 2 Global Step: 29060 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:08:41,857-Speed 5533.19 samples/sec Loss 10.2872 LearningRate 0.2457 Epoch: 2 Global Step: 29070 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:08:49,286-Speed 5514.45 samples/sec Loss 10.3007 LearningRate 0.2457 Epoch: 2 Global Step: 29080 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:08:56,673-Speed 5545.77 samples/sec Loss 10.3141 LearningRate 0.2457 Epoch: 2 Global Step: 29090 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:09:04,245-Speed 5410.11 samples/sec Loss 10.3507 LearningRate 0.2457 Epoch: 2 Global Step: 29100 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:09:11,753-Speed 5456.41 samples/sec Loss 10.2348 LearningRate 0.2456 Epoch: 2 Global Step: 29110 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:09:19,231-Speed 5478.18 samples/sec Loss 10.2997 LearningRate 0.2456 Epoch: 2 Global Step: 29120 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:09:26,707-Speed 5479.80 samples/sec Loss 10.3492 LearningRate 0.2456 Epoch: 2 Global Step: 29130 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 01:09:34,174-Speed 5487.56 samples/sec Loss 10.3503 LearningRate 0.2456 Epoch: 2 Global Step: 29140 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:09:41,641-Speed 5485.51 samples/sec Loss 10.2459 LearningRate 0.2455 Epoch: 2 Global Step: 29150 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:09:49,128-Speed 5471.97 samples/sec Loss 10.3205 LearningRate 0.2455 Epoch: 2 Global Step: 29160 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:09:56,558-Speed 5513.83 samples/sec Loss 10.3077 LearningRate 0.2455 Epoch: 2 Global Step: 29170 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:10:04,017-Speed 5492.19 samples/sec Loss 10.3087 LearningRate 0.2454 Epoch: 2 Global Step: 29180 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:10:11,562-Speed 5429.55 samples/sec Loss 10.3342 LearningRate 0.2454 Epoch: 2 Global Step: 29190 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:10:19,118-Speed 5421.61 samples/sec Loss 10.2842 LearningRate 0.2454 Epoch: 2 Global Step: 29200 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:10:26,636-Speed 5449.43 samples/sec Loss 10.2910 LearningRate 0.2454 Epoch: 2 Global Step: 29210 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:10:34,188-Speed 5423.76 samples/sec Loss 10.3422 LearningRate 0.2453 Epoch: 2 Global Step: 29220 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:10:41,678-Speed 5469.83 samples/sec Loss 10.3917 LearningRate 0.2453 Epoch: 2 Global Step: 29230 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:10:49,161-Speed 5474.54 samples/sec Loss 10.3027 LearningRate 0.2453 Epoch: 2 Global Step: 29240 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 01:10:56,627-Speed 5487.32 samples/sec Loss 10.2499 LearningRate 0.2453 Epoch: 2 Global Step: 29250 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 01:11:04,103-Speed 5479.03 samples/sec Loss 10.3177 LearningRate 0.2452 Epoch: 2 Global Step: 29260 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:11:11,533-Speed 5513.81 samples/sec Loss 10.2570 LearningRate 0.2452 Epoch: 2 Global Step: 29270 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:11:19,012-Speed 5477.54 samples/sec Loss 10.2773 LearningRate 0.2452 Epoch: 2 Global Step: 29280 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:11:26,435-Speed 5518.98 samples/sec Loss 10.3114 LearningRate 0.2451 Epoch: 2 Global Step: 29290 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:11:33,896-Speed 5490.77 samples/sec Loss 10.2382 LearningRate 0.2451 Epoch: 2 Global Step: 29300 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:11:41,296-Speed 5535.92 samples/sec Loss 10.2538 LearningRate 0.2451 Epoch: 2 Global Step: 29310 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:11:48,640-Speed 5577.49 samples/sec Loss 10.3292 LearningRate 0.2451 Epoch: 2 Global Step: 29320 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:11:56,081-Speed 5506.11 samples/sec Loss 10.2393 LearningRate 0.2450 Epoch: 2 Global Step: 29330 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:12:03,544-Speed 5488.54 samples/sec Loss 10.3252 LearningRate 0.2450 Epoch: 2 Global Step: 29340 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:12:11,001-Speed 5494.06 samples/sec Loss 10.3402 LearningRate 0.2450 Epoch: 2 Global Step: 29350 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:12:18,518-Speed 5449.82 samples/sec Loss 10.4045 LearningRate 0.2450 Epoch: 2 Global Step: 29360 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:12:26,091-Speed 5409.03 samples/sec Loss 10.2446 LearningRate 0.2449 Epoch: 2 Global Step: 29370 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:12:33,694-Speed 5388.19 samples/sec Loss 10.3315 LearningRate 0.2449 Epoch: 2 Global Step: 29380 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:12:41,279-Speed 5400.61 samples/sec Loss 10.3108 LearningRate 0.2449 Epoch: 2 Global Step: 29390 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:12:48,810-Speed 5439.38 samples/sec Loss 10.3771 LearningRate 0.2448 Epoch: 2 Global Step: 29400 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:12:56,222-Speed 5527.14 samples/sec Loss 10.2629 LearningRate 0.2448 Epoch: 2 Global Step: 29410 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:13:03,766-Speed 5430.78 samples/sec Loss 10.3255 LearningRate 0.2448 Epoch: 2 Global Step: 29420 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:13:11,284-Speed 5448.49 samples/sec Loss 10.2531 LearningRate 0.2448 Epoch: 2 Global Step: 29430 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:13:18,818-Speed 5437.49 samples/sec Loss 10.2690 LearningRate 0.2447 Epoch: 2 Global Step: 29440 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:13:26,301-Speed 5474.31 samples/sec Loss 10.2624 LearningRate 0.2447 Epoch: 2 Global Step: 29450 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:13:33,755-Speed 5495.88 samples/sec Loss 10.2667 LearningRate 0.2447 Epoch: 2 Global Step: 29460 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-01-08 01:13:41,251-Speed 5464.83 samples/sec Loss 10.2334 LearningRate 0.2446 Epoch: 2 Global Step: 29470 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:13:48,747-Speed 5464.76 samples/sec Loss 10.3193 LearningRate 0.2446 Epoch: 2 Global Step: 29480 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:13:56,262-Speed 5451.73 samples/sec Loss 10.2841 LearningRate 0.2446 Epoch: 2 Global Step: 29490 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:14:03,865-Speed 5387.84 samples/sec Loss 10.2516 LearningRate 0.2446 Epoch: 2 Global Step: 29500 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:14:11,469-Speed 5387.16 samples/sec Loss 10.2915 LearningRate 0.2445 Epoch: 2 Global Step: 29510 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:14:19,115-Speed 5357.87 samples/sec Loss 10.2808 LearningRate 0.2445 Epoch: 2 Global Step: 29520 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:14:26,722-Speed 5384.94 samples/sec Loss 10.3182 LearningRate 0.2445 Epoch: 2 Global Step: 29530 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:14:34,309-Speed 5399.99 samples/sec Loss 10.2275 LearningRate 0.2445 Epoch: 2 Global Step: 29540 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:14:41,789-Speed 5476.38 samples/sec Loss 10.2742 LearningRate 0.2444 Epoch: 2 Global Step: 29550 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:14:49,458-Speed 5341.24 samples/sec Loss 10.2990 LearningRate 0.2444 Epoch: 2 Global Step: 29560 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:14:56,902-Speed 5503.77 samples/sec Loss 10.3243 LearningRate 0.2444 Epoch: 2 Global Step: 29570 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:15:04,386-Speed 5473.74 samples/sec Loss 10.3139 LearningRate 0.2443 Epoch: 2 Global Step: 29580 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:15:11,979-Speed 5394.45 samples/sec Loss 10.2460 LearningRate 0.2443 Epoch: 2 Global Step: 29590 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:15:19,561-Speed 5403.08 samples/sec Loss 10.3109 LearningRate 0.2443 Epoch: 2 Global Step: 29600 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:15:27,049-Speed 5471.27 samples/sec Loss 10.2915 LearningRate 0.2443 Epoch: 2 Global Step: 29610 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:15:34,592-Speed 5431.04 samples/sec Loss 10.3366 LearningRate 0.2442 Epoch: 2 Global Step: 29620 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:15:42,142-Speed 5425.25 samples/sec Loss 10.3138 LearningRate 0.2442 Epoch: 2 Global Step: 29630 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:15:49,583-Speed 5504.89 samples/sec Loss 10.2604 LearningRate 0.2442 Epoch: 2 Global Step: 29640 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:15:57,069-Speed 5472.96 samples/sec Loss 10.2588 LearningRate 0.2442 Epoch: 2 Global Step: 29650 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:16:04,646-Speed 5406.91 samples/sec Loss 10.3317 LearningRate 0.2441 Epoch: 2 Global Step: 29660 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:16:12,061-Speed 5524.24 samples/sec Loss 10.2724 LearningRate 0.2441 Epoch: 2 Global Step: 29670 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:16:19,586-Speed 5443.63 samples/sec Loss 10.1892 LearningRate 0.2441 Epoch: 2 Global Step: 29680 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:16:27,088-Speed 5460.51 samples/sec Loss 10.2100 LearningRate 0.2440 Epoch: 2 Global Step: 29690 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:16:34,550-Speed 5490.48 samples/sec Loss 10.2372 LearningRate 0.2440 Epoch: 2 Global Step: 29700 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:16:42,137-Speed 5398.73 samples/sec Loss 10.2258 LearningRate 0.2440 Epoch: 2 Global Step: 29710 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:16:49,755-Speed 5377.27 samples/sec Loss 10.2832 LearningRate 0.2440 Epoch: 2 Global Step: 29720 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:16:57,335-Speed 5405.03 samples/sec Loss 10.3056 LearningRate 0.2439 Epoch: 2 Global Step: 29730 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:17:04,825-Speed 5469.54 samples/sec Loss 10.2010 LearningRate 0.2439 Epoch: 2 Global Step: 29740 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:17:12,276-Speed 5497.38 samples/sec Loss 10.3197 LearningRate 0.2439 Epoch: 2 Global Step: 29750 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:17:19,796-Speed 5447.52 samples/sec Loss 10.2634 LearningRate 0.2439 Epoch: 2 Global Step: 29760 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:17:27,394-Speed 5391.60 samples/sec Loss 10.2787 LearningRate 0.2438 Epoch: 2 Global Step: 29770 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:17:34,830-Speed 5509.92 samples/sec Loss 10.3610 LearningRate 0.2438 Epoch: 2 Global Step: 29780 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:17:42,355-Speed 5443.24 samples/sec Loss 10.3312 LearningRate 0.2438 Epoch: 2 Global Step: 29790 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:17:49,851-Speed 5465.02 samples/sec Loss 10.3473 LearningRate 0.2437 Epoch: 2 Global Step: 29800 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:17:57,466-Speed 5379.94 samples/sec Loss 10.2791 LearningRate 0.2437 Epoch: 2 Global Step: 29810 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:18:05,000-Speed 5437.44 samples/sec Loss 10.2852 LearningRate 0.2437 Epoch: 2 Global Step: 29820 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:18:12,443-Speed 5503.50 samples/sec Loss 10.2395 LearningRate 0.2437 Epoch: 2 Global Step: 29830 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:18:19,929-Speed 5471.86 samples/sec Loss 10.2930 LearningRate 0.2436 Epoch: 2 Global Step: 29840 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:18:27,386-Speed 5494.66 samples/sec Loss 10.2769 LearningRate 0.2436 Epoch: 2 Global Step: 29850 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:18:35,022-Speed 5364.26 samples/sec Loss 10.3143 LearningRate 0.2436 Epoch: 2 Global Step: 29860 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:18:42,489-Speed 5486.45 samples/sec Loss 10.2773 LearningRate 0.2435 Epoch: 2 Global Step: 29870 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:18:49,887-Speed 5536.88 samples/sec Loss 10.2634 LearningRate 0.2435 Epoch: 2 Global Step: 29880 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:18:57,385-Speed 5464.24 samples/sec Loss 10.3211 LearningRate 0.2435 Epoch: 2 Global Step: 29890 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:19:04,851-Speed 5486.81 samples/sec Loss 10.2523 LearningRate 0.2435 Epoch: 2 Global Step: 29900 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:19:12,313-Speed 5489.80 samples/sec Loss 10.2686 LearningRate 0.2434 Epoch: 2 Global Step: 29910 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:19:19,781-Speed 5485.42 samples/sec Loss 10.2136 LearningRate 0.2434 Epoch: 2 Global Step: 29920 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:19:27,251-Speed 5484.78 samples/sec Loss 10.2265 LearningRate 0.2434 Epoch: 2 Global Step: 29930 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:19:34,728-Speed 5479.08 samples/sec Loss 10.2984 LearningRate 0.2434 Epoch: 2 Global Step: 29940 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:19:42,315-Speed 5399.39 samples/sec Loss 10.2010 LearningRate 0.2433 Epoch: 2 Global Step: 29950 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:19:49,777-Speed 5489.03 samples/sec Loss 10.2699 LearningRate 0.2433 Epoch: 2 Global Step: 29960 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:19:57,337-Speed 5419.71 samples/sec Loss 10.2691 LearningRate 0.2433 Epoch: 2 Global Step: 29970 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:20:04,833-Speed 5464.45 samples/sec Loss 10.2596 LearningRate 0.2432 Epoch: 2 Global Step: 29980 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:20:12,289-Speed 5494.88 samples/sec Loss 10.2622 LearningRate 0.2432 Epoch: 2 Global Step: 29990 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:20:19,783-Speed 5466.01 samples/sec Loss 10.1201 LearningRate 0.2432 Epoch: 2 Global Step: 30000 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:21:03,965-[lfw][30000]XNorm: 23.945636 Training: 2022-01-08 01:21:03,966-[lfw][30000]Accuracy-Flip: 0.99733+-0.00291 Training: 2022-01-08 01:21:03,966-[lfw][30000]Accuracy-Highest: 0.99767 Training: 2022-01-08 01:21:56,813-[cfp_fp][30000]XNorm: 21.433830 Training: 2022-01-08 01:21:56,814-[cfp_fp][30000]Accuracy-Flip: 0.98114+-0.00486 Training: 2022-01-08 01:21:56,815-[cfp_fp][30000]Accuracy-Highest: 0.98271 Training: 2022-01-08 01:22:42,750-[agedb_30][30000]XNorm: 23.247524 Training: 2022-01-08 01:22:42,751-[agedb_30][30000]Accuracy-Flip: 0.96900+-0.00821 Training: 2022-01-08 01:22:42,752-[agedb_30][30000]Accuracy-Highest: 0.97167 Training: 2022-01-08 01:22:50,397-Speed 271.96 samples/sec Loss 10.2570 LearningRate 0.2432 Epoch: 2 Global Step: 30010 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:22:57,869-Speed 5484.01 samples/sec Loss 10.2458 LearningRate 0.2431 Epoch: 2 Global Step: 30020 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:23:05,272-Speed 5533.96 samples/sec Loss 10.2764 LearningRate 0.2431 Epoch: 2 Global Step: 30030 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:23:12,766-Speed 5467.11 samples/sec Loss 10.2532 LearningRate 0.2431 Epoch: 2 Global Step: 30040 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:23:20,246-Speed 5477.20 samples/sec Loss 10.1908 LearningRate 0.2431 Epoch: 2 Global Step: 30050 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:23:27,740-Speed 5466.85 samples/sec Loss 10.2252 LearningRate 0.2430 Epoch: 2 Global Step: 30060 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:23:35,232-Speed 5467.56 samples/sec Loss 10.2872 LearningRate 0.2430 Epoch: 2 Global Step: 30070 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:23:42,691-Speed 5492.05 samples/sec Loss 10.2503 LearningRate 0.2430 Epoch: 2 Global Step: 30080 Fp16 Grad Scale: 262144 Required: 41 hours Training: 2022-01-08 01:23:50,102-Speed 5527.71 samples/sec Loss 10.2443 LearningRate 0.2429 Epoch: 2 Global Step: 30090 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:23:57,536-Speed 5510.89 samples/sec Loss 10.2122 LearningRate 0.2429 Epoch: 2 Global Step: 30100 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:24:05,056-Speed 5447.88 samples/sec Loss 10.2406 LearningRate 0.2429 Epoch: 2 Global Step: 30110 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:24:12,509-Speed 5495.85 samples/sec Loss 10.2939 LearningRate 0.2429 Epoch: 2 Global Step: 30120 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:24:20,022-Speed 5452.87 samples/sec Loss 10.2938 LearningRate 0.2428 Epoch: 2 Global Step: 30130 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:24:27,755-Speed 5297.53 samples/sec Loss 10.1883 LearningRate 0.2428 Epoch: 2 Global Step: 30140 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:24:35,313-Speed 5420.77 samples/sec Loss 10.2200 LearningRate 0.2428 Epoch: 2 Global Step: 30150 Fp16 Grad Scale: 131072 Required: 41 hours Training: 2022-01-08 01:24:42,824-Speed 5453.49 samples/sec Loss 10.2913 LearningRate 0.2428 Epoch: 2 Global Step: 30160 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:24:50,377-Speed 5424.67 samples/sec Loss 10.1439 LearningRate 0.2427 Epoch: 2 Global Step: 30170 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:24:57,853-Speed 5479.03 samples/sec Loss 10.2378 LearningRate 0.2427 Epoch: 2 Global Step: 30180 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:25:05,455-Speed 5388.86 samples/sec Loss 10.2509 LearningRate 0.2427 Epoch: 2 Global Step: 30190 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:25:12,950-Speed 5465.33 samples/sec Loss 10.2110 LearningRate 0.2426 Epoch: 2 Global Step: 30200 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:25:20,402-Speed 5497.58 samples/sec Loss 10.2067 LearningRate 0.2426 Epoch: 2 Global Step: 30210 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:25:27,999-Speed 5392.83 samples/sec Loss 10.2545 LearningRate 0.2426 Epoch: 2 Global Step: 30220 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:25:35,533-Speed 5437.43 samples/sec Loss 10.2147 LearningRate 0.2426 Epoch: 2 Global Step: 30230 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:25:43,000-Speed 5485.86 samples/sec Loss 10.1414 LearningRate 0.2425 Epoch: 2 Global Step: 30240 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:25:50,581-Speed 5404.32 samples/sec Loss 10.2103 LearningRate 0.2425 Epoch: 2 Global Step: 30250 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:25:58,066-Speed 5472.85 samples/sec Loss 10.1257 LearningRate 0.2425 Epoch: 2 Global Step: 30260 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:26:05,612-Speed 5429.01 samples/sec Loss 10.2372 LearningRate 0.2425 Epoch: 2 Global Step: 30270 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:26:13,105-Speed 5466.76 samples/sec Loss 10.2472 LearningRate 0.2424 Epoch: 2 Global Step: 30280 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:26:20,492-Speed 5545.51 samples/sec Loss 10.2132 LearningRate 0.2424 Epoch: 2 Global Step: 30290 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:26:27,966-Speed 5480.94 samples/sec Loss 10.2122 LearningRate 0.2424 Epoch: 2 Global Step: 30300 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:26:35,605-Speed 5362.80 samples/sec Loss 10.2806 LearningRate 0.2423 Epoch: 2 Global Step: 30310 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:26:43,106-Speed 5461.49 samples/sec Loss 10.2228 LearningRate 0.2423 Epoch: 2 Global Step: 30320 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:26:50,630-Speed 5444.41 samples/sec Loss 10.2625 LearningRate 0.2423 Epoch: 2 Global Step: 30330 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:26:58,132-Speed 5460.81 samples/sec Loss 10.2799 LearningRate 0.2423 Epoch: 2 Global Step: 30340 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:27:05,622-Speed 5469.03 samples/sec Loss 10.2530 LearningRate 0.2422 Epoch: 2 Global Step: 30350 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:27:13,145-Speed 5445.50 samples/sec Loss 10.1739 LearningRate 0.2422 Epoch: 2 Global Step: 30360 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:27:20,642-Speed 5464.29 samples/sec Loss 10.2621 LearningRate 0.2422 Epoch: 2 Global Step: 30370 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:27:28,158-Speed 5450.47 samples/sec Loss 10.2198 LearningRate 0.2422 Epoch: 2 Global Step: 30380 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:27:35,640-Speed 5475.29 samples/sec Loss 10.1946 LearningRate 0.2421 Epoch: 2 Global Step: 30390 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:27:43,164-Speed 5444.86 samples/sec Loss 10.2206 LearningRate 0.2421 Epoch: 2 Global Step: 30400 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:27:50,714-Speed 5425.86 samples/sec Loss 10.1876 LearningRate 0.2421 Epoch: 2 Global Step: 30410 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:27:58,251-Speed 5435.38 samples/sec Loss 10.2875 LearningRate 0.2420 Epoch: 2 Global Step: 30420 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:28:05,707-Speed 5494.45 samples/sec Loss 10.2226 LearningRate 0.2420 Epoch: 2 Global Step: 30430 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:28:13,155-Speed 5500.36 samples/sec Loss 10.2727 LearningRate 0.2420 Epoch: 2 Global Step: 30440 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:28:20,594-Speed 5507.19 samples/sec Loss 10.3310 LearningRate 0.2420 Epoch: 2 Global Step: 30450 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:28:28,142-Speed 5427.08 samples/sec Loss 10.1740 LearningRate 0.2419 Epoch: 2 Global Step: 30460 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:28:35,613-Speed 5483.46 samples/sec Loss 10.2985 LearningRate 0.2419 Epoch: 2 Global Step: 30470 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:28:43,090-Speed 5478.85 samples/sec Loss 10.2149 LearningRate 0.2419 Epoch: 2 Global Step: 30480 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:28:50,604-Speed 5452.19 samples/sec Loss 10.2228 LearningRate 0.2419 Epoch: 2 Global Step: 30490 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:28:58,097-Speed 5467.24 samples/sec Loss 10.2362 LearningRate 0.2418 Epoch: 2 Global Step: 30500 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:29:05,525-Speed 5514.45 samples/sec Loss 10.2009 LearningRate 0.2418 Epoch: 2 Global Step: 30510 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:29:12,989-Speed 5488.95 samples/sec Loss 10.2079 LearningRate 0.2418 Epoch: 2 Global Step: 30520 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:29:20,534-Speed 5429.16 samples/sec Loss 10.1341 LearningRate 0.2417 Epoch: 2 Global Step: 30530 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:29:28,017-Speed 5474.54 samples/sec Loss 10.2132 LearningRate 0.2417 Epoch: 2 Global Step: 30540 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:29:35,499-Speed 5475.25 samples/sec Loss 10.1608 LearningRate 0.2417 Epoch: 2 Global Step: 30550 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:29:43,021-Speed 5446.80 samples/sec Loss 10.1846 LearningRate 0.2417 Epoch: 2 Global Step: 30560 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:29:50,540-Speed 5448.00 samples/sec Loss 10.1903 LearningRate 0.2416 Epoch: 2 Global Step: 30570 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:29:58,071-Speed 5439.60 samples/sec Loss 10.1898 LearningRate 0.2416 Epoch: 2 Global Step: 30580 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:30:05,847-Speed 5267.87 samples/sec Loss 10.2268 LearningRate 0.2416 Epoch: 2 Global Step: 30590 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:30:13,399-Speed 5425.27 samples/sec Loss 10.2415 LearningRate 0.2415 Epoch: 2 Global Step: 30600 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:30:20,882-Speed 5474.21 samples/sec Loss 10.2345 LearningRate 0.2415 Epoch: 2 Global Step: 30610 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:30:28,417-Speed 5436.31 samples/sec Loss 10.2394 LearningRate 0.2415 Epoch: 2 Global Step: 30620 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:30:35,984-Speed 5414.08 samples/sec Loss 10.1574 LearningRate 0.2415 Epoch: 2 Global Step: 30630 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:30:43,542-Speed 5420.52 samples/sec Loss 10.1695 LearningRate 0.2414 Epoch: 2 Global Step: 30640 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:30:51,059-Speed 5449.42 samples/sec Loss 10.1797 LearningRate 0.2414 Epoch: 2 Global Step: 30650 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:30:58,610-Speed 5425.67 samples/sec Loss 10.2594 LearningRate 0.2414 Epoch: 2 Global Step: 30660 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:31:06,167-Speed 5420.76 samples/sec Loss 10.2759 LearningRate 0.2414 Epoch: 2 Global Step: 30670 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:31:13,833-Speed 5344.16 samples/sec Loss 10.2991 LearningRate 0.2413 Epoch: 2 Global Step: 30680 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:31:21,398-Speed 5414.35 samples/sec Loss 10.1658 LearningRate 0.2413 Epoch: 2 Global Step: 30690 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:31:28,914-Speed 5450.65 samples/sec Loss 10.1837 LearningRate 0.2413 Epoch: 2 Global Step: 30700 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:31:36,393-Speed 5477.20 samples/sec Loss 10.2508 LearningRate 0.2412 Epoch: 2 Global Step: 30710 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:31:43,915-Speed 5446.74 samples/sec Loss 10.1608 LearningRate 0.2412 Epoch: 2 Global Step: 30720 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:31:51,405-Speed 5468.65 samples/sec Loss 10.1822 LearningRate 0.2412 Epoch: 2 Global Step: 30730 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:31:58,973-Speed 5412.92 samples/sec Loss 10.2029 LearningRate 0.2412 Epoch: 2 Global Step: 30740 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:32:06,492-Speed 5448.49 samples/sec Loss 10.1755 LearningRate 0.2411 Epoch: 2 Global Step: 30750 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:32:14,021-Speed 5441.69 samples/sec Loss 10.1655 LearningRate 0.2411 Epoch: 2 Global Step: 30760 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:32:21,460-Speed 5506.90 samples/sec Loss 10.2053 LearningRate 0.2411 Epoch: 2 Global Step: 30770 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:32:28,926-Speed 5486.54 samples/sec Loss 10.2118 LearningRate 0.2411 Epoch: 2 Global Step: 30780 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:32:36,436-Speed 5455.42 samples/sec Loss 10.1548 LearningRate 0.2410 Epoch: 2 Global Step: 30790 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:32:44,107-Speed 5340.02 samples/sec Loss 10.1806 LearningRate 0.2410 Epoch: 2 Global Step: 30800 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:32:51,655-Speed 5427.49 samples/sec Loss 10.1894 LearningRate 0.2410 Epoch: 2 Global Step: 30810 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:32:59,165-Speed 5454.75 samples/sec Loss 10.2402 LearningRate 0.2409 Epoch: 2 Global Step: 30820 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:33:06,691-Speed 5443.04 samples/sec Loss 10.2144 LearningRate 0.2409 Epoch: 2 Global Step: 30830 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:33:14,292-Speed 5390.28 samples/sec Loss 10.1687 LearningRate 0.2409 Epoch: 2 Global Step: 30840 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:33:21,908-Speed 5378.83 samples/sec Loss 10.1515 LearningRate 0.2409 Epoch: 2 Global Step: 30850 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:33:29,585-Speed 5335.72 samples/sec Loss 10.2455 LearningRate 0.2408 Epoch: 2 Global Step: 30860 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:33:37,165-Speed 5404.54 samples/sec Loss 10.1495 LearningRate 0.2408 Epoch: 2 Global Step: 30870 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:33:44,748-Speed 5402.58 samples/sec Loss 10.2167 LearningRate 0.2408 Epoch: 2 Global Step: 30880 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:33:52,285-Speed 5435.27 samples/sec Loss 10.1523 LearningRate 0.2408 Epoch: 2 Global Step: 30890 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:33:59,796-Speed 5453.56 samples/sec Loss 10.1648 LearningRate 0.2407 Epoch: 2 Global Step: 30900 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:34:07,421-Speed 5372.76 samples/sec Loss 10.1320 LearningRate 0.2407 Epoch: 2 Global Step: 30910 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:34:14,960-Speed 5433.93 samples/sec Loss 10.2209 LearningRate 0.2407 Epoch: 2 Global Step: 30920 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:34:22,473-Speed 5452.99 samples/sec Loss 10.2218 LearningRate 0.2406 Epoch: 2 Global Step: 30930 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:34:30,128-Speed 5351.18 samples/sec Loss 10.1819 LearningRate 0.2406 Epoch: 2 Global Step: 30940 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:34:37,697-Speed 5412.43 samples/sec Loss 10.1772 LearningRate 0.2406 Epoch: 2 Global Step: 30950 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:34:45,271-Speed 5408.65 samples/sec Loss 10.1928 LearningRate 0.2406 Epoch: 2 Global Step: 30960 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:34:52,825-Speed 5423.05 samples/sec Loss 10.1878 LearningRate 0.2405 Epoch: 2 Global Step: 30970 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:35:00,439-Speed 5380.44 samples/sec Loss 10.1686 LearningRate 0.2405 Epoch: 2 Global Step: 30980 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:35:08,012-Speed 5409.53 samples/sec Loss 10.1709 LearningRate 0.2405 Epoch: 2 Global Step: 30990 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:35:15,587-Speed 5408.19 samples/sec Loss 10.2147 LearningRate 0.2405 Epoch: 2 Global Step: 31000 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:35:23,188-Speed 5389.61 samples/sec Loss 10.1931 LearningRate 0.2404 Epoch: 2 Global Step: 31010 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:35:30,789-Speed 5388.75 samples/sec Loss 10.2284 LearningRate 0.2404 Epoch: 2 Global Step: 31020 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:35:38,349-Speed 5419.26 samples/sec Loss 10.2146 LearningRate 0.2404 Epoch: 2 Global Step: 31030 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:35:45,951-Speed 5388.44 samples/sec Loss 10.1716 LearningRate 0.2403 Epoch: 2 Global Step: 31040 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:35:53,480-Speed 5441.14 samples/sec Loss 10.1989 LearningRate 0.2403 Epoch: 2 Global Step: 31050 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:36:01,062-Speed 5403.56 samples/sec Loss 10.1838 LearningRate 0.2403 Epoch: 2 Global Step: 31060 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:36:08,613-Speed 5424.83 samples/sec Loss 10.2357 LearningRate 0.2403 Epoch: 2 Global Step: 31070 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:36:16,155-Speed 5431.50 samples/sec Loss 10.1753 LearningRate 0.2402 Epoch: 2 Global Step: 31080 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:36:23,736-Speed 5403.62 samples/sec Loss 10.2294 LearningRate 0.2402 Epoch: 2 Global Step: 31090 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:36:31,231-Speed 5465.67 samples/sec Loss 10.1361 LearningRate 0.2402 Epoch: 2 Global Step: 31100 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:36:38,771-Speed 5433.09 samples/sec Loss 10.2006 LearningRate 0.2402 Epoch: 2 Global Step: 31110 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:37:01,123-Speed 1832.58 samples/sec Loss 10.2077 LearningRate 0.2401 Epoch: 3 Global Step: 31120 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:37:08,568-Speed 5502.39 samples/sec Loss 10.1981 LearningRate 0.2401 Epoch: 3 Global Step: 31130 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:37:16,001-Speed 5511.30 samples/sec Loss 10.0911 LearningRate 0.2401 Epoch: 3 Global Step: 31140 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:37:23,527-Speed 5442.99 samples/sec Loss 10.2123 LearningRate 0.2400 Epoch: 3 Global Step: 31150 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:37:30,967-Speed 5506.28 samples/sec Loss 10.0856 LearningRate 0.2400 Epoch: 3 Global Step: 31160 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:37:38,462-Speed 5465.94 samples/sec Loss 10.1207 LearningRate 0.2400 Epoch: 3 Global Step: 31170 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:37:45,873-Speed 5527.62 samples/sec Loss 10.1434 LearningRate 0.2400 Epoch: 3 Global Step: 31180 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:37:53,285-Speed 5526.63 samples/sec Loss 10.1184 LearningRate 0.2399 Epoch: 3 Global Step: 31190 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:38:00,729-Speed 5503.10 samples/sec Loss 10.1059 LearningRate 0.2399 Epoch: 3 Global Step: 31200 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:38:08,142-Speed 5525.87 samples/sec Loss 10.1938 LearningRate 0.2399 Epoch: 3 Global Step: 31210 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:38:15,603-Speed 5490.94 samples/sec Loss 10.1486 LearningRate 0.2399 Epoch: 3 Global Step: 31220 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:38:23,071-Speed 5485.84 samples/sec Loss 10.1654 LearningRate 0.2398 Epoch: 3 Global Step: 31230 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:38:30,458-Speed 5545.69 samples/sec Loss 10.1614 LearningRate 0.2398 Epoch: 3 Global Step: 31240 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:38:37,861-Speed 5533.16 samples/sec Loss 10.1508 LearningRate 0.2398 Epoch: 3 Global Step: 31250 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:38:45,270-Speed 5530.00 samples/sec Loss 10.1904 LearningRate 0.2397 Epoch: 3 Global Step: 31260 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:38:52,656-Speed 5546.25 samples/sec Loss 10.2749 LearningRate 0.2397 Epoch: 3 Global Step: 31270 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:39:00,070-Speed 5525.57 samples/sec Loss 10.1225 LearningRate 0.2397 Epoch: 3 Global Step: 31280 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:39:07,518-Speed 5499.90 samples/sec Loss 10.2402 LearningRate 0.2397 Epoch: 3 Global Step: 31290 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:39:15,060-Speed 5431.49 samples/sec Loss 10.1060 LearningRate 0.2396 Epoch: 3 Global Step: 31300 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:39:22,582-Speed 5446.36 samples/sec Loss 10.1889 LearningRate 0.2396 Epoch: 3 Global Step: 31310 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:39:30,148-Speed 5414.68 samples/sec Loss 10.1241 LearningRate 0.2396 Epoch: 3 Global Step: 31320 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:39:37,618-Speed 5483.52 samples/sec Loss 10.1325 LearningRate 0.2396 Epoch: 3 Global Step: 31330 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:39:45,071-Speed 5497.21 samples/sec Loss 10.0320 LearningRate 0.2395 Epoch: 3 Global Step: 31340 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:39:52,514-Speed 5503.91 samples/sec Loss 10.1441 LearningRate 0.2395 Epoch: 3 Global Step: 31350 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:39:59,975-Speed 5490.62 samples/sec Loss 10.0893 LearningRate 0.2395 Epoch: 3 Global Step: 31360 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:40:07,513-Speed 5434.25 samples/sec Loss 10.1467 LearningRate 0.2395 Epoch: 3 Global Step: 31370 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:40:14,996-Speed 5474.51 samples/sec Loss 10.2295 LearningRate 0.2394 Epoch: 3 Global Step: 31380 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:40:22,539-Speed 5431.08 samples/sec Loss 10.1172 LearningRate 0.2394 Epoch: 3 Global Step: 31390 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:40:30,027-Speed 5471.08 samples/sec Loss 10.0866 LearningRate 0.2394 Epoch: 3 Global Step: 31400 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:40:37,516-Speed 5469.70 samples/sec Loss 10.0640 LearningRate 0.2393 Epoch: 3 Global Step: 31410 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:40:44,980-Speed 5488.72 samples/sec Loss 10.1037 LearningRate 0.2393 Epoch: 3 Global Step: 31420 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:40:52,468-Speed 5470.66 samples/sec Loss 10.0996 LearningRate 0.2393 Epoch: 3 Global Step: 31430 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:40:59,921-Speed 5496.45 samples/sec Loss 10.0757 LearningRate 0.2393 Epoch: 3 Global Step: 31440 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:41:07,378-Speed 5493.88 samples/sec Loss 10.1320 LearningRate 0.2392 Epoch: 3 Global Step: 31450 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:41:14,840-Speed 5489.88 samples/sec Loss 10.1364 LearningRate 0.2392 Epoch: 3 Global Step: 31460 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:41:22,295-Speed 5495.04 samples/sec Loss 10.1531 LearningRate 0.2392 Epoch: 3 Global Step: 31470 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:41:29,770-Speed 5480.36 samples/sec Loss 10.0786 LearningRate 0.2392 Epoch: 3 Global Step: 31480 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:41:37,227-Speed 5493.39 samples/sec Loss 10.0862 LearningRate 0.2391 Epoch: 3 Global Step: 31490 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:41:44,724-Speed 5464.00 samples/sec Loss 10.1303 LearningRate 0.2391 Epoch: 3 Global Step: 31500 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:41:52,225-Speed 5461.79 samples/sec Loss 10.1205 LearningRate 0.2391 Epoch: 3 Global Step: 31510 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:41:59,882-Speed 5350.43 samples/sec Loss 10.2172 LearningRate 0.2390 Epoch: 3 Global Step: 31520 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:42:07,798-Speed 5175.16 samples/sec Loss 10.1789 LearningRate 0.2390 Epoch: 3 Global Step: 31530 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:42:15,280-Speed 5474.81 samples/sec Loss 10.1080 LearningRate 0.2390 Epoch: 3 Global Step: 31540 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:42:22,756-Speed 5479.22 samples/sec Loss 10.2140 LearningRate 0.2390 Epoch: 3 Global Step: 31550 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:42:30,228-Speed 5482.57 samples/sec Loss 10.1563 LearningRate 0.2389 Epoch: 3 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:42:37,722-Speed 5466.97 samples/sec Loss 10.0950 LearningRate 0.2389 Epoch: 3 Global Step: 31570 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:42:45,215-Speed 5466.72 samples/sec Loss 10.1808 LearningRate 0.2389 Epoch: 3 Global Step: 31580 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:42:52,733-Speed 5448.73 samples/sec Loss 10.1594 LearningRate 0.2389 Epoch: 3 Global Step: 31590 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:43:00,234-Speed 5461.37 samples/sec Loss 10.1424 LearningRate 0.2388 Epoch: 3 Global Step: 31600 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:43:07,698-Speed 5489.06 samples/sec Loss 10.0872 LearningRate 0.2388 Epoch: 3 Global Step: 31610 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:43:15,169-Speed 5482.53 samples/sec Loss 10.0942 LearningRate 0.2388 Epoch: 3 Global Step: 31620 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:43:22,639-Speed 5484.76 samples/sec Loss 10.1171 LearningRate 0.2387 Epoch: 3 Global Step: 31630 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:43:30,120-Speed 5476.13 samples/sec Loss 10.1651 LearningRate 0.2387 Epoch: 3 Global Step: 31640 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:43:37,629-Speed 5455.41 samples/sec Loss 10.1185 LearningRate 0.2387 Epoch: 3 Global Step: 31650 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:43:45,067-Speed 5506.81 samples/sec Loss 10.1869 LearningRate 0.2387 Epoch: 3 Global Step: 31660 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:43:52,562-Speed 5466.13 samples/sec Loss 10.1458 LearningRate 0.2386 Epoch: 3 Global Step: 31670 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:44:00,008-Speed 5502.09 samples/sec Loss 10.1836 LearningRate 0.2386 Epoch: 3 Global Step: 31680 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:44:07,571-Speed 5416.54 samples/sec Loss 10.1250 LearningRate 0.2386 Epoch: 3 Global Step: 31690 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:44:15,224-Speed 5352.61 samples/sec Loss 10.2027 LearningRate 0.2386 Epoch: 3 Global Step: 31700 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:44:22,823-Speed 5391.53 samples/sec Loss 10.1238 LearningRate 0.2385 Epoch: 3 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:44:30,364-Speed 5432.44 samples/sec Loss 10.1427 LearningRate 0.2385 Epoch: 3 Global Step: 31720 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:44:37,942-Speed 5405.80 samples/sec Loss 10.1069 LearningRate 0.2385 Epoch: 3 Global Step: 31730 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:44:45,453-Speed 5453.38 samples/sec Loss 10.2108 LearningRate 0.2384 Epoch: 3 Global Step: 31740 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:44:52,977-Speed 5445.42 samples/sec Loss 10.1311 LearningRate 0.2384 Epoch: 3 Global Step: 31750 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:45:00,487-Speed 5454.05 samples/sec Loss 10.1181 LearningRate 0.2384 Epoch: 3 Global Step: 31760 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:45:07,930-Speed 5504.13 samples/sec Loss 10.1758 LearningRate 0.2384 Epoch: 3 Global Step: 31770 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:45:15,425-Speed 5466.01 samples/sec Loss 10.1717 LearningRate 0.2383 Epoch: 3 Global Step: 31780 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:45:22,855-Speed 5513.58 samples/sec Loss 10.1480 LearningRate 0.2383 Epoch: 3 Global Step: 31790 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:45:30,389-Speed 5437.12 samples/sec Loss 10.1622 LearningRate 0.2383 Epoch: 3 Global Step: 31800 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:45:37,889-Speed 5462.74 samples/sec Loss 10.1585 LearningRate 0.2383 Epoch: 3 Global Step: 31810 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:45:45,320-Speed 5512.65 samples/sec Loss 10.1218 LearningRate 0.2382 Epoch: 3 Global Step: 31820 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:45:52,773-Speed 5496.24 samples/sec Loss 10.1191 LearningRate 0.2382 Epoch: 3 Global Step: 31830 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:46:00,246-Speed 5482.16 samples/sec Loss 10.1262 LearningRate 0.2382 Epoch: 3 Global Step: 31840 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:46:07,709-Speed 5488.69 samples/sec Loss 10.1097 LearningRate 0.2381 Epoch: 3 Global Step: 31850 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:46:15,185-Speed 5480.21 samples/sec Loss 10.0386 LearningRate 0.2381 Epoch: 3 Global Step: 31860 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:46:22,628-Speed 5503.11 samples/sec Loss 10.1194 LearningRate 0.2381 Epoch: 3 Global Step: 31870 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:46:30,108-Speed 5476.97 samples/sec Loss 10.1471 LearningRate 0.2381 Epoch: 3 Global Step: 31880 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:46:37,602-Speed 5466.48 samples/sec Loss 10.1436 LearningRate 0.2380 Epoch: 3 Global Step: 31890 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:46:45,092-Speed 5469.76 samples/sec Loss 10.0723 LearningRate 0.2380 Epoch: 3 Global Step: 31900 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:46:52,579-Speed 5471.25 samples/sec Loss 10.1850 LearningRate 0.2380 Epoch: 3 Global Step: 31910 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:47:00,094-Speed 5451.19 samples/sec Loss 10.1108 LearningRate 0.2380 Epoch: 3 Global Step: 31920 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:47:07,616-Speed 5446.57 samples/sec Loss 10.0974 LearningRate 0.2379 Epoch: 3 Global Step: 31930 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:47:15,112-Speed 5465.22 samples/sec Loss 10.0581 LearningRate 0.2379 Epoch: 3 Global Step: 31940 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:47:22,623-Speed 5453.31 samples/sec Loss 10.0951 LearningRate 0.2379 Epoch: 3 Global Step: 31950 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:47:30,030-Speed 5531.44 samples/sec Loss 10.0871 LearningRate 0.2378 Epoch: 3 Global Step: 31960 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:47:37,573-Speed 5430.73 samples/sec Loss 10.0873 LearningRate 0.2378 Epoch: 3 Global Step: 31970 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:47:45,098-Speed 5443.94 samples/sec Loss 10.1855 LearningRate 0.2378 Epoch: 3 Global Step: 31980 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:47:52,595-Speed 5463.59 samples/sec Loss 10.1402 LearningRate 0.2378 Epoch: 3 Global Step: 31990 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:48:00,104-Speed 5455.93 samples/sec Loss 10.0961 LearningRate 0.2377 Epoch: 3 Global Step: 32000 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:48:44,548-[lfw][32000]XNorm: 21.934491 Training: 2022-01-08 01:48:44,548-[lfw][32000]Accuracy-Flip: 0.99800+-0.00277 Training: 2022-01-08 01:48:44,549-[lfw][32000]Accuracy-Highest: 0.99800 Training: 2022-01-08 01:49:37,890-[cfp_fp][32000]XNorm: 20.134749 Training: 2022-01-08 01:49:37,891-[cfp_fp][32000]Accuracy-Flip: 0.98457+-0.00514 Training: 2022-01-08 01:49:37,892-[cfp_fp][32000]Accuracy-Highest: 0.98457 Training: 2022-01-08 01:50:23,995-[agedb_30][32000]XNorm: 21.555493 Training: 2022-01-08 01:50:23,997-[agedb_30][32000]Accuracy-Flip: 0.96650+-0.00973 Training: 2022-01-08 01:50:23,997-[agedb_30][32000]Accuracy-Highest: 0.97167 Training: 2022-01-08 01:50:31,505-Speed 270.54 samples/sec Loss 10.1102 LearningRate 0.2377 Epoch: 3 Global Step: 32010 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:50:38,937-Speed 5513.67 samples/sec Loss 10.1682 LearningRate 0.2377 Epoch: 3 Global Step: 32020 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:50:46,494-Speed 5421.55 samples/sec Loss 10.1179 LearningRate 0.2377 Epoch: 3 Global Step: 32030 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:50:53,914-Speed 5521.65 samples/sec Loss 10.1721 LearningRate 0.2376 Epoch: 3 Global Step: 32040 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:51:01,358-Speed 5503.54 samples/sec Loss 10.2088 LearningRate 0.2376 Epoch: 3 Global Step: 32050 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:51:08,912-Speed 5423.73 samples/sec Loss 10.1278 LearningRate 0.2376 Epoch: 3 Global Step: 32060 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:51:16,461-Speed 5426.20 samples/sec Loss 10.1301 LearningRate 0.2375 Epoch: 3 Global Step: 32070 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:51:23,984-Speed 5446.23 samples/sec Loss 10.0819 LearningRate 0.2375 Epoch: 3 Global Step: 32080 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:51:31,443-Speed 5492.98 samples/sec Loss 10.0694 LearningRate 0.2375 Epoch: 3 Global Step: 32090 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:51:38,897-Speed 5496.30 samples/sec Loss 10.1681 LearningRate 0.2375 Epoch: 3 Global Step: 32100 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:51:46,375-Speed 5478.28 samples/sec Loss 10.1062 LearningRate 0.2374 Epoch: 3 Global Step: 32110 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:51:53,837-Speed 5490.21 samples/sec Loss 10.0788 LearningRate 0.2374 Epoch: 3 Global Step: 32120 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:52:01,285-Speed 5500.56 samples/sec Loss 10.1160 LearningRate 0.2374 Epoch: 3 Global Step: 32130 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:52:08,802-Speed 5449.55 samples/sec Loss 10.1092 LearningRate 0.2374 Epoch: 3 Global Step: 32140 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:52:16,252-Speed 5498.63 samples/sec Loss 10.1718 LearningRate 0.2373 Epoch: 3 Global Step: 32150 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:52:23,699-Speed 5500.82 samples/sec Loss 10.0776 LearningRate 0.2373 Epoch: 3 Global Step: 32160 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:52:31,271-Speed 5410.65 samples/sec Loss 10.0758 LearningRate 0.2373 Epoch: 3 Global Step: 32170 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:52:38,767-Speed 5464.74 samples/sec Loss 10.0858 LearningRate 0.2373 Epoch: 3 Global Step: 32180 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:52:46,225-Speed 5492.54 samples/sec Loss 10.0583 LearningRate 0.2372 Epoch: 3 Global Step: 32190 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:52:53,640-Speed 5525.07 samples/sec Loss 10.0702 LearningRate 0.2372 Epoch: 3 Global Step: 32200 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:53:01,140-Speed 5462.21 samples/sec Loss 10.0861 LearningRate 0.2372 Epoch: 3 Global Step: 32210 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:53:08,543-Speed 5533.17 samples/sec Loss 10.0896 LearningRate 0.2371 Epoch: 3 Global Step: 32220 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:53:16,046-Speed 5460.10 samples/sec Loss 10.1656 LearningRate 0.2371 Epoch: 3 Global Step: 32230 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:53:23,521-Speed 5480.17 samples/sec Loss 10.1078 LearningRate 0.2371 Epoch: 3 Global Step: 32240 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:53:31,139-Speed 5377.51 samples/sec Loss 10.0845 LearningRate 0.2371 Epoch: 3 Global Step: 32250 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:53:38,600-Speed 5490.47 samples/sec Loss 10.1310 LearningRate 0.2370 Epoch: 3 Global Step: 32260 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:53:46,099-Speed 5462.29 samples/sec Loss 10.0681 LearningRate 0.2370 Epoch: 3 Global Step: 32270 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:53:53,625-Speed 5443.51 samples/sec Loss 10.0916 LearningRate 0.2370 Epoch: 3 Global Step: 32280 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:54:01,260-Speed 5365.73 samples/sec Loss 10.1138 LearningRate 0.2370 Epoch: 3 Global Step: 32290 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:54:08,765-Speed 5458.41 samples/sec Loss 10.0926 LearningRate 0.2369 Epoch: 3 Global Step: 32300 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:54:16,270-Speed 5458.45 samples/sec Loss 10.0529 LearningRate 0.2369 Epoch: 3 Global Step: 32310 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:54:23,747-Speed 5478.35 samples/sec Loss 10.0992 LearningRate 0.2369 Epoch: 3 Global Step: 32320 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:54:31,205-Speed 5493.54 samples/sec Loss 9.9902 LearningRate 0.2368 Epoch: 3 Global Step: 32330 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:54:38,627-Speed 5519.50 samples/sec Loss 10.0997 LearningRate 0.2368 Epoch: 3 Global Step: 32340 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:54:46,163-Speed 5435.66 samples/sec Loss 10.0236 LearningRate 0.2368 Epoch: 3 Global Step: 32350 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:54:53,579-Speed 5523.80 samples/sec Loss 10.1284 LearningRate 0.2368 Epoch: 3 Global Step: 32360 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:55:01,053-Speed 5480.85 samples/sec Loss 10.0605 LearningRate 0.2367 Epoch: 3 Global Step: 32370 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:55:08,501-Speed 5500.71 samples/sec Loss 10.1065 LearningRate 0.2367 Epoch: 3 Global Step: 32380 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:55:15,999-Speed 5462.86 samples/sec Loss 10.0655 LearningRate 0.2367 Epoch: 3 Global Step: 32390 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:55:23,452-Speed 5496.00 samples/sec Loss 10.1331 LearningRate 0.2367 Epoch: 3 Global Step: 32400 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:55:30,919-Speed 5486.83 samples/sec Loss 10.1795 LearningRate 0.2366 Epoch: 3 Global Step: 32410 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:55:38,360-Speed 5505.78 samples/sec Loss 10.1632 LearningRate 0.2366 Epoch: 3 Global Step: 32420 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:55:45,861-Speed 5460.72 samples/sec Loss 10.0153 LearningRate 0.2366 Epoch: 3 Global Step: 32430 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:55:53,364-Speed 5459.84 samples/sec Loss 10.0520 LearningRate 0.2365 Epoch: 3 Global Step: 32440 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:56:00,853-Speed 5469.76 samples/sec Loss 10.0059 LearningRate 0.2365 Epoch: 3 Global Step: 32450 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 01:56:08,372-Speed 5448.59 samples/sec Loss 10.0906 LearningRate 0.2365 Epoch: 3 Global Step: 32460 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:56:15,881-Speed 5455.44 samples/sec Loss 10.1085 LearningRate 0.2365 Epoch: 3 Global Step: 32470 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:56:23,278-Speed 5537.80 samples/sec Loss 10.0273 LearningRate 0.2364 Epoch: 3 Global Step: 32480 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:56:30,721-Speed 5504.31 samples/sec Loss 10.0369 LearningRate 0.2364 Epoch: 3 Global Step: 32490 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:56:38,157-Speed 5509.32 samples/sec Loss 10.0953 LearningRate 0.2364 Epoch: 3 Global Step: 32500 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:56:45,576-Speed 5521.65 samples/sec Loss 10.0111 LearningRate 0.2364 Epoch: 3 Global Step: 32510 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:56:53,079-Speed 5459.63 samples/sec Loss 10.0245 LearningRate 0.2363 Epoch: 3 Global Step: 32520 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:57:00,555-Speed 5479.29 samples/sec Loss 10.0155 LearningRate 0.2363 Epoch: 3 Global Step: 32530 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:57:08,006-Speed 5498.44 samples/sec Loss 10.2048 LearningRate 0.2363 Epoch: 3 Global Step: 32540 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:57:15,538-Speed 5438.48 samples/sec Loss 10.0579 LearningRate 0.2363 Epoch: 3 Global Step: 32550 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 01:57:23,099-Speed 5418.16 samples/sec Loss 10.0210 LearningRate 0.2362 Epoch: 3 Global Step: 32560 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:57:30,650-Speed 5425.02 samples/sec Loss 10.0458 LearningRate 0.2362 Epoch: 3 Global Step: 32570 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:57:38,211-Speed 5418.43 samples/sec Loss 10.0509 LearningRate 0.2362 Epoch: 3 Global Step: 32580 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:57:45,658-Speed 5500.36 samples/sec Loss 10.1048 LearningRate 0.2361 Epoch: 3 Global Step: 32590 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:57:53,096-Speed 5507.43 samples/sec Loss 10.0083 LearningRate 0.2361 Epoch: 3 Global Step: 32600 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:58:00,721-Speed 5372.55 samples/sec Loss 10.0561 LearningRate 0.2361 Epoch: 3 Global Step: 32610 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:58:08,169-Speed 5500.83 samples/sec Loss 10.1262 LearningRate 0.2361 Epoch: 3 Global Step: 32620 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:58:15,795-Speed 5371.49 samples/sec Loss 10.0304 LearningRate 0.2360 Epoch: 3 Global Step: 32630 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:58:23,311-Speed 5450.17 samples/sec Loss 10.0466 LearningRate 0.2360 Epoch: 3 Global Step: 32640 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:58:30,797-Speed 5472.26 samples/sec Loss 10.0973 LearningRate 0.2360 Epoch: 3 Global Step: 32650 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:58:38,312-Speed 5451.27 samples/sec Loss 9.9806 LearningRate 0.2360 Epoch: 3 Global Step: 32660 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:58:45,822-Speed 5455.03 samples/sec Loss 10.0499 LearningRate 0.2359 Epoch: 3 Global Step: 32670 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 01:58:53,324-Speed 5460.38 samples/sec Loss 10.1019 LearningRate 0.2359 Epoch: 3 Global Step: 32680 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:59:00,924-Speed 5390.39 samples/sec Loss 10.1044 LearningRate 0.2359 Epoch: 3 Global Step: 32690 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:59:08,446-Speed 5445.87 samples/sec Loss 9.9945 LearningRate 0.2358 Epoch: 3 Global Step: 32700 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:59:16,137-Speed 5326.25 samples/sec Loss 10.0562 LearningRate 0.2358 Epoch: 3 Global Step: 32710 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:59:23,679-Speed 5431.80 samples/sec Loss 10.0634 LearningRate 0.2358 Epoch: 3 Global Step: 32720 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:59:31,360-Speed 5333.03 samples/sec Loss 10.1010 LearningRate 0.2358 Epoch: 3 Global Step: 32730 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:59:39,002-Speed 5361.21 samples/sec Loss 10.0693 LearningRate 0.2357 Epoch: 3 Global Step: 32740 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:59:46,512-Speed 5454.31 samples/sec Loss 10.0690 LearningRate 0.2357 Epoch: 3 Global Step: 32750 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 01:59:54,073-Speed 5418.27 samples/sec Loss 10.0301 LearningRate 0.2357 Epoch: 3 Global Step: 32760 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:00:01,576-Speed 5459.59 samples/sec Loss 10.0865 LearningRate 0.2357 Epoch: 3 Global Step: 32770 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:00:09,091-Speed 5451.18 samples/sec Loss 10.0910 LearningRate 0.2356 Epoch: 3 Global Step: 32780 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 02:00:16,610-Speed 5448.42 samples/sec Loss 10.0830 LearningRate 0.2356 Epoch: 3 Global Step: 32790 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 02:00:24,136-Speed 5442.95 samples/sec Loss 10.0553 LearningRate 0.2356 Epoch: 3 Global Step: 32800 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:00:31,644-Speed 5456.71 samples/sec Loss 10.0137 LearningRate 0.2355 Epoch: 3 Global Step: 32810 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:00:39,287-Speed 5359.68 samples/sec Loss 10.1181 LearningRate 0.2355 Epoch: 3 Global Step: 32820 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:00:46,798-Speed 5454.23 samples/sec Loss 10.0892 LearningRate 0.2355 Epoch: 3 Global Step: 32830 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:00:54,756-Speed 5147.61 samples/sec Loss 10.0233 LearningRate 0.2355 Epoch: 3 Global Step: 32840 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:01:02,236-Speed 5476.79 samples/sec Loss 10.0461 LearningRate 0.2354 Epoch: 3 Global Step: 32850 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:01:09,772-Speed 5435.94 samples/sec Loss 10.0587 LearningRate 0.2354 Epoch: 3 Global Step: 32860 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:01:17,256-Speed 5474.14 samples/sec Loss 9.9419 LearningRate 0.2354 Epoch: 3 Global Step: 32870 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:01:24,704-Speed 5499.78 samples/sec Loss 10.0314 LearningRate 0.2354 Epoch: 3 Global Step: 32880 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:01:32,204-Speed 5462.00 samples/sec Loss 10.1129 LearningRate 0.2353 Epoch: 3 Global Step: 32890 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:01:39,778-Speed 5408.90 samples/sec Loss 10.0286 LearningRate 0.2353 Epoch: 3 Global Step: 32900 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:01:47,302-Speed 5444.86 samples/sec Loss 10.0909 LearningRate 0.2353 Epoch: 3 Global Step: 32910 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:01:54,861-Speed 5419.19 samples/sec Loss 10.0550 LearningRate 0.2353 Epoch: 3 Global Step: 32920 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:02:02,547-Speed 5329.88 samples/sec Loss 10.0885 LearningRate 0.2352 Epoch: 3 Global Step: 32930 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:02:10,013-Speed 5487.34 samples/sec Loss 10.0163 LearningRate 0.2352 Epoch: 3 Global Step: 32940 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:02:17,462-Speed 5499.14 samples/sec Loss 10.1539 LearningRate 0.2352 Epoch: 3 Global Step: 32950 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:02:25,002-Speed 5433.70 samples/sec Loss 10.1263 LearningRate 0.2351 Epoch: 3 Global Step: 32960 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:02:32,605-Speed 5387.88 samples/sec Loss 9.9483 LearningRate 0.2351 Epoch: 3 Global Step: 32970 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:02:40,293-Speed 5328.22 samples/sec Loss 10.0537 LearningRate 0.2351 Epoch: 3 Global Step: 32980 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:02:47,873-Speed 5404.65 samples/sec Loss 10.0501 LearningRate 0.2351 Epoch: 3 Global Step: 32990 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:02:55,323-Speed 5498.76 samples/sec Loss 10.0927 LearningRate 0.2350 Epoch: 3 Global Step: 33000 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:03:02,941-Speed 5377.30 samples/sec Loss 10.0579 LearningRate 0.2350 Epoch: 3 Global Step: 33010 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:03:10,498-Speed 5421.04 samples/sec Loss 10.0043 LearningRate 0.2350 Epoch: 3 Global Step: 33020 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:03:18,156-Speed 5349.51 samples/sec Loss 9.9707 LearningRate 0.2350 Epoch: 3 Global Step: 33030 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:03:25,766-Speed 5383.08 samples/sec Loss 9.9482 LearningRate 0.2349 Epoch: 3 Global Step: 33040 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:03:33,283-Speed 5450.14 samples/sec Loss 9.9846 LearningRate 0.2349 Epoch: 3 Global Step: 33050 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:03:40,901-Speed 5377.46 samples/sec Loss 10.1659 LearningRate 0.2349 Epoch: 3 Global Step: 33060 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:03:48,396-Speed 5465.43 samples/sec Loss 10.0646 LearningRate 0.2348 Epoch: 3 Global Step: 33070 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:03:55,970-Speed 5408.67 samples/sec Loss 9.9310 LearningRate 0.2348 Epoch: 3 Global Step: 33080 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:04:03,610-Speed 5361.53 samples/sec Loss 10.0066 LearningRate 0.2348 Epoch: 3 Global Step: 33090 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:04:11,059-Speed 5499.72 samples/sec Loss 10.0170 LearningRate 0.2348 Epoch: 3 Global Step: 33100 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:04:18,604-Speed 5430.01 samples/sec Loss 10.0116 LearningRate 0.2347 Epoch: 3 Global Step: 33110 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:04:26,228-Speed 5373.28 samples/sec Loss 9.9787 LearningRate 0.2347 Epoch: 3 Global Step: 33120 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:04:33,822-Speed 5394.31 samples/sec Loss 9.9524 LearningRate 0.2347 Epoch: 3 Global Step: 33130 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:04:41,482-Speed 5348.20 samples/sec Loss 9.9953 LearningRate 0.2347 Epoch: 3 Global Step: 33140 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:04:49,059-Speed 5406.45 samples/sec Loss 10.0294 LearningRate 0.2346 Epoch: 3 Global Step: 33150 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:04:56,690-Speed 5368.44 samples/sec Loss 10.0198 LearningRate 0.2346 Epoch: 3 Global Step: 33160 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:05:04,282-Speed 5395.51 samples/sec Loss 10.0718 LearningRate 0.2346 Epoch: 3 Global Step: 33170 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:05:12,017-Speed 5295.97 samples/sec Loss 10.0067 LearningRate 0.2346 Epoch: 3 Global Step: 33180 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 02:05:19,579-Speed 5417.79 samples/sec Loss 10.0881 LearningRate 0.2345 Epoch: 3 Global Step: 33190 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:05:27,167-Speed 5398.56 samples/sec Loss 9.9114 LearningRate 0.2345 Epoch: 3 Global Step: 33200 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:05:34,868-Speed 5318.69 samples/sec Loss 10.0349 LearningRate 0.2345 Epoch: 3 Global Step: 33210 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:05:42,456-Speed 5399.32 samples/sec Loss 10.0119 LearningRate 0.2344 Epoch: 3 Global Step: 33220 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:05:49,997-Speed 5432.49 samples/sec Loss 9.9967 LearningRate 0.2344 Epoch: 3 Global Step: 33230 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:05:57,572-Speed 5407.59 samples/sec Loss 10.0592 LearningRate 0.2344 Epoch: 3 Global Step: 33240 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:06:05,197-Speed 5372.36 samples/sec Loss 10.0038 LearningRate 0.2344 Epoch: 3 Global Step: 33250 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:06:12,863-Speed 5344.07 samples/sec Loss 10.0490 LearningRate 0.2343 Epoch: 3 Global Step: 33260 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:06:20,536-Speed 5338.71 samples/sec Loss 10.0212 LearningRate 0.2343 Epoch: 3 Global Step: 33270 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:06:28,108-Speed 5410.35 samples/sec Loss 9.9788 LearningRate 0.2343 Epoch: 3 Global Step: 33280 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:06:35,723-Speed 5379.35 samples/sec Loss 10.0443 LearningRate 0.2343 Epoch: 3 Global Step: 33290 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:06:43,495-Speed 5270.42 samples/sec Loss 10.0429 LearningRate 0.2342 Epoch: 3 Global Step: 33300 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:06:51,241-Speed 5289.06 samples/sec Loss 10.0617 LearningRate 0.2342 Epoch: 3 Global Step: 33310 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:06:58,929-Speed 5327.90 samples/sec Loss 9.9376 LearningRate 0.2342 Epoch: 3 Global Step: 33320 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:07:06,437-Speed 5456.91 samples/sec Loss 9.9592 LearningRate 0.2341 Epoch: 3 Global Step: 33330 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:07:13,972-Speed 5436.62 samples/sec Loss 10.0130 LearningRate 0.2341 Epoch: 3 Global Step: 33340 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:07:21,524-Speed 5424.90 samples/sec Loss 10.0456 LearningRate 0.2341 Epoch: 3 Global Step: 33350 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:07:29,024-Speed 5462.06 samples/sec Loss 10.0292 LearningRate 0.2341 Epoch: 3 Global Step: 33360 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:07:36,515-Speed 5468.42 samples/sec Loss 10.0561 LearningRate 0.2340 Epoch: 3 Global Step: 33370 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:07:44,224-Speed 5313.71 samples/sec Loss 10.0302 LearningRate 0.2340 Epoch: 3 Global Step: 33380 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:07:51,814-Speed 5397.78 samples/sec Loss 10.0122 LearningRate 0.2340 Epoch: 3 Global Step: 33390 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:07:59,407-Speed 5395.25 samples/sec Loss 10.0663 LearningRate 0.2340 Epoch: 3 Global Step: 33400 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:08:06,948-Speed 5432.49 samples/sec Loss 9.9899 LearningRate 0.2339 Epoch: 3 Global Step: 33410 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:08:14,557-Speed 5383.16 samples/sec Loss 10.0010 LearningRate 0.2339 Epoch: 3 Global Step: 33420 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:08:22,038-Speed 5475.84 samples/sec Loss 10.0142 LearningRate 0.2339 Epoch: 3 Global Step: 33430 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:08:29,660-Speed 5374.83 samples/sec Loss 9.9937 LearningRate 0.2339 Epoch: 3 Global Step: 33440 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:08:37,241-Speed 5403.97 samples/sec Loss 10.0433 LearningRate 0.2338 Epoch: 3 Global Step: 33450 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:08:44,873-Speed 5366.90 samples/sec Loss 10.0101 LearningRate 0.2338 Epoch: 3 Global Step: 33460 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:08:52,531-Speed 5349.96 samples/sec Loss 9.9678 LearningRate 0.2338 Epoch: 3 Global Step: 33470 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:09:00,134-Speed 5388.07 samples/sec Loss 9.9623 LearningRate 0.2337 Epoch: 3 Global Step: 33480 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:09:07,660-Speed 5442.86 samples/sec Loss 10.0714 LearningRate 0.2337 Epoch: 3 Global Step: 33490 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:09:15,081-Speed 5520.47 samples/sec Loss 10.0701 LearningRate 0.2337 Epoch: 3 Global Step: 33500 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:09:22,696-Speed 5379.92 samples/sec Loss 10.0268 LearningRate 0.2337 Epoch: 3 Global Step: 33510 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:09:30,332-Speed 5365.03 samples/sec Loss 9.9400 LearningRate 0.2336 Epoch: 3 Global Step: 33520 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:09:37,853-Speed 5445.93 samples/sec Loss 9.9743 LearningRate 0.2336 Epoch: 3 Global Step: 33530 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:09:45,509-Speed 5350.90 samples/sec Loss 9.9741 LearningRate 0.2336 Epoch: 3 Global Step: 33540 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:09:53,236-Speed 5314.52 samples/sec Loss 10.0249 LearningRate 0.2336 Epoch: 3 Global Step: 33550 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:10:00,696-Speed 5491.51 samples/sec Loss 9.9028 LearningRate 0.2335 Epoch: 3 Global Step: 33560 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:10:08,294-Speed 5391.12 samples/sec Loss 9.9942 LearningRate 0.2335 Epoch: 3 Global Step: 33570 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:10:15,900-Speed 5386.31 samples/sec Loss 9.9948 LearningRate 0.2335 Epoch: 3 Global Step: 33580 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 02:10:23,400-Speed 5462.37 samples/sec Loss 9.9872 LearningRate 0.2334 Epoch: 3 Global Step: 33590 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 02:10:30,933-Speed 5438.35 samples/sec Loss 9.9684 LearningRate 0.2334 Epoch: 3 Global Step: 33600 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 02:10:38,537-Speed 5386.79 samples/sec Loss 9.9523 LearningRate 0.2334 Epoch: 3 Global Step: 33610 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 02:10:46,177-Speed 5361.90 samples/sec Loss 10.0828 LearningRate 0.2334 Epoch: 3 Global Step: 33620 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 02:10:53,768-Speed 5397.07 samples/sec Loss 10.0853 LearningRate 0.2333 Epoch: 3 Global Step: 33630 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 02:11:01,443-Speed 5337.75 samples/sec Loss 10.0548 LearningRate 0.2333 Epoch: 3 Global Step: 33640 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:11:09,040-Speed 5391.61 samples/sec Loss 9.9700 LearningRate 0.2333 Epoch: 3 Global Step: 33650 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:11:16,619-Speed 5405.31 samples/sec Loss 9.9994 LearningRate 0.2333 Epoch: 3 Global Step: 33660 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:11:24,143-Speed 5444.78 samples/sec Loss 10.0432 LearningRate 0.2332 Epoch: 3 Global Step: 33670 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:11:31,642-Speed 5462.83 samples/sec Loss 9.9439 LearningRate 0.2332 Epoch: 3 Global Step: 33680 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:11:39,198-Speed 5421.70 samples/sec Loss 10.0093 LearningRate 0.2332 Epoch: 3 Global Step: 33690 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:11:46,811-Speed 5380.55 samples/sec Loss 9.9549 LearningRate 0.2332 Epoch: 3 Global Step: 33700 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:11:54,445-Speed 5366.82 samples/sec Loss 9.9280 LearningRate 0.2331 Epoch: 3 Global Step: 33710 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:12:02,085-Speed 5362.15 samples/sec Loss 9.9468 LearningRate 0.2331 Epoch: 3 Global Step: 33720 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:12:09,649-Speed 5415.64 samples/sec Loss 9.9632 LearningRate 0.2331 Epoch: 3 Global Step: 33730 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:12:17,255-Speed 5386.03 samples/sec Loss 9.9579 LearningRate 0.2330 Epoch: 3 Global Step: 33740 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 02:12:24,811-Speed 5421.46 samples/sec Loss 9.9607 LearningRate 0.2330 Epoch: 3 Global Step: 33750 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:12:32,320-Speed 5455.37 samples/sec Loss 9.9677 LearningRate 0.2330 Epoch: 3 Global Step: 33760 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:12:39,882-Speed 5417.45 samples/sec Loss 10.1327 LearningRate 0.2330 Epoch: 3 Global Step: 33770 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:12:47,423-Speed 5432.68 samples/sec Loss 10.0378 LearningRate 0.2329 Epoch: 3 Global Step: 33780 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:12:54,978-Speed 5422.26 samples/sec Loss 10.0696 LearningRate 0.2329 Epoch: 3 Global Step: 33790 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:13:02,664-Speed 5329.53 samples/sec Loss 9.9348 LearningRate 0.2329 Epoch: 3 Global Step: 33800 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:13:10,147-Speed 5474.01 samples/sec Loss 9.9598 LearningRate 0.2329 Epoch: 3 Global Step: 33810 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:13:17,722-Speed 5408.26 samples/sec Loss 9.9437 LearningRate 0.2328 Epoch: 3 Global Step: 33820 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:13:25,310-Speed 5398.49 samples/sec Loss 9.9847 LearningRate 0.2328 Epoch: 3 Global Step: 33830 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:13:32,916-Speed 5385.92 samples/sec Loss 10.0147 LearningRate 0.2328 Epoch: 3 Global Step: 33840 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:13:40,583-Speed 5343.35 samples/sec Loss 9.9893 LearningRate 0.2327 Epoch: 3 Global Step: 33850 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 02:13:48,161-Speed 5405.92 samples/sec Loss 9.9783 LearningRate 0.2327 Epoch: 3 Global Step: 33860 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 02:13:55,730-Speed 5412.20 samples/sec Loss 9.9234 LearningRate 0.2327 Epoch: 3 Global Step: 33870 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:14:03,383-Speed 5352.33 samples/sec Loss 9.9890 LearningRate 0.2327 Epoch: 3 Global Step: 33880 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:14:11,033-Speed 5354.91 samples/sec Loss 9.9720 LearningRate 0.2326 Epoch: 3 Global Step: 33890 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:14:18,637-Speed 5387.70 samples/sec Loss 9.9883 LearningRate 0.2326 Epoch: 3 Global Step: 33900 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:14:26,180-Speed 5430.46 samples/sec Loss 9.9901 LearningRate 0.2326 Epoch: 3 Global Step: 33910 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:14:33,735-Speed 5422.36 samples/sec Loss 10.0201 LearningRate 0.2326 Epoch: 3 Global Step: 33920 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:14:41,294-Speed 5419.80 samples/sec Loss 10.0044 LearningRate 0.2325 Epoch: 3 Global Step: 33930 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:14:48,886-Speed 5396.09 samples/sec Loss 9.9784 LearningRate 0.2325 Epoch: 3 Global Step: 33940 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:14:56,435-Speed 5426.48 samples/sec Loss 10.0272 LearningRate 0.2325 Epoch: 3 Global Step: 33950 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:15:03,942-Speed 5457.06 samples/sec Loss 9.9716 LearningRate 0.2325 Epoch: 3 Global Step: 33960 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:15:11,480-Speed 5434.31 samples/sec Loss 9.9826 LearningRate 0.2324 Epoch: 3 Global Step: 33970 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:15:19,037-Speed 5420.68 samples/sec Loss 9.9900 LearningRate 0.2324 Epoch: 3 Global Step: 33980 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:15:26,635-Speed 5391.79 samples/sec Loss 9.9179 LearningRate 0.2324 Epoch: 3 Global Step: 33990 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:15:34,216-Speed 5404.03 samples/sec Loss 9.9214 LearningRate 0.2323 Epoch: 3 Global Step: 34000 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:16:18,581-[lfw][34000]XNorm: 20.941988 Training: 2022-01-08 02:16:18,582-[lfw][34000]Accuracy-Flip: 0.99767+-0.00260 Training: 2022-01-08 02:16:18,582-[lfw][34000]Accuracy-Highest: 0.99800 Training: 2022-01-08 02:17:10,601-[cfp_fp][34000]XNorm: 18.630953 Training: 2022-01-08 02:17:10,603-[cfp_fp][34000]Accuracy-Flip: 0.98371+-0.00496 Training: 2022-01-08 02:17:10,603-[cfp_fp][34000]Accuracy-Highest: 0.98457 Training: 2022-01-08 02:17:56,228-[agedb_30][34000]XNorm: 21.064765 Training: 2022-01-08 02:17:56,229-[agedb_30][34000]Accuracy-Flip: 0.96700+-0.00945 Training: 2022-01-08 02:17:56,230-[agedb_30][34000]Accuracy-Highest: 0.97167 Training: 2022-01-08 02:18:03,765-Speed 273.89 samples/sec Loss 9.9618 LearningRate 0.2323 Epoch: 3 Global Step: 34010 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:18:11,294-Speed 5441.36 samples/sec Loss 9.9410 LearningRate 0.2323 Epoch: 3 Global Step: 34020 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:18:18,775-Speed 5476.55 samples/sec Loss 9.9420 LearningRate 0.2323 Epoch: 3 Global Step: 34030 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:18:26,266-Speed 5469.29 samples/sec Loss 10.0507 LearningRate 0.2322 Epoch: 3 Global Step: 34040 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:18:33,898-Speed 5368.00 samples/sec Loss 9.9681 LearningRate 0.2322 Epoch: 3 Global Step: 34050 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:18:41,587-Speed 5328.02 samples/sec Loss 9.9864 LearningRate 0.2322 Epoch: 3 Global Step: 34060 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:18:49,099-Speed 5453.70 samples/sec Loss 9.9463 LearningRate 0.2322 Epoch: 3 Global Step: 34070 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:18:56,646-Speed 5428.03 samples/sec Loss 9.9710 LearningRate 0.2321 Epoch: 3 Global Step: 34080 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:19:06,423-Speed 5478.85 samples/sec Loss 9.9550 LearningRate 0.2321 Epoch: 3 Global Step: 34090 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:19:14,005-Speed 5402.41 samples/sec Loss 9.9057 LearningRate 0.2321 Epoch: 3 Global Step: 34100 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:19:21,595-Speed 5397.99 samples/sec Loss 10.0501 LearningRate 0.2321 Epoch: 3 Global Step: 34110 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:19:29,225-Speed 5368.65 samples/sec Loss 9.9946 LearningRate 0.2320 Epoch: 3 Global Step: 34120 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-01-08 02:19:36,854-Speed 5369.40 samples/sec Loss 9.9690 LearningRate 0.2320 Epoch: 3 Global Step: 34130 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:19:44,667-Speed 5243.97 samples/sec Loss 9.9672 LearningRate 0.2320 Epoch: 3 Global Step: 34140 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:19:52,226-Speed 5419.52 samples/sec Loss 9.9579 LearningRate 0.2319 Epoch: 3 Global Step: 34150 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:19:59,830-Speed 5387.55 samples/sec Loss 9.9156 LearningRate 0.2319 Epoch: 3 Global Step: 34160 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:20:07,403-Speed 5408.71 samples/sec Loss 9.9308 LearningRate 0.2319 Epoch: 3 Global Step: 34170 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:20:15,132-Speed 5300.51 samples/sec Loss 9.9804 LearningRate 0.2319 Epoch: 3 Global Step: 34180 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:20:22,744-Speed 5381.77 samples/sec Loss 9.9489 LearningRate 0.2318 Epoch: 3 Global Step: 34190 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:20:30,282-Speed 5434.99 samples/sec Loss 9.9507 LearningRate 0.2318 Epoch: 3 Global Step: 34200 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:20:37,703-Speed 5519.72 samples/sec Loss 10.0102 LearningRate 0.2318 Epoch: 3 Global Step: 34210 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:20:45,213-Speed 5455.19 samples/sec Loss 9.8880 LearningRate 0.2318 Epoch: 3 Global Step: 34220 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-01-08 02:20:52,720-Speed 5457.58 samples/sec Loss 9.9156 LearningRate 0.2317 Epoch: 3 Global Step: 34230 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:21:00,290-Speed 5411.50 samples/sec Loss 9.9556 LearningRate 0.2317 Epoch: 3 Global Step: 34240 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:21:07,906-Speed 5378.39 samples/sec Loss 9.9805 LearningRate 0.2317 Epoch: 3 Global Step: 34250 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:21:15,598-Speed 5325.95 samples/sec Loss 9.9332 LearningRate 0.2317 Epoch: 3 Global Step: 34260 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:21:23,200-Speed 5388.91 samples/sec Loss 9.9373 LearningRate 0.2316 Epoch: 3 Global Step: 34270 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:21:30,881-Speed 5333.55 samples/sec Loss 9.9993 LearningRate 0.2316 Epoch: 3 Global Step: 34280 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:21:38,527-Speed 5357.79 samples/sec Loss 9.8383 LearningRate 0.2316 Epoch: 3 Global Step: 34290 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:21:46,087-Speed 5418.51 samples/sec Loss 9.9966 LearningRate 0.2315 Epoch: 3 Global Step: 34300 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:21:53,668-Speed 5403.71 samples/sec Loss 9.9714 LearningRate 0.2315 Epoch: 3 Global Step: 34310 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:22:01,281-Speed 5381.17 samples/sec Loss 9.9635 LearningRate 0.2315 Epoch: 3 Global Step: 34320 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:22:08,863-Speed 5402.64 samples/sec Loss 9.8914 LearningRate 0.2315 Epoch: 3 Global Step: 34330 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:22:16,530-Speed 5343.23 samples/sec Loss 9.9540 LearningRate 0.2314 Epoch: 3 Global Step: 34340 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:22:23,973-Speed 5504.76 samples/sec Loss 9.9509 LearningRate 0.2314 Epoch: 3 Global Step: 34350 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:22:31,549-Speed 5407.57 samples/sec Loss 9.9799 LearningRate 0.2314 Epoch: 3 Global Step: 34360 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:22:39,080-Speed 5439.22 samples/sec Loss 9.9146 LearningRate 0.2314 Epoch: 3 Global Step: 34370 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:22:46,606-Speed 5443.38 samples/sec Loss 9.9770 LearningRate 0.2313 Epoch: 3 Global Step: 34380 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:22:54,171-Speed 5414.61 samples/sec Loss 9.9043 LearningRate 0.2313 Epoch: 3 Global Step: 34390 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:23:01,725-Speed 5423.18 samples/sec Loss 9.9732 LearningRate 0.2313 Epoch: 3 Global Step: 34400 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:23:09,255-Speed 5440.23 samples/sec Loss 9.9264 LearningRate 0.2313 Epoch: 3 Global Step: 34410 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:23:16,731-Speed 5479.56 samples/sec Loss 9.9553 LearningRate 0.2312 Epoch: 3 Global Step: 34420 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:23:24,343-Speed 5381.57 samples/sec Loss 9.8893 LearningRate 0.2312 Epoch: 3 Global Step: 34430 Fp16 Grad Scale: 262144 Required: 40 hours Training: 2022-01-08 02:23:31,861-Speed 5448.97 samples/sec Loss 9.8879 LearningRate 0.2312 Epoch: 3 Global Step: 34440 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:23:39,418-Speed 5420.61 samples/sec Loss 9.8826 LearningRate 0.2311 Epoch: 3 Global Step: 34450 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:23:46,911-Speed 5467.47 samples/sec Loss 10.0466 LearningRate 0.2311 Epoch: 3 Global Step: 34460 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:23:54,821-Speed 5178.70 samples/sec Loss 10.0013 LearningRate 0.2311 Epoch: 3 Global Step: 34470 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:24:02,371-Speed 5426.06 samples/sec Loss 10.0077 LearningRate 0.2311 Epoch: 3 Global Step: 34480 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:24:09,858-Speed 5471.37 samples/sec Loss 9.8854 LearningRate 0.2310 Epoch: 3 Global Step: 34490 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:24:17,293-Speed 5510.05 samples/sec Loss 9.9039 LearningRate 0.2310 Epoch: 3 Global Step: 34500 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:24:24,925-Speed 5367.82 samples/sec Loss 9.9512 LearningRate 0.2310 Epoch: 3 Global Step: 34510 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:24:32,502-Speed 5406.14 samples/sec Loss 9.9634 LearningRate 0.2310 Epoch: 3 Global Step: 34520 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:24:39,918-Speed 5524.21 samples/sec Loss 9.9096 LearningRate 0.2309 Epoch: 3 Global Step: 34530 Fp16 Grad Scale: 131072 Required: 40 hours Training: 2022-01-08 02:24:47,498-Speed 5404.08 samples/sec Loss 9.9632 LearningRate 0.2309 Epoch: 3 Global Step: 34540 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:24:55,058-Speed 5418.83 samples/sec Loss 9.8873 LearningRate 0.2309 Epoch: 3 Global Step: 34550 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:25:02,663-Speed 5386.99 samples/sec Loss 9.8897 LearningRate 0.2308 Epoch: 3 Global Step: 34560 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:25:10,194-Speed 5439.15 samples/sec Loss 9.9076 LearningRate 0.2308 Epoch: 3 Global Step: 34570 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:25:17,778-Speed 5401.73 samples/sec Loss 9.9617 LearningRate 0.2308 Epoch: 3 Global Step: 34580 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:25:25,256-Speed 5477.91 samples/sec Loss 9.9015 LearningRate 0.2308 Epoch: 3 Global Step: 34590 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:25:32,803-Speed 5428.32 samples/sec Loss 9.8990 LearningRate 0.2307 Epoch: 3 Global Step: 34600 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:25:40,286-Speed 5474.23 samples/sec Loss 9.9715 LearningRate 0.2307 Epoch: 3 Global Step: 34610 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:25:47,902-Speed 5379.51 samples/sec Loss 9.9915 LearningRate 0.2307 Epoch: 3 Global Step: 34620 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:25:55,686-Speed 5262.22 samples/sec Loss 9.8830 LearningRate 0.2307 Epoch: 3 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:26:03,383-Speed 5322.88 samples/sec Loss 9.9067 LearningRate 0.2306 Epoch: 3 Global Step: 34640 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:26:10,980-Speed 5391.98 samples/sec Loss 9.8664 LearningRate 0.2306 Epoch: 3 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:26:18,526-Speed 5428.71 samples/sec Loss 9.8511 LearningRate 0.2306 Epoch: 3 Global Step: 34660 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:26:26,171-Speed 5358.60 samples/sec Loss 10.0037 LearningRate 0.2306 Epoch: 3 Global Step: 34670 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:26:33,655-Speed 5473.71 samples/sec Loss 9.9402 LearningRate 0.2305 Epoch: 3 Global Step: 34680 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:26:41,138-Speed 5474.14 samples/sec Loss 9.9560 LearningRate 0.2305 Epoch: 3 Global Step: 34690 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:26:48,850-Speed 5312.27 samples/sec Loss 9.9131 LearningRate 0.2305 Epoch: 3 Global Step: 34700 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:26:56,455-Speed 5386.43 samples/sec Loss 9.9355 LearningRate 0.2304 Epoch: 3 Global Step: 34710 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:27:04,020-Speed 5415.39 samples/sec Loss 9.8645 LearningRate 0.2304 Epoch: 3 Global Step: 34720 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:27:11,442-Speed 5519.00 samples/sec Loss 9.8986 LearningRate 0.2304 Epoch: 3 Global Step: 34730 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:27:18,989-Speed 5427.92 samples/sec Loss 9.9602 LearningRate 0.2304 Epoch: 3 Global Step: 34740 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:27:26,552-Speed 5417.00 samples/sec Loss 9.9289 LearningRate 0.2303 Epoch: 3 Global Step: 34750 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:27:34,099-Speed 5427.84 samples/sec Loss 9.8856 LearningRate 0.2303 Epoch: 3 Global Step: 34760 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:27:41,629-Speed 5439.97 samples/sec Loss 9.9187 LearningRate 0.2303 Epoch: 3 Global Step: 34770 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:27:49,149-Speed 5448.18 samples/sec Loss 9.9196 LearningRate 0.2303 Epoch: 3 Global Step: 34780 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:27:56,656-Speed 5457.16 samples/sec Loss 9.8447 LearningRate 0.2302 Epoch: 3 Global Step: 34790 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:28:04,183-Speed 5442.62 samples/sec Loss 9.9367 LearningRate 0.2302 Epoch: 3 Global Step: 34800 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:28:11,608-Speed 5516.87 samples/sec Loss 9.9594 LearningRate 0.2302 Epoch: 3 Global Step: 34810 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:28:19,079-Speed 5483.29 samples/sec Loss 9.8860 LearningRate 0.2302 Epoch: 3 Global Step: 34820 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:28:26,719-Speed 5362.18 samples/sec Loss 9.8929 LearningRate 0.2301 Epoch: 3 Global Step: 34830 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:28:34,370-Speed 5354.59 samples/sec Loss 9.8905 LearningRate 0.2301 Epoch: 3 Global Step: 34840 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:28:41,793-Speed 5519.04 samples/sec Loss 9.9431 LearningRate 0.2301 Epoch: 3 Global Step: 34850 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:28:49,418-Speed 5372.12 samples/sec Loss 9.8397 LearningRate 0.2300 Epoch: 3 Global Step: 34860 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:28:57,047-Speed 5370.17 samples/sec Loss 9.9237 LearningRate 0.2300 Epoch: 3 Global Step: 34870 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:29:04,763-Speed 5309.34 samples/sec Loss 9.9727 LearningRate 0.2300 Epoch: 3 Global Step: 34880 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:29:12,386-Speed 5373.95 samples/sec Loss 9.9250 LearningRate 0.2300 Epoch: 3 Global Step: 34890 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:29:20,024-Speed 5363.25 samples/sec Loss 9.8776 LearningRate 0.2299 Epoch: 3 Global Step: 34900 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:29:27,595-Speed 5410.93 samples/sec Loss 9.9222 LearningRate 0.2299 Epoch: 3 Global Step: 34910 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:29:35,050-Speed 5495.02 samples/sec Loss 9.9223 LearningRate 0.2299 Epoch: 3 Global Step: 34920 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:29:42,607-Speed 5421.37 samples/sec Loss 9.9176 LearningRate 0.2299 Epoch: 3 Global Step: 34930 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:29:50,259-Speed 5353.22 samples/sec Loss 9.8753 LearningRate 0.2298 Epoch: 3 Global Step: 34940 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:29:57,890-Speed 5367.94 samples/sec Loss 9.8837 LearningRate 0.2298 Epoch: 3 Global Step: 34950 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:30:05,456-Speed 5415.21 samples/sec Loss 9.9112 LearningRate 0.2298 Epoch: 3 Global Step: 34960 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:30:12,882-Speed 5516.90 samples/sec Loss 9.8994 LearningRate 0.2298 Epoch: 3 Global Step: 34970 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:30:20,432-Speed 5425.94 samples/sec Loss 9.8468 LearningRate 0.2297 Epoch: 3 Global Step: 34980 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:30:28,000-Speed 5412.44 samples/sec Loss 9.8698 LearningRate 0.2297 Epoch: 3 Global Step: 34990 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:30:35,518-Speed 5449.26 samples/sec Loss 9.8289 LearningRate 0.2297 Epoch: 3 Global Step: 35000 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:30:43,071-Speed 5423.95 samples/sec Loss 9.9002 LearningRate 0.2296 Epoch: 3 Global Step: 35010 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:30:50,767-Speed 5322.10 samples/sec Loss 9.8205 LearningRate 0.2296 Epoch: 3 Global Step: 35020 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:30:58,271-Speed 5458.88 samples/sec Loss 9.8815 LearningRate 0.2296 Epoch: 3 Global Step: 35030 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:31:05,789-Speed 5449.38 samples/sec Loss 9.8650 LearningRate 0.2296 Epoch: 3 Global Step: 35040 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:31:13,376-Speed 5400.13 samples/sec Loss 9.9243 LearningRate 0.2295 Epoch: 3 Global Step: 35050 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:31:20,997-Speed 5374.86 samples/sec Loss 9.8387 LearningRate 0.2295 Epoch: 3 Global Step: 35060 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:31:28,627-Speed 5368.80 samples/sec Loss 9.8735 LearningRate 0.2295 Epoch: 3 Global Step: 35070 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:31:36,217-Speed 5397.10 samples/sec Loss 9.8058 LearningRate 0.2295 Epoch: 3 Global Step: 35080 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:31:43,828-Speed 5383.31 samples/sec Loss 9.8342 LearningRate 0.2294 Epoch: 3 Global Step: 35090 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:31:51,410-Speed 5402.15 samples/sec Loss 9.8853 LearningRate 0.2294 Epoch: 3 Global Step: 35100 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:31:58,994-Speed 5401.92 samples/sec Loss 9.8612 LearningRate 0.2294 Epoch: 3 Global Step: 35110 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:32:06,528-Speed 5437.61 samples/sec Loss 9.9682 LearningRate 0.2294 Epoch: 3 Global Step: 35120 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:32:14,073-Speed 5429.32 samples/sec Loss 9.9459 LearningRate 0.2293 Epoch: 3 Global Step: 35130 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:32:21,644-Speed 5411.05 samples/sec Loss 9.8663 LearningRate 0.2293 Epoch: 3 Global Step: 35140 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:32:29,220-Speed 5407.12 samples/sec Loss 9.9129 LearningRate 0.2293 Epoch: 3 Global Step: 35150 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:32:36,830-Speed 5382.93 samples/sec Loss 9.9718 LearningRate 0.2292 Epoch: 3 Global Step: 35160 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:32:44,362-Speed 5439.00 samples/sec Loss 9.8694 LearningRate 0.2292 Epoch: 3 Global Step: 35170 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:32:51,879-Speed 5449.53 samples/sec Loss 9.8639 LearningRate 0.2292 Epoch: 3 Global Step: 35180 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:32:59,467-Speed 5399.02 samples/sec Loss 9.9330 LearningRate 0.2292 Epoch: 3 Global Step: 35190 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:33:06,975-Speed 5458.83 samples/sec Loss 9.9524 LearningRate 0.2291 Epoch: 3 Global Step: 35200 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:33:14,448-Speed 5482.25 samples/sec Loss 9.8203 LearningRate 0.2291 Epoch: 3 Global Step: 35210 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:33:21,920-Speed 5482.26 samples/sec Loss 9.8969 LearningRate 0.2291 Epoch: 3 Global Step: 35220 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:33:29,341-Speed 5520.31 samples/sec Loss 9.8771 LearningRate 0.2291 Epoch: 3 Global Step: 35230 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:33:37,013-Speed 5339.81 samples/sec Loss 9.8617 LearningRate 0.2290 Epoch: 3 Global Step: 35240 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:33:44,575-Speed 5417.54 samples/sec Loss 9.8491 LearningRate 0.2290 Epoch: 3 Global Step: 35250 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:33:52,251-Speed 5336.17 samples/sec Loss 9.8338 LearningRate 0.2290 Epoch: 3 Global Step: 35260 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:34:00,071-Speed 5238.59 samples/sec Loss 9.9262 LearningRate 0.2290 Epoch: 3 Global Step: 35270 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:34:07,612-Speed 5432.18 samples/sec Loss 9.7823 LearningRate 0.2289 Epoch: 3 Global Step: 35280 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:34:15,191-Speed 5405.51 samples/sec Loss 9.8737 LearningRate 0.2289 Epoch: 3 Global Step: 35290 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:34:22,740-Speed 5426.32 samples/sec Loss 9.8264 LearningRate 0.2289 Epoch: 3 Global Step: 35300 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:34:30,233-Speed 5466.94 samples/sec Loss 9.8567 LearningRate 0.2288 Epoch: 3 Global Step: 35310 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:34:37,770-Speed 5435.76 samples/sec Loss 9.8626 LearningRate 0.2288 Epoch: 3 Global Step: 35320 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:34:45,289-Speed 5447.95 samples/sec Loss 9.8311 LearningRate 0.2288 Epoch: 3 Global Step: 35330 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:34:52,748-Speed 5491.89 samples/sec Loss 9.8003 LearningRate 0.2288 Epoch: 3 Global Step: 35340 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:35:00,314-Speed 5414.17 samples/sec Loss 9.8944 LearningRate 0.2287 Epoch: 3 Global Step: 35350 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:35:07,818-Speed 5459.30 samples/sec Loss 9.8213 LearningRate 0.2287 Epoch: 3 Global Step: 35360 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:35:15,292-Speed 5481.46 samples/sec Loss 9.8794 LearningRate 0.2287 Epoch: 3 Global Step: 35370 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:35:22,835-Speed 5430.79 samples/sec Loss 9.8256 LearningRate 0.2287 Epoch: 3 Global Step: 35380 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:35:30,405-Speed 5411.61 samples/sec Loss 9.8548 LearningRate 0.2286 Epoch: 3 Global Step: 35390 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:35:37,871-Speed 5486.89 samples/sec Loss 9.8583 LearningRate 0.2286 Epoch: 3 Global Step: 35400 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:35:45,436-Speed 5415.26 samples/sec Loss 9.9108 LearningRate 0.2286 Epoch: 3 Global Step: 35410 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:35:53,050-Speed 5380.53 samples/sec Loss 9.8586 LearningRate 0.2286 Epoch: 3 Global Step: 35420 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:36:00,499-Speed 5499.33 samples/sec Loss 9.8800 LearningRate 0.2285 Epoch: 3 Global Step: 35430 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:36:07,918-Speed 5521.88 samples/sec Loss 9.8317 LearningRate 0.2285 Epoch: 3 Global Step: 35440 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:36:15,379-Speed 5490.73 samples/sec Loss 9.8411 LearningRate 0.2285 Epoch: 3 Global Step: 35450 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:36:22,905-Speed 5442.54 samples/sec Loss 9.9178 LearningRate 0.2285 Epoch: 3 Global Step: 35460 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:36:30,475-Speed 5411.94 samples/sec Loss 9.8951 LearningRate 0.2284 Epoch: 3 Global Step: 35470 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:36:38,030-Speed 5422.35 samples/sec Loss 9.8829 LearningRate 0.2284 Epoch: 3 Global Step: 35480 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:36:45,599-Speed 5412.03 samples/sec Loss 9.7892 LearningRate 0.2284 Epoch: 3 Global Step: 35490 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:36:53,159-Speed 5418.59 samples/sec Loss 9.8547 LearningRate 0.2283 Epoch: 3 Global Step: 35500 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:37:00,772-Speed 5381.33 samples/sec Loss 9.9035 LearningRate 0.2283 Epoch: 3 Global Step: 35510 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:37:08,309-Speed 5435.69 samples/sec Loss 9.8185 LearningRate 0.2283 Epoch: 3 Global Step: 35520 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:37:15,728-Speed 5521.32 samples/sec Loss 9.8354 LearningRate 0.2283 Epoch: 3 Global Step: 35530 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:37:23,215-Speed 5471.61 samples/sec Loss 9.9463 LearningRate 0.2282 Epoch: 3 Global Step: 35540 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:37:30,763-Speed 5427.51 samples/sec Loss 9.8896 LearningRate 0.2282 Epoch: 3 Global Step: 35550 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:37:38,321-Speed 5420.06 samples/sec Loss 9.9118 LearningRate 0.2282 Epoch: 3 Global Step: 35560 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:37:45,802-Speed 5475.86 samples/sec Loss 9.8834 LearningRate 0.2282 Epoch: 3 Global Step: 35570 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:37:53,265-Speed 5488.92 samples/sec Loss 9.8972 LearningRate 0.2281 Epoch: 3 Global Step: 35580 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:38:00,793-Speed 5442.14 samples/sec Loss 9.8915 LearningRate 0.2281 Epoch: 3 Global Step: 35590 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:38:08,441-Speed 5356.65 samples/sec Loss 9.8591 LearningRate 0.2281 Epoch: 3 Global Step: 35600 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:38:15,889-Speed 5499.78 samples/sec Loss 9.8205 LearningRate 0.2281 Epoch: 3 Global Step: 35610 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:38:23,360-Speed 5483.24 samples/sec Loss 9.8567 LearningRate 0.2280 Epoch: 3 Global Step: 35620 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:38:30,804-Speed 5503.91 samples/sec Loss 9.9124 LearningRate 0.2280 Epoch: 3 Global Step: 35630 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:38:38,246-Speed 5503.69 samples/sec Loss 9.8072 LearningRate 0.2280 Epoch: 3 Global Step: 35640 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:38:45,818-Speed 5410.18 samples/sec Loss 9.8617 LearningRate 0.2279 Epoch: 3 Global Step: 35650 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:38:53,449-Speed 5369.06 samples/sec Loss 9.7957 LearningRate 0.2279 Epoch: 3 Global Step: 35660 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:39:01,022-Speed 5409.50 samples/sec Loss 9.8985 LearningRate 0.2279 Epoch: 3 Global Step: 35670 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:39:08,763-Speed 5292.05 samples/sec Loss 9.8622 LearningRate 0.2279 Epoch: 3 Global Step: 35680 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:39:16,314-Speed 5424.55 samples/sec Loss 9.7981 LearningRate 0.2278 Epoch: 3 Global Step: 35690 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:39:23,901-Speed 5400.20 samples/sec Loss 9.8515 LearningRate 0.2278 Epoch: 3 Global Step: 35700 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:39:31,332-Speed 5512.52 samples/sec Loss 9.8615 LearningRate 0.2278 Epoch: 3 Global Step: 35710 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:39:38,973-Speed 5360.78 samples/sec Loss 9.8863 LearningRate 0.2278 Epoch: 3 Global Step: 35720 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:39:46,737-Speed 5276.45 samples/sec Loss 9.8877 LearningRate 0.2277 Epoch: 3 Global Step: 35730 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:39:54,198-Speed 5490.45 samples/sec Loss 9.8300 LearningRate 0.2277 Epoch: 3 Global Step: 35740 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:40:01,723-Speed 5444.42 samples/sec Loss 9.8290 LearningRate 0.2277 Epoch: 3 Global Step: 35750 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 02:40:09,203-Speed 5475.94 samples/sec Loss 9.7677 LearningRate 0.2277 Epoch: 3 Global Step: 35760 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 02:40:16,696-Speed 5467.39 samples/sec Loss 9.8382 LearningRate 0.2276 Epoch: 3 Global Step: 35770 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 02:40:24,101-Speed 5532.64 samples/sec Loss 9.8209 LearningRate 0.2276 Epoch: 3 Global Step: 35780 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 02:40:31,500-Speed 5536.05 samples/sec Loss 9.9046 LearningRate 0.2276 Epoch: 3 Global Step: 35790 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 02:40:39,063-Speed 5416.66 samples/sec Loss 9.8389 LearningRate 0.2275 Epoch: 3 Global Step: 35800 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 02:40:46,682-Speed 5377.10 samples/sec Loss 9.8414 LearningRate 0.2275 Epoch: 3 Global Step: 35810 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 02:40:54,172-Speed 5469.18 samples/sec Loss 9.8221 LearningRate 0.2275 Epoch: 3 Global Step: 35820 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 02:41:01,809-Speed 5364.09 samples/sec Loss 9.9626 LearningRate 0.2275 Epoch: 3 Global Step: 35830 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 02:41:09,378-Speed 5412.22 samples/sec Loss 9.8227 LearningRate 0.2274 Epoch: 3 Global Step: 35840 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 02:41:17,065-Speed 5329.23 samples/sec Loss 9.8580 LearningRate 0.2274 Epoch: 3 Global Step: 35850 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:41:24,774-Speed 5314.50 samples/sec Loss 9.8128 LearningRate 0.2274 Epoch: 3 Global Step: 35860 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:41:32,309-Speed 5436.16 samples/sec Loss 9.8419 LearningRate 0.2274 Epoch: 3 Global Step: 35870 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:41:39,881-Speed 5409.79 samples/sec Loss 9.8262 LearningRate 0.2273 Epoch: 3 Global Step: 35880 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:41:47,369-Speed 5471.19 samples/sec Loss 9.8019 LearningRate 0.2273 Epoch: 3 Global Step: 35890 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:41:54,950-Speed 5403.57 samples/sec Loss 9.8333 LearningRate 0.2273 Epoch: 3 Global Step: 35900 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:42:02,379-Speed 5514.05 samples/sec Loss 9.9363 LearningRate 0.2273 Epoch: 3 Global Step: 35910 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:42:09,861-Speed 5475.36 samples/sec Loss 9.8533 LearningRate 0.2272 Epoch: 3 Global Step: 35920 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:42:17,374-Speed 5452.65 samples/sec Loss 9.8731 LearningRate 0.2272 Epoch: 3 Global Step: 35930 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:42:24,926-Speed 5424.17 samples/sec Loss 9.8314 LearningRate 0.2272 Epoch: 3 Global Step: 35940 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:42:32,461-Speed 5436.80 samples/sec Loss 9.9162 LearningRate 0.2272 Epoch: 3 Global Step: 35950 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:42:39,889-Speed 5514.58 samples/sec Loss 9.8785 LearningRate 0.2271 Epoch: 3 Global Step: 35960 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:42:47,450-Speed 5417.84 samples/sec Loss 9.8195 LearningRate 0.2271 Epoch: 3 Global Step: 35970 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:42:55,036-Speed 5400.46 samples/sec Loss 9.7869 LearningRate 0.2271 Epoch: 3 Global Step: 35980 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:43:02,537-Speed 5461.67 samples/sec Loss 9.7629 LearningRate 0.2270 Epoch: 3 Global Step: 35990 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:43:10,078-Speed 5431.58 samples/sec Loss 9.7796 LearningRate 0.2270 Epoch: 3 Global Step: 36000 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:43:54,071-[lfw][36000]XNorm: 21.759819 Training: 2022-01-08 02:43:54,072-[lfw][36000]Accuracy-Flip: 0.99750+-0.00318 Training: 2022-01-08 02:43:54,073-[lfw][36000]Accuracy-Highest: 0.99800 Training: 2022-01-08 02:44:46,041-[cfp_fp][36000]XNorm: 19.605702 Training: 2022-01-08 02:44:46,042-[cfp_fp][36000]Accuracy-Flip: 0.97829+-0.00758 Training: 2022-01-08 02:44:46,043-[cfp_fp][36000]Accuracy-Highest: 0.98457 Training: 2022-01-08 02:45:31,662-[agedb_30][36000]XNorm: 21.594704 Training: 2022-01-08 02:45:31,663-[agedb_30][36000]Accuracy-Flip: 0.97250+-0.00772 Training: 2022-01-08 02:45:31,664-[agedb_30][36000]Accuracy-Highest: 0.97250 Training: 2022-01-08 02:45:39,279-Speed 274.53 samples/sec Loss 9.8549 LearningRate 0.2270 Epoch: 3 Global Step: 36010 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:45:46,713-Speed 5511.85 samples/sec Loss 9.7913 LearningRate 0.2270 Epoch: 3 Global Step: 36020 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:45:54,192-Speed 5478.45 samples/sec Loss 9.8088 LearningRate 0.2269 Epoch: 3 Global Step: 36030 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:46:01,839-Speed 5357.30 samples/sec Loss 9.8020 LearningRate 0.2269 Epoch: 3 Global Step: 36040 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:46:09,619-Speed 5266.10 samples/sec Loss 9.8769 LearningRate 0.2269 Epoch: 3 Global Step: 36050 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:46:17,295-Speed 5337.03 samples/sec Loss 9.8396 LearningRate 0.2269 Epoch: 3 Global Step: 36060 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:46:24,802-Speed 5457.31 samples/sec Loss 9.8483 LearningRate 0.2268 Epoch: 3 Global Step: 36070 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:46:32,326-Speed 5444.94 samples/sec Loss 9.8590 LearningRate 0.2268 Epoch: 3 Global Step: 36080 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:46:39,929-Speed 5388.35 samples/sec Loss 9.8844 LearningRate 0.2268 Epoch: 3 Global Step: 36090 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:46:47,777-Speed 5220.68 samples/sec Loss 9.8534 LearningRate 0.2268 Epoch: 3 Global Step: 36100 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:46:55,350-Speed 5409.68 samples/sec Loss 9.7857 LearningRate 0.2267 Epoch: 3 Global Step: 36110 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:47:02,866-Speed 5451.09 samples/sec Loss 9.8005 LearningRate 0.2267 Epoch: 3 Global Step: 36120 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:47:10,390-Speed 5444.86 samples/sec Loss 9.9096 LearningRate 0.2267 Epoch: 3 Global Step: 36130 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:47:17,910-Speed 5447.92 samples/sec Loss 9.8090 LearningRate 0.2266 Epoch: 3 Global Step: 36140 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:47:25,438-Speed 5443.44 samples/sec Loss 9.7908 LearningRate 0.2266 Epoch: 3 Global Step: 36150 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:47:33,124-Speed 5330.38 samples/sec Loss 9.7375 LearningRate 0.2266 Epoch: 3 Global Step: 36160 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:47:40,810-Speed 5329.94 samples/sec Loss 9.8530 LearningRate 0.2266 Epoch: 3 Global Step: 36170 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:47:48,347-Speed 5435.79 samples/sec Loss 9.8141 LearningRate 0.2265 Epoch: 3 Global Step: 36180 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:47:55,854-Speed 5457.78 samples/sec Loss 9.7381 LearningRate 0.2265 Epoch: 3 Global Step: 36190 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:48:03,323-Speed 5485.16 samples/sec Loss 9.7251 LearningRate 0.2265 Epoch: 3 Global Step: 36200 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:48:11,154-Speed 5231.83 samples/sec Loss 9.8164 LearningRate 0.2265 Epoch: 3 Global Step: 36210 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:48:18,776-Speed 5374.88 samples/sec Loss 9.8319 LearningRate 0.2264 Epoch: 3 Global Step: 36220 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:48:26,393-Speed 5378.95 samples/sec Loss 9.7496 LearningRate 0.2264 Epoch: 3 Global Step: 36230 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:48:33,848-Speed 5496.00 samples/sec Loss 9.8062 LearningRate 0.2264 Epoch: 3 Global Step: 36240 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:48:41,229-Speed 5550.44 samples/sec Loss 9.8920 LearningRate 0.2264 Epoch: 3 Global Step: 36250 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:48:48,704-Speed 5481.01 samples/sec Loss 9.8203 LearningRate 0.2263 Epoch: 3 Global Step: 36260 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:48:56,290-Speed 5400.19 samples/sec Loss 9.7937 LearningRate 0.2263 Epoch: 3 Global Step: 36270 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:49:03,934-Speed 5359.60 samples/sec Loss 9.9010 LearningRate 0.2263 Epoch: 3 Global Step: 36280 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:49:11,428-Speed 5466.92 samples/sec Loss 9.8206 LearningRate 0.2263 Epoch: 3 Global Step: 36290 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:49:18,921-Speed 5467.31 samples/sec Loss 9.8234 LearningRate 0.2262 Epoch: 3 Global Step: 36300 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:49:26,471-Speed 5426.50 samples/sec Loss 9.7776 LearningRate 0.2262 Epoch: 3 Global Step: 36310 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:49:34,034-Speed 5416.07 samples/sec Loss 9.8210 LearningRate 0.2262 Epoch: 3 Global Step: 36320 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:49:41,538-Speed 5459.11 samples/sec Loss 9.8357 LearningRate 0.2261 Epoch: 3 Global Step: 36330 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:49:49,115-Speed 5406.55 samples/sec Loss 9.8293 LearningRate 0.2261 Epoch: 3 Global Step: 36340 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:49:56,600-Speed 5472.92 samples/sec Loss 9.7646 LearningRate 0.2261 Epoch: 3 Global Step: 36350 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:50:04,060-Speed 5491.14 samples/sec Loss 9.7958 LearningRate 0.2261 Epoch: 3 Global Step: 36360 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:50:11,526-Speed 5487.44 samples/sec Loss 9.7663 LearningRate 0.2260 Epoch: 3 Global Step: 36370 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:50:18,963-Speed 5508.35 samples/sec Loss 9.8196 LearningRate 0.2260 Epoch: 3 Global Step: 36380 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:50:26,412-Speed 5499.86 samples/sec Loss 9.7516 LearningRate 0.2260 Epoch: 3 Global Step: 36390 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:50:33,886-Speed 5480.50 samples/sec Loss 9.8119 LearningRate 0.2260 Epoch: 3 Global Step: 36400 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:50:41,399-Speed 5452.47 samples/sec Loss 9.7193 LearningRate 0.2259 Epoch: 3 Global Step: 36410 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:50:48,886-Speed 5472.11 samples/sec Loss 9.8062 LearningRate 0.2259 Epoch: 3 Global Step: 36420 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:50:56,260-Speed 5554.94 samples/sec Loss 9.7149 LearningRate 0.2259 Epoch: 3 Global Step: 36430 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:51:03,754-Speed 5466.88 samples/sec Loss 9.7663 LearningRate 0.2259 Epoch: 3 Global Step: 36440 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:51:11,295-Speed 5432.68 samples/sec Loss 9.7326 LearningRate 0.2258 Epoch: 3 Global Step: 36450 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:51:18,863-Speed 5413.10 samples/sec Loss 9.8331 LearningRate 0.2258 Epoch: 3 Global Step: 36460 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:51:26,419-Speed 5421.71 samples/sec Loss 9.7593 LearningRate 0.2258 Epoch: 3 Global Step: 36470 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:51:34,048-Speed 5369.62 samples/sec Loss 9.7456 LearningRate 0.2257 Epoch: 3 Global Step: 36480 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:51:41,707-Speed 5348.85 samples/sec Loss 9.7169 LearningRate 0.2257 Epoch: 3 Global Step: 36490 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:51:49,120-Speed 5526.49 samples/sec Loss 9.8569 LearningRate 0.2257 Epoch: 3 Global Step: 36500 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:51:56,602-Speed 5475.26 samples/sec Loss 9.8231 LearningRate 0.2257 Epoch: 3 Global Step: 36510 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:52:04,081-Speed 5477.05 samples/sec Loss 9.7225 LearningRate 0.2256 Epoch: 3 Global Step: 36520 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:52:11,623-Speed 5431.64 samples/sec Loss 9.8483 LearningRate 0.2256 Epoch: 3 Global Step: 36530 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:52:19,022-Speed 5537.13 samples/sec Loss 9.8420 LearningRate 0.2256 Epoch: 3 Global Step: 36540 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:52:26,626-Speed 5387.26 samples/sec Loss 9.7690 LearningRate 0.2256 Epoch: 3 Global Step: 36550 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:52:34,196-Speed 5412.02 samples/sec Loss 9.7522 LearningRate 0.2255 Epoch: 3 Global Step: 36560 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:52:41,696-Speed 5462.06 samples/sec Loss 9.8408 LearningRate 0.2255 Epoch: 3 Global Step: 36570 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:52:49,303-Speed 5385.49 samples/sec Loss 9.7215 LearningRate 0.2255 Epoch: 3 Global Step: 36580 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:52:56,789-Speed 5472.22 samples/sec Loss 9.7522 LearningRate 0.2255 Epoch: 3 Global Step: 36590 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:53:04,297-Speed 5455.96 samples/sec Loss 9.8004 LearningRate 0.2254 Epoch: 3 Global Step: 36600 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:53:11,829-Speed 5439.04 samples/sec Loss 9.7891 LearningRate 0.2254 Epoch: 3 Global Step: 36610 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:53:19,406-Speed 5406.73 samples/sec Loss 9.8100 LearningRate 0.2254 Epoch: 3 Global Step: 36620 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:53:26,819-Speed 5526.44 samples/sec Loss 9.8197 LearningRate 0.2254 Epoch: 3 Global Step: 36630 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:53:34,284-Speed 5487.38 samples/sec Loss 9.7699 LearningRate 0.2253 Epoch: 3 Global Step: 36640 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:53:41,748-Speed 5488.04 samples/sec Loss 9.8327 LearningRate 0.2253 Epoch: 3 Global Step: 36650 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:53:49,293-Speed 5429.52 samples/sec Loss 9.7373 LearningRate 0.2253 Epoch: 3 Global Step: 36660 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:53:56,794-Speed 5461.83 samples/sec Loss 9.7029 LearningRate 0.2252 Epoch: 3 Global Step: 36670 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:54:04,265-Speed 5482.96 samples/sec Loss 9.8087 LearningRate 0.2252 Epoch: 3 Global Step: 36680 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:54:11,723-Speed 5493.12 samples/sec Loss 9.6878 LearningRate 0.2252 Epoch: 3 Global Step: 36690 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:54:19,261-Speed 5434.25 samples/sec Loss 9.7194 LearningRate 0.2252 Epoch: 3 Global Step: 36700 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:54:26,792-Speed 5440.29 samples/sec Loss 9.7509 LearningRate 0.2251 Epoch: 3 Global Step: 36710 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:54:34,298-Speed 5457.33 samples/sec Loss 9.7913 LearningRate 0.2251 Epoch: 3 Global Step: 36720 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:54:41,719-Speed 5520.69 samples/sec Loss 9.7651 LearningRate 0.2251 Epoch: 3 Global Step: 36730 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:54:49,209-Speed 5469.09 samples/sec Loss 9.7861 LearningRate 0.2251 Epoch: 3 Global Step: 36740 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:54:56,742-Speed 5438.26 samples/sec Loss 9.7785 LearningRate 0.2250 Epoch: 3 Global Step: 36750 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:55:04,211-Speed 5484.93 samples/sec Loss 9.7762 LearningRate 0.2250 Epoch: 3 Global Step: 36760 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:55:11,655-Speed 5503.11 samples/sec Loss 9.7801 LearningRate 0.2250 Epoch: 3 Global Step: 36770 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:55:19,372-Speed 5308.21 samples/sec Loss 9.8219 LearningRate 0.2250 Epoch: 3 Global Step: 36780 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:55:26,946-Speed 5409.22 samples/sec Loss 9.8356 LearningRate 0.2249 Epoch: 3 Global Step: 36790 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:55:34,431-Speed 5472.44 samples/sec Loss 9.7523 LearningRate 0.2249 Epoch: 3 Global Step: 36800 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:55:42,006-Speed 5408.40 samples/sec Loss 9.8835 LearningRate 0.2249 Epoch: 3 Global Step: 36810 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:55:49,533-Speed 5442.46 samples/sec Loss 9.6753 LearningRate 0.2249 Epoch: 3 Global Step: 36820 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:55:56,982-Speed 5499.45 samples/sec Loss 9.7816 LearningRate 0.2248 Epoch: 3 Global Step: 36830 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:56:04,468-Speed 5472.23 samples/sec Loss 9.7933 LearningRate 0.2248 Epoch: 3 Global Step: 36840 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:56:11,878-Speed 5528.87 samples/sec Loss 9.7397 LearningRate 0.2248 Epoch: 3 Global Step: 36850 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:56:19,313-Speed 5509.58 samples/sec Loss 9.7686 LearningRate 0.2247 Epoch: 3 Global Step: 36860 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:56:26,743-Speed 5513.33 samples/sec Loss 9.7602 LearningRate 0.2247 Epoch: 3 Global Step: 36870 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:56:34,256-Speed 5452.78 samples/sec Loss 9.6757 LearningRate 0.2247 Epoch: 3 Global Step: 36880 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:56:41,705-Speed 5499.60 samples/sec Loss 9.6953 LearningRate 0.2247 Epoch: 3 Global Step: 36890 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:56:49,128-Speed 5518.65 samples/sec Loss 9.7797 LearningRate 0.2246 Epoch: 3 Global Step: 36900 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:56:56,585-Speed 5493.66 samples/sec Loss 9.7704 LearningRate 0.2246 Epoch: 3 Global Step: 36910 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 02:57:03,975-Speed 5543.62 samples/sec Loss 9.8386 LearningRate 0.2246 Epoch: 3 Global Step: 36920 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:57:11,507-Speed 5439.25 samples/sec Loss 9.7969 LearningRate 0.2246 Epoch: 3 Global Step: 36930 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:57:18,964-Speed 5493.34 samples/sec Loss 9.7051 LearningRate 0.2245 Epoch: 3 Global Step: 36940 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:57:26,403-Speed 5507.00 samples/sec Loss 9.7980 LearningRate 0.2245 Epoch: 3 Global Step: 36950 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:57:33,867-Speed 5488.75 samples/sec Loss 9.7665 LearningRate 0.2245 Epoch: 3 Global Step: 36960 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:57:41,348-Speed 5475.59 samples/sec Loss 9.8191 LearningRate 0.2245 Epoch: 3 Global Step: 36970 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:57:48,845-Speed 5464.19 samples/sec Loss 9.7403 LearningRate 0.2244 Epoch: 3 Global Step: 36980 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:57:56,485-Speed 5361.75 samples/sec Loss 9.7464 LearningRate 0.2244 Epoch: 3 Global Step: 36990 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:58:03,987-Speed 5461.06 samples/sec Loss 9.6354 LearningRate 0.2244 Epoch: 3 Global Step: 37000 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:58:11,569-Speed 5402.84 samples/sec Loss 9.7771 LearningRate 0.2244 Epoch: 3 Global Step: 37010 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:58:19,036-Speed 5486.16 samples/sec Loss 9.8232 LearningRate 0.2243 Epoch: 3 Global Step: 37020 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:58:26,427-Speed 5542.62 samples/sec Loss 9.7779 LearningRate 0.2243 Epoch: 3 Global Step: 37030 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:58:33,974-Speed 5428.69 samples/sec Loss 9.8186 LearningRate 0.2243 Epoch: 3 Global Step: 37040 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:58:41,427-Speed 5496.52 samples/sec Loss 9.8089 LearningRate 0.2242 Epoch: 3 Global Step: 37050 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:58:48,949-Speed 5445.65 samples/sec Loss 9.7854 LearningRate 0.2242 Epoch: 3 Global Step: 37060 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:58:56,412-Speed 5488.89 samples/sec Loss 9.7379 LearningRate 0.2242 Epoch: 3 Global Step: 37070 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:59:03,902-Speed 5469.95 samples/sec Loss 9.7121 LearningRate 0.2242 Epoch: 3 Global Step: 37080 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:59:11,449-Speed 5428.66 samples/sec Loss 9.7707 LearningRate 0.2241 Epoch: 3 Global Step: 37090 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:59:18,879-Speed 5513.21 samples/sec Loss 9.7841 LearningRate 0.2241 Epoch: 3 Global Step: 37100 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:59:26,375-Speed 5464.37 samples/sec Loss 9.7377 LearningRate 0.2241 Epoch: 3 Global Step: 37110 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:59:33,903-Speed 5442.29 samples/sec Loss 9.7085 LearningRate 0.2241 Epoch: 3 Global Step: 37120 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 02:59:41,396-Speed 5467.36 samples/sec Loss 9.7015 LearningRate 0.2240 Epoch: 3 Global Step: 37130 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:59:48,867-Speed 5482.72 samples/sec Loss 9.7315 LearningRate 0.2240 Epoch: 3 Global Step: 37140 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 02:59:56,240-Speed 5556.42 samples/sec Loss 9.7265 LearningRate 0.2240 Epoch: 3 Global Step: 37150 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:00:03,626-Speed 5546.77 samples/sec Loss 9.6412 LearningRate 0.2240 Epoch: 3 Global Step: 37160 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:00:11,131-Speed 5458.72 samples/sec Loss 9.7965 LearningRate 0.2239 Epoch: 3 Global Step: 37170 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:00:18,545-Speed 5525.08 samples/sec Loss 9.6863 LearningRate 0.2239 Epoch: 3 Global Step: 37180 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:00:26,014-Speed 5484.59 samples/sec Loss 9.7755 LearningRate 0.2239 Epoch: 3 Global Step: 37190 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:00:33,473-Speed 5492.15 samples/sec Loss 9.8328 LearningRate 0.2239 Epoch: 3 Global Step: 37200 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:00:40,880-Speed 5530.84 samples/sec Loss 9.7114 LearningRate 0.2238 Epoch: 3 Global Step: 37210 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:00:48,322-Speed 5505.01 samples/sec Loss 9.7512 LearningRate 0.2238 Epoch: 3 Global Step: 37220 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:00:55,901-Speed 5405.18 samples/sec Loss 9.7631 LearningRate 0.2238 Epoch: 3 Global Step: 37230 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:01:03,492-Speed 5396.14 samples/sec Loss 9.7875 LearningRate 0.2237 Epoch: 3 Global Step: 37240 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:01:11,043-Speed 5425.54 samples/sec Loss 9.7949 LearningRate 0.2237 Epoch: 3 Global Step: 37250 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:01:18,534-Speed 5468.88 samples/sec Loss 9.7578 LearningRate 0.2237 Epoch: 3 Global Step: 37260 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:01:26,140-Speed 5385.26 samples/sec Loss 9.6971 LearningRate 0.2237 Epoch: 3 Global Step: 37270 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:01:33,623-Speed 5474.58 samples/sec Loss 9.7094 LearningRate 0.2236 Epoch: 3 Global Step: 37280 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:01:41,122-Speed 5463.10 samples/sec Loss 9.6871 LearningRate 0.2236 Epoch: 3 Global Step: 37290 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:01:48,539-Speed 5522.75 samples/sec Loss 9.6385 LearningRate 0.2236 Epoch: 3 Global Step: 37300 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:01:55,962-Speed 5519.08 samples/sec Loss 9.6887 LearningRate 0.2236 Epoch: 3 Global Step: 37310 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:02:03,561-Speed 5390.79 samples/sec Loss 9.7406 LearningRate 0.2235 Epoch: 3 Global Step: 37320 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:02:11,120-Speed 5419.56 samples/sec Loss 9.7200 LearningRate 0.2235 Epoch: 3 Global Step: 37330 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:02:18,649-Speed 5441.08 samples/sec Loss 9.7775 LearningRate 0.2235 Epoch: 3 Global Step: 37340 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:02:26,192-Speed 5430.86 samples/sec Loss 9.6944 LearningRate 0.2235 Epoch: 3 Global Step: 37350 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:02:33,680-Speed 5471.04 samples/sec Loss 9.6820 LearningRate 0.2234 Epoch: 3 Global Step: 37360 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:02:41,146-Speed 5486.84 samples/sec Loss 9.7545 LearningRate 0.2234 Epoch: 3 Global Step: 37370 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:02:48,713-Speed 5413.71 samples/sec Loss 9.6856 LearningRate 0.2234 Epoch: 3 Global Step: 37380 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:02:56,130-Speed 5522.88 samples/sec Loss 9.7305 LearningRate 0.2234 Epoch: 3 Global Step: 37390 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:03:03,614-Speed 5473.76 samples/sec Loss 9.7446 LearningRate 0.2233 Epoch: 3 Global Step: 37400 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:03:11,107-Speed 5467.91 samples/sec Loss 9.7150 LearningRate 0.2233 Epoch: 3 Global Step: 37410 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:03:18,625-Speed 5448.81 samples/sec Loss 9.6741 LearningRate 0.2233 Epoch: 3 Global Step: 37420 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:03:26,120-Speed 5465.24 samples/sec Loss 9.7818 LearningRate 0.2232 Epoch: 3 Global Step: 37430 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:03:33,665-Speed 5430.08 samples/sec Loss 9.7575 LearningRate 0.2232 Epoch: 3 Global Step: 37440 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:03:41,193-Speed 5442.01 samples/sec Loss 9.7513 LearningRate 0.2232 Epoch: 3 Global Step: 37450 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:03:48,742-Speed 5426.34 samples/sec Loss 9.7478 LearningRate 0.2232 Epoch: 3 Global Step: 37460 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:03:56,327-Speed 5400.45 samples/sec Loss 9.7135 LearningRate 0.2231 Epoch: 3 Global Step: 37470 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:04:03,882-Speed 5422.31 samples/sec Loss 9.7241 LearningRate 0.2231 Epoch: 3 Global Step: 37480 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:04:11,421-Speed 5434.11 samples/sec Loss 9.6786 LearningRate 0.2231 Epoch: 3 Global Step: 37490 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 03:04:18,845-Speed 5518.35 samples/sec Loss 9.7306 LearningRate 0.2231 Epoch: 3 Global Step: 37500 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 03:04:26,337-Speed 5467.43 samples/sec Loss 9.6946 LearningRate 0.2230 Epoch: 3 Global Step: 37510 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 03:04:33,876-Speed 5433.63 samples/sec Loss 9.7028 LearningRate 0.2230 Epoch: 3 Global Step: 37520 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 03:04:41,345-Speed 5484.46 samples/sec Loss 9.7858 LearningRate 0.2230 Epoch: 3 Global Step: 37530 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 03:04:48,901-Speed 5421.93 samples/sec Loss 9.6434 LearningRate 0.2230 Epoch: 3 Global Step: 37540 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 03:04:56,341-Speed 5506.14 samples/sec Loss 9.7284 LearningRate 0.2229 Epoch: 3 Global Step: 37550 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:05:03,903-Speed 5417.13 samples/sec Loss 9.7261 LearningRate 0.2229 Epoch: 3 Global Step: 37560 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:05:11,380-Speed 5478.63 samples/sec Loss 9.7194 LearningRate 0.2229 Epoch: 3 Global Step: 37570 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:05:18,843-Speed 5489.14 samples/sec Loss 9.7507 LearningRate 0.2229 Epoch: 3 Global Step: 37580 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:05:26,322-Speed 5477.72 samples/sec Loss 9.7292 LearningRate 0.2228 Epoch: 3 Global Step: 37590 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:05:33,850-Speed 5441.57 samples/sec Loss 9.6985 LearningRate 0.2228 Epoch: 3 Global Step: 37600 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:05:41,325-Speed 5480.68 samples/sec Loss 9.6587 LearningRate 0.2228 Epoch: 3 Global Step: 37610 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:05:48,785-Speed 5491.75 samples/sec Loss 9.6157 LearningRate 0.2227 Epoch: 3 Global Step: 37620 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:05:56,260-Speed 5479.88 samples/sec Loss 9.6791 LearningRate 0.2227 Epoch: 3 Global Step: 37630 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:06:03,746-Speed 5472.52 samples/sec Loss 9.7069 LearningRate 0.2227 Epoch: 3 Global Step: 37640 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:06:11,207-Speed 5490.55 samples/sec Loss 9.6824 LearningRate 0.2227 Epoch: 3 Global Step: 37650 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 03:06:18,729-Speed 5446.12 samples/sec Loss 9.7799 LearningRate 0.2226 Epoch: 3 Global Step: 37660 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:06:26,352-Speed 5374.55 samples/sec Loss 9.7650 LearningRate 0.2226 Epoch: 3 Global Step: 37670 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:06:33,955-Speed 5387.74 samples/sec Loss 9.7437 LearningRate 0.2226 Epoch: 3 Global Step: 37680 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:06:41,388-Speed 5511.02 samples/sec Loss 9.6782 LearningRate 0.2226 Epoch: 3 Global Step: 37690 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:06:48,830-Speed 5504.61 samples/sec Loss 9.7257 LearningRate 0.2225 Epoch: 3 Global Step: 37700 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:06:56,332-Speed 5460.54 samples/sec Loss 9.7522 LearningRate 0.2225 Epoch: 3 Global Step: 37710 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:07:03,854-Speed 5446.63 samples/sec Loss 9.7220 LearningRate 0.2225 Epoch: 3 Global Step: 37720 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:07:11,482-Speed 5370.42 samples/sec Loss 9.6062 LearningRate 0.2225 Epoch: 3 Global Step: 37730 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:07:19,057-Speed 5407.64 samples/sec Loss 9.6851 LearningRate 0.2224 Epoch: 3 Global Step: 37740 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:07:26,543-Speed 5472.56 samples/sec Loss 9.6453 LearningRate 0.2224 Epoch: 3 Global Step: 37750 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:07:34,049-Speed 5457.54 samples/sec Loss 9.6422 LearningRate 0.2224 Epoch: 3 Global Step: 37760 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 03:07:41,591-Speed 5431.24 samples/sec Loss 9.7162 LearningRate 0.2224 Epoch: 3 Global Step: 37770 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 03:07:49,089-Speed 5464.45 samples/sec Loss 9.6426 LearningRate 0.2223 Epoch: 3 Global Step: 37780 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 03:07:56,517-Speed 5515.11 samples/sec Loss 9.6978 LearningRate 0.2223 Epoch: 3 Global Step: 37790 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:08:04,004-Speed 5470.91 samples/sec Loss 9.6977 LearningRate 0.2223 Epoch: 3 Global Step: 37800 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:08:11,433-Speed 5514.27 samples/sec Loss 9.6903 LearningRate 0.2222 Epoch: 3 Global Step: 37810 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:08:18,884-Speed 5498.07 samples/sec Loss 9.6920 LearningRate 0.2222 Epoch: 3 Global Step: 37820 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:08:26,349-Speed 5487.89 samples/sec Loss 9.7067 LearningRate 0.2222 Epoch: 3 Global Step: 37830 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:08:33,865-Speed 5450.68 samples/sec Loss 9.6161 LearningRate 0.2222 Epoch: 3 Global Step: 37840 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:08:41,341-Speed 5479.49 samples/sec Loss 9.6974 LearningRate 0.2221 Epoch: 3 Global Step: 37850 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:08:48,771-Speed 5513.34 samples/sec Loss 9.7122 LearningRate 0.2221 Epoch: 3 Global Step: 37860 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:08:56,243-Speed 5482.49 samples/sec Loss 9.6823 LearningRate 0.2221 Epoch: 3 Global Step: 37870 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:09:03,693-Speed 5498.86 samples/sec Loss 9.7116 LearningRate 0.2221 Epoch: 3 Global Step: 37880 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:09:11,156-Speed 5489.28 samples/sec Loss 9.6608 LearningRate 0.2220 Epoch: 3 Global Step: 37890 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 03:09:18,629-Speed 5481.47 samples/sec Loss 9.7114 LearningRate 0.2220 Epoch: 3 Global Step: 37900 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:09:26,183-Speed 5423.81 samples/sec Loss 9.7551 LearningRate 0.2220 Epoch: 3 Global Step: 37910 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:09:33,607-Speed 5518.01 samples/sec Loss 9.6747 LearningRate 0.2220 Epoch: 3 Global Step: 37920 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:09:41,119-Speed 5453.43 samples/sec Loss 9.7449 LearningRate 0.2219 Epoch: 3 Global Step: 37930 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:09:48,671-Speed 5424.10 samples/sec Loss 9.6223 LearningRate 0.2219 Epoch: 3 Global Step: 37940 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:09:56,130-Speed 5492.78 samples/sec Loss 9.6173 LearningRate 0.2219 Epoch: 3 Global Step: 37950 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:10:03,609-Speed 5476.93 samples/sec Loss 9.6246 LearningRate 0.2219 Epoch: 3 Global Step: 37960 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:10:11,053-Speed 5503.60 samples/sec Loss 9.6665 LearningRate 0.2218 Epoch: 3 Global Step: 37970 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:10:18,550-Speed 5463.33 samples/sec Loss 9.7733 LearningRate 0.2218 Epoch: 3 Global Step: 37980 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:10:26,037-Speed 5471.81 samples/sec Loss 9.6893 LearningRate 0.2218 Epoch: 3 Global Step: 37990 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:10:33,484-Speed 5501.15 samples/sec Loss 9.6567 LearningRate 0.2218 Epoch: 3 Global Step: 38000 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:11:17,672-[lfw][38000]XNorm: 21.775363 Training: 2022-01-08 03:11:17,672-[lfw][38000]Accuracy-Flip: 0.99700+-0.00323 Training: 2022-01-08 03:11:17,673-[lfw][38000]Accuracy-Highest: 0.99800 Training: 2022-01-08 03:12:09,739-[cfp_fp][38000]XNorm: 19.744928 Training: 2022-01-08 03:12:09,740-[cfp_fp][38000]Accuracy-Flip: 0.98200+-0.00524 Training: 2022-01-08 03:12:09,740-[cfp_fp][38000]Accuracy-Highest: 0.98457 Training: 2022-01-08 03:12:55,301-[agedb_30][38000]XNorm: 21.892830 Training: 2022-01-08 03:12:55,302-[agedb_30][38000]Accuracy-Flip: 0.96900+-0.00646 Training: 2022-01-08 03:12:55,303-[agedb_30][38000]Accuracy-Highest: 0.97250 Training: 2022-01-08 03:13:02,859-Speed 274.21 samples/sec Loss 9.6749 LearningRate 0.2217 Epoch: 3 Global Step: 38010 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:13:10,320-Speed 5493.54 samples/sec Loss 9.6767 LearningRate 0.2217 Epoch: 3 Global Step: 38020 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:13:18,027-Speed 5315.96 samples/sec Loss 9.7095 LearningRate 0.2217 Epoch: 3 Global Step: 38030 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:13:25,581-Speed 5423.85 samples/sec Loss 9.5988 LearningRate 0.2216 Epoch: 3 Global Step: 38040 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:13:33,094-Speed 5452.89 samples/sec Loss 9.6701 LearningRate 0.2216 Epoch: 3 Global Step: 38050 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:13:40,648-Speed 5424.16 samples/sec Loss 9.6103 LearningRate 0.2216 Epoch: 3 Global Step: 38060 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:13:48,166-Speed 5449.52 samples/sec Loss 9.6957 LearningRate 0.2216 Epoch: 3 Global Step: 38070 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:13:55,621-Speed 5494.70 samples/sec Loss 9.5678 LearningRate 0.2215 Epoch: 3 Global Step: 38080 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:14:03,078-Speed 5493.23 samples/sec Loss 9.6757 LearningRate 0.2215 Epoch: 3 Global Step: 38090 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:14:10,559-Speed 5476.14 samples/sec Loss 9.6200 LearningRate 0.2215 Epoch: 3 Global Step: 38100 Fp16 Grad Scale: 262144 Required: 39 hours Training: 2022-01-08 03:14:18,043-Speed 5473.99 samples/sec Loss 9.6311 LearningRate 0.2215 Epoch: 3 Global Step: 38110 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:14:25,576-Speed 5437.97 samples/sec Loss 9.6768 LearningRate 0.2214 Epoch: 3 Global Step: 38120 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:14:33,110-Speed 5437.59 samples/sec Loss 9.6331 LearningRate 0.2214 Epoch: 3 Global Step: 38130 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:14:40,689-Speed 5405.14 samples/sec Loss 9.7253 LearningRate 0.2214 Epoch: 3 Global Step: 38140 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:14:48,137-Speed 5499.98 samples/sec Loss 9.7965 LearningRate 0.2214 Epoch: 3 Global Step: 38150 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:14:55,628-Speed 5468.28 samples/sec Loss 9.6743 LearningRate 0.2213 Epoch: 3 Global Step: 38160 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:15:03,419-Speed 5258.03 samples/sec Loss 9.6632 LearningRate 0.2213 Epoch: 3 Global Step: 38170 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:15:11,104-Speed 5331.12 samples/sec Loss 9.5775 LearningRate 0.2213 Epoch: 3 Global Step: 38180 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:15:18,637-Speed 5437.69 samples/sec Loss 9.6725 LearningRate 0.2213 Epoch: 3 Global Step: 38190 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:15:26,194-Speed 5421.00 samples/sec Loss 9.6731 LearningRate 0.2212 Epoch: 3 Global Step: 38200 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:15:33,647-Speed 5496.77 samples/sec Loss 9.6465 LearningRate 0.2212 Epoch: 3 Global Step: 38210 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:15:41,207-Speed 5418.40 samples/sec Loss 9.7432 LearningRate 0.2212 Epoch: 3 Global Step: 38220 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:15:48,809-Speed 5388.99 samples/sec Loss 9.5943 LearningRate 0.2211 Epoch: 3 Global Step: 38230 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:15:56,295-Speed 5472.23 samples/sec Loss 9.6908 LearningRate 0.2211 Epoch: 3 Global Step: 38240 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:16:03,783-Speed 5470.49 samples/sec Loss 9.6048 LearningRate 0.2211 Epoch: 3 Global Step: 38250 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:16:11,244-Speed 5490.42 samples/sec Loss 9.5951 LearningRate 0.2211 Epoch: 3 Global Step: 38260 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:16:18,682-Speed 5508.10 samples/sec Loss 9.6857 LearningRate 0.2210 Epoch: 3 Global Step: 38270 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:16:26,117-Speed 5509.16 samples/sec Loss 9.6615 LearningRate 0.2210 Epoch: 3 Global Step: 38280 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:16:33,595-Speed 5478.36 samples/sec Loss 9.6661 LearningRate 0.2210 Epoch: 3 Global Step: 38290 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:16:41,022-Speed 5515.96 samples/sec Loss 9.6606 LearningRate 0.2210 Epoch: 3 Global Step: 38300 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:16:48,747-Speed 5302.96 samples/sec Loss 9.6069 LearningRate 0.2209 Epoch: 3 Global Step: 38310 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:16:56,177-Speed 5513.44 samples/sec Loss 9.6188 LearningRate 0.2209 Epoch: 3 Global Step: 38320 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:17:03,663-Speed 5472.43 samples/sec Loss 9.6406 LearningRate 0.2209 Epoch: 3 Global Step: 38330 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:17:11,239-Speed 5407.76 samples/sec Loss 9.6441 LearningRate 0.2209 Epoch: 3 Global Step: 38340 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:17:18,679-Speed 5506.38 samples/sec Loss 9.7444 LearningRate 0.2208 Epoch: 3 Global Step: 38350 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:17:26,175-Speed 5464.80 samples/sec Loss 9.6747 LearningRate 0.2208 Epoch: 3 Global Step: 38360 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:17:33,714-Speed 5433.50 samples/sec Loss 9.6918 LearningRate 0.2208 Epoch: 3 Global Step: 38370 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:17:41,211-Speed 5464.94 samples/sec Loss 9.6792 LearningRate 0.2208 Epoch: 3 Global Step: 38380 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:17:48,641-Speed 5513.50 samples/sec Loss 9.6130 LearningRate 0.2207 Epoch: 3 Global Step: 38390 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:17:56,139-Speed 5463.20 samples/sec Loss 9.6159 LearningRate 0.2207 Epoch: 3 Global Step: 38400 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:18:03,561-Speed 5520.00 samples/sec Loss 9.6470 LearningRate 0.2207 Epoch: 3 Global Step: 38410 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:18:10,991-Speed 5513.55 samples/sec Loss 9.6076 LearningRate 0.2207 Epoch: 3 Global Step: 38420 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:18:18,451-Speed 5490.86 samples/sec Loss 9.7143 LearningRate 0.2206 Epoch: 3 Global Step: 38430 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:18:25,857-Speed 5531.87 samples/sec Loss 9.7228 LearningRate 0.2206 Epoch: 3 Global Step: 38440 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:18:33,337-Speed 5476.64 samples/sec Loss 9.6814 LearningRate 0.2206 Epoch: 3 Global Step: 38450 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:18:40,757-Speed 5520.95 samples/sec Loss 9.6563 LearningRate 0.2205 Epoch: 3 Global Step: 38460 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:18:48,156-Speed 5536.41 samples/sec Loss 9.6360 LearningRate 0.2205 Epoch: 3 Global Step: 38470 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:18:55,569-Speed 5526.73 samples/sec Loss 9.6754 LearningRate 0.2205 Epoch: 3 Global Step: 38480 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:19:03,018-Speed 5499.31 samples/sec Loss 9.5939 LearningRate 0.2205 Epoch: 3 Global Step: 38490 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:19:10,537-Speed 5448.66 samples/sec Loss 9.6627 LearningRate 0.2204 Epoch: 3 Global Step: 38500 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:19:18,042-Speed 5458.33 samples/sec Loss 9.5707 LearningRate 0.2204 Epoch: 3 Global Step: 38510 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:19:25,590-Speed 5427.50 samples/sec Loss 9.6391 LearningRate 0.2204 Epoch: 3 Global Step: 38520 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:19:33,093-Speed 5459.88 samples/sec Loss 9.5539 LearningRate 0.2204 Epoch: 3 Global Step: 38530 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:19:40,701-Speed 5385.12 samples/sec Loss 9.5774 LearningRate 0.2203 Epoch: 3 Global Step: 38540 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:19:48,163-Speed 5489.62 samples/sec Loss 9.7196 LearningRate 0.2203 Epoch: 3 Global Step: 38550 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:19:55,655-Speed 5467.76 samples/sec Loss 9.6626 LearningRate 0.2203 Epoch: 3 Global Step: 38560 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:20:03,190-Speed 5436.43 samples/sec Loss 9.5950 LearningRate 0.2203 Epoch: 3 Global Step: 38570 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:20:10,714-Speed 5444.91 samples/sec Loss 9.5883 LearningRate 0.2202 Epoch: 3 Global Step: 38580 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:20:18,228-Speed 5451.83 samples/sec Loss 9.6453 LearningRate 0.2202 Epoch: 3 Global Step: 38590 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:20:25,825-Speed 5392.63 samples/sec Loss 9.5949 LearningRate 0.2202 Epoch: 3 Global Step: 38600 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:20:33,465-Speed 5361.76 samples/sec Loss 9.6772 LearningRate 0.2202 Epoch: 3 Global Step: 38610 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:20:41,029-Speed 5416.52 samples/sec Loss 9.7232 LearningRate 0.2201 Epoch: 3 Global Step: 38620 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:20:48,644-Speed 5379.41 samples/sec Loss 9.6763 LearningRate 0.2201 Epoch: 3 Global Step: 38630 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:20:56,192-Speed 5427.45 samples/sec Loss 9.6690 LearningRate 0.2201 Epoch: 3 Global Step: 38640 Fp16 Grad Scale: 131072 Required: 39 hours Training: 2022-01-08 03:21:03,711-Speed 5448.63 samples/sec Loss 9.5790 LearningRate 0.2201 Epoch: 3 Global Step: 38650 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 03:21:11,230-Speed 5448.28 samples/sec Loss 9.6759 LearningRate 0.2200 Epoch: 3 Global Step: 38660 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 03:21:18,793-Speed 5416.79 samples/sec Loss 9.5815 LearningRate 0.2200 Epoch: 3 Global Step: 38670 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 03:21:26,209-Speed 5524.41 samples/sec Loss 9.6645 LearningRate 0.2200 Epoch: 3 Global Step: 38680 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 03:21:33,647-Speed 5506.81 samples/sec Loss 9.6734 LearningRate 0.2199 Epoch: 3 Global Step: 38690 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 03:21:41,159-Speed 5473.77 samples/sec Loss 9.5790 LearningRate 0.2199 Epoch: 3 Global Step: 38700 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 03:21:48,633-Speed 5480.95 samples/sec Loss 9.4748 LearningRate 0.2199 Epoch: 3 Global Step: 38710 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 03:21:56,048-Speed 5525.01 samples/sec Loss 9.6001 LearningRate 0.2199 Epoch: 3 Global Step: 38720 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 03:22:03,501-Speed 5496.18 samples/sec Loss 9.5904 LearningRate 0.2198 Epoch: 3 Global Step: 38730 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 03:22:10,953-Speed 5497.48 samples/sec Loss 9.6718 LearningRate 0.2198 Epoch: 3 Global Step: 38740 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-01-08 03:22:18,435-Speed 5475.87 samples/sec Loss 9.5120 LearningRate 0.2198 Epoch: 3 Global Step: 38750 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:22:25,970-Speed 5436.59 samples/sec Loss 9.5840 LearningRate 0.2198 Epoch: 3 Global Step: 38760 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:22:33,439-Speed 5484.59 samples/sec Loss 9.6120 LearningRate 0.2197 Epoch: 3 Global Step: 38770 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:22:40,894-Speed 5495.02 samples/sec Loss 9.5792 LearningRate 0.2197 Epoch: 3 Global Step: 38780 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:22:48,522-Speed 5370.76 samples/sec Loss 9.6213 LearningRate 0.2197 Epoch: 3 Global Step: 38790 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:22:56,154-Speed 5367.80 samples/sec Loss 9.5373 LearningRate 0.2197 Epoch: 3 Global Step: 38800 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:23:03,670-Speed 5450.73 samples/sec Loss 9.5263 LearningRate 0.2196 Epoch: 3 Global Step: 38810 Fp16 Grad Scale: 65536 Required: 39 hours Training: 2022-01-08 03:23:11,225-Speed 5421.79 samples/sec Loss 9.6755 LearningRate 0.2196 Epoch: 3 Global Step: 38820 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:23:18,688-Speed 5489.08 samples/sec Loss 9.6769 LearningRate 0.2196 Epoch: 3 Global Step: 38830 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:23:26,180-Speed 5468.42 samples/sec Loss 9.5622 LearningRate 0.2196 Epoch: 3 Global Step: 38840 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:23:33,776-Speed 5393.34 samples/sec Loss 9.5418 LearningRate 0.2195 Epoch: 3 Global Step: 38850 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:23:41,225-Speed 5498.80 samples/sec Loss 9.5629 LearningRate 0.2195 Epoch: 3 Global Step: 38860 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:23:48,658-Speed 5511.67 samples/sec Loss 9.6210 LearningRate 0.2195 Epoch: 3 Global Step: 38870 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:23:56,167-Speed 5455.37 samples/sec Loss 9.5639 LearningRate 0.2195 Epoch: 3 Global Step: 38880 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:24:03,631-Speed 5488.60 samples/sec Loss 9.5893 LearningRate 0.2194 Epoch: 3 Global Step: 38890 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:24:11,136-Speed 5458.35 samples/sec Loss 9.5966 LearningRate 0.2194 Epoch: 3 Global Step: 38900 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:24:18,720-Speed 5401.77 samples/sec Loss 9.6422 LearningRate 0.2194 Epoch: 3 Global Step: 38910 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:24:26,388-Speed 5342.39 samples/sec Loss 9.6176 LearningRate 0.2193 Epoch: 3 Global Step: 38920 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:24:33,931-Speed 5431.28 samples/sec Loss 9.4917 LearningRate 0.2193 Epoch: 3 Global Step: 38930 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:24:41,377-Speed 5501.41 samples/sec Loss 9.5864 LearningRate 0.2193 Epoch: 3 Global Step: 38940 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:24:48,805-Speed 5515.07 samples/sec Loss 9.5807 LearningRate 0.2193 Epoch: 3 Global Step: 38950 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:24:56,237-Speed 5512.49 samples/sec Loss 9.5781 LearningRate 0.2192 Epoch: 3 Global Step: 38960 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:25:03,707-Speed 5483.66 samples/sec Loss 9.6029 LearningRate 0.2192 Epoch: 3 Global Step: 38970 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:25:11,209-Speed 5460.85 samples/sec Loss 9.5713 LearningRate 0.2192 Epoch: 3 Global Step: 38980 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:25:18,730-Speed 5447.04 samples/sec Loss 9.5742 LearningRate 0.2192 Epoch: 3 Global Step: 38990 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:25:26,191-Speed 5490.37 samples/sec Loss 9.6085 LearningRate 0.2191 Epoch: 3 Global Step: 39000 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:25:33,660-Speed 5485.37 samples/sec Loss 9.6009 LearningRate 0.2191 Epoch: 3 Global Step: 39010 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:25:41,148-Speed 5470.35 samples/sec Loss 9.6080 LearningRate 0.2191 Epoch: 3 Global Step: 39020 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:25:48,610-Speed 5489.62 samples/sec Loss 9.5985 LearningRate 0.2191 Epoch: 3 Global Step: 39030 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:25:56,019-Speed 5529.90 samples/sec Loss 9.5608 LearningRate 0.2190 Epoch: 3 Global Step: 39040 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:26:03,438-Speed 5521.67 samples/sec Loss 9.5292 LearningRate 0.2190 Epoch: 3 Global Step: 39050 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 03:26:10,928-Speed 5469.07 samples/sec Loss 9.6002 LearningRate 0.2190 Epoch: 3 Global Step: 39060 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 03:26:18,352-Speed 5518.40 samples/sec Loss 9.6562 LearningRate 0.2190 Epoch: 3 Global Step: 39070 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 03:26:25,818-Speed 5486.69 samples/sec Loss 9.5163 LearningRate 0.2189 Epoch: 3 Global Step: 39080 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:26:33,288-Speed 5484.35 samples/sec Loss 9.6872 LearningRate 0.2189 Epoch: 3 Global Step: 39090 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:26:41,099-Speed 5244.61 samples/sec Loss 9.7123 LearningRate 0.2189 Epoch: 3 Global Step: 39100 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:26:48,536-Speed 5508.33 samples/sec Loss 9.5637 LearningRate 0.2189 Epoch: 3 Global Step: 39110 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:26:56,034-Speed 5463.98 samples/sec Loss 9.5643 LearningRate 0.2188 Epoch: 3 Global Step: 39120 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:27:03,485-Speed 5497.99 samples/sec Loss 9.6186 LearningRate 0.2188 Epoch: 3 Global Step: 39130 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:27:10,919-Speed 5510.26 samples/sec Loss 9.5822 LearningRate 0.2188 Epoch: 3 Global Step: 39140 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:27:18,386-Speed 5485.99 samples/sec Loss 9.5505 LearningRate 0.2187 Epoch: 3 Global Step: 39150 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:27:25,792-Speed 5540.90 samples/sec Loss 9.5984 LearningRate 0.2187 Epoch: 3 Global Step: 39160 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:27:33,300-Speed 5456.22 samples/sec Loss 9.5342 LearningRate 0.2187 Epoch: 3 Global Step: 39170 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:27:40,702-Speed 5534.06 samples/sec Loss 9.5272 LearningRate 0.2187 Epoch: 3 Global Step: 39180 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:27:48,131-Speed 5514.15 samples/sec Loss 9.5724 LearningRate 0.2186 Epoch: 3 Global Step: 39190 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:27:55,618-Speed 5471.77 samples/sec Loss 9.5365 LearningRate 0.2186 Epoch: 3 Global Step: 39200 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:28:03,122-Speed 5459.36 samples/sec Loss 9.6270 LearningRate 0.2186 Epoch: 3 Global Step: 39210 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:28:10,591-Speed 5484.22 samples/sec Loss 9.6275 LearningRate 0.2186 Epoch: 3 Global Step: 39220 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:28:18,021-Speed 5513.65 samples/sec Loss 9.6059 LearningRate 0.2185 Epoch: 3 Global Step: 39230 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:28:25,507-Speed 5472.44 samples/sec Loss 9.5955 LearningRate 0.2185 Epoch: 3 Global Step: 39240 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:28:32,937-Speed 5513.59 samples/sec Loss 9.5966 LearningRate 0.2185 Epoch: 3 Global Step: 39250 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:28:40,372-Speed 5509.87 samples/sec Loss 9.5989 LearningRate 0.2185 Epoch: 3 Global Step: 39260 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:28:47,778-Speed 5531.35 samples/sec Loss 9.5483 LearningRate 0.2184 Epoch: 3 Global Step: 39270 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:28:55,279-Speed 5461.14 samples/sec Loss 9.6636 LearningRate 0.2184 Epoch: 3 Global Step: 39280 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:29:02,787-Speed 5456.88 samples/sec Loss 9.6115 LearningRate 0.2184 Epoch: 3 Global Step: 39290 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:29:10,241-Speed 5495.41 samples/sec Loss 9.5559 LearningRate 0.2184 Epoch: 3 Global Step: 39300 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:29:17,683-Speed 5504.19 samples/sec Loss 9.6031 LearningRate 0.2183 Epoch: 3 Global Step: 39310 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:29:25,116-Speed 5511.71 samples/sec Loss 9.6240 LearningRate 0.2183 Epoch: 3 Global Step: 39320 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:29:32,555-Speed 5506.92 samples/sec Loss 9.5858 LearningRate 0.2183 Epoch: 3 Global Step: 39330 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:29:40,200-Speed 5358.59 samples/sec Loss 9.5304 LearningRate 0.2183 Epoch: 3 Global Step: 39340 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:29:47,667-Speed 5486.23 samples/sec Loss 9.5828 LearningRate 0.2182 Epoch: 3 Global Step: 39350 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:29:55,110-Speed 5503.44 samples/sec Loss 9.5545 LearningRate 0.2182 Epoch: 3 Global Step: 39360 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:30:02,565-Speed 5495.50 samples/sec Loss 9.4744 LearningRate 0.2182 Epoch: 3 Global Step: 39370 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:30:10,013-Speed 5500.22 samples/sec Loss 9.6191 LearningRate 0.2182 Epoch: 3 Global Step: 39380 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 03:30:17,396-Speed 5548.25 samples/sec Loss 9.5862 LearningRate 0.2181 Epoch: 3 Global Step: 39390 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:30:24,846-Speed 5499.09 samples/sec Loss 9.5493 LearningRate 0.2181 Epoch: 3 Global Step: 39400 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:30:32,288-Speed 5504.96 samples/sec Loss 9.5816 LearningRate 0.2181 Epoch: 3 Global Step: 39410 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:30:39,659-Speed 5557.72 samples/sec Loss 9.5333 LearningRate 0.2180 Epoch: 3 Global Step: 39420 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:30:47,107-Speed 5499.53 samples/sec Loss 9.5478 LearningRate 0.2180 Epoch: 3 Global Step: 39430 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:30:54,576-Speed 5484.86 samples/sec Loss 9.5861 LearningRate 0.2180 Epoch: 3 Global Step: 39440 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:31:02,033-Speed 5494.06 samples/sec Loss 9.5444 LearningRate 0.2180 Epoch: 3 Global Step: 39450 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:31:09,654-Speed 5375.53 samples/sec Loss 9.5536 LearningRate 0.2179 Epoch: 3 Global Step: 39460 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:31:17,171-Speed 5449.27 samples/sec Loss 9.5719 LearningRate 0.2179 Epoch: 3 Global Step: 39470 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:31:24,620-Speed 5499.63 samples/sec Loss 9.5535 LearningRate 0.2179 Epoch: 3 Global Step: 39480 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:31:32,100-Speed 5476.20 samples/sec Loss 9.5520 LearningRate 0.2179 Epoch: 3 Global Step: 39490 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:31:39,572-Speed 5483.02 samples/sec Loss 9.5464 LearningRate 0.2178 Epoch: 3 Global Step: 39500 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:31:46,986-Speed 5525.30 samples/sec Loss 9.5648 LearningRate 0.2178 Epoch: 3 Global Step: 39510 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:31:54,415-Speed 5514.30 samples/sec Loss 9.4516 LearningRate 0.2178 Epoch: 3 Global Step: 39520 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:32:01,875-Speed 5491.31 samples/sec Loss 9.4982 LearningRate 0.2178 Epoch: 3 Global Step: 39530 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:32:09,403-Speed 5442.01 samples/sec Loss 9.5197 LearningRate 0.2177 Epoch: 3 Global Step: 39540 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:32:16,851-Speed 5500.41 samples/sec Loss 9.5483 LearningRate 0.2177 Epoch: 3 Global Step: 39550 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:32:24,409-Speed 5419.45 samples/sec Loss 9.5110 LearningRate 0.2177 Epoch: 3 Global Step: 39560 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:32:31,908-Speed 5463.18 samples/sec Loss 9.5274 LearningRate 0.2177 Epoch: 3 Global Step: 39570 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:32:39,375-Speed 5486.20 samples/sec Loss 9.4924 LearningRate 0.2176 Epoch: 3 Global Step: 39580 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:32:46,863-Speed 5471.16 samples/sec Loss 9.5408 LearningRate 0.2176 Epoch: 3 Global Step: 39590 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:32:54,305-Speed 5504.09 samples/sec Loss 9.4801 LearningRate 0.2176 Epoch: 3 Global Step: 39600 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:33:01,793-Speed 5470.46 samples/sec Loss 9.5960 LearningRate 0.2176 Epoch: 3 Global Step: 39610 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:33:09,280-Speed 5472.43 samples/sec Loss 9.5077 LearningRate 0.2175 Epoch: 3 Global Step: 39620 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:33:16,772-Speed 5467.41 samples/sec Loss 9.5001 LearningRate 0.2175 Epoch: 3 Global Step: 39630 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:33:24,224-Speed 5497.36 samples/sec Loss 9.6830 LearningRate 0.2175 Epoch: 3 Global Step: 39640 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:33:31,719-Speed 5465.48 samples/sec Loss 9.5622 LearningRate 0.2175 Epoch: 3 Global Step: 39650 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:33:39,347-Speed 5371.31 samples/sec Loss 9.5404 LearningRate 0.2174 Epoch: 3 Global Step: 39660 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:33:46,800-Speed 5496.08 samples/sec Loss 9.5348 LearningRate 0.2174 Epoch: 3 Global Step: 39670 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:33:54,323-Speed 5445.46 samples/sec Loss 9.5337 LearningRate 0.2174 Epoch: 3 Global Step: 39680 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:34:01,934-Speed 5382.26 samples/sec Loss 9.5682 LearningRate 0.2173 Epoch: 3 Global Step: 39690 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:34:09,455-Speed 5447.00 samples/sec Loss 9.4712 LearningRate 0.2173 Epoch: 3 Global Step: 39700 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:34:16,914-Speed 5492.46 samples/sec Loss 9.5049 LearningRate 0.2173 Epoch: 3 Global Step: 39710 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:34:24,363-Speed 5499.54 samples/sec Loss 9.5858 LearningRate 0.2173 Epoch: 3 Global Step: 39720 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:34:31,933-Speed 5411.24 samples/sec Loss 9.6154 LearningRate 0.2172 Epoch: 3 Global Step: 39730 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:34:39,666-Speed 5297.66 samples/sec Loss 9.5433 LearningRate 0.2172 Epoch: 3 Global Step: 39740 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:34:47,276-Speed 5382.97 samples/sec Loss 9.5553 LearningRate 0.2172 Epoch: 3 Global Step: 39750 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:34:54,904-Speed 5370.61 samples/sec Loss 9.5290 LearningRate 0.2172 Epoch: 3 Global Step: 39760 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:35:02,514-Speed 5383.06 samples/sec Loss 9.3961 LearningRate 0.2171 Epoch: 3 Global Step: 39770 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 03:35:10,112-Speed 5391.36 samples/sec Loss 9.4996 LearningRate 0.2171 Epoch: 3 Global Step: 39780 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:35:17,814-Speed 5319.07 samples/sec Loss 9.4786 LearningRate 0.2171 Epoch: 3 Global Step: 39790 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:35:25,457-Speed 5359.88 samples/sec Loss 9.5836 LearningRate 0.2171 Epoch: 3 Global Step: 39800 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:35:33,073-Speed 5379.29 samples/sec Loss 9.5483 LearningRate 0.2170 Epoch: 3 Global Step: 39810 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:35:40,690-Speed 5377.97 samples/sec Loss 9.5189 LearningRate 0.2170 Epoch: 3 Global Step: 39820 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:35:48,173-Speed 5474.86 samples/sec Loss 9.6523 LearningRate 0.2170 Epoch: 3 Global Step: 39830 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:35:55,683-Speed 5454.50 samples/sec Loss 9.5607 LearningRate 0.2170 Epoch: 3 Global Step: 39840 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:36:03,177-Speed 5466.14 samples/sec Loss 9.4478 LearningRate 0.2169 Epoch: 3 Global Step: 39850 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:36:10,767-Speed 5397.26 samples/sec Loss 9.5621 LearningRate 0.2169 Epoch: 3 Global Step: 39860 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 03:36:18,221-Speed 5495.97 samples/sec Loss 9.4756 LearningRate 0.2169 Epoch: 3 Global Step: 39870 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 03:36:25,771-Speed 5426.56 samples/sec Loss 9.5452 LearningRate 0.2169 Epoch: 3 Global Step: 39880 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 03:36:33,210-Speed 5506.43 samples/sec Loss 9.4514 LearningRate 0.2168 Epoch: 3 Global Step: 39890 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 03:36:40,692-Speed 5475.14 samples/sec Loss 9.5465 LearningRate 0.2168 Epoch: 3 Global Step: 39900 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 03:36:48,149-Speed 5493.66 samples/sec Loss 9.5617 LearningRate 0.2168 Epoch: 3 Global Step: 39910 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 03:36:55,577-Speed 5514.75 samples/sec Loss 9.5444 LearningRate 0.2168 Epoch: 3 Global Step: 39920 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 03:37:03,135-Speed 5420.40 samples/sec Loss 9.5247 LearningRate 0.2167 Epoch: 3 Global Step: 39930 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 03:37:10,674-Speed 5433.73 samples/sec Loss 9.5834 LearningRate 0.2167 Epoch: 3 Global Step: 39940 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 03:37:18,201-Speed 5442.71 samples/sec Loss 9.5478 LearningRate 0.2167 Epoch: 3 Global Step: 39950 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 03:37:25,655-Speed 5496.30 samples/sec Loss 9.4854 LearningRate 0.2166 Epoch: 3 Global Step: 39960 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:37:33,271-Speed 5378.36 samples/sec Loss 9.5712 LearningRate 0.2166 Epoch: 3 Global Step: 39970 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:37:40,781-Speed 5455.66 samples/sec Loss 9.4593 LearningRate 0.2166 Epoch: 3 Global Step: 39980 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:37:48,452-Speed 5340.18 samples/sec Loss 9.4943 LearningRate 0.2166 Epoch: 3 Global Step: 39990 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:37:56,044-Speed 5395.01 samples/sec Loss 9.5001 LearningRate 0.2165 Epoch: 3 Global Step: 40000 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:38:40,076-[lfw][40000]XNorm: 23.917961 Training: 2022-01-08 03:38:40,077-[lfw][40000]Accuracy-Flip: 0.99717+-0.00236 Training: 2022-01-08 03:38:40,078-[lfw][40000]Accuracy-Highest: 0.99800 Training: 2022-01-08 03:39:32,410-[cfp_fp][40000]XNorm: 21.545600 Training: 2022-01-08 03:39:32,411-[cfp_fp][40000]Accuracy-Flip: 0.98600+-0.00534 Training: 2022-01-08 03:39:32,412-[cfp_fp][40000]Accuracy-Highest: 0.98600 Training: 2022-01-08 03:40:17,829-[agedb_30][40000]XNorm: 24.016595 Training: 2022-01-08 03:40:17,831-[agedb_30][40000]Accuracy-Flip: 0.96867+-0.00609 Training: 2022-01-08 03:40:17,831-[agedb_30][40000]Accuracy-Highest: 0.97250 Training: 2022-01-08 03:40:25,573-Speed 273.93 samples/sec Loss 9.5309 LearningRate 0.2165 Epoch: 3 Global Step: 40010 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:40:33,116-Speed 5432.05 samples/sec Loss 9.5041 LearningRate 0.2165 Epoch: 3 Global Step: 40020 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:40:40,620-Speed 5459.38 samples/sec Loss 9.5758 LearningRate 0.2165 Epoch: 3 Global Step: 40030 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:40:48,167-Speed 5428.45 samples/sec Loss 9.4909 LearningRate 0.2164 Epoch: 3 Global Step: 40040 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:40:55,651-Speed 5473.43 samples/sec Loss 9.5943 LearningRate 0.2164 Epoch: 3 Global Step: 40050 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:41:03,099-Speed 5501.72 samples/sec Loss 9.4999 LearningRate 0.2164 Epoch: 3 Global Step: 40060 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:41:10,546-Speed 5501.12 samples/sec Loss 9.4775 LearningRate 0.2164 Epoch: 3 Global Step: 40070 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:41:17,988-Speed 5504.19 samples/sec Loss 9.4907 LearningRate 0.2163 Epoch: 3 Global Step: 40080 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:41:25,484-Speed 5465.33 samples/sec Loss 9.5811 LearningRate 0.2163 Epoch: 3 Global Step: 40090 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:41:32,991-Speed 5456.97 samples/sec Loss 9.5988 LearningRate 0.2163 Epoch: 3 Global Step: 40100 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:41:40,463-Speed 5482.55 samples/sec Loss 9.5369 LearningRate 0.2163 Epoch: 3 Global Step: 40110 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:41:48,020-Speed 5420.60 samples/sec Loss 9.5154 LearningRate 0.2162 Epoch: 3 Global Step: 40120 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:41:55,591-Speed 5410.66 samples/sec Loss 9.4905 LearningRate 0.2162 Epoch: 3 Global Step: 40130 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:42:03,138-Speed 5428.33 samples/sec Loss 9.4981 LearningRate 0.2162 Epoch: 3 Global Step: 40140 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:42:10,692-Speed 5423.27 samples/sec Loss 9.4135 LearningRate 0.2162 Epoch: 3 Global Step: 40150 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:42:18,237-Speed 5428.89 samples/sec Loss 9.4765 LearningRate 0.2161 Epoch: 3 Global Step: 40160 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:42:25,845-Speed 5384.80 samples/sec Loss 9.4482 LearningRate 0.2161 Epoch: 3 Global Step: 40170 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:42:33,357-Speed 5453.07 samples/sec Loss 9.4868 LearningRate 0.2161 Epoch: 3 Global Step: 40180 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:42:40,858-Speed 5461.40 samples/sec Loss 9.5101 LearningRate 0.2161 Epoch: 3 Global Step: 40190 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:42:48,351-Speed 5467.37 samples/sec Loss 9.4372 LearningRate 0.2160 Epoch: 3 Global Step: 40200 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:42:55,825-Speed 5480.96 samples/sec Loss 9.4576 LearningRate 0.2160 Epoch: 3 Global Step: 40210 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:43:03,298-Speed 5481.57 samples/sec Loss 9.4635 LearningRate 0.2160 Epoch: 3 Global Step: 40220 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:43:10,784-Speed 5472.82 samples/sec Loss 9.4368 LearningRate 0.2159 Epoch: 3 Global Step: 40230 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:43:18,357-Speed 5409.06 samples/sec Loss 9.5058 LearningRate 0.2159 Epoch: 3 Global Step: 40240 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:43:25,798-Speed 5505.05 samples/sec Loss 9.5039 LearningRate 0.2159 Epoch: 3 Global Step: 40250 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:43:33,282-Speed 5474.19 samples/sec Loss 9.5320 LearningRate 0.2159 Epoch: 3 Global Step: 40260 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 03:43:40,760-Speed 5477.84 samples/sec Loss 9.5404 LearningRate 0.2158 Epoch: 3 Global Step: 40270 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 03:43:48,284-Speed 5444.71 samples/sec Loss 9.5365 LearningRate 0.2158 Epoch: 3 Global Step: 40280 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:43:55,850-Speed 5414.29 samples/sec Loss 9.4290 LearningRate 0.2158 Epoch: 3 Global Step: 40290 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:44:03,416-Speed 5414.28 samples/sec Loss 9.4932 LearningRate 0.2158 Epoch: 3 Global Step: 40300 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:44:10,962-Speed 5428.85 samples/sec Loss 9.5371 LearningRate 0.2157 Epoch: 3 Global Step: 40310 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:44:18,555-Speed 5395.42 samples/sec Loss 9.4763 LearningRate 0.2157 Epoch: 3 Global Step: 40320 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:44:26,115-Speed 5418.21 samples/sec Loss 9.6025 LearningRate 0.2157 Epoch: 3 Global Step: 40330 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:44:33,584-Speed 5485.47 samples/sec Loss 9.4785 LearningRate 0.2157 Epoch: 3 Global Step: 40340 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:44:41,102-Speed 5449.34 samples/sec Loss 9.4881 LearningRate 0.2156 Epoch: 3 Global Step: 40350 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:44:48,714-Speed 5381.53 samples/sec Loss 9.4655 LearningRate 0.2156 Epoch: 3 Global Step: 40360 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:44:56,184-Speed 5483.56 samples/sec Loss 9.5061 LearningRate 0.2156 Epoch: 3 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:45:03,722-Speed 5434.86 samples/sec Loss 9.4551 LearningRate 0.2156 Epoch: 3 Global Step: 40380 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:45:11,163-Speed 5505.50 samples/sec Loss 9.4006 LearningRate 0.2155 Epoch: 3 Global Step: 40390 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:45:18,649-Speed 5472.16 samples/sec Loss 9.5283 LearningRate 0.2155 Epoch: 3 Global Step: 40400 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:45:26,118-Speed 5484.55 samples/sec Loss 9.5147 LearningRate 0.2155 Epoch: 3 Global Step: 40410 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:45:33,623-Speed 5458.24 samples/sec Loss 9.5150 LearningRate 0.2155 Epoch: 3 Global Step: 40420 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:45:41,071-Speed 5500.71 samples/sec Loss 9.4443 LearningRate 0.2154 Epoch: 3 Global Step: 40430 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:45:48,537-Speed 5486.72 samples/sec Loss 9.5214 LearningRate 0.2154 Epoch: 3 Global Step: 40440 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:45:55,988-Speed 5498.31 samples/sec Loss 9.5048 LearningRate 0.2154 Epoch: 3 Global Step: 40450 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:46:03,524-Speed 5436.16 samples/sec Loss 9.5186 LearningRate 0.2154 Epoch: 3 Global Step: 40460 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:46:10,976-Speed 5496.65 samples/sec Loss 9.5234 LearningRate 0.2153 Epoch: 3 Global Step: 40470 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:46:18,480-Speed 5459.81 samples/sec Loss 9.4069 LearningRate 0.2153 Epoch: 3 Global Step: 40480 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:46:26,021-Speed 5433.35 samples/sec Loss 9.5059 LearningRate 0.2153 Epoch: 3 Global Step: 40490 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:46:33,517-Speed 5464.99 samples/sec Loss 9.5115 LearningRate 0.2153 Epoch: 3 Global Step: 40500 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:46:41,085-Speed 5413.05 samples/sec Loss 9.5540 LearningRate 0.2152 Epoch: 3 Global Step: 40510 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:46:48,659-Speed 5409.07 samples/sec Loss 9.4905 LearningRate 0.2152 Epoch: 3 Global Step: 40520 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:46:56,147-Speed 5470.62 samples/sec Loss 9.4873 LearningRate 0.2152 Epoch: 3 Global Step: 40530 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:47:03,648-Speed 5461.43 samples/sec Loss 9.4356 LearningRate 0.2151 Epoch: 3 Global Step: 40540 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:47:11,182-Speed 5437.50 samples/sec Loss 9.5063 LearningRate 0.2151 Epoch: 3 Global Step: 40550 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:47:18,640-Speed 5492.55 samples/sec Loss 9.4211 LearningRate 0.2151 Epoch: 3 Global Step: 40560 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:47:26,174-Speed 5437.55 samples/sec Loss 9.4399 LearningRate 0.2151 Epoch: 3 Global Step: 40570 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:47:33,736-Speed 5417.11 samples/sec Loss 9.4664 LearningRate 0.2150 Epoch: 3 Global Step: 40580 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:47:41,336-Speed 5390.38 samples/sec Loss 9.4100 LearningRate 0.2150 Epoch: 3 Global Step: 40590 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:47:48,832-Speed 5465.56 samples/sec Loss 9.4548 LearningRate 0.2150 Epoch: 3 Global Step: 40600 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:47:56,288-Speed 5494.14 samples/sec Loss 9.4809 LearningRate 0.2150 Epoch: 3 Global Step: 40610 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:48:03,801-Speed 5452.78 samples/sec Loss 9.4359 LearningRate 0.2149 Epoch: 3 Global Step: 40620 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:48:11,285-Speed 5473.33 samples/sec Loss 9.4619 LearningRate 0.2149 Epoch: 3 Global Step: 40630 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:48:18,738-Speed 5496.97 samples/sec Loss 9.4197 LearningRate 0.2149 Epoch: 3 Global Step: 40640 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:48:26,255-Speed 5449.52 samples/sec Loss 9.5120 LearningRate 0.2149 Epoch: 3 Global Step: 40650 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:48:33,856-Speed 5389.38 samples/sec Loss 9.5001 LearningRate 0.2148 Epoch: 3 Global Step: 40660 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:48:41,441-Speed 5400.79 samples/sec Loss 9.4530 LearningRate 0.2148 Epoch: 3 Global Step: 40670 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:48:49,016-Speed 5407.90 samples/sec Loss 9.4467 LearningRate 0.2148 Epoch: 3 Global Step: 40680 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:48:56,513-Speed 5464.38 samples/sec Loss 9.4436 LearningRate 0.2148 Epoch: 3 Global Step: 40690 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:49:03,975-Speed 5489.99 samples/sec Loss 9.5441 LearningRate 0.2147 Epoch: 3 Global Step: 40700 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:49:11,474-Speed 5463.13 samples/sec Loss 9.4182 LearningRate 0.2147 Epoch: 3 Global Step: 40710 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:49:18,943-Speed 5484.44 samples/sec Loss 9.4648 LearningRate 0.2147 Epoch: 3 Global Step: 40720 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:49:26,396-Speed 5496.73 samples/sec Loss 9.4892 LearningRate 0.2147 Epoch: 3 Global Step: 40730 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:49:33,966-Speed 5411.36 samples/sec Loss 9.5027 LearningRate 0.2146 Epoch: 3 Global Step: 40740 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:49:41,488-Speed 5445.86 samples/sec Loss 9.3349 LearningRate 0.2146 Epoch: 3 Global Step: 40750 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:49:49,035-Speed 5428.50 samples/sec Loss 9.3308 LearningRate 0.2146 Epoch: 3 Global Step: 40760 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:49:56,693-Speed 5348.95 samples/sec Loss 9.4055 LearningRate 0.2146 Epoch: 3 Global Step: 40770 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:50:04,255-Speed 5417.71 samples/sec Loss 9.4122 LearningRate 0.2145 Epoch: 3 Global Step: 40780 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:50:11,864-Speed 5383.34 samples/sec Loss 9.4219 LearningRate 0.2145 Epoch: 3 Global Step: 40790 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:50:19,291-Speed 5515.85 samples/sec Loss 9.4259 LearningRate 0.2145 Epoch: 3 Global Step: 40800 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:50:26,708-Speed 5523.18 samples/sec Loss 9.4763 LearningRate 0.2145 Epoch: 3 Global Step: 40810 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:50:34,187-Speed 5477.57 samples/sec Loss 9.4293 LearningRate 0.2144 Epoch: 3 Global Step: 40820 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:50:41,644-Speed 5493.39 samples/sec Loss 9.5547 LearningRate 0.2144 Epoch: 3 Global Step: 40830 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:50:49,127-Speed 5474.47 samples/sec Loss 9.4571 LearningRate 0.2144 Epoch: 3 Global Step: 40840 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:50:56,583-Speed 5494.20 samples/sec Loss 9.4309 LearningRate 0.2144 Epoch: 3 Global Step: 40850 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:51:04,115-Speed 5438.52 samples/sec Loss 9.4304 LearningRate 0.2143 Epoch: 3 Global Step: 40860 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:51:11,588-Speed 5482.47 samples/sec Loss 9.4186 LearningRate 0.2143 Epoch: 3 Global Step: 40870 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:51:19,085-Speed 5464.31 samples/sec Loss 9.4139 LearningRate 0.2143 Epoch: 3 Global Step: 40880 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:51:26,652-Speed 5413.28 samples/sec Loss 9.3807 LearningRate 0.2142 Epoch: 3 Global Step: 40890 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:51:34,138-Speed 5472.43 samples/sec Loss 9.5008 LearningRate 0.2142 Epoch: 3 Global Step: 40900 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 03:51:41,601-Speed 5489.46 samples/sec Loss 9.4685 LearningRate 0.2142 Epoch: 3 Global Step: 40910 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:51:49,414-Speed 5243.06 samples/sec Loss 9.4031 LearningRate 0.2142 Epoch: 3 Global Step: 40920 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:51:57,096-Speed 5333.24 samples/sec Loss 9.4028 LearningRate 0.2141 Epoch: 3 Global Step: 40930 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:52:04,609-Speed 5452.05 samples/sec Loss 9.4423 LearningRate 0.2141 Epoch: 3 Global Step: 40940 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:52:12,111-Speed 5460.99 samples/sec Loss 9.4767 LearningRate 0.2141 Epoch: 3 Global Step: 40950 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:52:19,686-Speed 5408.06 samples/sec Loss 9.4696 LearningRate 0.2141 Epoch: 3 Global Step: 40960 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:52:27,261-Speed 5408.14 samples/sec Loss 9.3910 LearningRate 0.2140 Epoch: 3 Global Step: 40970 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:52:34,864-Speed 5387.79 samples/sec Loss 9.4336 LearningRate 0.2140 Epoch: 3 Global Step: 40980 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:52:42,353-Speed 5469.99 samples/sec Loss 9.3582 LearningRate 0.2140 Epoch: 3 Global Step: 40990 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:52:49,952-Speed 5390.98 samples/sec Loss 9.5125 LearningRate 0.2140 Epoch: 3 Global Step: 41000 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:52:57,458-Speed 5457.57 samples/sec Loss 9.5007 LearningRate 0.2139 Epoch: 3 Global Step: 41010 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:53:04,995-Speed 5435.23 samples/sec Loss 9.3641 LearningRate 0.2139 Epoch: 3 Global Step: 41020 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:53:12,541-Speed 5428.78 samples/sec Loss 9.5485 LearningRate 0.2139 Epoch: 3 Global Step: 41030 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:53:20,029-Speed 5470.83 samples/sec Loss 9.4829 LearningRate 0.2139 Epoch: 3 Global Step: 41040 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:53:27,718-Speed 5328.15 samples/sec Loss 9.4446 LearningRate 0.2138 Epoch: 3 Global Step: 41050 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:53:35,308-Speed 5397.04 samples/sec Loss 9.3896 LearningRate 0.2138 Epoch: 3 Global Step: 41060 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:53:42,909-Speed 5389.35 samples/sec Loss 9.3834 LearningRate 0.2138 Epoch: 3 Global Step: 41070 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:53:50,492-Speed 5402.48 samples/sec Loss 9.4624 LearningRate 0.2138 Epoch: 3 Global Step: 41080 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:53:58,067-Speed 5408.10 samples/sec Loss 9.4209 LearningRate 0.2137 Epoch: 3 Global Step: 41090 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:54:05,644-Speed 5406.88 samples/sec Loss 9.3973 LearningRate 0.2137 Epoch: 3 Global Step: 41100 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:54:13,157-Speed 5452.37 samples/sec Loss 9.4778 LearningRate 0.2137 Epoch: 3 Global Step: 41110 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:54:20,754-Speed 5392.50 samples/sec Loss 9.4507 LearningRate 0.2137 Epoch: 3 Global Step: 41120 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:54:28,265-Speed 5454.03 samples/sec Loss 9.4742 LearningRate 0.2136 Epoch: 3 Global Step: 41130 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:54:35,863-Speed 5391.47 samples/sec Loss 9.4667 LearningRate 0.2136 Epoch: 3 Global Step: 41140 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:54:43,403-Speed 5433.18 samples/sec Loss 9.4411 LearningRate 0.2136 Epoch: 3 Global Step: 41150 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:54:50,903-Speed 5462.01 samples/sec Loss 9.3390 LearningRate 0.2136 Epoch: 3 Global Step: 41160 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:54:58,429-Speed 5443.16 samples/sec Loss 9.4436 LearningRate 0.2135 Epoch: 3 Global Step: 41170 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:55:05,893-Speed 5488.96 samples/sec Loss 9.4257 LearningRate 0.2135 Epoch: 3 Global Step: 41180 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:55:13,366-Speed 5481.60 samples/sec Loss 9.4193 LearningRate 0.2135 Epoch: 3 Global Step: 41190 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:55:20,914-Speed 5427.65 samples/sec Loss 9.4209 LearningRate 0.2135 Epoch: 3 Global Step: 41200 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:55:28,522-Speed 5384.28 samples/sec Loss 9.4238 LearningRate 0.2134 Epoch: 3 Global Step: 41210 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:55:36,127-Speed 5387.15 samples/sec Loss 9.4930 LearningRate 0.2134 Epoch: 3 Global Step: 41220 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:55:43,697-Speed 5411.62 samples/sec Loss 9.4318 LearningRate 0.2134 Epoch: 3 Global Step: 41230 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:55:51,356-Speed 5348.51 samples/sec Loss 9.4280 LearningRate 0.2133 Epoch: 3 Global Step: 41240 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:55:58,915-Speed 5419.26 samples/sec Loss 9.3531 LearningRate 0.2133 Epoch: 3 Global Step: 41250 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:56:06,483-Speed 5413.24 samples/sec Loss 9.3934 LearningRate 0.2133 Epoch: 3 Global Step: 41260 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:56:14,116-Speed 5366.70 samples/sec Loss 9.3768 LearningRate 0.2133 Epoch: 3 Global Step: 41270 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:56:21,629-Speed 5453.12 samples/sec Loss 9.4261 LearningRate 0.2132 Epoch: 3 Global Step: 41280 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:56:29,087-Speed 5492.40 samples/sec Loss 9.4223 LearningRate 0.2132 Epoch: 3 Global Step: 41290 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:56:36,626-Speed 5433.75 samples/sec Loss 9.4170 LearningRate 0.2132 Epoch: 3 Global Step: 41300 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:56:44,209-Speed 5402.38 samples/sec Loss 9.3723 LearningRate 0.2132 Epoch: 3 Global Step: 41310 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:56:51,677-Speed 5485.91 samples/sec Loss 9.4880 LearningRate 0.2131 Epoch: 3 Global Step: 41320 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:56:59,094-Speed 5523.02 samples/sec Loss 9.4186 LearningRate 0.2131 Epoch: 3 Global Step: 41330 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:57:06,615-Speed 5446.56 samples/sec Loss 9.3466 LearningRate 0.2131 Epoch: 3 Global Step: 41340 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:57:14,329-Speed 5310.61 samples/sec Loss 9.5012 LearningRate 0.2131 Epoch: 3 Global Step: 41350 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:57:21,982-Speed 5353.22 samples/sec Loss 9.4677 LearningRate 0.2130 Epoch: 3 Global Step: 41360 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:57:29,734-Speed 5284.77 samples/sec Loss 9.5325 LearningRate 0.2130 Epoch: 3 Global Step: 41370 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:57:37,219-Speed 5472.47 samples/sec Loss 9.4341 LearningRate 0.2130 Epoch: 3 Global Step: 41380 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:57:44,762-Speed 5430.60 samples/sec Loss 9.4390 LearningRate 0.2130 Epoch: 3 Global Step: 41390 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:57:52,264-Speed 5460.77 samples/sec Loss 9.4303 LearningRate 0.2129 Epoch: 3 Global Step: 41400 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:57:59,818-Speed 5423.52 samples/sec Loss 9.4220 LearningRate 0.2129 Epoch: 3 Global Step: 41410 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:58:07,359-Speed 5432.27 samples/sec Loss 9.3565 LearningRate 0.2129 Epoch: 3 Global Step: 41420 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 03:58:14,849-Speed 5468.72 samples/sec Loss 9.4357 LearningRate 0.2129 Epoch: 3 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:58:22,457-Speed 5384.92 samples/sec Loss 9.4195 LearningRate 0.2128 Epoch: 3 Global Step: 41440 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:58:30,037-Speed 5404.63 samples/sec Loss 9.3710 LearningRate 0.2128 Epoch: 3 Global Step: 41450 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:58:37,725-Speed 5328.71 samples/sec Loss 9.3551 LearningRate 0.2128 Epoch: 3 Global Step: 41460 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:58:45,258-Speed 5437.52 samples/sec Loss 9.3857 LearningRate 0.2128 Epoch: 3 Global Step: 41470 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:58:52,905-Speed 5357.84 samples/sec Loss 9.4178 LearningRate 0.2127 Epoch: 3 Global Step: 41480 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:59:16,076-Speed 1767.79 samples/sec Loss 9.3659 LearningRate 0.2127 Epoch: 4 Global Step: 41490 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:59:23,558-Speed 5474.96 samples/sec Loss 9.4160 LearningRate 0.2127 Epoch: 4 Global Step: 41500 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:59:31,142-Speed 5401.87 samples/sec Loss 9.3544 LearningRate 0.2127 Epoch: 4 Global Step: 41510 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:59:38,559-Speed 5522.98 samples/sec Loss 9.3627 LearningRate 0.2126 Epoch: 4 Global Step: 41520 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 03:59:46,030-Speed 5483.24 samples/sec Loss 9.3927 LearningRate 0.2126 Epoch: 4 Global Step: 41530 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 03:59:53,465-Speed 5510.13 samples/sec Loss 9.3771 LearningRate 0.2126 Epoch: 4 Global Step: 41540 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:00:00,886-Speed 5520.15 samples/sec Loss 9.3237 LearningRate 0.2126 Epoch: 4 Global Step: 41550 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:00:08,303-Speed 5523.90 samples/sec Loss 9.4898 LearningRate 0.2125 Epoch: 4 Global Step: 41560 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:00:15,760-Speed 5493.24 samples/sec Loss 9.3922 LearningRate 0.2125 Epoch: 4 Global Step: 41570 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:00:23,211-Speed 5498.03 samples/sec Loss 9.4494 LearningRate 0.2125 Epoch: 4 Global Step: 41580 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:00:30,652-Speed 5505.25 samples/sec Loss 9.4020 LearningRate 0.2125 Epoch: 4 Global Step: 41590 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:00:38,083-Speed 5512.93 samples/sec Loss 9.3436 LearningRate 0.2124 Epoch: 4 Global Step: 41600 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:00:45,499-Speed 5524.14 samples/sec Loss 9.3315 LearningRate 0.2124 Epoch: 4 Global Step: 41610 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:00:52,915-Speed 5523.59 samples/sec Loss 9.4045 LearningRate 0.2124 Epoch: 4 Global Step: 41620 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:01:00,363-Speed 5499.63 samples/sec Loss 9.3997 LearningRate 0.2123 Epoch: 4 Global Step: 41630 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:01:07,796-Speed 5511.94 samples/sec Loss 9.3349 LearningRate 0.2123 Epoch: 4 Global Step: 41640 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:01:15,282-Speed 5472.15 samples/sec Loss 9.3566 LearningRate 0.2123 Epoch: 4 Global Step: 41650 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:01:22,869-Speed 5399.52 samples/sec Loss 9.4461 LearningRate 0.2123 Epoch: 4 Global Step: 41660 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:01:30,316-Speed 5500.51 samples/sec Loss 9.3578 LearningRate 0.2122 Epoch: 4 Global Step: 41670 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:01:37,791-Speed 5480.56 samples/sec Loss 9.3483 LearningRate 0.2122 Epoch: 4 Global Step: 41680 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:01:45,253-Speed 5489.96 samples/sec Loss 9.3850 LearningRate 0.2122 Epoch: 4 Global Step: 41690 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:01:52,768-Speed 5450.85 samples/sec Loss 9.3532 LearningRate 0.2122 Epoch: 4 Global Step: 41700 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:02:00,278-Speed 5454.66 samples/sec Loss 9.4923 LearningRate 0.2121 Epoch: 4 Global Step: 41710 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:02:07,745-Speed 5486.20 samples/sec Loss 9.3780 LearningRate 0.2121 Epoch: 4 Global Step: 41720 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:02:15,151-Speed 5531.55 samples/sec Loss 9.3611 LearningRate 0.2121 Epoch: 4 Global Step: 41730 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:02:22,610-Speed 5491.90 samples/sec Loss 9.3877 LearningRate 0.2121 Epoch: 4 Global Step: 41740 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:02:29,991-Speed 5549.51 samples/sec Loss 9.4035 LearningRate 0.2120 Epoch: 4 Global Step: 41750 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 04:02:37,454-Speed 5489.61 samples/sec Loss 9.3687 LearningRate 0.2120 Epoch: 4 Global Step: 41760 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 04:02:44,879-Speed 5517.71 samples/sec Loss 9.3683 LearningRate 0.2120 Epoch: 4 Global Step: 41770 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 04:02:52,321-Speed 5503.90 samples/sec Loss 9.2806 LearningRate 0.2120 Epoch: 4 Global Step: 41780 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 04:02:59,772-Speed 5498.20 samples/sec Loss 9.3185 LearningRate 0.2119 Epoch: 4 Global Step: 41790 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 04:03:07,236-Speed 5488.77 samples/sec Loss 9.3295 LearningRate 0.2119 Epoch: 4 Global Step: 41800 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 04:03:14,723-Speed 5471.29 samples/sec Loss 9.3160 LearningRate 0.2119 Epoch: 4 Global Step: 41810 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 04:03:22,182-Speed 5491.92 samples/sec Loss 9.3291 LearningRate 0.2119 Epoch: 4 Global Step: 41820 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 04:03:29,676-Speed 5466.16 samples/sec Loss 9.2931 LearningRate 0.2118 Epoch: 4 Global Step: 41830 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 04:03:37,129-Speed 5497.20 samples/sec Loss 9.3075 LearningRate 0.2118 Epoch: 4 Global Step: 41840 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-01-08 04:03:44,783-Speed 5352.01 samples/sec Loss 9.3181 LearningRate 0.2118 Epoch: 4 Global Step: 41850 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:03:52,301-Speed 5448.48 samples/sec Loss 9.3635 LearningRate 0.2118 Epoch: 4 Global Step: 41860 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:03:59,956-Speed 5351.16 samples/sec Loss 9.3643 LearningRate 0.2117 Epoch: 4 Global Step: 41870 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:04:07,490-Speed 5437.33 samples/sec Loss 9.3875 LearningRate 0.2117 Epoch: 4 Global Step: 41880 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:04:15,129-Speed 5363.64 samples/sec Loss 9.3959 LearningRate 0.2117 Epoch: 4 Global Step: 41890 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:04:22,582-Speed 5496.27 samples/sec Loss 9.3558 LearningRate 0.2117 Epoch: 4 Global Step: 41900 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:04:30,103-Speed 5446.63 samples/sec Loss 9.3633 LearningRate 0.2116 Epoch: 4 Global Step: 41910 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:04:37,639-Speed 5435.36 samples/sec Loss 9.3913 LearningRate 0.2116 Epoch: 4 Global Step: 41920 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:04:45,162-Speed 5445.95 samples/sec Loss 9.3657 LearningRate 0.2116 Epoch: 4 Global Step: 41930 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:04:52,641-Speed 5477.46 samples/sec Loss 9.3685 LearningRate 0.2116 Epoch: 4 Global Step: 41940 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:05:00,213-Speed 5409.73 samples/sec Loss 9.3834 LearningRate 0.2115 Epoch: 4 Global Step: 41950 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:05:07,663-Speed 5498.70 samples/sec Loss 9.4267 LearningRate 0.2115 Epoch: 4 Global Step: 41960 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:05:15,145-Speed 5476.09 samples/sec Loss 9.2825 LearningRate 0.2115 Epoch: 4 Global Step: 41970 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:05:22,691-Speed 5428.82 samples/sec Loss 9.3149 LearningRate 0.2115 Epoch: 4 Global Step: 41980 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:05:30,270-Speed 5404.83 samples/sec Loss 9.3627 LearningRate 0.2114 Epoch: 4 Global Step: 41990 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:05:37,758-Speed 5470.55 samples/sec Loss 9.2753 LearningRate 0.2114 Epoch: 4 Global Step: 42000 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:06:22,275-[lfw][42000]XNorm: 23.390701 Training: 2022-01-08 04:06:22,276-[lfw][42000]Accuracy-Flip: 0.99717+-0.00317 Training: 2022-01-08 04:06:22,276-[lfw][42000]Accuracy-Highest: 0.99800 Training: 2022-01-08 04:07:14,015-[cfp_fp][42000]XNorm: 21.024187 Training: 2022-01-08 04:07:14,016-[cfp_fp][42000]Accuracy-Flip: 0.98086+-0.00583 Training: 2022-01-08 04:07:14,017-[cfp_fp][42000]Accuracy-Highest: 0.98600 Training: 2022-01-08 04:07:59,728-[agedb_30][42000]XNorm: 23.162721 Training: 2022-01-08 04:07:59,729-[agedb_30][42000]Accuracy-Flip: 0.96750+-0.00880 Training: 2022-01-08 04:07:59,730-[agedb_30][42000]Accuracy-Highest: 0.97250 Training: 2022-01-08 04:08:07,536-Speed 273.47 samples/sec Loss 9.3577 LearningRate 0.2114 Epoch: 4 Global Step: 42010 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:08:15,210-Speed 5339.04 samples/sec Loss 9.4016 LearningRate 0.2113 Epoch: 4 Global Step: 42020 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:08:22,918-Speed 5315.14 samples/sec Loss 9.3557 LearningRate 0.2113 Epoch: 4 Global Step: 42030 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:08:30,551-Speed 5367.80 samples/sec Loss 9.3102 LearningRate 0.2113 Epoch: 4 Global Step: 42040 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:08:38,110-Speed 5420.01 samples/sec Loss 9.3468 LearningRate 0.2113 Epoch: 4 Global Step: 42050 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:08:45,615-Speed 5457.82 samples/sec Loss 9.3971 LearningRate 0.2112 Epoch: 4 Global Step: 42060 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:08:53,115-Speed 5462.36 samples/sec Loss 9.3715 LearningRate 0.2112 Epoch: 4 Global Step: 42070 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:09:00,729-Speed 5380.11 samples/sec Loss 9.3778 LearningRate 0.2112 Epoch: 4 Global Step: 42080 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:09:08,292-Speed 5417.05 samples/sec Loss 9.3661 LearningRate 0.2112 Epoch: 4 Global Step: 42090 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:09:15,816-Speed 5444.48 samples/sec Loss 9.4113 LearningRate 0.2111 Epoch: 4 Global Step: 42100 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:09:23,389-Speed 5409.66 samples/sec Loss 9.3112 LearningRate 0.2111 Epoch: 4 Global Step: 42110 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:09:30,827-Speed 5507.21 samples/sec Loss 9.3730 LearningRate 0.2111 Epoch: 4 Global Step: 42120 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:09:38,304-Speed 5479.17 samples/sec Loss 9.4089 LearningRate 0.2111 Epoch: 4 Global Step: 42130 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:09:45,844-Speed 5432.71 samples/sec Loss 9.3491 LearningRate 0.2110 Epoch: 4 Global Step: 42140 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:09:53,373-Speed 5441.42 samples/sec Loss 9.3291 LearningRate 0.2110 Epoch: 4 Global Step: 42150 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:10:00,851-Speed 5478.23 samples/sec Loss 9.3479 LearningRate 0.2110 Epoch: 4 Global Step: 42160 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:10:08,407-Speed 5421.68 samples/sec Loss 9.3034 LearningRate 0.2110 Epoch: 4 Global Step: 42170 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:10:15,992-Speed 5400.76 samples/sec Loss 9.3203 LearningRate 0.2109 Epoch: 4 Global Step: 42180 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:10:23,479-Speed 5471.55 samples/sec Loss 9.2430 LearningRate 0.2109 Epoch: 4 Global Step: 42190 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:10:30,996-Speed 5449.75 samples/sec Loss 9.3029 LearningRate 0.2109 Epoch: 4 Global Step: 42200 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:10:38,489-Speed 5466.72 samples/sec Loss 9.3816 LearningRate 0.2109 Epoch: 4 Global Step: 42210 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:10:46,034-Speed 5429.31 samples/sec Loss 9.3285 LearningRate 0.2108 Epoch: 4 Global Step: 42220 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:10:53,578-Speed 5430.77 samples/sec Loss 9.3246 LearningRate 0.2108 Epoch: 4 Global Step: 42230 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:11:01,050-Speed 5482.06 samples/sec Loss 9.3156 LearningRate 0.2108 Epoch: 4 Global Step: 42240 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:11:08,622-Speed 5410.01 samples/sec Loss 9.4037 LearningRate 0.2108 Epoch: 4 Global Step: 42250 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:11:16,157-Speed 5437.10 samples/sec Loss 9.4282 LearningRate 0.2107 Epoch: 4 Global Step: 42260 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:11:23,675-Speed 5448.75 samples/sec Loss 9.3602 LearningRate 0.2107 Epoch: 4 Global Step: 42270 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:11:31,341-Speed 5344.05 samples/sec Loss 9.3592 LearningRate 0.2107 Epoch: 4 Global Step: 42280 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:11:38,888-Speed 5428.02 samples/sec Loss 9.3488 LearningRate 0.2107 Epoch: 4 Global Step: 42290 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:11:46,479-Speed 5395.92 samples/sec Loss 9.3451 LearningRate 0.2106 Epoch: 4 Global Step: 42300 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:11:54,045-Speed 5414.81 samples/sec Loss 9.3458 LearningRate 0.2106 Epoch: 4 Global Step: 42310 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:12:01,632-Speed 5399.49 samples/sec Loss 9.3081 LearningRate 0.2106 Epoch: 4 Global Step: 42320 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:12:09,267-Speed 5366.96 samples/sec Loss 9.3011 LearningRate 0.2106 Epoch: 4 Global Step: 42330 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:12:16,869-Speed 5388.69 samples/sec Loss 9.3562 LearningRate 0.2105 Epoch: 4 Global Step: 42340 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:12:24,413-Speed 5430.07 samples/sec Loss 9.3106 LearningRate 0.2105 Epoch: 4 Global Step: 42350 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:12:32,075-Speed 5347.33 samples/sec Loss 9.3018 LearningRate 0.2105 Epoch: 4 Global Step: 42360 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:12:39,619-Speed 5430.28 samples/sec Loss 9.3288 LearningRate 0.2105 Epoch: 4 Global Step: 42370 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:12:47,140-Speed 5446.99 samples/sec Loss 9.2851 LearningRate 0.2104 Epoch: 4 Global Step: 42380 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:12:54,646-Speed 5457.96 samples/sec Loss 9.3833 LearningRate 0.2104 Epoch: 4 Global Step: 42390 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:13:02,165-Speed 5447.72 samples/sec Loss 9.3624 LearningRate 0.2104 Epoch: 4 Global Step: 42400 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:13:09,721-Speed 5421.12 samples/sec Loss 9.2986 LearningRate 0.2104 Epoch: 4 Global Step: 42410 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:13:17,294-Speed 5409.53 samples/sec Loss 9.3102 LearningRate 0.2103 Epoch: 4 Global Step: 42420 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:13:24,785-Speed 5469.29 samples/sec Loss 9.3331 LearningRate 0.2103 Epoch: 4 Global Step: 42430 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:13:32,326-Speed 5432.05 samples/sec Loss 9.3349 LearningRate 0.2103 Epoch: 4 Global Step: 42440 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:13:39,858-Speed 5438.84 samples/sec Loss 9.3244 LearningRate 0.2103 Epoch: 4 Global Step: 42450 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:13:47,389-Speed 5439.59 samples/sec Loss 9.3432 LearningRate 0.2102 Epoch: 4 Global Step: 42460 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:13:54,965-Speed 5408.46 samples/sec Loss 9.4136 LearningRate 0.2102 Epoch: 4 Global Step: 42470 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:14:02,581-Speed 5378.31 samples/sec Loss 9.3457 LearningRate 0.2102 Epoch: 4 Global Step: 42480 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:14:10,183-Speed 5389.01 samples/sec Loss 9.3476 LearningRate 0.2101 Epoch: 4 Global Step: 42490 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:14:17,846-Speed 5345.69 samples/sec Loss 9.3498 LearningRate 0.2101 Epoch: 4 Global Step: 42500 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:14:25,365-Speed 5449.00 samples/sec Loss 9.3022 LearningRate 0.2101 Epoch: 4 Global Step: 42510 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:14:32,842-Speed 5478.71 samples/sec Loss 9.3533 LearningRate 0.2101 Epoch: 4 Global Step: 42520 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:14:40,485-Speed 5359.78 samples/sec Loss 9.2067 LearningRate 0.2100 Epoch: 4 Global Step: 42530 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:14:48,026-Speed 5431.89 samples/sec Loss 9.3262 LearningRate 0.2100 Epoch: 4 Global Step: 42540 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:14:55,527-Speed 5461.81 samples/sec Loss 9.2507 LearningRate 0.2100 Epoch: 4 Global Step: 42550 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:15:02,994-Speed 5486.45 samples/sec Loss 9.3675 LearningRate 0.2100 Epoch: 4 Global Step: 42560 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:15:10,563-Speed 5411.91 samples/sec Loss 9.2152 LearningRate 0.2099 Epoch: 4 Global Step: 42570 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:15:18,091-Speed 5442.23 samples/sec Loss 9.3118 LearningRate 0.2099 Epoch: 4 Global Step: 42580 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:15:25,649-Speed 5420.53 samples/sec Loss 9.3144 LearningRate 0.2099 Epoch: 4 Global Step: 42590 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:15:33,135-Speed 5471.98 samples/sec Loss 9.2659 LearningRate 0.2099 Epoch: 4 Global Step: 42600 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:15:40,769-Speed 5366.47 samples/sec Loss 9.3136 LearningRate 0.2098 Epoch: 4 Global Step: 42610 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:15:48,447-Speed 5335.52 samples/sec Loss 9.3428 LearningRate 0.2098 Epoch: 4 Global Step: 42620 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:15:55,979-Speed 5438.70 samples/sec Loss 9.3469 LearningRate 0.2098 Epoch: 4 Global Step: 42630 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:16:03,612-Speed 5367.12 samples/sec Loss 9.3447 LearningRate 0.2098 Epoch: 4 Global Step: 42640 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:16:11,210-Speed 5391.75 samples/sec Loss 9.2877 LearningRate 0.2097 Epoch: 4 Global Step: 42650 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:16:18,682-Speed 5482.64 samples/sec Loss 9.2621 LearningRate 0.2097 Epoch: 4 Global Step: 42660 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:16:26,187-Speed 5458.24 samples/sec Loss 9.2098 LearningRate 0.2097 Epoch: 4 Global Step: 42670 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:16:33,702-Speed 5450.88 samples/sec Loss 9.2515 LearningRate 0.2097 Epoch: 4 Global Step: 42680 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:16:41,419-Speed 5309.22 samples/sec Loss 9.3625 LearningRate 0.2096 Epoch: 4 Global Step: 42690 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:16:48,940-Speed 5446.92 samples/sec Loss 9.2873 LearningRate 0.2096 Epoch: 4 Global Step: 42700 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:16:56,450-Speed 5454.56 samples/sec Loss 9.3077 LearningRate 0.2096 Epoch: 4 Global Step: 42710 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:17:04,025-Speed 5408.38 samples/sec Loss 9.2293 LearningRate 0.2096 Epoch: 4 Global Step: 42720 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:17:11,572-Speed 5428.03 samples/sec Loss 9.3012 LearningRate 0.2095 Epoch: 4 Global Step: 42730 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:17:19,038-Speed 5486.76 samples/sec Loss 9.3078 LearningRate 0.2095 Epoch: 4 Global Step: 42740 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:17:26,523-Speed 5472.96 samples/sec Loss 9.2668 LearningRate 0.2095 Epoch: 4 Global Step: 42750 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:17:33,994-Speed 5483.72 samples/sec Loss 9.3140 LearningRate 0.2095 Epoch: 4 Global Step: 42760 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:17:41,522-Speed 5441.94 samples/sec Loss 9.3400 LearningRate 0.2094 Epoch: 4 Global Step: 42770 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:17:49,062-Speed 5432.93 samples/sec Loss 9.2410 LearningRate 0.2094 Epoch: 4 Global Step: 42780 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:17:56,685-Speed 5373.80 samples/sec Loss 9.2912 LearningRate 0.2094 Epoch: 4 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:18:04,289-Speed 5387.45 samples/sec Loss 9.2539 LearningRate 0.2094 Epoch: 4 Global Step: 42800 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:18:11,832-Speed 5431.24 samples/sec Loss 9.2886 LearningRate 0.2093 Epoch: 4 Global Step: 42810 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:18:19,316-Speed 5473.59 samples/sec Loss 9.2016 LearningRate 0.2093 Epoch: 4 Global Step: 42820 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:18:26,788-Speed 5482.52 samples/sec Loss 9.2130 LearningRate 0.2093 Epoch: 4 Global Step: 42830 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:18:34,237-Speed 5499.71 samples/sec Loss 9.3460 LearningRate 0.2093 Epoch: 4 Global Step: 42840 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:18:41,814-Speed 5406.67 samples/sec Loss 9.3177 LearningRate 0.2092 Epoch: 4 Global Step: 42850 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:18:49,376-Speed 5416.93 samples/sec Loss 9.3307 LearningRate 0.2092 Epoch: 4 Global Step: 42860 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:18:56,935-Speed 5419.46 samples/sec Loss 9.3003 LearningRate 0.2092 Epoch: 4 Global Step: 42870 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:19:04,342-Speed 5530.46 samples/sec Loss 9.2814 LearningRate 0.2092 Epoch: 4 Global Step: 42880 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:19:11,987-Speed 5358.60 samples/sec Loss 9.2315 LearningRate 0.2091 Epoch: 4 Global Step: 42890 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:19:19,412-Speed 5517.81 samples/sec Loss 9.3113 LearningRate 0.2091 Epoch: 4 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:19:26,940-Speed 5441.09 samples/sec Loss 9.3388 LearningRate 0.2091 Epoch: 4 Global Step: 42910 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:19:34,514-Speed 5409.09 samples/sec Loss 9.2730 LearningRate 0.2091 Epoch: 4 Global Step: 42920 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:19:42,032-Speed 5448.97 samples/sec Loss 9.2639 LearningRate 0.2090 Epoch: 4 Global Step: 42930 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:19:49,558-Speed 5442.89 samples/sec Loss 9.3397 LearningRate 0.2090 Epoch: 4 Global Step: 42940 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:19:57,079-Speed 5446.87 samples/sec Loss 9.2083 LearningRate 0.2090 Epoch: 4 Global Step: 42950 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:20:04,591-Speed 5453.37 samples/sec Loss 9.2107 LearningRate 0.2090 Epoch: 4 Global Step: 42960 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:20:12,105-Speed 5452.16 samples/sec Loss 9.2313 LearningRate 0.2089 Epoch: 4 Global Step: 42970 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:20:19,555-Speed 5498.86 samples/sec Loss 9.3115 LearningRate 0.2089 Epoch: 4 Global Step: 42980 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:20:27,047-Speed 5467.47 samples/sec Loss 9.2993 LearningRate 0.2089 Epoch: 4 Global Step: 42990 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:20:34,634-Speed 5399.26 samples/sec Loss 9.2418 LearningRate 0.2089 Epoch: 4 Global Step: 43000 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:20:42,098-Speed 5488.43 samples/sec Loss 9.2398 LearningRate 0.2088 Epoch: 4 Global Step: 43010 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:20:49,599-Speed 5461.71 samples/sec Loss 9.1914 LearningRate 0.2088 Epoch: 4 Global Step: 43020 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:20:57,072-Speed 5482.19 samples/sec Loss 9.2625 LearningRate 0.2088 Epoch: 4 Global Step: 43030 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:21:04,596-Speed 5444.12 samples/sec Loss 9.1684 LearningRate 0.2088 Epoch: 4 Global Step: 43040 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:21:12,146-Speed 5426.22 samples/sec Loss 9.2882 LearningRate 0.2087 Epoch: 4 Global Step: 43050 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:21:19,653-Speed 5457.63 samples/sec Loss 9.3283 LearningRate 0.2087 Epoch: 4 Global Step: 43060 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:21:27,122-Speed 5484.63 samples/sec Loss 9.2995 LearningRate 0.2087 Epoch: 4 Global Step: 43070 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:21:34,608-Speed 5472.10 samples/sec Loss 9.2162 LearningRate 0.2086 Epoch: 4 Global Step: 43080 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:21:42,217-Speed 5383.96 samples/sec Loss 9.2559 LearningRate 0.2086 Epoch: 4 Global Step: 43090 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:21:49,649-Speed 5511.86 samples/sec Loss 9.3275 LearningRate 0.2086 Epoch: 4 Global Step: 43100 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:21:57,183-Speed 5437.50 samples/sec Loss 9.2082 LearningRate 0.2086 Epoch: 4 Global Step: 43110 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:22:04,820-Speed 5364.45 samples/sec Loss 9.2602 LearningRate 0.2085 Epoch: 4 Global Step: 43120 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:22:12,394-Speed 5408.45 samples/sec Loss 9.3123 LearningRate 0.2085 Epoch: 4 Global Step: 43130 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:22:19,959-Speed 5415.14 samples/sec Loss 9.2292 LearningRate 0.2085 Epoch: 4 Global Step: 43140 Fp16 Grad Scale: 262144 Required: 38 hours Training: 2022-01-08 04:22:27,460-Speed 5461.31 samples/sec Loss 9.3010 LearningRate 0.2085 Epoch: 4 Global Step: 43150 Fp16 Grad Scale: 131072 Required: 38 hours Training: 2022-01-08 04:22:34,946-Speed 5472.75 samples/sec Loss 9.2648 LearningRate 0.2084 Epoch: 4 Global Step: 43160 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:22:42,455-Speed 5455.27 samples/sec Loss 9.2716 LearningRate 0.2084 Epoch: 4 Global Step: 43170 Fp16 Grad Scale: 65536 Required: 38 hours Training: 2022-01-08 04:22:50,030-Speed 5407.55 samples/sec Loss 9.2025 LearningRate 0.2084 Epoch: 4 Global Step: 43180 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:22:57,569-Speed 5434.07 samples/sec Loss 9.2713 LearningRate 0.2084 Epoch: 4 Global Step: 43190 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:23:05,131-Speed 5416.94 samples/sec Loss 9.2203 LearningRate 0.2083 Epoch: 4 Global Step: 43200 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:23:12,707-Speed 5407.87 samples/sec Loss 9.2329 LearningRate 0.2083 Epoch: 4 Global Step: 43210 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:23:20,251-Speed 5430.01 samples/sec Loss 9.3008 LearningRate 0.2083 Epoch: 4 Global Step: 43220 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:23:27,736-Speed 5472.43 samples/sec Loss 9.2895 LearningRate 0.2083 Epoch: 4 Global Step: 43230 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:23:35,274-Speed 5434.57 samples/sec Loss 9.2694 LearningRate 0.2082 Epoch: 4 Global Step: 43240 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:23:42,876-Speed 5389.00 samples/sec Loss 9.2296 LearningRate 0.2082 Epoch: 4 Global Step: 43250 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:23:50,412-Speed 5435.59 samples/sec Loss 9.2310 LearningRate 0.2082 Epoch: 4 Global Step: 43260 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:23:58,906-Speed 5514.76 samples/sec Loss 9.1479 LearningRate 0.2082 Epoch: 4 Global Step: 43270 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:24:06,460-Speed 5422.82 samples/sec Loss 9.2896 LearningRate 0.2081 Epoch: 4 Global Step: 43280 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:24:13,997-Speed 5435.31 samples/sec Loss 9.3310 LearningRate 0.2081 Epoch: 4 Global Step: 43290 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:24:21,477-Speed 5477.32 samples/sec Loss 9.2363 LearningRate 0.2081 Epoch: 4 Global Step: 43300 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:24:28,982-Speed 5458.01 samples/sec Loss 9.2667 LearningRate 0.2081 Epoch: 4 Global Step: 43310 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:24:36,536-Speed 5423.15 samples/sec Loss 9.2889 LearningRate 0.2080 Epoch: 4 Global Step: 43320 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:24:44,020-Speed 5473.68 samples/sec Loss 9.1796 LearningRate 0.2080 Epoch: 4 Global Step: 43330 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:24:51,558-Speed 5434.85 samples/sec Loss 9.2422 LearningRate 0.2080 Epoch: 4 Global Step: 43340 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:24:59,164-Speed 5386.22 samples/sec Loss 9.2196 LearningRate 0.2080 Epoch: 4 Global Step: 43350 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:25:06,729-Speed 5414.64 samples/sec Loss 9.2528 LearningRate 0.2079 Epoch: 4 Global Step: 43360 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:25:14,243-Speed 5451.81 samples/sec Loss 9.2098 LearningRate 0.2079 Epoch: 4 Global Step: 43370 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:25:21,910-Speed 5344.13 samples/sec Loss 9.3001 LearningRate 0.2079 Epoch: 4 Global Step: 43380 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:25:29,445-Speed 5436.31 samples/sec Loss 9.3410 LearningRate 0.2079 Epoch: 4 Global Step: 43390 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:25:36,954-Speed 5455.21 samples/sec Loss 9.2262 LearningRate 0.2078 Epoch: 4 Global Step: 43400 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:25:44,513-Speed 5419.09 samples/sec Loss 9.1748 LearningRate 0.2078 Epoch: 4 Global Step: 43410 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:25:52,029-Speed 5450.97 samples/sec Loss 9.1102 LearningRate 0.2078 Epoch: 4 Global Step: 43420 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:25:59,578-Speed 5426.40 samples/sec Loss 9.2119 LearningRate 0.2078 Epoch: 4 Global Step: 43430 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:26:07,077-Speed 5462.59 samples/sec Loss 9.2914 LearningRate 0.2077 Epoch: 4 Global Step: 43440 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:26:14,695-Speed 5377.50 samples/sec Loss 9.2907 LearningRate 0.2077 Epoch: 4 Global Step: 43450 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:26:22,310-Speed 5379.82 samples/sec Loss 9.2142 LearningRate 0.2077 Epoch: 4 Global Step: 43460 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:26:29,869-Speed 5419.28 samples/sec Loss 9.2320 LearningRate 0.2077 Epoch: 4 Global Step: 43470 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:26:37,398-Speed 5440.91 samples/sec Loss 9.2045 LearningRate 0.2076 Epoch: 4 Global Step: 43480 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:26:44,866-Speed 5485.63 samples/sec Loss 9.1607 LearningRate 0.2076 Epoch: 4 Global Step: 43490 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:26:52,366-Speed 5462.43 samples/sec Loss 9.2223 LearningRate 0.2076 Epoch: 4 Global Step: 43500 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:26:59,929-Speed 5416.29 samples/sec Loss 9.2849 LearningRate 0.2076 Epoch: 4 Global Step: 43510 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:27:07,421-Speed 5467.95 samples/sec Loss 9.2050 LearningRate 0.2075 Epoch: 4 Global Step: 43520 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:27:15,063-Speed 5360.65 samples/sec Loss 9.2247 LearningRate 0.2075 Epoch: 4 Global Step: 43530 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:27:22,583-Speed 5447.66 samples/sec Loss 9.2090 LearningRate 0.2075 Epoch: 4 Global Step: 43540 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:27:30,100-Speed 5449.63 samples/sec Loss 9.2329 LearningRate 0.2075 Epoch: 4 Global Step: 43550 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:27:37,636-Speed 5435.58 samples/sec Loss 9.2068 LearningRate 0.2074 Epoch: 4 Global Step: 43560 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:27:45,139-Speed 5460.26 samples/sec Loss 9.1638 LearningRate 0.2074 Epoch: 4 Global Step: 43570 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:27:52,669-Speed 5440.65 samples/sec Loss 9.2132 LearningRate 0.2074 Epoch: 4 Global Step: 43580 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:28:00,120-Speed 5497.60 samples/sec Loss 9.2550 LearningRate 0.2074 Epoch: 4 Global Step: 43590 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:28:07,591-Speed 5483.52 samples/sec Loss 9.2359 LearningRate 0.2073 Epoch: 4 Global Step: 43600 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:28:15,137-Speed 5428.62 samples/sec Loss 9.1240 LearningRate 0.2073 Epoch: 4 Global Step: 43610 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:28:22,656-Speed 5448.43 samples/sec Loss 9.2587 LearningRate 0.2073 Epoch: 4 Global Step: 43620 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:28:30,125-Speed 5485.19 samples/sec Loss 9.1839 LearningRate 0.2073 Epoch: 4 Global Step: 43630 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:28:37,568-Speed 5503.11 samples/sec Loss 9.2305 LearningRate 0.2072 Epoch: 4 Global Step: 43640 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:28:45,044-Speed 5479.47 samples/sec Loss 9.1825 LearningRate 0.2072 Epoch: 4 Global Step: 43650 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:28:52,546-Speed 5461.51 samples/sec Loss 9.2399 LearningRate 0.2072 Epoch: 4 Global Step: 43660 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:29:00,024-Speed 5478.13 samples/sec Loss 9.1764 LearningRate 0.2072 Epoch: 4 Global Step: 43670 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:29:07,524-Speed 5461.69 samples/sec Loss 9.2480 LearningRate 0.2071 Epoch: 4 Global Step: 43680 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:29:15,020-Speed 5464.73 samples/sec Loss 9.1352 LearningRate 0.2071 Epoch: 4 Global Step: 43690 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:29:22,547-Speed 5443.01 samples/sec Loss 9.1267 LearningRate 0.2071 Epoch: 4 Global Step: 43700 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:29:30,022-Speed 5480.37 samples/sec Loss 9.2407 LearningRate 0.2071 Epoch: 4 Global Step: 43710 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:29:37,472-Speed 5498.35 samples/sec Loss 9.2858 LearningRate 0.2070 Epoch: 4 Global Step: 43720 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:29:44,988-Speed 5450.52 samples/sec Loss 9.1372 LearningRate 0.2070 Epoch: 4 Global Step: 43730 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:29:52,573-Speed 5400.89 samples/sec Loss 9.2368 LearningRate 0.2070 Epoch: 4 Global Step: 43740 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:30:00,054-Speed 5476.29 samples/sec Loss 9.2904 LearningRate 0.2070 Epoch: 4 Global Step: 43750 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:30:07,539-Speed 5473.19 samples/sec Loss 9.2981 LearningRate 0.2069 Epoch: 4 Global Step: 43760 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:30:14,982-Speed 5503.59 samples/sec Loss 9.1991 LearningRate 0.2069 Epoch: 4 Global Step: 43770 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:30:22,459-Speed 5479.30 samples/sec Loss 9.2620 LearningRate 0.2069 Epoch: 4 Global Step: 43780 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:30:29,905-Speed 5501.34 samples/sec Loss 9.2067 LearningRate 0.2068 Epoch: 4 Global Step: 43790 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:30:37,386-Speed 5476.13 samples/sec Loss 9.2453 LearningRate 0.2068 Epoch: 4 Global Step: 43800 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:30:44,937-Speed 5424.86 samples/sec Loss 9.2299 LearningRate 0.2068 Epoch: 4 Global Step: 43810 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:30:52,495-Speed 5420.41 samples/sec Loss 9.1861 LearningRate 0.2068 Epoch: 4 Global Step: 43820 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:31:00,114-Speed 5376.63 samples/sec Loss 9.2230 LearningRate 0.2067 Epoch: 4 Global Step: 43830 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:31:07,647-Speed 5438.12 samples/sec Loss 9.2127 LearningRate 0.2067 Epoch: 4 Global Step: 43840 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:31:15,173-Speed 5443.15 samples/sec Loss 9.2208 LearningRate 0.2067 Epoch: 4 Global Step: 43850 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:31:22,714-Speed 5432.56 samples/sec Loss 9.2282 LearningRate 0.2067 Epoch: 4 Global Step: 43860 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:31:30,226-Speed 5453.09 samples/sec Loss 9.2208 LearningRate 0.2066 Epoch: 4 Global Step: 43870 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:31:37,674-Speed 5500.05 samples/sec Loss 9.1704 LearningRate 0.2066 Epoch: 4 Global Step: 43880 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:31:45,316-Speed 5360.79 samples/sec Loss 9.2046 LearningRate 0.2066 Epoch: 4 Global Step: 43890 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:31:52,771-Speed 5495.20 samples/sec Loss 9.1914 LearningRate 0.2066 Epoch: 4 Global Step: 43900 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:32:00,281-Speed 5455.25 samples/sec Loss 9.1410 LearningRate 0.2065 Epoch: 4 Global Step: 43910 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:32:07,768-Speed 5471.33 samples/sec Loss 9.1724 LearningRate 0.2065 Epoch: 4 Global Step: 43920 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:32:15,356-Speed 5398.86 samples/sec Loss 9.2606 LearningRate 0.2065 Epoch: 4 Global Step: 43930 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:32:22,797-Speed 5504.76 samples/sec Loss 9.1921 LearningRate 0.2065 Epoch: 4 Global Step: 43940 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:32:30,337-Speed 5433.40 samples/sec Loss 9.1999 LearningRate 0.2064 Epoch: 4 Global Step: 43950 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:32:37,888-Speed 5425.16 samples/sec Loss 9.2180 LearningRate 0.2064 Epoch: 4 Global Step: 43960 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:32:45,363-Speed 5480.35 samples/sec Loss 9.1248 LearningRate 0.2064 Epoch: 4 Global Step: 43970 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:32:53,017-Speed 5352.75 samples/sec Loss 9.2272 LearningRate 0.2064 Epoch: 4 Global Step: 43980 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:33:00,542-Speed 5443.66 samples/sec Loss 9.2201 LearningRate 0.2063 Epoch: 4 Global Step: 43990 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:33:08,052-Speed 5455.05 samples/sec Loss 9.2303 LearningRate 0.2063 Epoch: 4 Global Step: 44000 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:33:52,652-[lfw][44000]XNorm: 23.556365 Training: 2022-01-08 04:33:52,653-[lfw][44000]Accuracy-Flip: 0.99700+-0.00314 Training: 2022-01-08 04:33:52,653-[lfw][44000]Accuracy-Highest: 0.99800 Training: 2022-01-08 04:34:45,178-[cfp_fp][44000]XNorm: 21.248845 Training: 2022-01-08 04:34:45,179-[cfp_fp][44000]Accuracy-Flip: 0.98329+-0.00515 Training: 2022-01-08 04:34:45,180-[cfp_fp][44000]Accuracy-Highest: 0.98600 Training: 2022-01-08 04:35:30,701-[agedb_30][44000]XNorm: 23.516151 Training: 2022-01-08 04:35:30,701-[agedb_30][44000]Accuracy-Flip: 0.96950+-0.00723 Training: 2022-01-08 04:35:30,702-[agedb_30][44000]Accuracy-Highest: 0.97250 Training: 2022-01-08 04:35:38,319-Speed 272.58 samples/sec Loss 9.1919 LearningRate 0.2063 Epoch: 4 Global Step: 44010 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:35:45,831-Speed 5454.26 samples/sec Loss 9.1334 LearningRate 0.2063 Epoch: 4 Global Step: 44020 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:35:53,319-Speed 5471.62 samples/sec Loss 9.1505 LearningRate 0.2062 Epoch: 4 Global Step: 44030 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:36:00,757-Speed 5508.37 samples/sec Loss 9.1529 LearningRate 0.2062 Epoch: 4 Global Step: 44040 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:36:08,320-Speed 5416.34 samples/sec Loss 9.2418 LearningRate 0.2062 Epoch: 4 Global Step: 44050 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:36:15,836-Speed 5450.77 samples/sec Loss 9.2004 LearningRate 0.2062 Epoch: 4 Global Step: 44060 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:36:23,291-Speed 5495.16 samples/sec Loss 9.1994 LearningRate 0.2061 Epoch: 4 Global Step: 44070 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:36:30,851-Speed 5418.41 samples/sec Loss 9.1670 LearningRate 0.2061 Epoch: 4 Global Step: 44080 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:36:38,418-Speed 5413.73 samples/sec Loss 9.2242 LearningRate 0.2061 Epoch: 4 Global Step: 44090 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:36:45,893-Speed 5480.62 samples/sec Loss 9.2312 LearningRate 0.2061 Epoch: 4 Global Step: 44100 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:36:53,366-Speed 5482.07 samples/sec Loss 9.2209 LearningRate 0.2060 Epoch: 4 Global Step: 44110 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:37:00,893-Speed 5442.34 samples/sec Loss 9.1944 LearningRate 0.2060 Epoch: 4 Global Step: 44120 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:37:08,411-Speed 5448.99 samples/sec Loss 9.1691 LearningRate 0.2060 Epoch: 4 Global Step: 44130 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:37:15,858-Speed 5500.43 samples/sec Loss 9.1351 LearningRate 0.2060 Epoch: 4 Global Step: 44140 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:37:23,266-Speed 5530.05 samples/sec Loss 9.1640 LearningRate 0.2059 Epoch: 4 Global Step: 44150 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:37:30,834-Speed 5412.81 samples/sec Loss 9.1094 LearningRate 0.2059 Epoch: 4 Global Step: 44160 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:37:38,430-Speed 5393.04 samples/sec Loss 9.2770 LearningRate 0.2059 Epoch: 4 Global Step: 44170 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:37:46,064-Speed 5366.51 samples/sec Loss 9.2047 LearningRate 0.2059 Epoch: 4 Global Step: 44180 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:37:53,561-Speed 5464.70 samples/sec Loss 9.1667 LearningRate 0.2058 Epoch: 4 Global Step: 44190 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:38:01,094-Speed 5437.60 samples/sec Loss 9.1590 LearningRate 0.2058 Epoch: 4 Global Step: 44200 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:38:08,811-Speed 5308.63 samples/sec Loss 9.1940 LearningRate 0.2058 Epoch: 4 Global Step: 44210 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:38:16,372-Speed 5418.16 samples/sec Loss 9.1606 LearningRate 0.2058 Epoch: 4 Global Step: 44220 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:38:23,910-Speed 5434.14 samples/sec Loss 9.1498 LearningRate 0.2057 Epoch: 4 Global Step: 44230 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:38:31,443-Speed 5438.35 samples/sec Loss 9.2469 LearningRate 0.2057 Epoch: 4 Global Step: 44240 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:38:38,991-Speed 5426.94 samples/sec Loss 9.1852 LearningRate 0.2057 Epoch: 4 Global Step: 44250 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:38:46,526-Speed 5436.82 samples/sec Loss 9.2092 LearningRate 0.2057 Epoch: 4 Global Step: 44260 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:38:54,099-Speed 5409.32 samples/sec Loss 9.1451 LearningRate 0.2056 Epoch: 4 Global Step: 44270 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:39:01,603-Speed 5459.34 samples/sec Loss 9.2355 LearningRate 0.2056 Epoch: 4 Global Step: 44280 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:39:09,119-Speed 5450.66 samples/sec Loss 9.2110 LearningRate 0.2056 Epoch: 4 Global Step: 44290 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:39:16,659-Speed 5433.17 samples/sec Loss 9.1066 LearningRate 0.2056 Epoch: 4 Global Step: 44300 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:39:24,312-Speed 5352.68 samples/sec Loss 9.1866 LearningRate 0.2055 Epoch: 4 Global Step: 44310 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:39:31,794-Speed 5475.00 samples/sec Loss 9.2165 LearningRate 0.2055 Epoch: 4 Global Step: 44320 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:39:39,264-Speed 5483.82 samples/sec Loss 9.1346 LearningRate 0.2055 Epoch: 4 Global Step: 44330 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:39:46,759-Speed 5465.56 samples/sec Loss 9.1866 LearningRate 0.2055 Epoch: 4 Global Step: 44340 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:39:54,311-Speed 5424.71 samples/sec Loss 9.1343 LearningRate 0.2054 Epoch: 4 Global Step: 44350 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:40:02,163-Speed 5217.27 samples/sec Loss 9.2331 LearningRate 0.2054 Epoch: 4 Global Step: 44360 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:40:09,757-Speed 5394.33 samples/sec Loss 9.1492 LearningRate 0.2054 Epoch: 4 Global Step: 44370 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:40:17,330-Speed 5409.31 samples/sec Loss 9.1538 LearningRate 0.2054 Epoch: 4 Global Step: 44380 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:40:24,846-Speed 5450.08 samples/sec Loss 9.0995 LearningRate 0.2053 Epoch: 4 Global Step: 44390 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:40:32,379-Speed 5438.72 samples/sec Loss 9.1511 LearningRate 0.2053 Epoch: 4 Global Step: 44400 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:40:39,916-Speed 5435.37 samples/sec Loss 9.1611 LearningRate 0.2053 Epoch: 4 Global Step: 44410 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:40:47,536-Speed 5375.78 samples/sec Loss 9.1303 LearningRate 0.2053 Epoch: 4 Global Step: 44420 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:40:55,005-Speed 5484.49 samples/sec Loss 9.1399 LearningRate 0.2052 Epoch: 4 Global Step: 44430 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:41:02,478-Speed 5482.30 samples/sec Loss 9.1359 LearningRate 0.2052 Epoch: 4 Global Step: 44440 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:41:09,924-Speed 5501.88 samples/sec Loss 9.1355 LearningRate 0.2052 Epoch: 4 Global Step: 44450 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:41:17,510-Speed 5399.74 samples/sec Loss 9.1860 LearningRate 0.2052 Epoch: 4 Global Step: 44460 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:41:25,090-Speed 5404.59 samples/sec Loss 9.1421 LearningRate 0.2051 Epoch: 4 Global Step: 44470 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:41:32,617-Speed 5442.18 samples/sec Loss 9.1992 LearningRate 0.2051 Epoch: 4 Global Step: 44480 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:41:40,142-Speed 5444.27 samples/sec Loss 9.2101 LearningRate 0.2051 Epoch: 4 Global Step: 44490 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:41:47,646-Speed 5459.44 samples/sec Loss 9.1643 LearningRate 0.2051 Epoch: 4 Global Step: 44500 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:41:55,178-Speed 5438.87 samples/sec Loss 9.1807 LearningRate 0.2050 Epoch: 4 Global Step: 44510 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:42:02,654-Speed 5479.32 samples/sec Loss 9.1981 LearningRate 0.2050 Epoch: 4 Global Step: 44520 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:42:10,165-Speed 5453.89 samples/sec Loss 9.0902 LearningRate 0.2050 Epoch: 4 Global Step: 44530 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:42:17,784-Speed 5377.18 samples/sec Loss 9.1291 LearningRate 0.2050 Epoch: 4 Global Step: 44540 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:42:25,310-Speed 5442.84 samples/sec Loss 9.1503 LearningRate 0.2049 Epoch: 4 Global Step: 44550 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:42:32,887-Speed 5407.00 samples/sec Loss 9.0983 LearningRate 0.2049 Epoch: 4 Global Step: 44560 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:42:40,503-Speed 5378.70 samples/sec Loss 9.1275 LearningRate 0.2049 Epoch: 4 Global Step: 44570 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:42:48,061-Speed 5420.44 samples/sec Loss 9.1182 LearningRate 0.2049 Epoch: 4 Global Step: 44580 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:42:55,640-Speed 5404.95 samples/sec Loss 9.2183 LearningRate 0.2048 Epoch: 4 Global Step: 44590 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:43:03,258-Speed 5377.51 samples/sec Loss 9.2029 LearningRate 0.2048 Epoch: 4 Global Step: 44600 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:43:10,870-Speed 5381.35 samples/sec Loss 9.1309 LearningRate 0.2048 Epoch: 4 Global Step: 44610 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:43:18,595-Speed 5302.91 samples/sec Loss 9.0653 LearningRate 0.2048 Epoch: 4 Global Step: 44620 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:43:26,401-Speed 5248.30 samples/sec Loss 9.1558 LearningRate 0.2047 Epoch: 4 Global Step: 44630 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:43:33,979-Speed 5405.16 samples/sec Loss 9.1104 LearningRate 0.2047 Epoch: 4 Global Step: 44640 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:43:41,529-Speed 5426.47 samples/sec Loss 9.0717 LearningRate 0.2047 Epoch: 4 Global Step: 44650 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:43:48,988-Speed 5491.68 samples/sec Loss 9.1783 LearningRate 0.2047 Epoch: 4 Global Step: 44660 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:43:56,575-Speed 5399.90 samples/sec Loss 9.1826 LearningRate 0.2046 Epoch: 4 Global Step: 44670 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:44:04,192-Speed 5377.71 samples/sec Loss 9.1236 LearningRate 0.2046 Epoch: 4 Global Step: 44680 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:44:11,809-Speed 5378.32 samples/sec Loss 9.1477 LearningRate 0.2046 Epoch: 4 Global Step: 44690 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:44:19,355-Speed 5428.83 samples/sec Loss 9.1653 LearningRate 0.2046 Epoch: 4 Global Step: 44700 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:44:27,024-Speed 5341.94 samples/sec Loss 9.1223 LearningRate 0.2045 Epoch: 4 Global Step: 44710 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:44:34,613-Speed 5397.64 samples/sec Loss 9.1097 LearningRate 0.2045 Epoch: 4 Global Step: 44720 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:44:42,154-Speed 5432.03 samples/sec Loss 9.2089 LearningRate 0.2045 Epoch: 4 Global Step: 44730 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:44:49,681-Speed 5442.86 samples/sec Loss 9.2182 LearningRate 0.2045 Epoch: 4 Global Step: 44740 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:44:57,187-Speed 5457.98 samples/sec Loss 9.1381 LearningRate 0.2044 Epoch: 4 Global Step: 44750 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:45:04,755-Speed 5412.69 samples/sec Loss 9.2367 LearningRate 0.2044 Epoch: 4 Global Step: 44760 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:45:12,319-Speed 5415.76 samples/sec Loss 9.1035 LearningRate 0.2044 Epoch: 4 Global Step: 44770 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:45:19,915-Speed 5393.09 samples/sec Loss 9.1226 LearningRate 0.2044 Epoch: 4 Global Step: 44780 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:45:27,492-Speed 5406.24 samples/sec Loss 9.0920 LearningRate 0.2043 Epoch: 4 Global Step: 44790 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:45:35,047-Speed 5422.21 samples/sec Loss 9.1605 LearningRate 0.2043 Epoch: 4 Global Step: 44800 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:45:42,581-Speed 5437.43 samples/sec Loss 9.1145 LearningRate 0.2043 Epoch: 4 Global Step: 44810 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:45:50,111-Speed 5440.15 samples/sec Loss 9.0923 LearningRate 0.2043 Epoch: 4 Global Step: 44820 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:45:57,747-Speed 5364.74 samples/sec Loss 9.0807 LearningRate 0.2042 Epoch: 4 Global Step: 44830 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:46:05,296-Speed 5426.86 samples/sec Loss 9.1551 LearningRate 0.2042 Epoch: 4 Global Step: 44840 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:46:12,898-Speed 5388.49 samples/sec Loss 9.2039 LearningRate 0.2042 Epoch: 4 Global Step: 44850 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:46:20,510-Speed 5381.77 samples/sec Loss 9.0664 LearningRate 0.2042 Epoch: 4 Global Step: 44860 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:46:28,022-Speed 5453.81 samples/sec Loss 9.1088 LearningRate 0.2041 Epoch: 4 Global Step: 44870 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:46:35,591-Speed 5411.94 samples/sec Loss 9.0809 LearningRate 0.2041 Epoch: 4 Global Step: 44880 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:46:43,097-Speed 5457.15 samples/sec Loss 9.0930 LearningRate 0.2041 Epoch: 4 Global Step: 44890 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:46:50,684-Speed 5399.64 samples/sec Loss 9.1272 LearningRate 0.2041 Epoch: 4 Global Step: 44900 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:46:58,302-Speed 5377.83 samples/sec Loss 9.0943 LearningRate 0.2040 Epoch: 4 Global Step: 44910 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:47:06,090-Speed 5260.11 samples/sec Loss 9.0692 LearningRate 0.2040 Epoch: 4 Global Step: 44920 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:47:13,736-Speed 5357.13 samples/sec Loss 9.1256 LearningRate 0.2040 Epoch: 4 Global Step: 44930 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:47:21,363-Speed 5371.71 samples/sec Loss 9.0408 LearningRate 0.2040 Epoch: 4 Global Step: 44940 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:47:28,987-Speed 5373.51 samples/sec Loss 9.1082 LearningRate 0.2039 Epoch: 4 Global Step: 44950 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:47:36,637-Speed 5354.24 samples/sec Loss 9.1129 LearningRate 0.2039 Epoch: 4 Global Step: 44960 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:47:44,146-Speed 5455.50 samples/sec Loss 9.0748 LearningRate 0.2039 Epoch: 4 Global Step: 44970 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:47:51,607-Speed 5490.80 samples/sec Loss 9.0761 LearningRate 0.2039 Epoch: 4 Global Step: 44980 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:47:59,193-Speed 5400.68 samples/sec Loss 9.1650 LearningRate 0.2038 Epoch: 4 Global Step: 44990 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:48:06,844-Speed 5354.10 samples/sec Loss 9.0382 LearningRate 0.2038 Epoch: 4 Global Step: 45000 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:48:14,482-Speed 5363.26 samples/sec Loss 9.0349 LearningRate 0.2038 Epoch: 4 Global Step: 45010 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:48:22,036-Speed 5423.03 samples/sec Loss 9.1199 LearningRate 0.2038 Epoch: 4 Global Step: 45020 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:48:29,747-Speed 5313.34 samples/sec Loss 9.0434 LearningRate 0.2037 Epoch: 4 Global Step: 45030 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:48:37,414-Speed 5342.71 samples/sec Loss 9.1012 LearningRate 0.2037 Epoch: 4 Global Step: 45040 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:48:44,986-Speed 5410.00 samples/sec Loss 9.1504 LearningRate 0.2037 Epoch: 4 Global Step: 45050 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:48:52,575-Speed 5397.84 samples/sec Loss 9.2329 LearningRate 0.2036 Epoch: 4 Global Step: 45060 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:49:00,205-Speed 5369.39 samples/sec Loss 9.1457 LearningRate 0.2036 Epoch: 4 Global Step: 45070 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:49:07,904-Speed 5320.54 samples/sec Loss 9.0674 LearningRate 0.2036 Epoch: 4 Global Step: 45080 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:49:15,572-Speed 5342.32 samples/sec Loss 9.1551 LearningRate 0.2036 Epoch: 4 Global Step: 45090 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:49:23,138-Speed 5414.92 samples/sec Loss 9.0531 LearningRate 0.2035 Epoch: 4 Global Step: 45100 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:49:30,818-Speed 5334.14 samples/sec Loss 9.1882 LearningRate 0.2035 Epoch: 4 Global Step: 45110 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:49:38,542-Speed 5303.65 samples/sec Loss 9.0677 LearningRate 0.2035 Epoch: 4 Global Step: 45120 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:49:46,249-Speed 5314.82 samples/sec Loss 9.1532 LearningRate 0.2035 Epoch: 4 Global Step: 45130 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:49:53,873-Speed 5373.56 samples/sec Loss 9.1603 LearningRate 0.2034 Epoch: 4 Global Step: 45140 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:50:01,463-Speed 5397.50 samples/sec Loss 9.0949 LearningRate 0.2034 Epoch: 4 Global Step: 45150 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:50:08,953-Speed 5469.36 samples/sec Loss 9.2165 LearningRate 0.2034 Epoch: 4 Global Step: 45160 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:50:16,396-Speed 5503.22 samples/sec Loss 9.1235 LearningRate 0.2034 Epoch: 4 Global Step: 45170 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:50:23,966-Speed 5411.92 samples/sec Loss 8.9951 LearningRate 0.2033 Epoch: 4 Global Step: 45180 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:50:31,515-Speed 5427.00 samples/sec Loss 9.1611 LearningRate 0.2033 Epoch: 4 Global Step: 45190 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:50:39,155-Speed 5361.94 samples/sec Loss 9.1031 LearningRate 0.2033 Epoch: 4 Global Step: 45200 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:50:46,947-Speed 5257.12 samples/sec Loss 9.0869 LearningRate 0.2033 Epoch: 4 Global Step: 45210 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:50:54,629-Speed 5332.88 samples/sec Loss 9.1016 LearningRate 0.2032 Epoch: 4 Global Step: 45220 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:51:02,314-Speed 5330.82 samples/sec Loss 9.0952 LearningRate 0.2032 Epoch: 4 Global Step: 45230 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:51:09,902-Speed 5398.14 samples/sec Loss 9.1727 LearningRate 0.2032 Epoch: 4 Global Step: 45240 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:51:17,560-Speed 5349.62 samples/sec Loss 9.0298 LearningRate 0.2032 Epoch: 4 Global Step: 45250 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:51:25,134-Speed 5408.52 samples/sec Loss 9.1311 LearningRate 0.2031 Epoch: 4 Global Step: 45260 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:51:32,730-Speed 5392.95 samples/sec Loss 9.0658 LearningRate 0.2031 Epoch: 4 Global Step: 45270 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:51:40,339-Speed 5383.75 samples/sec Loss 9.2031 LearningRate 0.2031 Epoch: 4 Global Step: 45280 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:51:47,884-Speed 5429.00 samples/sec Loss 9.0990 LearningRate 0.2031 Epoch: 4 Global Step: 45290 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:51:55,502-Speed 5378.28 samples/sec Loss 9.0985 LearningRate 0.2030 Epoch: 4 Global Step: 45300 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:52:03,166-Speed 5345.10 samples/sec Loss 9.0893 LearningRate 0.2030 Epoch: 4 Global Step: 45310 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:52:10,847-Speed 5333.04 samples/sec Loss 9.0842 LearningRate 0.2030 Epoch: 4 Global Step: 45320 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:52:18,537-Speed 5327.11 samples/sec Loss 9.0493 LearningRate 0.2030 Epoch: 4 Global Step: 45330 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:52:26,159-Speed 5374.55 samples/sec Loss 9.0814 LearningRate 0.2029 Epoch: 4 Global Step: 45340 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:52:33,863-Speed 5317.66 samples/sec Loss 9.1562 LearningRate 0.2029 Epoch: 4 Global Step: 45350 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:52:41,412-Speed 5426.18 samples/sec Loss 9.1083 LearningRate 0.2029 Epoch: 4 Global Step: 45360 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:52:49,010-Speed 5391.36 samples/sec Loss 9.0291 LearningRate 0.2029 Epoch: 4 Global Step: 45370 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:52:56,547-Speed 5436.50 samples/sec Loss 9.0492 LearningRate 0.2028 Epoch: 4 Global Step: 45380 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:53:04,064-Speed 5449.50 samples/sec Loss 9.0832 LearningRate 0.2028 Epoch: 4 Global Step: 45390 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:53:11,657-Speed 5395.43 samples/sec Loss 9.1098 LearningRate 0.2028 Epoch: 4 Global Step: 45400 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:53:19,272-Speed 5378.77 samples/sec Loss 9.0880 LearningRate 0.2028 Epoch: 4 Global Step: 45410 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:53:26,875-Speed 5388.70 samples/sec Loss 9.1028 LearningRate 0.2027 Epoch: 4 Global Step: 45420 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:53:34,506-Speed 5368.12 samples/sec Loss 9.0750 LearningRate 0.2027 Epoch: 4 Global Step: 45430 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 04:53:42,081-Speed 5408.34 samples/sec Loss 9.0618 LearningRate 0.2027 Epoch: 4 Global Step: 45440 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:53:49,694-Speed 5380.33 samples/sec Loss 9.0574 LearningRate 0.2027 Epoch: 4 Global Step: 45450 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:53:57,188-Speed 5467.11 samples/sec Loss 9.1189 LearningRate 0.2026 Epoch: 4 Global Step: 45460 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:54:04,762-Speed 5408.68 samples/sec Loss 9.1120 LearningRate 0.2026 Epoch: 4 Global Step: 45470 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:54:12,366-Speed 5386.90 samples/sec Loss 9.1014 LearningRate 0.2026 Epoch: 4 Global Step: 45480 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:54:20,034-Speed 5342.50 samples/sec Loss 9.0141 LearningRate 0.2026 Epoch: 4 Global Step: 45490 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:54:27,603-Speed 5412.65 samples/sec Loss 9.0875 LearningRate 0.2025 Epoch: 4 Global Step: 45500 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:54:35,185-Speed 5402.96 samples/sec Loss 9.0815 LearningRate 0.2025 Epoch: 4 Global Step: 45510 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:54:42,783-Speed 5391.30 samples/sec Loss 9.0477 LearningRate 0.2025 Epoch: 4 Global Step: 45520 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:54:50,392-Speed 5383.82 samples/sec Loss 9.0992 LearningRate 0.2025 Epoch: 4 Global Step: 45530 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:54:57,951-Speed 5419.67 samples/sec Loss 9.0869 LearningRate 0.2024 Epoch: 4 Global Step: 45540 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:55:05,681-Speed 5299.47 samples/sec Loss 9.0563 LearningRate 0.2024 Epoch: 4 Global Step: 45550 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:55:13,337-Speed 5351.13 samples/sec Loss 8.9586 LearningRate 0.2024 Epoch: 4 Global Step: 45560 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:55:20,976-Speed 5362.17 samples/sec Loss 9.0385 LearningRate 0.2024 Epoch: 4 Global Step: 45570 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:55:28,478-Speed 5460.15 samples/sec Loss 9.0400 LearningRate 0.2023 Epoch: 4 Global Step: 45580 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:55:36,194-Speed 5309.81 samples/sec Loss 9.0145 LearningRate 0.2023 Epoch: 4 Global Step: 45590 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:55:43,751-Speed 5420.83 samples/sec Loss 9.1040 LearningRate 0.2023 Epoch: 4 Global Step: 45600 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:55:51,316-Speed 5414.62 samples/sec Loss 9.0446 LearningRate 0.2023 Epoch: 4 Global Step: 45610 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:55:58,877-Speed 5417.87 samples/sec Loss 9.1034 LearningRate 0.2022 Epoch: 4 Global Step: 45620 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:56:06,404-Speed 5442.24 samples/sec Loss 9.0270 LearningRate 0.2022 Epoch: 4 Global Step: 45630 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:56:13,997-Speed 5395.48 samples/sec Loss 8.9813 LearningRate 0.2022 Epoch: 4 Global Step: 45640 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:56:21,526-Speed 5441.05 samples/sec Loss 9.0291 LearningRate 0.2022 Epoch: 4 Global Step: 45650 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:56:29,127-Speed 5389.42 samples/sec Loss 9.0210 LearningRate 0.2021 Epoch: 4 Global Step: 45660 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:56:36,708-Speed 5403.39 samples/sec Loss 9.0721 LearningRate 0.2021 Epoch: 4 Global Step: 45670 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:56:44,286-Speed 5406.19 samples/sec Loss 9.0194 LearningRate 0.2021 Epoch: 4 Global Step: 45680 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:56:51,933-Speed 5356.99 samples/sec Loss 9.0608 LearningRate 0.2021 Epoch: 4 Global Step: 45690 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:56:59,525-Speed 5395.75 samples/sec Loss 9.0508 LearningRate 0.2020 Epoch: 4 Global Step: 45700 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:57:07,271-Speed 5288.75 samples/sec Loss 9.0129 LearningRate 0.2020 Epoch: 4 Global Step: 45710 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:57:14,863-Speed 5395.67 samples/sec Loss 9.0631 LearningRate 0.2020 Epoch: 4 Global Step: 45720 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:57:22,421-Speed 5419.97 samples/sec Loss 9.1064 LearningRate 0.2020 Epoch: 4 Global Step: 45730 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:57:29,904-Speed 5474.39 samples/sec Loss 9.0210 LearningRate 0.2019 Epoch: 4 Global Step: 45740 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:57:37,501-Speed 5392.21 samples/sec Loss 9.0894 LearningRate 0.2019 Epoch: 4 Global Step: 45750 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:57:45,012-Speed 5454.50 samples/sec Loss 9.0906 LearningRate 0.2019 Epoch: 4 Global Step: 45760 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:57:52,564-Speed 5424.42 samples/sec Loss 9.0327 LearningRate 0.2019 Epoch: 4 Global Step: 45770 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:58:00,101-Speed 5434.96 samples/sec Loss 9.0164 LearningRate 0.2018 Epoch: 4 Global Step: 45780 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:58:07,637-Speed 5436.11 samples/sec Loss 8.9897 LearningRate 0.2018 Epoch: 4 Global Step: 45790 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:58:15,238-Speed 5389.24 samples/sec Loss 9.0828 LearningRate 0.2018 Epoch: 4 Global Step: 45800 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:58:22,757-Speed 5448.24 samples/sec Loss 9.0358 LearningRate 0.2018 Epoch: 4 Global Step: 45810 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:58:30,366-Speed 5383.95 samples/sec Loss 9.0962 LearningRate 0.2017 Epoch: 4 Global Step: 45820 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:58:37,905-Speed 5433.74 samples/sec Loss 9.0303 LearningRate 0.2017 Epoch: 4 Global Step: 45830 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:58:45,422-Speed 5449.63 samples/sec Loss 9.0441 LearningRate 0.2017 Epoch: 4 Global Step: 45840 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:58:53,092-Speed 5341.12 samples/sec Loss 9.0731 LearningRate 0.2017 Epoch: 4 Global Step: 45850 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:59:00,597-Speed 5458.45 samples/sec Loss 9.1010 LearningRate 0.2016 Epoch: 4 Global Step: 45860 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:59:08,248-Speed 5354.06 samples/sec Loss 9.0678 LearningRate 0.2016 Epoch: 4 Global Step: 45870 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:59:16,005-Speed 5281.18 samples/sec Loss 9.0874 LearningRate 0.2016 Epoch: 4 Global Step: 45880 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:59:23,599-Speed 5394.07 samples/sec Loss 9.1006 LearningRate 0.2016 Epoch: 4 Global Step: 45890 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:59:31,204-Speed 5387.40 samples/sec Loss 9.0273 LearningRate 0.2015 Epoch: 4 Global Step: 45900 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:59:38,757-Speed 5423.46 samples/sec Loss 9.1019 LearningRate 0.2015 Epoch: 4 Global Step: 45910 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 04:59:46,268-Speed 5453.92 samples/sec Loss 9.0680 LearningRate 0.2015 Epoch: 4 Global Step: 45920 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 04:59:53,832-Speed 5416.09 samples/sec Loss 9.0567 LearningRate 0.2015 Epoch: 4 Global Step: 45930 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:00:01,373-Speed 5432.64 samples/sec Loss 9.1061 LearningRate 0.2014 Epoch: 4 Global Step: 45940 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:00:08,846-Speed 5481.44 samples/sec Loss 9.0480 LearningRate 0.2014 Epoch: 4 Global Step: 45950 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:00:16,315-Speed 5484.77 samples/sec Loss 9.0779 LearningRate 0.2014 Epoch: 4 Global Step: 45960 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:00:23,931-Speed 5379.06 samples/sec Loss 9.0107 LearningRate 0.2014 Epoch: 4 Global Step: 45970 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:00:31,518-Speed 5399.33 samples/sec Loss 9.0615 LearningRate 0.2013 Epoch: 4 Global Step: 45980 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:00:39,005-Speed 5471.74 samples/sec Loss 9.0759 LearningRate 0.2013 Epoch: 4 Global Step: 45990 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:00:46,514-Speed 5455.28 samples/sec Loss 8.9841 LearningRate 0.2013 Epoch: 4 Global Step: 46000 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:01:31,136-[lfw][46000]XNorm: 23.927140 Training: 2022-01-08 05:01:31,136-[lfw][46000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-08 05:01:31,137-[lfw][46000]Accuracy-Highest: 0.99817 Training: 2022-01-08 05:02:23,702-[cfp_fp][46000]XNorm: 21.717181 Training: 2022-01-08 05:02:23,703-[cfp_fp][46000]Accuracy-Flip: 0.98486+-0.00543 Training: 2022-01-08 05:02:23,704-[cfp_fp][46000]Accuracy-Highest: 0.98600 Training: 2022-01-08 05:03:09,184-[agedb_30][46000]XNorm: 23.814232 Training: 2022-01-08 05:03:09,186-[agedb_30][46000]Accuracy-Flip: 0.97050+-0.00796 Training: 2022-01-08 05:03:09,186-[agedb_30][46000]Accuracy-Highest: 0.97250 Training: 2022-01-08 05:03:16,968-Speed 272.25 samples/sec Loss 9.0464 LearningRate 0.2013 Epoch: 4 Global Step: 46010 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:03:24,575-Speed 5386.89 samples/sec Loss 9.0081 LearningRate 0.2012 Epoch: 4 Global Step: 46020 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:03:32,070-Speed 5466.26 samples/sec Loss 9.0485 LearningRate 0.2012 Epoch: 4 Global Step: 46030 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:03:39,563-Speed 5468.21 samples/sec Loss 9.0250 LearningRate 0.2012 Epoch: 4 Global Step: 46040 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:03:47,163-Speed 5389.69 samples/sec Loss 9.0825 LearningRate 0.2012 Epoch: 4 Global Step: 46050 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:03:54,789-Speed 5372.79 samples/sec Loss 9.0459 LearningRate 0.2011 Epoch: 4 Global Step: 46060 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:04:02,394-Speed 5386.96 samples/sec Loss 9.0480 LearningRate 0.2011 Epoch: 4 Global Step: 46070 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:04:09,993-Speed 5390.89 samples/sec Loss 9.0591 LearningRate 0.2011 Epoch: 4 Global Step: 46080 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:04:17,552-Speed 5419.49 samples/sec Loss 9.0684 LearningRate 0.2011 Epoch: 4 Global Step: 46090 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:04:25,097-Speed 5429.33 samples/sec Loss 9.0891 LearningRate 0.2010 Epoch: 4 Global Step: 46100 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:04:32,649-Speed 5424.52 samples/sec Loss 9.0707 LearningRate 0.2010 Epoch: 4 Global Step: 46110 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:04:40,298-Speed 5355.37 samples/sec Loss 9.0578 LearningRate 0.2010 Epoch: 4 Global Step: 46120 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:04:47,794-Speed 5465.06 samples/sec Loss 8.9750 LearningRate 0.2010 Epoch: 4 Global Step: 46130 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:04:55,464-Speed 5341.24 samples/sec Loss 8.9829 LearningRate 0.2009 Epoch: 4 Global Step: 46140 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:05:03,020-Speed 5421.65 samples/sec Loss 9.0374 LearningRate 0.2009 Epoch: 4 Global Step: 46150 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:05:10,535-Speed 5450.84 samples/sec Loss 9.0948 LearningRate 0.2009 Epoch: 4 Global Step: 46160 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:05:18,020-Speed 5472.97 samples/sec Loss 9.0068 LearningRate 0.2009 Epoch: 4 Global Step: 46170 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:05:25,506-Speed 5471.78 samples/sec Loss 9.1317 LearningRate 0.2008 Epoch: 4 Global Step: 46180 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:05:33,058-Speed 5424.88 samples/sec Loss 9.1094 LearningRate 0.2008 Epoch: 4 Global Step: 46190 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:05:40,522-Speed 5488.20 samples/sec Loss 9.0522 LearningRate 0.2008 Epoch: 4 Global Step: 46200 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:05:48,026-Speed 5459.19 samples/sec Loss 8.9966 LearningRate 0.2008 Epoch: 4 Global Step: 46210 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:05:55,515-Speed 5470.00 samples/sec Loss 9.0423 LearningRate 0.2007 Epoch: 4 Global Step: 46220 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:06:03,155-Speed 5361.86 samples/sec Loss 9.0410 LearningRate 0.2007 Epoch: 4 Global Step: 46230 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:06:10,674-Speed 5448.50 samples/sec Loss 9.0409 LearningRate 0.2007 Epoch: 4 Global Step: 46240 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:06:18,269-Speed 5393.51 samples/sec Loss 9.0412 LearningRate 0.2007 Epoch: 4 Global Step: 46250 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:06:25,906-Speed 5364.30 samples/sec Loss 9.0507 LearningRate 0.2007 Epoch: 4 Global Step: 46260 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:06:33,519-Speed 5380.86 samples/sec Loss 9.0364 LearningRate 0.2006 Epoch: 4 Global Step: 46270 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:06:41,032-Speed 5452.64 samples/sec Loss 9.0022 LearningRate 0.2006 Epoch: 4 Global Step: 46280 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:06:48,547-Speed 5451.50 samples/sec Loss 9.0554 LearningRate 0.2006 Epoch: 4 Global Step: 46290 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:06:56,206-Speed 5348.28 samples/sec Loss 9.0333 LearningRate 0.2006 Epoch: 4 Global Step: 46300 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:07:03,789-Speed 5402.77 samples/sec Loss 8.9991 LearningRate 0.2005 Epoch: 4 Global Step: 46310 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:07:11,264-Speed 5480.20 samples/sec Loss 9.0520 LearningRate 0.2005 Epoch: 4 Global Step: 46320 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:07:18,920-Speed 5350.63 samples/sec Loss 8.9687 LearningRate 0.2005 Epoch: 4 Global Step: 46330 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:07:26,482-Speed 5417.17 samples/sec Loss 9.0527 LearningRate 0.2005 Epoch: 4 Global Step: 46340 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:07:33,932-Speed 5499.08 samples/sec Loss 9.0815 LearningRate 0.2004 Epoch: 4 Global Step: 46350 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:07:41,585-Speed 5353.30 samples/sec Loss 9.0111 LearningRate 0.2004 Epoch: 4 Global Step: 46360 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:07:49,121-Speed 5435.57 samples/sec Loss 9.0825 LearningRate 0.2004 Epoch: 4 Global Step: 46370 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:07:56,640-Speed 5448.54 samples/sec Loss 8.9198 LearningRate 0.2004 Epoch: 4 Global Step: 46380 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:08:04,226-Speed 5399.88 samples/sec Loss 9.0405 LearningRate 0.2003 Epoch: 4 Global Step: 46390 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:08:11,744-Speed 5448.80 samples/sec Loss 9.0135 LearningRate 0.2003 Epoch: 4 Global Step: 46400 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:08:19,238-Speed 5466.92 samples/sec Loss 9.0875 LearningRate 0.2003 Epoch: 4 Global Step: 46410 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:08:26,857-Speed 5376.59 samples/sec Loss 8.8704 LearningRate 0.2003 Epoch: 4 Global Step: 46420 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:08:34,389-Speed 5438.44 samples/sec Loss 9.0454 LearningRate 0.2002 Epoch: 4 Global Step: 46430 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:08:42,019-Speed 5369.11 samples/sec Loss 9.0960 LearningRate 0.2002 Epoch: 4 Global Step: 46440 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:08:49,575-Speed 5421.90 samples/sec Loss 9.0438 LearningRate 0.2002 Epoch: 4 Global Step: 46450 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:08:57,154-Speed 5404.77 samples/sec Loss 9.0096 LearningRate 0.2002 Epoch: 4 Global Step: 46460 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:09:04,689-Speed 5436.82 samples/sec Loss 9.0156 LearningRate 0.2001 Epoch: 4 Global Step: 46470 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:09:12,176-Speed 5471.67 samples/sec Loss 9.0256 LearningRate 0.2001 Epoch: 4 Global Step: 46480 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:09:19,738-Speed 5417.35 samples/sec Loss 8.9972 LearningRate 0.2001 Epoch: 4 Global Step: 46490 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:09:27,382-Speed 5358.86 samples/sec Loss 8.9799 LearningRate 0.2001 Epoch: 4 Global Step: 46500 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:09:34,924-Speed 5431.75 samples/sec Loss 9.0023 LearningRate 0.2000 Epoch: 4 Global Step: 46510 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:09:42,503-Speed 5404.85 samples/sec Loss 9.0306 LearningRate 0.2000 Epoch: 4 Global Step: 46520 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:09:50,065-Speed 5417.45 samples/sec Loss 9.0290 LearningRate 0.2000 Epoch: 4 Global Step: 46530 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:09:57,691-Speed 5371.83 samples/sec Loss 8.9931 LearningRate 0.2000 Epoch: 4 Global Step: 46540 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:10:05,176-Speed 5472.80 samples/sec Loss 9.0103 LearningRate 0.1999 Epoch: 4 Global Step: 46550 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:10:12,724-Speed 5427.86 samples/sec Loss 9.0193 LearningRate 0.1999 Epoch: 4 Global Step: 46560 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:10:20,239-Speed 5450.83 samples/sec Loss 9.0336 LearningRate 0.1999 Epoch: 4 Global Step: 46570 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:10:27,827-Speed 5398.85 samples/sec Loss 9.0603 LearningRate 0.1999 Epoch: 4 Global Step: 46580 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:10:35,449-Speed 5374.51 samples/sec Loss 9.0949 LearningRate 0.1998 Epoch: 4 Global Step: 46590 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:10:43,011-Speed 5417.55 samples/sec Loss 9.0929 LearningRate 0.1998 Epoch: 4 Global Step: 46600 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:10:50,577-Speed 5414.52 samples/sec Loss 8.9771 LearningRate 0.1998 Epoch: 4 Global Step: 46610 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:10:58,157-Speed 5404.24 samples/sec Loss 8.9737 LearningRate 0.1998 Epoch: 4 Global Step: 46620 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:11:05,753-Speed 5392.23 samples/sec Loss 9.0470 LearningRate 0.1997 Epoch: 4 Global Step: 46630 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:11:13,302-Speed 5427.03 samples/sec Loss 9.0576 LearningRate 0.1997 Epoch: 4 Global Step: 46640 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:11:20,845-Speed 5431.10 samples/sec Loss 9.0163 LearningRate 0.1997 Epoch: 4 Global Step: 46650 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:11:28,395-Speed 5425.85 samples/sec Loss 8.9904 LearningRate 0.1997 Epoch: 4 Global Step: 46660 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:11:35,956-Speed 5417.57 samples/sec Loss 8.9671 LearningRate 0.1996 Epoch: 4 Global Step: 46670 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:11:43,568-Speed 5382.12 samples/sec Loss 9.0210 LearningRate 0.1996 Epoch: 4 Global Step: 46680 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:11:51,125-Speed 5420.76 samples/sec Loss 9.0493 LearningRate 0.1996 Epoch: 4 Global Step: 46690 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:11:58,683-Speed 5420.22 samples/sec Loss 9.0377 LearningRate 0.1996 Epoch: 4 Global Step: 46700 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:12:06,187-Speed 5458.55 samples/sec Loss 9.0787 LearningRate 0.1995 Epoch: 4 Global Step: 46710 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:12:13,713-Speed 5443.43 samples/sec Loss 8.9013 LearningRate 0.1995 Epoch: 4 Global Step: 46720 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:12:21,212-Speed 5462.77 samples/sec Loss 8.9643 LearningRate 0.1995 Epoch: 4 Global Step: 46730 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:12:28,972-Speed 5279.27 samples/sec Loss 8.8744 LearningRate 0.1995 Epoch: 4 Global Step: 46740 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:12:36,591-Speed 5376.46 samples/sec Loss 8.9383 LearningRate 0.1994 Epoch: 4 Global Step: 46750 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:12:44,041-Speed 5498.99 samples/sec Loss 9.0176 LearningRate 0.1994 Epoch: 4 Global Step: 46760 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:12:51,673-Speed 5367.45 samples/sec Loss 9.0427 LearningRate 0.1994 Epoch: 4 Global Step: 46770 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:12:59,127-Speed 5495.60 samples/sec Loss 9.0241 LearningRate 0.1994 Epoch: 4 Global Step: 46780 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:13:06,646-Speed 5448.35 samples/sec Loss 8.9399 LearningRate 0.1993 Epoch: 4 Global Step: 46790 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:13:14,157-Speed 5454.35 samples/sec Loss 8.9490 LearningRate 0.1993 Epoch: 4 Global Step: 46800 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:13:21,717-Speed 5418.19 samples/sec Loss 9.0139 LearningRate 0.1993 Epoch: 4 Global Step: 46810 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:13:29,229-Speed 5453.46 samples/sec Loss 8.9575 LearningRate 0.1993 Epoch: 4 Global Step: 46820 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:13:36,764-Speed 5436.56 samples/sec Loss 8.9445 LearningRate 0.1992 Epoch: 4 Global Step: 46830 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:13:44,218-Speed 5496.36 samples/sec Loss 8.9455 LearningRate 0.1992 Epoch: 4 Global Step: 46840 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:13:51,758-Speed 5433.40 samples/sec Loss 8.9465 LearningRate 0.1992 Epoch: 4 Global Step: 46850 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:13:59,363-Speed 5386.69 samples/sec Loss 8.9903 LearningRate 0.1992 Epoch: 4 Global Step: 46860 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:14:06,881-Speed 5448.76 samples/sec Loss 9.0370 LearningRate 0.1991 Epoch: 4 Global Step: 46870 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:14:14,355-Speed 5481.41 samples/sec Loss 8.9750 LearningRate 0.1991 Epoch: 4 Global Step: 46880 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:14:21,867-Speed 5452.74 samples/sec Loss 8.9696 LearningRate 0.1991 Epoch: 4 Global Step: 46890 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:14:29,435-Speed 5413.47 samples/sec Loss 9.0501 LearningRate 0.1991 Epoch: 4 Global Step: 46900 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:14:36,944-Speed 5454.80 samples/sec Loss 9.0077 LearningRate 0.1990 Epoch: 4 Global Step: 46910 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:14:44,504-Speed 5419.00 samples/sec Loss 8.9246 LearningRate 0.1990 Epoch: 4 Global Step: 46920 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:14:52,194-Speed 5327.21 samples/sec Loss 8.9717 LearningRate 0.1990 Epoch: 4 Global Step: 46930 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:14:59,716-Speed 5446.31 samples/sec Loss 8.9523 LearningRate 0.1990 Epoch: 4 Global Step: 46940 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:15:07,350-Speed 5365.68 samples/sec Loss 9.0144 LearningRate 0.1989 Epoch: 4 Global Step: 46950 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:15:15,009-Speed 5348.83 samples/sec Loss 8.9080 LearningRate 0.1989 Epoch: 4 Global Step: 46960 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:15:22,682-Speed 5339.39 samples/sec Loss 9.0187 LearningRate 0.1989 Epoch: 4 Global Step: 46970 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:15:30,141-Speed 5491.85 samples/sec Loss 8.9512 LearningRate 0.1989 Epoch: 4 Global Step: 46980 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:15:37,628-Speed 5471.16 samples/sec Loss 9.0442 LearningRate 0.1988 Epoch: 4 Global Step: 46990 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:15:45,040-Speed 5526.91 samples/sec Loss 8.9943 LearningRate 0.1988 Epoch: 4 Global Step: 47000 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:15:52,575-Speed 5437.10 samples/sec Loss 9.0814 LearningRate 0.1988 Epoch: 4 Global Step: 47010 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:16:00,055-Speed 5476.84 samples/sec Loss 8.9394 LearningRate 0.1988 Epoch: 4 Global Step: 47020 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:16:07,666-Speed 5381.82 samples/sec Loss 9.0237 LearningRate 0.1987 Epoch: 4 Global Step: 47030 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:16:15,167-Speed 5461.61 samples/sec Loss 8.9350 LearningRate 0.1987 Epoch: 4 Global Step: 47040 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:16:22,724-Speed 5421.09 samples/sec Loss 8.9998 LearningRate 0.1987 Epoch: 4 Global Step: 47050 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:16:30,291-Speed 5413.55 samples/sec Loss 9.0071 LearningRate 0.1987 Epoch: 4 Global Step: 47060 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:16:37,789-Speed 5463.57 samples/sec Loss 9.0116 LearningRate 0.1986 Epoch: 4 Global Step: 47070 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:16:45,279-Speed 5468.98 samples/sec Loss 8.9489 LearningRate 0.1986 Epoch: 4 Global Step: 47080 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:16:52,843-Speed 5415.55 samples/sec Loss 8.9482 LearningRate 0.1986 Epoch: 4 Global Step: 47090 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:17:00,392-Speed 5426.89 samples/sec Loss 9.0427 LearningRate 0.1986 Epoch: 4 Global Step: 47100 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:17:08,108-Speed 5309.17 samples/sec Loss 8.9716 LearningRate 0.1985 Epoch: 4 Global Step: 47110 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:17:15,556-Speed 5500.45 samples/sec Loss 9.0411 LearningRate 0.1985 Epoch: 4 Global Step: 47120 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:17:23,021-Speed 5487.47 samples/sec Loss 8.9308 LearningRate 0.1985 Epoch: 4 Global Step: 47130 Fp16 Grad Scale: 262144 Required: 37 hours Training: 2022-01-08 05:17:30,556-Speed 5436.43 samples/sec Loss 9.0270 LearningRate 0.1985 Epoch: 4 Global Step: 47140 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:17:38,046-Speed 5469.45 samples/sec Loss 9.0179 LearningRate 0.1984 Epoch: 4 Global Step: 47150 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:17:45,527-Speed 5476.24 samples/sec Loss 9.0492 LearningRate 0.1984 Epoch: 4 Global Step: 47160 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:17:53,080-Speed 5424.04 samples/sec Loss 8.9572 LearningRate 0.1984 Epoch: 4 Global Step: 47170 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:18:00,637-Speed 5420.76 samples/sec Loss 9.0055 LearningRate 0.1984 Epoch: 4 Global Step: 47180 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:18:08,160-Speed 5445.54 samples/sec Loss 8.9965 LearningRate 0.1983 Epoch: 4 Global Step: 47190 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:18:15,683-Speed 5445.15 samples/sec Loss 8.9170 LearningRate 0.1983 Epoch: 4 Global Step: 47200 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:18:23,176-Speed 5467.29 samples/sec Loss 8.9288 LearningRate 0.1983 Epoch: 4 Global Step: 47210 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:18:30,672-Speed 5464.81 samples/sec Loss 8.8984 LearningRate 0.1983 Epoch: 4 Global Step: 47220 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:18:38,276-Speed 5387.15 samples/sec Loss 8.8877 LearningRate 0.1982 Epoch: 4 Global Step: 47230 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:18:45,747-Speed 5483.72 samples/sec Loss 8.9775 LearningRate 0.1982 Epoch: 4 Global Step: 47240 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:18:53,256-Speed 5455.56 samples/sec Loss 9.0123 LearningRate 0.1982 Epoch: 4 Global Step: 47250 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:19:00,759-Speed 5459.31 samples/sec Loss 8.9450 LearningRate 0.1982 Epoch: 4 Global Step: 47260 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:19:08,275-Speed 5450.49 samples/sec Loss 8.9730 LearningRate 0.1981 Epoch: 4 Global Step: 47270 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:19:15,781-Speed 5458.39 samples/sec Loss 8.9012 LearningRate 0.1981 Epoch: 4 Global Step: 47280 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:19:23,332-Speed 5424.44 samples/sec Loss 8.9105 LearningRate 0.1981 Epoch: 4 Global Step: 47290 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:19:30,913-Speed 5403.54 samples/sec Loss 9.0185 LearningRate 0.1981 Epoch: 4 Global Step: 47300 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:19:38,433-Speed 5448.17 samples/sec Loss 8.9451 LearningRate 0.1980 Epoch: 4 Global Step: 47310 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:19:45,973-Speed 5433.04 samples/sec Loss 8.9448 LearningRate 0.1980 Epoch: 4 Global Step: 47320 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:19:53,502-Speed 5440.54 samples/sec Loss 8.9343 LearningRate 0.1980 Epoch: 4 Global Step: 47330 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:20:01,095-Speed 5395.17 samples/sec Loss 8.9091 LearningRate 0.1980 Epoch: 4 Global Step: 47340 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:20:08,653-Speed 5420.35 samples/sec Loss 8.9773 LearningRate 0.1979 Epoch: 4 Global Step: 47350 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:20:16,157-Speed 5458.68 samples/sec Loss 8.9901 LearningRate 0.1979 Epoch: 4 Global Step: 47360 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:20:23,708-Speed 5425.84 samples/sec Loss 8.9749 LearningRate 0.1979 Epoch: 4 Global Step: 47370 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:20:31,325-Speed 5377.90 samples/sec Loss 9.0412 LearningRate 0.1979 Epoch: 4 Global Step: 47380 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:20:38,945-Speed 5375.58 samples/sec Loss 8.9371 LearningRate 0.1978 Epoch: 4 Global Step: 47390 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:20:46,427-Speed 5475.24 samples/sec Loss 8.9575 LearningRate 0.1978 Epoch: 4 Global Step: 47400 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:20:53,997-Speed 5411.68 samples/sec Loss 8.9476 LearningRate 0.1978 Epoch: 4 Global Step: 47410 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:21:01,600-Speed 5388.10 samples/sec Loss 8.9813 LearningRate 0.1978 Epoch: 4 Global Step: 47420 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:21:09,083-Speed 5474.61 samples/sec Loss 9.0075 LearningRate 0.1977 Epoch: 4 Global Step: 47430 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:21:16,596-Speed 5452.73 samples/sec Loss 8.9479 LearningRate 0.1977 Epoch: 4 Global Step: 47440 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:21:24,250-Speed 5351.64 samples/sec Loss 8.9516 LearningRate 0.1977 Epoch: 4 Global Step: 47450 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:21:31,777-Speed 5442.81 samples/sec Loss 8.9824 LearningRate 0.1977 Epoch: 4 Global Step: 47460 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:21:39,305-Speed 5441.61 samples/sec Loss 8.9345 LearningRate 0.1976 Epoch: 4 Global Step: 47470 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:21:46,782-Speed 5478.56 samples/sec Loss 8.9829 LearningRate 0.1976 Epoch: 4 Global Step: 47480 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:21:54,288-Speed 5458.10 samples/sec Loss 8.9750 LearningRate 0.1976 Epoch: 4 Global Step: 47490 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:22:01,837-Speed 5426.78 samples/sec Loss 8.8747 LearningRate 0.1976 Epoch: 4 Global Step: 47500 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:22:09,350-Speed 5452.41 samples/sec Loss 8.9245 LearningRate 0.1975 Epoch: 4 Global Step: 47510 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:22:16,808-Speed 5493.07 samples/sec Loss 8.9845 LearningRate 0.1975 Epoch: 4 Global Step: 47520 Fp16 Grad Scale: 131072 Required: 37 hours Training: 2022-01-08 05:22:24,310-Speed 5460.89 samples/sec Loss 8.9408 LearningRate 0.1975 Epoch: 4 Global Step: 47530 Fp16 Grad Scale: 65536 Required: 37 hours Training: 2022-01-08 05:22:31,795-Speed 5472.20 samples/sec Loss 9.0423 LearningRate 0.1975 Epoch: 4 Global Step: 47540 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:22:39,373-Speed 5406.54 samples/sec Loss 8.9239 LearningRate 0.1974 Epoch: 4 Global Step: 47550 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:22:46,945-Speed 5409.63 samples/sec Loss 8.9720 LearningRate 0.1974 Epoch: 4 Global Step: 47560 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:22:54,605-Speed 5348.59 samples/sec Loss 8.9693 LearningRate 0.1974 Epoch: 4 Global Step: 47570 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:23:02,161-Speed 5420.93 samples/sec Loss 8.9131 LearningRate 0.1974 Epoch: 4 Global Step: 47580 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:23:09,739-Speed 5405.93 samples/sec Loss 8.9277 LearningRate 0.1974 Epoch: 4 Global Step: 47590 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:23:17,314-Speed 5408.40 samples/sec Loss 8.9443 LearningRate 0.1973 Epoch: 4 Global Step: 47600 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:23:24,847-Speed 5438.21 samples/sec Loss 8.9640 LearningRate 0.1973 Epoch: 4 Global Step: 47610 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:23:32,540-Speed 5324.94 samples/sec Loss 8.9561 LearningRate 0.1973 Epoch: 4 Global Step: 47620 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:23:40,102-Speed 5417.51 samples/sec Loss 8.9782 LearningRate 0.1973 Epoch: 4 Global Step: 47630 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:23:47,746-Speed 5359.05 samples/sec Loss 8.9701 LearningRate 0.1972 Epoch: 4 Global Step: 47640 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:23:55,392-Speed 5357.92 samples/sec Loss 8.9755 LearningRate 0.1972 Epoch: 4 Global Step: 47650 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:24:03,077-Speed 5330.54 samples/sec Loss 8.9887 LearningRate 0.1972 Epoch: 4 Global Step: 47660 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:24:10,679-Speed 5388.51 samples/sec Loss 9.0099 LearningRate 0.1972 Epoch: 4 Global Step: 47670 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:24:18,265-Speed 5400.03 samples/sec Loss 8.9916 LearningRate 0.1971 Epoch: 4 Global Step: 47680 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:24:25,741-Speed 5480.42 samples/sec Loss 8.9323 LearningRate 0.1971 Epoch: 4 Global Step: 47690 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:24:33,334-Speed 5395.08 samples/sec Loss 8.9506 LearningRate 0.1971 Epoch: 4 Global Step: 47700 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:24:40,960-Speed 5371.44 samples/sec Loss 8.9143 LearningRate 0.1971 Epoch: 4 Global Step: 47710 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:24:48,567-Speed 5385.13 samples/sec Loss 8.9950 LearningRate 0.1970 Epoch: 4 Global Step: 47720 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:24:56,125-Speed 5420.81 samples/sec Loss 8.8957 LearningRate 0.1970 Epoch: 4 Global Step: 47730 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:25:03,704-Speed 5404.60 samples/sec Loss 8.9279 LearningRate 0.1970 Epoch: 4 Global Step: 47740 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:25:11,290-Speed 5400.26 samples/sec Loss 8.9634 LearningRate 0.1970 Epoch: 4 Global Step: 47750 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:25:18,860-Speed 5411.82 samples/sec Loss 8.9156 LearningRate 0.1969 Epoch: 4 Global Step: 47760 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:25:26,450-Speed 5397.42 samples/sec Loss 8.8856 LearningRate 0.1969 Epoch: 4 Global Step: 47770 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:25:34,085-Speed 5365.55 samples/sec Loss 8.9353 LearningRate 0.1969 Epoch: 4 Global Step: 47780 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:25:41,647-Speed 5416.69 samples/sec Loss 8.9335 LearningRate 0.1969 Epoch: 4 Global Step: 47790 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:25:49,486-Speed 5226.08 samples/sec Loss 8.9433 LearningRate 0.1968 Epoch: 4 Global Step: 47800 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:25:57,005-Speed 5448.60 samples/sec Loss 8.9185 LearningRate 0.1968 Epoch: 4 Global Step: 47810 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:26:04,788-Speed 5263.26 samples/sec Loss 8.8747 LearningRate 0.1968 Epoch: 4 Global Step: 47820 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:26:12,268-Speed 5476.62 samples/sec Loss 8.9112 LearningRate 0.1968 Epoch: 4 Global Step: 47830 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:26:19,846-Speed 5405.74 samples/sec Loss 8.9429 LearningRate 0.1967 Epoch: 4 Global Step: 47840 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:26:27,392-Speed 5429.07 samples/sec Loss 8.8934 LearningRate 0.1967 Epoch: 4 Global Step: 47850 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:26:34,958-Speed 5414.09 samples/sec Loss 8.9171 LearningRate 0.1967 Epoch: 4 Global Step: 47860 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:26:42,454-Speed 5464.95 samples/sec Loss 8.9192 LearningRate 0.1967 Epoch: 4 Global Step: 47870 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:26:50,029-Speed 5407.81 samples/sec Loss 8.8783 LearningRate 0.1966 Epoch: 4 Global Step: 47880 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:26:57,586-Speed 5421.38 samples/sec Loss 8.8980 LearningRate 0.1966 Epoch: 4 Global Step: 47890 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:27:05,105-Speed 5447.86 samples/sec Loss 9.0074 LearningRate 0.1966 Epoch: 4 Global Step: 47900 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:27:12,660-Speed 5422.44 samples/sec Loss 8.8966 LearningRate 0.1966 Epoch: 4 Global Step: 47910 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:27:20,222-Speed 5416.62 samples/sec Loss 8.9073 LearningRate 0.1965 Epoch: 4 Global Step: 47920 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:27:27,908-Speed 5330.22 samples/sec Loss 8.8762 LearningRate 0.1965 Epoch: 4 Global Step: 47930 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:27:35,568-Speed 5348.13 samples/sec Loss 8.9293 LearningRate 0.1965 Epoch: 4 Global Step: 47940 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:27:43,136-Speed 5412.93 samples/sec Loss 8.9184 LearningRate 0.1965 Epoch: 4 Global Step: 47950 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:27:50,629-Speed 5466.90 samples/sec Loss 8.8575 LearningRate 0.1964 Epoch: 4 Global Step: 47960 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:27:58,188-Speed 5420.34 samples/sec Loss 8.8723 LearningRate 0.1964 Epoch: 4 Global Step: 47970 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:28:05,675-Speed 5471.68 samples/sec Loss 8.9812 LearningRate 0.1964 Epoch: 4 Global Step: 47980 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:28:13,162-Speed 5470.64 samples/sec Loss 8.9927 LearningRate 0.1964 Epoch: 4 Global Step: 47990 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:28:20,775-Speed 5380.94 samples/sec Loss 8.9173 LearningRate 0.1963 Epoch: 4 Global Step: 48000 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:29:04,874-[lfw][48000]XNorm: 23.765317 Training: 2022-01-08 05:29:04,875-[lfw][48000]Accuracy-Flip: 0.99717+-0.00279 Training: 2022-01-08 05:29:04,875-[lfw][48000]Accuracy-Highest: 0.99817 Training: 2022-01-08 05:29:56,633-[cfp_fp][48000]XNorm: 21.289137 Training: 2022-01-08 05:29:56,634-[cfp_fp][48000]Accuracy-Flip: 0.98186+-0.00535 Training: 2022-01-08 05:29:56,635-[cfp_fp][48000]Accuracy-Highest: 0.98600 Training: 2022-01-08 05:30:42,416-[agedb_30][48000]XNorm: 23.444254 Training: 2022-01-08 05:30:42,418-[agedb_30][48000]Accuracy-Flip: 0.96933+-0.00978 Training: 2022-01-08 05:30:42,418-[agedb_30][48000]Accuracy-Highest: 0.97250 Training: 2022-01-08 05:30:50,115-Speed 274.28 samples/sec Loss 8.9716 LearningRate 0.1963 Epoch: 4 Global Step: 48010 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:30:57,684-Speed 5412.83 samples/sec Loss 8.8830 LearningRate 0.1963 Epoch: 4 Global Step: 48020 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:31:05,326-Speed 5361.23 samples/sec Loss 8.9810 LearningRate 0.1963 Epoch: 4 Global Step: 48030 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:31:12,859-Speed 5438.69 samples/sec Loss 8.8828 LearningRate 0.1962 Epoch: 4 Global Step: 48040 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:31:20,574-Speed 5310.25 samples/sec Loss 8.9359 LearningRate 0.1962 Epoch: 4 Global Step: 48050 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:31:28,225-Speed 5355.15 samples/sec Loss 8.8929 LearningRate 0.1962 Epoch: 4 Global Step: 48060 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:31:35,853-Speed 5371.00 samples/sec Loss 8.8715 LearningRate 0.1962 Epoch: 4 Global Step: 48070 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:31:43,397-Speed 5430.08 samples/sec Loss 8.8165 LearningRate 0.1961 Epoch: 4 Global Step: 48080 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:31:50,900-Speed 5460.16 samples/sec Loss 8.9555 LearningRate 0.1961 Epoch: 4 Global Step: 48090 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:31:58,522-Speed 5374.17 samples/sec Loss 8.8946 LearningRate 0.1961 Epoch: 4 Global Step: 48100 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:32:06,148-Speed 5371.68 samples/sec Loss 8.8763 LearningRate 0.1961 Epoch: 4 Global Step: 48110 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:32:13,843-Speed 5324.25 samples/sec Loss 8.8600 LearningRate 0.1960 Epoch: 4 Global Step: 48120 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:32:21,325-Speed 5475.45 samples/sec Loss 8.9130 LearningRate 0.1960 Epoch: 4 Global Step: 48130 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:32:28,877-Speed 5424.53 samples/sec Loss 8.8417 LearningRate 0.1960 Epoch: 4 Global Step: 48140 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:32:36,391-Speed 5451.41 samples/sec Loss 8.9436 LearningRate 0.1960 Epoch: 4 Global Step: 48150 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:32:43,872-Speed 5475.93 samples/sec Loss 8.8817 LearningRate 0.1959 Epoch: 4 Global Step: 48160 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 05:32:51,362-Speed 5469.95 samples/sec Loss 8.9198 LearningRate 0.1959 Epoch: 4 Global Step: 48170 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 05:32:58,879-Speed 5449.46 samples/sec Loss 8.9624 LearningRate 0.1959 Epoch: 4 Global Step: 48180 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 05:33:06,430-Speed 5424.98 samples/sec Loss 8.8484 LearningRate 0.1959 Epoch: 4 Global Step: 48190 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 05:33:13,965-Speed 5436.72 samples/sec Loss 8.9669 LearningRate 0.1958 Epoch: 4 Global Step: 48200 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 05:33:21,511-Speed 5429.39 samples/sec Loss 8.8924 LearningRate 0.1958 Epoch: 4 Global Step: 48210 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 05:33:28,987-Speed 5479.65 samples/sec Loss 8.9331 LearningRate 0.1958 Epoch: 4 Global Step: 48220 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 05:33:36,465-Speed 5477.96 samples/sec Loss 8.9187 LearningRate 0.1958 Epoch: 4 Global Step: 48230 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 05:33:43,997-Speed 5438.97 samples/sec Loss 8.8658 LearningRate 0.1957 Epoch: 4 Global Step: 48240 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 05:33:51,526-Speed 5440.69 samples/sec Loss 8.9288 LearningRate 0.1957 Epoch: 4 Global Step: 48250 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 05:33:59,042-Speed 5450.78 samples/sec Loss 8.8641 LearningRate 0.1957 Epoch: 4 Global Step: 48260 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:34:06,598-Speed 5421.23 samples/sec Loss 8.8589 LearningRate 0.1957 Epoch: 4 Global Step: 48270 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:34:14,185-Speed 5399.60 samples/sec Loss 8.8182 LearningRate 0.1957 Epoch: 4 Global Step: 48280 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:34:21,679-Speed 5466.54 samples/sec Loss 8.9036 LearningRate 0.1956 Epoch: 4 Global Step: 48290 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:34:29,157-Speed 5477.72 samples/sec Loss 8.9461 LearningRate 0.1956 Epoch: 4 Global Step: 48300 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:34:36,674-Speed 5449.43 samples/sec Loss 8.9503 LearningRate 0.1956 Epoch: 4 Global Step: 48310 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:34:44,160-Speed 5472.24 samples/sec Loss 8.8857 LearningRate 0.1956 Epoch: 4 Global Step: 48320 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:34:51,675-Speed 5451.61 samples/sec Loss 8.8858 LearningRate 0.1955 Epoch: 4 Global Step: 48330 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:34:59,223-Speed 5427.40 samples/sec Loss 8.8248 LearningRate 0.1955 Epoch: 4 Global Step: 48340 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:35:06,696-Speed 5481.58 samples/sec Loss 8.8817 LearningRate 0.1955 Epoch: 4 Global Step: 48350 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:35:14,231-Speed 5436.55 samples/sec Loss 8.8902 LearningRate 0.1955 Epoch: 4 Global Step: 48360 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:35:21,755-Speed 5445.09 samples/sec Loss 8.8277 LearningRate 0.1954 Epoch: 4 Global Step: 48370 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:35:29,348-Speed 5395.50 samples/sec Loss 8.8554 LearningRate 0.1954 Epoch: 4 Global Step: 48380 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:35:37,055-Speed 5315.05 samples/sec Loss 8.8291 LearningRate 0.1954 Epoch: 4 Global Step: 48390 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:35:44,539-Speed 5473.49 samples/sec Loss 8.9170 LearningRate 0.1954 Epoch: 4 Global Step: 48400 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:35:52,406-Speed 5207.74 samples/sec Loss 8.8559 LearningRate 0.1953 Epoch: 4 Global Step: 48410 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:35:59,947-Speed 5432.59 samples/sec Loss 8.8609 LearningRate 0.1953 Epoch: 4 Global Step: 48420 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:36:07,457-Speed 5454.72 samples/sec Loss 8.8786 LearningRate 0.1953 Epoch: 4 Global Step: 48430 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:36:14,998-Speed 5431.83 samples/sec Loss 8.9135 LearningRate 0.1953 Epoch: 4 Global Step: 48440 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:36:22,523-Speed 5444.48 samples/sec Loss 8.9038 LearningRate 0.1952 Epoch: 4 Global Step: 48450 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:36:30,114-Speed 5396.91 samples/sec Loss 8.8671 LearningRate 0.1952 Epoch: 4 Global Step: 48460 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:36:37,555-Speed 5505.63 samples/sec Loss 8.8539 LearningRate 0.1952 Epoch: 4 Global Step: 48470 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:36:45,024-Speed 5484.51 samples/sec Loss 8.8442 LearningRate 0.1952 Epoch: 4 Global Step: 48480 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:36:52,492-Speed 5485.56 samples/sec Loss 8.8270 LearningRate 0.1951 Epoch: 4 Global Step: 48490 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:36:59,986-Speed 5466.39 samples/sec Loss 8.8244 LearningRate 0.1951 Epoch: 4 Global Step: 48500 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:37:07,674-Speed 5329.01 samples/sec Loss 8.8991 LearningRate 0.1951 Epoch: 4 Global Step: 48510 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:37:15,274-Speed 5390.05 samples/sec Loss 8.9630 LearningRate 0.1951 Epoch: 4 Global Step: 48520 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:37:22,905-Speed 5367.92 samples/sec Loss 8.8246 LearningRate 0.1950 Epoch: 4 Global Step: 48530 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:37:30,433-Speed 5442.63 samples/sec Loss 8.8610 LearningRate 0.1950 Epoch: 4 Global Step: 48540 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:37:37,942-Speed 5455.29 samples/sec Loss 8.8279 LearningRate 0.1950 Epoch: 4 Global Step: 48550 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:37:45,607-Speed 5344.63 samples/sec Loss 8.8663 LearningRate 0.1950 Epoch: 4 Global Step: 48560 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:37:53,251-Speed 5359.15 samples/sec Loss 8.8732 LearningRate 0.1949 Epoch: 4 Global Step: 48570 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:38:00,912-Speed 5346.95 samples/sec Loss 8.8312 LearningRate 0.1949 Epoch: 4 Global Step: 48580 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:38:09,025-Speed 5049.52 samples/sec Loss 8.9319 LearningRate 0.1949 Epoch: 4 Global Step: 48590 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:38:16,579-Speed 5422.80 samples/sec Loss 8.9260 LearningRate 0.1949 Epoch: 4 Global Step: 48600 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:38:24,108-Speed 5440.98 samples/sec Loss 8.8625 LearningRate 0.1948 Epoch: 4 Global Step: 48610 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:38:31,678-Speed 5411.61 samples/sec Loss 8.8987 LearningRate 0.1948 Epoch: 4 Global Step: 48620 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:38:39,169-Speed 5468.53 samples/sec Loss 8.9397 LearningRate 0.1948 Epoch: 4 Global Step: 48630 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:38:46,690-Speed 5446.57 samples/sec Loss 8.8521 LearningRate 0.1948 Epoch: 4 Global Step: 48640 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:38:54,229-Speed 5433.88 samples/sec Loss 8.8504 LearningRate 0.1947 Epoch: 4 Global Step: 48650 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:39:01,678-Speed 5499.68 samples/sec Loss 8.8730 LearningRate 0.1947 Epoch: 4 Global Step: 48660 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:39:09,161-Speed 5474.50 samples/sec Loss 8.8791 LearningRate 0.1947 Epoch: 4 Global Step: 48670 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:39:16,668-Speed 5456.96 samples/sec Loss 8.7828 LearningRate 0.1947 Epoch: 4 Global Step: 48680 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:39:24,259-Speed 5396.68 samples/sec Loss 8.8606 LearningRate 0.1946 Epoch: 4 Global Step: 48690 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:39:31,839-Speed 5404.89 samples/sec Loss 8.9162 LearningRate 0.1946 Epoch: 4 Global Step: 48700 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:39:39,454-Speed 5379.33 samples/sec Loss 8.8479 LearningRate 0.1946 Epoch: 4 Global Step: 48710 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:39:47,031-Speed 5406.42 samples/sec Loss 8.8637 LearningRate 0.1946 Epoch: 4 Global Step: 48720 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:39:54,664-Speed 5367.00 samples/sec Loss 8.7631 LearningRate 0.1945 Epoch: 4 Global Step: 48730 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:40:02,272-Speed 5385.29 samples/sec Loss 8.8450 LearningRate 0.1945 Epoch: 4 Global Step: 48740 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:40:09,943-Speed 5339.94 samples/sec Loss 8.8685 LearningRate 0.1945 Epoch: 4 Global Step: 48750 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:40:17,517-Speed 5408.31 samples/sec Loss 8.9201 LearningRate 0.1945 Epoch: 4 Global Step: 48760 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:40:25,048-Speed 5439.63 samples/sec Loss 8.8928 LearningRate 0.1944 Epoch: 4 Global Step: 48770 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:40:32,593-Speed 5429.92 samples/sec Loss 8.8915 LearningRate 0.1944 Epoch: 4 Global Step: 48780 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:40:40,100-Speed 5457.29 samples/sec Loss 8.9232 LearningRate 0.1944 Epoch: 4 Global Step: 48790 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:40:47,557-Speed 5492.73 samples/sec Loss 8.8146 LearningRate 0.1944 Epoch: 4 Global Step: 48800 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:40:55,042-Speed 5472.76 samples/sec Loss 8.8668 LearningRate 0.1943 Epoch: 4 Global Step: 48810 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:41:02,578-Speed 5436.78 samples/sec Loss 8.8275 LearningRate 0.1943 Epoch: 4 Global Step: 48820 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:41:10,039-Speed 5490.61 samples/sec Loss 8.9367 LearningRate 0.1943 Epoch: 4 Global Step: 48830 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:41:17,523-Speed 5473.15 samples/sec Loss 8.8870 LearningRate 0.1943 Epoch: 4 Global Step: 48840 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:41:25,144-Speed 5375.50 samples/sec Loss 8.7917 LearningRate 0.1943 Epoch: 4 Global Step: 48850 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:41:32,672-Speed 5441.68 samples/sec Loss 8.8524 LearningRate 0.1942 Epoch: 4 Global Step: 48860 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:41:40,258-Speed 5400.50 samples/sec Loss 8.9470 LearningRate 0.1942 Epoch: 4 Global Step: 48870 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:41:47,788-Speed 5439.63 samples/sec Loss 8.8301 LearningRate 0.1942 Epoch: 4 Global Step: 48880 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:41:55,290-Speed 5461.05 samples/sec Loss 8.8204 LearningRate 0.1942 Epoch: 4 Global Step: 48890 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:42:02,793-Speed 5460.42 samples/sec Loss 8.8507 LearningRate 0.1941 Epoch: 4 Global Step: 48900 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:42:10,303-Speed 5455.39 samples/sec Loss 8.8523 LearningRate 0.1941 Epoch: 4 Global Step: 48910 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:42:17,726-Speed 5518.45 samples/sec Loss 8.9080 LearningRate 0.1941 Epoch: 4 Global Step: 48920 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:42:25,286-Speed 5418.64 samples/sec Loss 8.8148 LearningRate 0.1941 Epoch: 4 Global Step: 48930 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:42:32,872-Speed 5399.98 samples/sec Loss 8.8422 LearningRate 0.1940 Epoch: 4 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:42:40,367-Speed 5465.93 samples/sec Loss 8.8665 LearningRate 0.1940 Epoch: 4 Global Step: 48950 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:42:47,863-Speed 5465.02 samples/sec Loss 8.8105 LearningRate 0.1940 Epoch: 4 Global Step: 48960 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:42:55,420-Speed 5420.82 samples/sec Loss 8.7766 LearningRate 0.1940 Epoch: 4 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:43:02,971-Speed 5424.74 samples/sec Loss 8.8420 LearningRate 0.1939 Epoch: 4 Global Step: 48980 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:43:10,436-Speed 5487.76 samples/sec Loss 8.9069 LearningRate 0.1939 Epoch: 4 Global Step: 48990 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:43:17,974-Speed 5435.02 samples/sec Loss 8.8545 LearningRate 0.1939 Epoch: 4 Global Step: 49000 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:43:25,540-Speed 5413.99 samples/sec Loss 8.8742 LearningRate 0.1939 Epoch: 4 Global Step: 49010 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:43:33,066-Speed 5442.93 samples/sec Loss 8.9356 LearningRate 0.1938 Epoch: 4 Global Step: 49020 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:43:44,215-Speed 3674.24 samples/sec Loss 8.7994 LearningRate 0.1938 Epoch: 4 Global Step: 49030 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:43:51,766-Speed 5432.05 samples/sec Loss 8.8206 LearningRate 0.1938 Epoch: 4 Global Step: 49040 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:43:59,292-Speed 5442.59 samples/sec Loss 8.8505 LearningRate 0.1938 Epoch: 4 Global Step: 49050 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:44:06,722-Speed 5514.02 samples/sec Loss 8.8292 LearningRate 0.1937 Epoch: 4 Global Step: 49060 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:44:14,170-Speed 5500.43 samples/sec Loss 8.8382 LearningRate 0.1937 Epoch: 4 Global Step: 49070 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:44:21,628-Speed 5492.50 samples/sec Loss 8.8495 LearningRate 0.1937 Epoch: 4 Global Step: 49080 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:44:29,131-Speed 5459.80 samples/sec Loss 8.7434 LearningRate 0.1937 Epoch: 4 Global Step: 49090 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:44:36,633-Speed 5461.32 samples/sec Loss 8.8121 LearningRate 0.1936 Epoch: 4 Global Step: 49100 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:44:44,148-Speed 5451.32 samples/sec Loss 8.7646 LearningRate 0.1936 Epoch: 4 Global Step: 49110 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:44:51,710-Speed 5416.66 samples/sec Loss 8.7819 LearningRate 0.1936 Epoch: 4 Global Step: 49120 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:44:59,252-Speed 5432.06 samples/sec Loss 8.8342 LearningRate 0.1936 Epoch: 4 Global Step: 49130 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:45:06,819-Speed 5413.43 samples/sec Loss 8.8146 LearningRate 0.1935 Epoch: 4 Global Step: 49140 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:45:14,351-Speed 5438.99 samples/sec Loss 8.7487 LearningRate 0.1935 Epoch: 4 Global Step: 49150 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:45:21,880-Speed 5441.46 samples/sec Loss 8.8986 LearningRate 0.1935 Epoch: 4 Global Step: 49160 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:45:29,389-Speed 5454.86 samples/sec Loss 8.8265 LearningRate 0.1935 Epoch: 4 Global Step: 49170 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:45:36,831-Speed 5504.93 samples/sec Loss 8.8584 LearningRate 0.1934 Epoch: 4 Global Step: 49180 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:45:44,304-Speed 5482.52 samples/sec Loss 8.7457 LearningRate 0.1934 Epoch: 4 Global Step: 49190 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:45:51,953-Speed 5355.10 samples/sec Loss 8.8037 LearningRate 0.1934 Epoch: 4 Global Step: 49200 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:45:59,450-Speed 5464.24 samples/sec Loss 8.8136 LearningRate 0.1934 Epoch: 4 Global Step: 49210 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:46:07,049-Speed 5391.48 samples/sec Loss 8.7730 LearningRate 0.1933 Epoch: 4 Global Step: 49220 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:46:14,666-Speed 5377.75 samples/sec Loss 8.8015 LearningRate 0.1933 Epoch: 4 Global Step: 49230 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:46:22,222-Speed 5421.45 samples/sec Loss 8.7771 LearningRate 0.1933 Epoch: 4 Global Step: 49240 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:46:29,936-Speed 5310.98 samples/sec Loss 8.8015 LearningRate 0.1933 Epoch: 4 Global Step: 49250 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:46:37,456-Speed 5447.91 samples/sec Loss 8.9008 LearningRate 0.1932 Epoch: 4 Global Step: 49260 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:46:45,072-Speed 5378.55 samples/sec Loss 8.8486 LearningRate 0.1932 Epoch: 4 Global Step: 49270 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:46:52,736-Speed 5345.53 samples/sec Loss 8.8139 LearningRate 0.1932 Epoch: 4 Global Step: 49280 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:47:00,256-Speed 5447.14 samples/sec Loss 8.8719 LearningRate 0.1932 Epoch: 4 Global Step: 49290 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:47:07,896-Speed 5362.20 samples/sec Loss 8.7366 LearningRate 0.1931 Epoch: 4 Global Step: 49300 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:47:15,398-Speed 5461.01 samples/sec Loss 8.8251 LearningRate 0.1931 Epoch: 4 Global Step: 49310 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:47:22,932-Speed 5437.41 samples/sec Loss 8.8211 LearningRate 0.1931 Epoch: 4 Global Step: 49320 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:47:30,408-Speed 5479.73 samples/sec Loss 8.8550 LearningRate 0.1931 Epoch: 4 Global Step: 49330 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:47:37,886-Speed 5477.78 samples/sec Loss 8.7970 LearningRate 0.1931 Epoch: 4 Global Step: 49340 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:47:45,458-Speed 5410.18 samples/sec Loss 8.9414 LearningRate 0.1930 Epoch: 4 Global Step: 49350 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:47:53,008-Speed 5426.08 samples/sec Loss 8.7937 LearningRate 0.1930 Epoch: 4 Global Step: 49360 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:48:00,608-Speed 5390.21 samples/sec Loss 8.7576 LearningRate 0.1930 Epoch: 4 Global Step: 49370 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:48:08,143-Speed 5436.33 samples/sec Loss 8.8083 LearningRate 0.1930 Epoch: 4 Global Step: 49380 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:48:15,660-Speed 5450.15 samples/sec Loss 8.7965 LearningRate 0.1929 Epoch: 4 Global Step: 49390 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:48:23,189-Speed 5441.37 samples/sec Loss 8.8791 LearningRate 0.1929 Epoch: 4 Global Step: 49400 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:48:30,776-Speed 5398.85 samples/sec Loss 8.8572 LearningRate 0.1929 Epoch: 4 Global Step: 49410 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:48:38,250-Speed 5481.47 samples/sec Loss 8.8660 LearningRate 0.1929 Epoch: 4 Global Step: 49420 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:48:45,778-Speed 5441.83 samples/sec Loss 8.8712 LearningRate 0.1928 Epoch: 4 Global Step: 49430 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:48:53,272-Speed 5466.32 samples/sec Loss 8.8682 LearningRate 0.1928 Epoch: 4 Global Step: 49440 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:49:00,784-Speed 5453.19 samples/sec Loss 8.8309 LearningRate 0.1928 Epoch: 4 Global Step: 49450 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:49:08,307-Speed 5445.69 samples/sec Loss 8.8744 LearningRate 0.1928 Epoch: 4 Global Step: 49460 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:49:15,768-Speed 5490.41 samples/sec Loss 8.8392 LearningRate 0.1927 Epoch: 4 Global Step: 49470 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:49:23,323-Speed 5422.91 samples/sec Loss 8.7890 LearningRate 0.1927 Epoch: 4 Global Step: 49480 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:49:30,858-Speed 5436.39 samples/sec Loss 8.7407 LearningRate 0.1927 Epoch: 4 Global Step: 49490 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:49:38,409-Speed 5424.88 samples/sec Loss 8.8112 LearningRate 0.1927 Epoch: 4 Global Step: 49500 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:49:45,975-Speed 5414.48 samples/sec Loss 8.7811 LearningRate 0.1926 Epoch: 4 Global Step: 49510 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:49:53,466-Speed 5469.11 samples/sec Loss 8.8016 LearningRate 0.1926 Epoch: 4 Global Step: 49520 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:50:00,951-Speed 5473.19 samples/sec Loss 8.7982 LearningRate 0.1926 Epoch: 4 Global Step: 49530 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:50:08,440-Speed 5470.08 samples/sec Loss 8.8404 LearningRate 0.1926 Epoch: 4 Global Step: 49540 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:50:15,851-Speed 5527.64 samples/sec Loss 8.8049 LearningRate 0.1925 Epoch: 4 Global Step: 49550 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:50:23,344-Speed 5467.37 samples/sec Loss 8.8109 LearningRate 0.1925 Epoch: 4 Global Step: 49560 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:50:30,866-Speed 5445.93 samples/sec Loss 8.7420 LearningRate 0.1925 Epoch: 4 Global Step: 49570 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:50:38,376-Speed 5454.68 samples/sec Loss 8.7355 LearningRate 0.1925 Epoch: 4 Global Step: 49580 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:50:45,869-Speed 5467.21 samples/sec Loss 8.7861 LearningRate 0.1924 Epoch: 4 Global Step: 49590 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:50:53,364-Speed 5466.09 samples/sec Loss 8.8631 LearningRate 0.1924 Epoch: 4 Global Step: 49600 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:51:00,889-Speed 5443.62 samples/sec Loss 8.8041 LearningRate 0.1924 Epoch: 4 Global Step: 49610 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:51:08,447-Speed 5420.62 samples/sec Loss 8.8158 LearningRate 0.1924 Epoch: 4 Global Step: 49620 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:51:16,383-Speed 5162.05 samples/sec Loss 8.8361 LearningRate 0.1923 Epoch: 4 Global Step: 49630 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:51:24,115-Speed 5298.33 samples/sec Loss 8.8009 LearningRate 0.1923 Epoch: 4 Global Step: 49640 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:51:31,635-Speed 5446.92 samples/sec Loss 8.7941 LearningRate 0.1923 Epoch: 4 Global Step: 49650 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:51:39,110-Speed 5480.57 samples/sec Loss 8.7455 LearningRate 0.1923 Epoch: 4 Global Step: 49660 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:51:46,634-Speed 5444.22 samples/sec Loss 8.8058 LearningRate 0.1922 Epoch: 4 Global Step: 49670 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:51:54,215-Speed 5404.24 samples/sec Loss 8.7427 LearningRate 0.1922 Epoch: 4 Global Step: 49680 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:52:01,735-Speed 5447.76 samples/sec Loss 8.7664 LearningRate 0.1922 Epoch: 4 Global Step: 49690 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:52:09,272-Speed 5434.76 samples/sec Loss 8.7485 LearningRate 0.1922 Epoch: 4 Global Step: 49700 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:52:16,831-Speed 5419.15 samples/sec Loss 8.7485 LearningRate 0.1921 Epoch: 4 Global Step: 49710 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:52:24,351-Speed 5448.46 samples/sec Loss 8.8098 LearningRate 0.1921 Epoch: 4 Global Step: 49720 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:52:31,794-Speed 5503.22 samples/sec Loss 8.7571 LearningRate 0.1921 Epoch: 4 Global Step: 49730 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:52:39,311-Speed 5449.29 samples/sec Loss 8.8158 LearningRate 0.1921 Epoch: 4 Global Step: 49740 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:52:46,841-Speed 5440.25 samples/sec Loss 8.8008 LearningRate 0.1921 Epoch: 4 Global Step: 49750 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:52:54,425-Speed 5402.31 samples/sec Loss 8.8965 LearningRate 0.1920 Epoch: 4 Global Step: 49760 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:53:01,943-Speed 5448.89 samples/sec Loss 8.8386 LearningRate 0.1920 Epoch: 4 Global Step: 49770 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:53:09,529-Speed 5400.14 samples/sec Loss 8.8246 LearningRate 0.1920 Epoch: 4 Global Step: 49780 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:53:17,122-Speed 5394.94 samples/sec Loss 8.7891 LearningRate 0.1920 Epoch: 4 Global Step: 49790 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:53:24,657-Speed 5436.86 samples/sec Loss 8.7256 LearningRate 0.1919 Epoch: 4 Global Step: 49800 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:53:32,167-Speed 5454.91 samples/sec Loss 8.7625 LearningRate 0.1919 Epoch: 4 Global Step: 49810 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:53:39,680-Speed 5452.22 samples/sec Loss 8.7749 LearningRate 0.1919 Epoch: 4 Global Step: 49820 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:53:47,126-Speed 5502.11 samples/sec Loss 8.7610 LearningRate 0.1919 Epoch: 4 Global Step: 49830 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:53:54,673-Speed 5427.69 samples/sec Loss 8.8460 LearningRate 0.1918 Epoch: 4 Global Step: 49840 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:54:02,194-Speed 5447.40 samples/sec Loss 8.8041 LearningRate 0.1918 Epoch: 4 Global Step: 49850 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:54:09,717-Speed 5445.43 samples/sec Loss 8.7335 LearningRate 0.1918 Epoch: 4 Global Step: 49860 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:54:17,189-Speed 5482.33 samples/sec Loss 8.7881 LearningRate 0.1918 Epoch: 4 Global Step: 49870 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:54:24,769-Speed 5404.36 samples/sec Loss 8.7637 LearningRate 0.1917 Epoch: 4 Global Step: 49880 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:54:32,369-Speed 5390.48 samples/sec Loss 8.7158 LearningRate 0.1917 Epoch: 4 Global Step: 49890 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 05:54:39,917-Speed 5427.31 samples/sec Loss 8.8255 LearningRate 0.1917 Epoch: 4 Global Step: 49900 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:54:47,509-Speed 5395.82 samples/sec Loss 8.7053 LearningRate 0.1917 Epoch: 4 Global Step: 49910 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:54:55,060-Speed 5424.94 samples/sec Loss 8.7490 LearningRate 0.1916 Epoch: 4 Global Step: 49920 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:55:02,566-Speed 5457.45 samples/sec Loss 8.7831 LearningRate 0.1916 Epoch: 4 Global Step: 49930 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:55:10,047-Speed 5476.64 samples/sec Loss 8.8007 LearningRate 0.1916 Epoch: 4 Global Step: 49940 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:55:17,573-Speed 5442.75 samples/sec Loss 8.7293 LearningRate 0.1916 Epoch: 4 Global Step: 49950 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:55:25,077-Speed 5458.85 samples/sec Loss 8.7395 LearningRate 0.1915 Epoch: 4 Global Step: 49960 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:55:32,676-Speed 5390.80 samples/sec Loss 8.8211 LearningRate 0.1915 Epoch: 4 Global Step: 49970 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:55:40,221-Speed 5430.08 samples/sec Loss 8.8497 LearningRate 0.1915 Epoch: 4 Global Step: 49980 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:55:47,749-Speed 5441.14 samples/sec Loss 8.7937 LearningRate 0.1915 Epoch: 4 Global Step: 49990 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:55:55,329-Speed 5403.83 samples/sec Loss 8.7585 LearningRate 0.1914 Epoch: 4 Global Step: 50000 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:56:39,506-[lfw][50000]XNorm: 23.231188 Training: 2022-01-08 05:56:39,507-[lfw][50000]Accuracy-Flip: 0.99700+-0.00245 Training: 2022-01-08 05:56:39,507-[lfw][50000]Accuracy-Highest: 0.99817 Training: 2022-01-08 05:57:31,014-[cfp_fp][50000]XNorm: 21.256862 Training: 2022-01-08 05:57:31,015-[cfp_fp][50000]Accuracy-Flip: 0.98386+-0.00691 Training: 2022-01-08 05:57:31,016-[cfp_fp][50000]Accuracy-Highest: 0.98600 Training: 2022-01-08 05:58:17,299-[agedb_30][50000]XNorm: 23.094199 Training: 2022-01-08 05:58:17,300-[agedb_30][50000]Accuracy-Flip: 0.97083+-0.00970 Training: 2022-01-08 05:58:17,301-[agedb_30][50000]Accuracy-Highest: 0.97250 Training: 2022-01-08 05:58:24,853-Speed 273.94 samples/sec Loss 8.8490 LearningRate 0.1914 Epoch: 4 Global Step: 50010 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:58:32,324-Speed 5484.50 samples/sec Loss 8.8449 LearningRate 0.1914 Epoch: 4 Global Step: 50020 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:58:39,901-Speed 5407.34 samples/sec Loss 8.7510 LearningRate 0.1914 Epoch: 4 Global Step: 50030 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:58:47,418-Speed 5450.69 samples/sec Loss 8.8425 LearningRate 0.1913 Epoch: 4 Global Step: 50040 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:58:54,828-Speed 5528.59 samples/sec Loss 8.7903 LearningRate 0.1913 Epoch: 4 Global Step: 50050 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:59:02,402-Speed 5408.84 samples/sec Loss 8.7460 LearningRate 0.1913 Epoch: 4 Global Step: 50060 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:59:09,965-Speed 5415.99 samples/sec Loss 8.7330 LearningRate 0.1913 Epoch: 4 Global Step: 50070 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:59:17,575-Speed 5383.14 samples/sec Loss 8.7792 LearningRate 0.1912 Epoch: 4 Global Step: 50080 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:59:25,200-Speed 5373.37 samples/sec Loss 8.7140 LearningRate 0.1912 Epoch: 4 Global Step: 50090 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:59:32,750-Speed 5425.22 samples/sec Loss 8.7879 LearningRate 0.1912 Epoch: 4 Global Step: 50100 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:59:40,417-Speed 5342.92 samples/sec Loss 8.7856 LearningRate 0.1912 Epoch: 4 Global Step: 50110 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 05:59:47,934-Speed 5450.22 samples/sec Loss 8.7950 LearningRate 0.1912 Epoch: 4 Global Step: 50120 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 05:59:55,524-Speed 5397.77 samples/sec Loss 8.7721 LearningRate 0.1911 Epoch: 4 Global Step: 50130 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:00:03,182-Speed 5349.11 samples/sec Loss 8.7201 LearningRate 0.1911 Epoch: 4 Global Step: 50140 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:00:10,740-Speed 5420.33 samples/sec Loss 8.7950 LearningRate 0.1911 Epoch: 4 Global Step: 50150 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:00:18,368-Speed 5370.30 samples/sec Loss 8.7641 LearningRate 0.1911 Epoch: 4 Global Step: 50160 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:00:25,773-Speed 5532.19 samples/sec Loss 8.7260 LearningRate 0.1910 Epoch: 4 Global Step: 50170 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:00:33,335-Speed 5417.65 samples/sec Loss 8.6841 LearningRate 0.1910 Epoch: 4 Global Step: 50180 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:00:40,947-Speed 5381.28 samples/sec Loss 8.7562 LearningRate 0.1910 Epoch: 4 Global Step: 50190 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:00:48,457-Speed 5455.06 samples/sec Loss 8.8039 LearningRate 0.1910 Epoch: 4 Global Step: 50200 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:00:55,926-Speed 5484.84 samples/sec Loss 8.6479 LearningRate 0.1909 Epoch: 4 Global Step: 50210 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:01:03,433-Speed 5456.58 samples/sec Loss 8.7570 LearningRate 0.1909 Epoch: 4 Global Step: 50220 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:01:10,962-Speed 5440.78 samples/sec Loss 8.8095 LearningRate 0.1909 Epoch: 4 Global Step: 50230 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:01:18,462-Speed 5462.38 samples/sec Loss 8.7606 LearningRate 0.1909 Epoch: 4 Global Step: 50240 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:01:25,929-Speed 5486.04 samples/sec Loss 8.8096 LearningRate 0.1908 Epoch: 4 Global Step: 50250 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:01:33,388-Speed 5492.20 samples/sec Loss 8.7786 LearningRate 0.1908 Epoch: 4 Global Step: 50260 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:01:40,826-Speed 5507.49 samples/sec Loss 8.7845 LearningRate 0.1908 Epoch: 4 Global Step: 50270 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:01:48,398-Speed 5410.33 samples/sec Loss 8.7485 LearningRate 0.1908 Epoch: 4 Global Step: 50280 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:01:55,929-Speed 5439.70 samples/sec Loss 8.7962 LearningRate 0.1907 Epoch: 4 Global Step: 50290 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:02:03,444-Speed 5451.27 samples/sec Loss 8.7901 LearningRate 0.1907 Epoch: 4 Global Step: 50300 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:02:11,038-Speed 5394.32 samples/sec Loss 8.7668 LearningRate 0.1907 Epoch: 4 Global Step: 50310 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:02:18,491-Speed 5496.50 samples/sec Loss 8.7550 LearningRate 0.1907 Epoch: 4 Global Step: 50320 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:02:26,037-Speed 5428.77 samples/sec Loss 8.7211 LearningRate 0.1906 Epoch: 4 Global Step: 50330 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:02:33,609-Speed 5410.50 samples/sec Loss 8.7237 LearningRate 0.1906 Epoch: 4 Global Step: 50340 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:02:41,153-Speed 5429.91 samples/sec Loss 8.7948 LearningRate 0.1906 Epoch: 4 Global Step: 50350 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:02:48,633-Speed 5477.07 samples/sec Loss 8.8668 LearningRate 0.1906 Epoch: 4 Global Step: 50360 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:02:56,128-Speed 5465.87 samples/sec Loss 8.7629 LearningRate 0.1905 Epoch: 4 Global Step: 50370 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 06:03:03,675-Speed 5427.71 samples/sec Loss 8.7529 LearningRate 0.1905 Epoch: 4 Global Step: 50380 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 06:03:11,133-Speed 5493.23 samples/sec Loss 8.7470 LearningRate 0.1905 Epoch: 4 Global Step: 50390 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 06:03:18,598-Speed 5487.46 samples/sec Loss 8.7069 LearningRate 0.1905 Epoch: 4 Global Step: 50400 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:03:26,149-Speed 5425.19 samples/sec Loss 8.7718 LearningRate 0.1904 Epoch: 4 Global Step: 50410 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:03:33,762-Speed 5380.88 samples/sec Loss 8.7689 LearningRate 0.1904 Epoch: 4 Global Step: 50420 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:03:41,366-Speed 5387.83 samples/sec Loss 8.7728 LearningRate 0.1904 Epoch: 4 Global Step: 50430 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:03:48,962-Speed 5392.17 samples/sec Loss 8.7363 LearningRate 0.1904 Epoch: 4 Global Step: 50440 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:03:56,521-Speed 5419.63 samples/sec Loss 8.7157 LearningRate 0.1903 Epoch: 4 Global Step: 50450 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:04:04,062-Speed 5432.19 samples/sec Loss 8.7899 LearningRate 0.1903 Epoch: 4 Global Step: 50460 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:04:11,567-Speed 5458.63 samples/sec Loss 8.7049 LearningRate 0.1903 Epoch: 4 Global Step: 50470 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:04:19,129-Speed 5417.00 samples/sec Loss 8.7506 LearningRate 0.1903 Epoch: 4 Global Step: 50480 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:04:26,734-Speed 5386.59 samples/sec Loss 8.7740 LearningRate 0.1903 Epoch: 4 Global Step: 50490 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:04:34,166-Speed 5512.31 samples/sec Loss 8.7771 LearningRate 0.1902 Epoch: 4 Global Step: 50500 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:04:41,707-Speed 5432.14 samples/sec Loss 8.7041 LearningRate 0.1902 Epoch: 4 Global Step: 50510 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:04:49,175-Speed 5485.43 samples/sec Loss 8.7347 LearningRate 0.1902 Epoch: 4 Global Step: 50520 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:04:56,697-Speed 5446.13 samples/sec Loss 8.7579 LearningRate 0.1902 Epoch: 4 Global Step: 50530 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:05:04,235-Speed 5434.83 samples/sec Loss 8.7440 LearningRate 0.1901 Epoch: 4 Global Step: 50540 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:05:11,685-Speed 5498.59 samples/sec Loss 8.7267 LearningRate 0.1901 Epoch: 4 Global Step: 50550 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:05:19,130-Speed 5502.45 samples/sec Loss 8.7230 LearningRate 0.1901 Epoch: 4 Global Step: 50560 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:05:26,699-Speed 5412.87 samples/sec Loss 8.8120 LearningRate 0.1901 Epoch: 4 Global Step: 50570 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:05:34,423-Speed 5303.57 samples/sec Loss 8.7136 LearningRate 0.1900 Epoch: 4 Global Step: 50580 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:05:42,016-Speed 5394.90 samples/sec Loss 8.7611 LearningRate 0.1900 Epoch: 4 Global Step: 50590 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:05:49,617-Speed 5389.35 samples/sec Loss 8.7124 LearningRate 0.1900 Epoch: 4 Global Step: 50600 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 06:05:57,159-Speed 5432.21 samples/sec Loss 8.8066 LearningRate 0.1900 Epoch: 4 Global Step: 50610 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:06:04,645-Speed 5472.00 samples/sec Loss 8.6961 LearningRate 0.1899 Epoch: 4 Global Step: 50620 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:06:12,190-Speed 5429.58 samples/sec Loss 8.6222 LearningRate 0.1899 Epoch: 4 Global Step: 50630 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:06:19,731-Speed 5432.47 samples/sec Loss 8.7404 LearningRate 0.1899 Epoch: 4 Global Step: 50640 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:06:27,262-Speed 5439.58 samples/sec Loss 8.7268 LearningRate 0.1899 Epoch: 4 Global Step: 50650 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:06:34,739-Speed 5479.21 samples/sec Loss 8.8124 LearningRate 0.1898 Epoch: 4 Global Step: 50660 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:06:42,217-Speed 5478.07 samples/sec Loss 8.7750 LearningRate 0.1898 Epoch: 4 Global Step: 50670 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:06:49,779-Speed 5417.23 samples/sec Loss 8.7143 LearningRate 0.1898 Epoch: 4 Global Step: 50680 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:06:57,394-Speed 5379.36 samples/sec Loss 8.7084 LearningRate 0.1898 Epoch: 4 Global Step: 50690 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:07:04,901-Speed 5457.20 samples/sec Loss 8.7180 LearningRate 0.1897 Epoch: 4 Global Step: 50700 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:07:12,438-Speed 5434.91 samples/sec Loss 8.7079 LearningRate 0.1897 Epoch: 4 Global Step: 50710 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 06:07:19,949-Speed 5454.56 samples/sec Loss 8.7653 LearningRate 0.1897 Epoch: 4 Global Step: 50720 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 06:07:27,402-Speed 5496.44 samples/sec Loss 8.7181 LearningRate 0.1897 Epoch: 4 Global Step: 50730 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 06:07:34,890-Speed 5470.74 samples/sec Loss 8.7420 LearningRate 0.1896 Epoch: 4 Global Step: 50740 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:07:42,375-Speed 5473.53 samples/sec Loss 8.7355 LearningRate 0.1896 Epoch: 4 Global Step: 50750 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:07:49,910-Speed 5436.16 samples/sec Loss 8.6780 LearningRate 0.1896 Epoch: 4 Global Step: 50760 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:07:57,421-Speed 5454.27 samples/sec Loss 8.7278 LearningRate 0.1896 Epoch: 4 Global Step: 50770 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:08:04,982-Speed 5418.13 samples/sec Loss 8.7381 LearningRate 0.1896 Epoch: 4 Global Step: 50780 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:08:12,480-Speed 5463.25 samples/sec Loss 8.8019 LearningRate 0.1895 Epoch: 4 Global Step: 50790 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:08:20,045-Speed 5415.39 samples/sec Loss 8.7429 LearningRate 0.1895 Epoch: 4 Global Step: 50800 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:08:27,542-Speed 5463.84 samples/sec Loss 8.6934 LearningRate 0.1895 Epoch: 4 Global Step: 50810 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:08:35,085-Speed 5431.24 samples/sec Loss 8.7548 LearningRate 0.1895 Epoch: 4 Global Step: 50820 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:08:42,569-Speed 5473.95 samples/sec Loss 8.6658 LearningRate 0.1894 Epoch: 4 Global Step: 50830 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:08:50,058-Speed 5470.29 samples/sec Loss 8.7566 LearningRate 0.1894 Epoch: 4 Global Step: 50840 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:08:57,576-Speed 5448.23 samples/sec Loss 8.7324 LearningRate 0.1894 Epoch: 4 Global Step: 50850 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:09:05,067-Speed 5469.50 samples/sec Loss 8.6903 LearningRate 0.1894 Epoch: 4 Global Step: 50860 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:09:12,556-Speed 5469.60 samples/sec Loss 8.7419 LearningRate 0.1893 Epoch: 4 Global Step: 50870 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:09:20,058-Speed 5460.99 samples/sec Loss 8.6822 LearningRate 0.1893 Epoch: 4 Global Step: 50880 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:09:27,547-Speed 5469.76 samples/sec Loss 8.7522 LearningRate 0.1893 Epoch: 4 Global Step: 50890 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:09:35,026-Speed 5477.55 samples/sec Loss 8.7153 LearningRate 0.1893 Epoch: 4 Global Step: 50900 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:09:42,585-Speed 5419.48 samples/sec Loss 8.7466 LearningRate 0.1892 Epoch: 4 Global Step: 50910 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:09:50,220-Speed 5365.74 samples/sec Loss 8.7373 LearningRate 0.1892 Epoch: 4 Global Step: 50920 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:09:57,731-Speed 5453.91 samples/sec Loss 8.6797 LearningRate 0.1892 Epoch: 4 Global Step: 50930 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:10:05,260-Speed 5440.96 samples/sec Loss 8.7594 LearningRate 0.1892 Epoch: 4 Global Step: 50940 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:10:12,980-Speed 5306.47 samples/sec Loss 8.7681 LearningRate 0.1891 Epoch: 4 Global Step: 50950 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:10:20,441-Speed 5490.93 samples/sec Loss 8.6885 LearningRate 0.1891 Epoch: 4 Global Step: 50960 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:10:28,000-Speed 5418.80 samples/sec Loss 8.7030 LearningRate 0.1891 Epoch: 4 Global Step: 50970 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:10:35,422-Speed 5520.24 samples/sec Loss 8.6954 LearningRate 0.1891 Epoch: 4 Global Step: 50980 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:10:42,892-Speed 5483.75 samples/sec Loss 8.7142 LearningRate 0.1890 Epoch: 4 Global Step: 50990 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:10:50,404-Speed 5453.02 samples/sec Loss 8.7039 LearningRate 0.1890 Epoch: 4 Global Step: 51000 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:10:57,884-Speed 5476.75 samples/sec Loss 8.7182 LearningRate 0.1890 Epoch: 4 Global Step: 51010 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:11:05,384-Speed 5462.10 samples/sec Loss 8.7184 LearningRate 0.1890 Epoch: 4 Global Step: 51020 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:11:12,880-Speed 5465.03 samples/sec Loss 8.7118 LearningRate 0.1889 Epoch: 4 Global Step: 51030 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:11:20,467-Speed 5399.37 samples/sec Loss 8.8364 LearningRate 0.1889 Epoch: 4 Global Step: 51040 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:11:28,040-Speed 5409.38 samples/sec Loss 8.6613 LearningRate 0.1889 Epoch: 4 Global Step: 51050 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:11:35,496-Speed 5494.61 samples/sec Loss 8.6615 LearningRate 0.1889 Epoch: 4 Global Step: 51060 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:11:42,986-Speed 5468.69 samples/sec Loss 8.7769 LearningRate 0.1888 Epoch: 4 Global Step: 51070 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:11:50,544-Speed 5420.79 samples/sec Loss 8.7344 LearningRate 0.1888 Epoch: 4 Global Step: 51080 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:11:58,070-Speed 5442.64 samples/sec Loss 8.6173 LearningRate 0.1888 Epoch: 4 Global Step: 51090 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:12:05,555-Speed 5473.25 samples/sec Loss 8.6905 LearningRate 0.1888 Epoch: 4 Global Step: 51100 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:12:13,104-Speed 5426.58 samples/sec Loss 8.7427 LearningRate 0.1888 Epoch: 4 Global Step: 51110 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:12:20,667-Speed 5416.19 samples/sec Loss 8.7063 LearningRate 0.1887 Epoch: 4 Global Step: 51120 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:12:28,228-Speed 5418.05 samples/sec Loss 8.7300 LearningRate 0.1887 Epoch: 4 Global Step: 51130 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:12:35,782-Speed 5423.33 samples/sec Loss 8.6813 LearningRate 0.1887 Epoch: 4 Global Step: 51140 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:12:43,281-Speed 5462.67 samples/sec Loss 8.7641 LearningRate 0.1887 Epoch: 4 Global Step: 51150 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:12:50,862-Speed 5403.42 samples/sec Loss 8.7423 LearningRate 0.1886 Epoch: 4 Global Step: 51160 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:12:58,367-Speed 5458.57 samples/sec Loss 8.6851 LearningRate 0.1886 Epoch: 4 Global Step: 51170 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:13:05,935-Speed 5413.02 samples/sec Loss 8.6860 LearningRate 0.1886 Epoch: 4 Global Step: 51180 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:13:13,447-Speed 5453.18 samples/sec Loss 8.7385 LearningRate 0.1886 Epoch: 4 Global Step: 51190 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:13:21,004-Speed 5420.92 samples/sec Loss 8.7175 LearningRate 0.1885 Epoch: 4 Global Step: 51200 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:13:28,530-Speed 5443.30 samples/sec Loss 8.6703 LearningRate 0.1885 Epoch: 4 Global Step: 51210 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:13:36,066-Speed 5436.21 samples/sec Loss 8.7125 LearningRate 0.1885 Epoch: 4 Global Step: 51220 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:13:43,560-Speed 5466.06 samples/sec Loss 8.6925 LearningRate 0.1885 Epoch: 4 Global Step: 51230 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:13:51,055-Speed 5465.11 samples/sec Loss 8.6500 LearningRate 0.1884 Epoch: 4 Global Step: 51240 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:13:58,529-Speed 5481.10 samples/sec Loss 8.7238 LearningRate 0.1884 Epoch: 4 Global Step: 51250 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:14:06,111-Speed 5403.81 samples/sec Loss 8.7025 LearningRate 0.1884 Epoch: 4 Global Step: 51260 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:14:13,638-Speed 5442.01 samples/sec Loss 8.6912 LearningRate 0.1884 Epoch: 4 Global Step: 51270 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:14:21,126-Speed 5470.93 samples/sec Loss 8.6354 LearningRate 0.1883 Epoch: 4 Global Step: 51280 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:14:28,695-Speed 5412.07 samples/sec Loss 8.7119 LearningRate 0.1883 Epoch: 4 Global Step: 51290 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:14:36,174-Speed 5477.70 samples/sec Loss 8.6544 LearningRate 0.1883 Epoch: 4 Global Step: 51300 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:14:43,652-Speed 5477.93 samples/sec Loss 8.7056 LearningRate 0.1883 Epoch: 4 Global Step: 51310 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:14:51,102-Speed 5498.28 samples/sec Loss 8.7550 LearningRate 0.1882 Epoch: 4 Global Step: 51320 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:14:58,697-Speed 5394.16 samples/sec Loss 8.7411 LearningRate 0.1882 Epoch: 4 Global Step: 51330 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:15:06,194-Speed 5464.34 samples/sec Loss 8.6707 LearningRate 0.1882 Epoch: 4 Global Step: 51340 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:15:13,711-Speed 5449.82 samples/sec Loss 8.6696 LearningRate 0.1882 Epoch: 4 Global Step: 51350 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:15:21,247-Speed 5435.12 samples/sec Loss 8.7262 LearningRate 0.1881 Epoch: 4 Global Step: 51360 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:15:28,796-Speed 5427.03 samples/sec Loss 8.7770 LearningRate 0.1881 Epoch: 4 Global Step: 51370 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:15:36,216-Speed 5521.23 samples/sec Loss 8.7030 LearningRate 0.1881 Epoch: 4 Global Step: 51380 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:15:43,715-Speed 5462.64 samples/sec Loss 8.6659 LearningRate 0.1881 Epoch: 4 Global Step: 51390 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:15:51,238-Speed 5445.52 samples/sec Loss 8.6905 LearningRate 0.1881 Epoch: 4 Global Step: 51400 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:15:58,747-Speed 5455.08 samples/sec Loss 8.6723 LearningRate 0.1880 Epoch: 4 Global Step: 51410 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:16:06,377-Speed 5369.13 samples/sec Loss 8.7027 LearningRate 0.1880 Epoch: 4 Global Step: 51420 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:16:13,941-Speed 5415.65 samples/sec Loss 8.6628 LearningRate 0.1880 Epoch: 4 Global Step: 51430 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:16:21,468-Speed 5442.99 samples/sec Loss 8.6551 LearningRate 0.1880 Epoch: 4 Global Step: 51440 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:16:29,052-Speed 5401.45 samples/sec Loss 8.7079 LearningRate 0.1879 Epoch: 4 Global Step: 51450 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:16:36,575-Speed 5445.21 samples/sec Loss 8.6628 LearningRate 0.1879 Epoch: 4 Global Step: 51460 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:16:44,040-Speed 5487.49 samples/sec Loss 8.6983 LearningRate 0.1879 Epoch: 4 Global Step: 51470 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:16:51,524-Speed 5473.74 samples/sec Loss 8.6829 LearningRate 0.1879 Epoch: 4 Global Step: 51480 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:16:59,071-Speed 5428.65 samples/sec Loss 8.6903 LearningRate 0.1878 Epoch: 4 Global Step: 51490 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:17:06,554-Speed 5473.47 samples/sec Loss 8.7497 LearningRate 0.1878 Epoch: 4 Global Step: 51500 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:17:14,073-Speed 5448.52 samples/sec Loss 8.6988 LearningRate 0.1878 Epoch: 4 Global Step: 51510 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:17:21,636-Speed 5417.36 samples/sec Loss 8.6142 LearningRate 0.1878 Epoch: 4 Global Step: 51520 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:17:29,246-Speed 5382.67 samples/sec Loss 8.6896 LearningRate 0.1877 Epoch: 4 Global Step: 51530 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:17:36,807-Speed 5418.12 samples/sec Loss 8.7418 LearningRate 0.1877 Epoch: 4 Global Step: 51540 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:17:44,318-Speed 5454.50 samples/sec Loss 8.7139 LearningRate 0.1877 Epoch: 4 Global Step: 51550 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:17:51,922-Speed 5387.30 samples/sec Loss 8.6786 LearningRate 0.1877 Epoch: 4 Global Step: 51560 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:18:00,243-Speed 4922.97 samples/sec Loss 8.7090 LearningRate 0.1876 Epoch: 4 Global Step: 51570 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:18:07,806-Speed 5416.59 samples/sec Loss 8.6983 LearningRate 0.1876 Epoch: 4 Global Step: 51580 Fp16 Grad Scale: 262144 Required: 36 hours Training: 2022-01-08 06:18:15,317-Speed 5454.24 samples/sec Loss 8.7065 LearningRate 0.1876 Epoch: 4 Global Step: 51590 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:18:22,960-Speed 5360.21 samples/sec Loss 8.6609 LearningRate 0.1876 Epoch: 4 Global Step: 51600 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:18:30,507-Speed 5427.87 samples/sec Loss 8.7079 LearningRate 0.1875 Epoch: 4 Global Step: 51610 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:18:38,110-Speed 5387.70 samples/sec Loss 8.7099 LearningRate 0.1875 Epoch: 4 Global Step: 51620 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:18:45,688-Speed 5406.49 samples/sec Loss 8.6923 LearningRate 0.1875 Epoch: 4 Global Step: 51630 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:18:53,307-Speed 5376.58 samples/sec Loss 8.6734 LearningRate 0.1875 Epoch: 4 Global Step: 51640 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:19:00,918-Speed 5382.46 samples/sec Loss 8.6757 LearningRate 0.1874 Epoch: 4 Global Step: 51650 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:19:08,472-Speed 5423.28 samples/sec Loss 8.6721 LearningRate 0.1874 Epoch: 4 Global Step: 51660 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:19:16,266-Speed 5256.04 samples/sec Loss 8.6073 LearningRate 0.1874 Epoch: 4 Global Step: 51670 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:19:23,859-Speed 5395.02 samples/sec Loss 8.7486 LearningRate 0.1874 Epoch: 4 Global Step: 51680 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:19:31,320-Speed 5489.75 samples/sec Loss 8.7284 LearningRate 0.1874 Epoch: 4 Global Step: 51690 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:19:38,955-Speed 5365.55 samples/sec Loss 8.6745 LearningRate 0.1873 Epoch: 4 Global Step: 51700 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:19:46,597-Speed 5360.88 samples/sec Loss 8.7385 LearningRate 0.1873 Epoch: 4 Global Step: 51710 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:19:54,085-Speed 5470.75 samples/sec Loss 8.6854 LearningRate 0.1873 Epoch: 4 Global Step: 51720 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:20:01,672-Speed 5399.48 samples/sec Loss 8.7246 LearningRate 0.1873 Epoch: 4 Global Step: 51730 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:20:09,184-Speed 5453.38 samples/sec Loss 8.6434 LearningRate 0.1872 Epoch: 4 Global Step: 51740 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:20:16,776-Speed 5396.04 samples/sec Loss 8.6804 LearningRate 0.1872 Epoch: 4 Global Step: 51750 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:20:24,271-Speed 5465.59 samples/sec Loss 8.7661 LearningRate 0.1872 Epoch: 4 Global Step: 51760 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:20:31,896-Speed 5372.82 samples/sec Loss 8.6631 LearningRate 0.1872 Epoch: 4 Global Step: 51770 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:20:39,473-Speed 5406.22 samples/sec Loss 8.6095 LearningRate 0.1871 Epoch: 4 Global Step: 51780 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-01-08 06:20:46,937-Speed 5488.65 samples/sec Loss 8.7057 LearningRate 0.1871 Epoch: 4 Global Step: 51790 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:20:54,456-Speed 5448.42 samples/sec Loss 8.6817 LearningRate 0.1871 Epoch: 4 Global Step: 51800 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:21:02,000-Speed 5430.25 samples/sec Loss 8.6297 LearningRate 0.1871 Epoch: 4 Global Step: 51810 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:21:09,420-Speed 5520.53 samples/sec Loss 8.6845 LearningRate 0.1870 Epoch: 4 Global Step: 51820 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:21:17,006-Speed 5400.14 samples/sec Loss 8.6698 LearningRate 0.1870 Epoch: 4 Global Step: 51830 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:21:24,641-Speed 5365.62 samples/sec Loss 8.7294 LearningRate 0.1870 Epoch: 4 Global Step: 51840 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:21:46,897-Speed 1840.52 samples/sec Loss 8.6155 LearningRate 0.1870 Epoch: 5 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:21:54,394-Speed 5464.43 samples/sec Loss 8.6472 LearningRate 0.1869 Epoch: 5 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:22:01,833-Speed 5506.93 samples/sec Loss 8.6924 LearningRate 0.1869 Epoch: 5 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:22:09,429-Speed 5393.20 samples/sec Loss 8.6392 LearningRate 0.1869 Epoch: 5 Global Step: 51880 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:22:17,116-Speed 5328.51 samples/sec Loss 8.5739 LearningRate 0.1869 Epoch: 5 Global Step: 51890 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:22:24,518-Speed 5534.42 samples/sec Loss 8.5394 LearningRate 0.1868 Epoch: 5 Global Step: 51900 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:22:31,931-Speed 5526.46 samples/sec Loss 8.6105 LearningRate 0.1868 Epoch: 5 Global Step: 51910 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:22:39,375-Speed 5503.97 samples/sec Loss 8.6703 LearningRate 0.1868 Epoch: 5 Global Step: 51920 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:22:46,947-Speed 5409.72 samples/sec Loss 8.6522 LearningRate 0.1868 Epoch: 5 Global Step: 51930 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:22:54,439-Speed 5468.09 samples/sec Loss 8.6649 LearningRate 0.1868 Epoch: 5 Global Step: 51940 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:23:01,841-Speed 5533.70 samples/sec Loss 8.7116 LearningRate 0.1867 Epoch: 5 Global Step: 51950 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:23:09,350-Speed 5456.03 samples/sec Loss 8.6589 LearningRate 0.1867 Epoch: 5 Global Step: 51960 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:23:16,828-Speed 5478.41 samples/sec Loss 8.6580 LearningRate 0.1867 Epoch: 5 Global Step: 51970 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:23:24,294-Speed 5486.26 samples/sec Loss 8.7402 LearningRate 0.1867 Epoch: 5 Global Step: 51980 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:23:31,762-Speed 5485.43 samples/sec Loss 8.6899 LearningRate 0.1866 Epoch: 5 Global Step: 51990 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:23:39,206-Speed 5503.21 samples/sec Loss 8.6634 LearningRate 0.1866 Epoch: 5 Global Step: 52000 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:24:24,102-[lfw][52000]XNorm: 23.197142 Training: 2022-01-08 06:24:24,102-[lfw][52000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-01-08 06:24:24,103-[lfw][52000]Accuracy-Highest: 0.99817 Training: 2022-01-08 06:25:16,178-[cfp_fp][52000]XNorm: 20.822825 Training: 2022-01-08 06:25:16,179-[cfp_fp][52000]Accuracy-Flip: 0.98429+-0.00383 Training: 2022-01-08 06:25:16,179-[cfp_fp][52000]Accuracy-Highest: 0.98600 Training: 2022-01-08 06:26:02,076-[agedb_30][52000]XNorm: 22.821595 Training: 2022-01-08 06:26:02,077-[agedb_30][52000]Accuracy-Flip: 0.97233+-0.00790 Training: 2022-01-08 06:26:02,078-[agedb_30][52000]Accuracy-Highest: 0.97250 Training: 2022-01-08 06:26:09,646-Speed 272.27 samples/sec Loss 8.6605 LearningRate 0.1866 Epoch: 5 Global Step: 52010 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:26:17,096-Speed 5501.96 samples/sec Loss 8.6896 LearningRate 0.1866 Epoch: 5 Global Step: 52020 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:26:24,651-Speed 5422.08 samples/sec Loss 8.6153 LearningRate 0.1865 Epoch: 5 Global Step: 52030 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:26:32,116-Speed 5488.60 samples/sec Loss 8.6518 LearningRate 0.1865 Epoch: 5 Global Step: 52040 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:26:39,607-Speed 5468.60 samples/sec Loss 8.6280 LearningRate 0.1865 Epoch: 5 Global Step: 52050 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:26:47,028-Speed 5520.78 samples/sec Loss 8.6180 LearningRate 0.1865 Epoch: 5 Global Step: 52060 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:26:54,503-Speed 5481.01 samples/sec Loss 8.7284 LearningRate 0.1864 Epoch: 5 Global Step: 52070 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:27:01,915-Speed 5527.16 samples/sec Loss 8.6294 LearningRate 0.1864 Epoch: 5 Global Step: 52080 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:27:09,203-Speed 5621.70 samples/sec Loss 8.5798 LearningRate 0.1864 Epoch: 5 Global Step: 52090 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:27:16,661-Speed 5493.60 samples/sec Loss 8.6836 LearningRate 0.1864 Epoch: 5 Global Step: 52100 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:27:24,175-Speed 5452.46 samples/sec Loss 8.5745 LearningRate 0.1863 Epoch: 5 Global Step: 52110 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:27:31,746-Speed 5411.82 samples/sec Loss 8.6242 LearningRate 0.1863 Epoch: 5 Global Step: 52120 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:27:39,315-Speed 5412.85 samples/sec Loss 8.5843 LearningRate 0.1863 Epoch: 5 Global Step: 52130 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:27:46,967-Speed 5353.90 samples/sec Loss 8.6036 LearningRate 0.1863 Epoch: 5 Global Step: 52140 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:27:54,696-Speed 5300.77 samples/sec Loss 8.6901 LearningRate 0.1862 Epoch: 5 Global Step: 52150 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:28:02,357-Speed 5347.49 samples/sec Loss 8.6681 LearningRate 0.1862 Epoch: 5 Global Step: 52160 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:28:10,039-Speed 5332.82 samples/sec Loss 8.6425 LearningRate 0.1862 Epoch: 5 Global Step: 52170 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:28:17,837-Speed 5253.61 samples/sec Loss 8.6976 LearningRate 0.1862 Epoch: 5 Global Step: 52180 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:28:25,552-Speed 5310.21 samples/sec Loss 8.6170 LearningRate 0.1862 Epoch: 5 Global Step: 52190 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:28:33,162-Speed 5383.74 samples/sec Loss 8.6419 LearningRate 0.1861 Epoch: 5 Global Step: 52200 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:28:40,889-Speed 5301.19 samples/sec Loss 8.6050 LearningRate 0.1861 Epoch: 5 Global Step: 52210 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:28:48,560-Speed 5340.41 samples/sec Loss 8.6741 LearningRate 0.1861 Epoch: 5 Global Step: 52220 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:28:56,365-Speed 5249.30 samples/sec Loss 8.5536 LearningRate 0.1861 Epoch: 5 Global Step: 52230 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:29:04,190-Speed 5234.70 samples/sec Loss 8.6886 LearningRate 0.1860 Epoch: 5 Global Step: 52240 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:29:11,898-Speed 5314.84 samples/sec Loss 8.6562 LearningRate 0.1860 Epoch: 5 Global Step: 52250 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:29:19,584-Speed 5330.10 samples/sec Loss 8.6425 LearningRate 0.1860 Epoch: 5 Global Step: 52260 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:29:27,172-Speed 5399.14 samples/sec Loss 8.6102 LearningRate 0.1860 Epoch: 5 Global Step: 52270 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:29:34,767-Speed 5394.27 samples/sec Loss 8.6208 LearningRate 0.1859 Epoch: 5 Global Step: 52280 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:29:42,467-Speed 5319.61 samples/sec Loss 8.6683 LearningRate 0.1859 Epoch: 5 Global Step: 52290 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:29:50,032-Speed 5415.19 samples/sec Loss 8.6153 LearningRate 0.1859 Epoch: 5 Global Step: 52300 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:29:57,510-Speed 5477.98 samples/sec Loss 8.6279 LearningRate 0.1859 Epoch: 5 Global Step: 52310 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:30:05,151-Speed 5361.58 samples/sec Loss 8.6337 LearningRate 0.1858 Epoch: 5 Global Step: 52320 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:30:12,619-Speed 5485.45 samples/sec Loss 8.6390 LearningRate 0.1858 Epoch: 5 Global Step: 52330 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:30:20,278-Speed 5348.23 samples/sec Loss 8.6059 LearningRate 0.1858 Epoch: 5 Global Step: 52340 Fp16 Grad Scale: 131072 Required: 36 hours Training: 2022-01-08 06:30:27,820-Speed 5431.72 samples/sec Loss 8.6833 LearningRate 0.1858 Epoch: 5 Global Step: 52350 Fp16 Grad Scale: 65536 Required: 36 hours Training: 2022-01-08 06:30:35,514-Speed 5324.74 samples/sec Loss 8.6408 LearningRate 0.1857 Epoch: 5 Global Step: 52360 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:30:43,042-Speed 5441.75 samples/sec Loss 8.6670 LearningRate 0.1857 Epoch: 5 Global Step: 52370 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:30:50,505-Speed 5488.21 samples/sec Loss 8.6737 LearningRate 0.1857 Epoch: 5 Global Step: 52380 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:30:57,984-Speed 5477.62 samples/sec Loss 8.6600 LearningRate 0.1857 Epoch: 5 Global Step: 52390 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:31:05,419-Speed 5509.78 samples/sec Loss 8.6564 LearningRate 0.1856 Epoch: 5 Global Step: 52400 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:31:12,873-Speed 5496.03 samples/sec Loss 8.6893 LearningRate 0.1856 Epoch: 5 Global Step: 52410 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:31:20,648-Speed 5268.47 samples/sec Loss 8.6348 LearningRate 0.1856 Epoch: 5 Global Step: 52420 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:31:28,285-Speed 5364.79 samples/sec Loss 8.6251 LearningRate 0.1856 Epoch: 5 Global Step: 52430 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:31:35,828-Speed 5431.07 samples/sec Loss 8.6420 LearningRate 0.1856 Epoch: 5 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:31:43,311-Speed 5473.77 samples/sec Loss 8.5771 LearningRate 0.1855 Epoch: 5 Global Step: 52450 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:31:51,016-Speed 5316.66 samples/sec Loss 8.6341 LearningRate 0.1855 Epoch: 5 Global Step: 52460 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:31:58,589-Speed 5410.26 samples/sec Loss 8.6086 LearningRate 0.1855 Epoch: 5 Global Step: 52470 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:32:06,032-Speed 5503.90 samples/sec Loss 8.6696 LearningRate 0.1855 Epoch: 5 Global Step: 52480 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:32:13,645-Speed 5380.78 samples/sec Loss 8.5892 LearningRate 0.1854 Epoch: 5 Global Step: 52490 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:32:21,105-Speed 5491.42 samples/sec Loss 8.6602 LearningRate 0.1854 Epoch: 5 Global Step: 52500 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:32:28,620-Speed 5451.21 samples/sec Loss 8.6121 LearningRate 0.1854 Epoch: 5 Global Step: 52510 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:32:36,355-Speed 5296.90 samples/sec Loss 8.5790 LearningRate 0.1854 Epoch: 5 Global Step: 52520 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:32:43,867-Speed 5452.71 samples/sec Loss 8.6412 LearningRate 0.1853 Epoch: 5 Global Step: 52530 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:32:51,358-Speed 5469.00 samples/sec Loss 8.7030 LearningRate 0.1853 Epoch: 5 Global Step: 52540 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:32:58,891-Speed 5438.32 samples/sec Loss 8.7113 LearningRate 0.1853 Epoch: 5 Global Step: 52550 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 06:33:06,397-Speed 5457.61 samples/sec Loss 8.6388 LearningRate 0.1853 Epoch: 5 Global Step: 52560 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 06:33:13,848-Speed 5497.28 samples/sec Loss 8.5376 LearningRate 0.1852 Epoch: 5 Global Step: 52570 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:33:21,394-Speed 5429.18 samples/sec Loss 8.6302 LearningRate 0.1852 Epoch: 5 Global Step: 52580 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:33:28,956-Speed 5418.32 samples/sec Loss 8.6023 LearningRate 0.1852 Epoch: 5 Global Step: 52590 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:33:36,421-Speed 5489.17 samples/sec Loss 8.5618 LearningRate 0.1852 Epoch: 5 Global Step: 52600 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:33:43,932-Speed 5453.60 samples/sec Loss 8.6396 LearningRate 0.1851 Epoch: 5 Global Step: 52610 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:33:51,328-Speed 5538.82 samples/sec Loss 8.5524 LearningRate 0.1851 Epoch: 5 Global Step: 52620 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:33:58,731-Speed 5534.13 samples/sec Loss 8.5730 LearningRate 0.1851 Epoch: 5 Global Step: 52630 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:34:06,240-Speed 5455.26 samples/sec Loss 8.5697 LearningRate 0.1851 Epoch: 5 Global Step: 52640 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:34:13,606-Speed 5561.59 samples/sec Loss 8.6948 LearningRate 0.1851 Epoch: 5 Global Step: 52650 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:34:21,019-Speed 5525.95 samples/sec Loss 8.5826 LearningRate 0.1850 Epoch: 5 Global Step: 52660 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:34:28,762-Speed 5291.08 samples/sec Loss 8.6813 LearningRate 0.1850 Epoch: 5 Global Step: 52670 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:34:36,507-Speed 5289.27 samples/sec Loss 8.7350 LearningRate 0.1850 Epoch: 5 Global Step: 52680 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:34:44,095-Speed 5398.63 samples/sec Loss 8.6211 LearningRate 0.1850 Epoch: 5 Global Step: 52690 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:34:51,551-Speed 5494.30 samples/sec Loss 8.7005 LearningRate 0.1849 Epoch: 5 Global Step: 52700 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:34:58,944-Speed 5540.99 samples/sec Loss 8.5595 LearningRate 0.1849 Epoch: 5 Global Step: 52710 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:35:06,420-Speed 5479.86 samples/sec Loss 8.6333 LearningRate 0.1849 Epoch: 5 Global Step: 52720 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:35:14,000-Speed 5404.08 samples/sec Loss 8.6557 LearningRate 0.1849 Epoch: 5 Global Step: 52730 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:35:21,422-Speed 5519.62 samples/sec Loss 8.6576 LearningRate 0.1848 Epoch: 5 Global Step: 52740 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:35:28,942-Speed 5447.62 samples/sec Loss 8.6263 LearningRate 0.1848 Epoch: 5 Global Step: 52750 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:35:36,350-Speed 5529.88 samples/sec Loss 8.6463 LearningRate 0.1848 Epoch: 5 Global Step: 52760 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:35:43,915-Speed 5415.13 samples/sec Loss 8.6394 LearningRate 0.1848 Epoch: 5 Global Step: 52770 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:35:51,433-Speed 5449.25 samples/sec Loss 8.5686 LearningRate 0.1847 Epoch: 5 Global Step: 52780 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:35:58,866-Speed 5511.35 samples/sec Loss 8.5881 LearningRate 0.1847 Epoch: 5 Global Step: 52790 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:36:06,382-Speed 5450.17 samples/sec Loss 8.6274 LearningRate 0.1847 Epoch: 5 Global Step: 52800 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:36:13,902-Speed 5448.17 samples/sec Loss 8.6130 LearningRate 0.1847 Epoch: 5 Global Step: 52810 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:36:21,313-Speed 5526.70 samples/sec Loss 8.5488 LearningRate 0.1846 Epoch: 5 Global Step: 52820 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:36:28,758-Speed 5502.99 samples/sec Loss 8.5855 LearningRate 0.1846 Epoch: 5 Global Step: 52830 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:36:36,269-Speed 5454.46 samples/sec Loss 8.6360 LearningRate 0.1846 Epoch: 5 Global Step: 52840 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:36:43,859-Speed 5397.65 samples/sec Loss 8.6313 LearningRate 0.1846 Epoch: 5 Global Step: 52850 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:36:51,490-Speed 5367.87 samples/sec Loss 8.6579 LearningRate 0.1845 Epoch: 5 Global Step: 52860 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:36:59,147-Speed 5350.44 samples/sec Loss 8.6751 LearningRate 0.1845 Epoch: 5 Global Step: 52870 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:37:06,677-Speed 5440.47 samples/sec Loss 8.6157 LearningRate 0.1845 Epoch: 5 Global Step: 52880 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:37:14,197-Speed 5447.46 samples/sec Loss 8.5926 LearningRate 0.1845 Epoch: 5 Global Step: 52890 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:37:21,706-Speed 5455.61 samples/sec Loss 8.6267 LearningRate 0.1845 Epoch: 5 Global Step: 52900 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:37:29,125-Speed 5521.33 samples/sec Loss 8.5817 LearningRate 0.1844 Epoch: 5 Global Step: 52910 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:37:36,618-Speed 5467.88 samples/sec Loss 8.6621 LearningRate 0.1844 Epoch: 5 Global Step: 52920 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:37:44,180-Speed 5417.15 samples/sec Loss 8.5894 LearningRate 0.1844 Epoch: 5 Global Step: 52930 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:37:51,670-Speed 5469.27 samples/sec Loss 8.6025 LearningRate 0.1844 Epoch: 5 Global Step: 52940 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:37:59,137-Speed 5485.94 samples/sec Loss 8.5895 LearningRate 0.1843 Epoch: 5 Global Step: 52950 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:38:06,695-Speed 5420.41 samples/sec Loss 8.5946 LearningRate 0.1843 Epoch: 5 Global Step: 52960 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:38:14,195-Speed 5462.32 samples/sec Loss 8.5577 LearningRate 0.1843 Epoch: 5 Global Step: 52970 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:38:21,734-Speed 5433.73 samples/sec Loss 8.6094 LearningRate 0.1843 Epoch: 5 Global Step: 52980 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:38:29,370-Speed 5365.07 samples/sec Loss 8.6482 LearningRate 0.1842 Epoch: 5 Global Step: 52990 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:38:36,872-Speed 5460.81 samples/sec Loss 8.6023 LearningRate 0.1842 Epoch: 5 Global Step: 53000 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:38:44,361-Speed 5470.00 samples/sec Loss 8.6099 LearningRate 0.1842 Epoch: 5 Global Step: 53010 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:38:51,875-Speed 5451.56 samples/sec Loss 8.6973 LearningRate 0.1842 Epoch: 5 Global Step: 53020 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:38:59,332-Speed 5493.41 samples/sec Loss 8.6226 LearningRate 0.1841 Epoch: 5 Global Step: 53030 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 06:39:06,887-Speed 5422.25 samples/sec Loss 8.5931 LearningRate 0.1841 Epoch: 5 Global Step: 53040 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:39:14,388-Speed 5461.97 samples/sec Loss 8.5375 LearningRate 0.1841 Epoch: 5 Global Step: 53050 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:39:21,949-Speed 5417.93 samples/sec Loss 8.5943 LearningRate 0.1841 Epoch: 5 Global Step: 53060 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:39:29,554-Speed 5386.50 samples/sec Loss 8.5790 LearningRate 0.1840 Epoch: 5 Global Step: 53070 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:39:37,251-Speed 5322.29 samples/sec Loss 8.5986 LearningRate 0.1840 Epoch: 5 Global Step: 53080 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:39:44,819-Speed 5413.56 samples/sec Loss 8.6220 LearningRate 0.1840 Epoch: 5 Global Step: 53090 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:39:52,261-Speed 5504.40 samples/sec Loss 8.6721 LearningRate 0.1840 Epoch: 5 Global Step: 53100 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:39:59,984-Speed 5304.58 samples/sec Loss 8.5576 LearningRate 0.1840 Epoch: 5 Global Step: 53110 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:40:07,694-Speed 5312.87 samples/sec Loss 8.6460 LearningRate 0.1839 Epoch: 5 Global Step: 53120 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:40:15,198-Speed 5459.91 samples/sec Loss 8.5934 LearningRate 0.1839 Epoch: 5 Global Step: 53130 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:40:22,665-Speed 5485.42 samples/sec Loss 8.6087 LearningRate 0.1839 Epoch: 5 Global Step: 53140 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:40:30,255-Speed 5397.50 samples/sec Loss 8.5413 LearningRate 0.1839 Epoch: 5 Global Step: 53150 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:40:37,740-Speed 5472.93 samples/sec Loss 8.6040 LearningRate 0.1838 Epoch: 5 Global Step: 53160 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:40:45,240-Speed 5462.41 samples/sec Loss 8.5765 LearningRate 0.1838 Epoch: 5 Global Step: 53170 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:40:52,819-Speed 5405.30 samples/sec Loss 8.6205 LearningRate 0.1838 Epoch: 5 Global Step: 53180 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:41:00,364-Speed 5429.49 samples/sec Loss 8.5322 LearningRate 0.1838 Epoch: 5 Global Step: 53190 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:41:07,978-Speed 5380.15 samples/sec Loss 8.6434 LearningRate 0.1837 Epoch: 5 Global Step: 53200 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:41:15,586-Speed 5384.98 samples/sec Loss 8.6248 LearningRate 0.1837 Epoch: 5 Global Step: 53210 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:41:23,088-Speed 5460.78 samples/sec Loss 8.5457 LearningRate 0.1837 Epoch: 5 Global Step: 53220 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:41:30,605-Speed 5449.18 samples/sec Loss 8.5230 LearningRate 0.1837 Epoch: 5 Global Step: 53230 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:41:38,078-Speed 5482.34 samples/sec Loss 8.5390 LearningRate 0.1836 Epoch: 5 Global Step: 53240 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:41:45,490-Speed 5527.23 samples/sec Loss 8.6180 LearningRate 0.1836 Epoch: 5 Global Step: 53250 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:41:52,940-Speed 5498.37 samples/sec Loss 8.5851 LearningRate 0.1836 Epoch: 5 Global Step: 53260 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:42:00,437-Speed 5464.60 samples/sec Loss 8.6393 LearningRate 0.1836 Epoch: 5 Global Step: 53270 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:42:07,952-Speed 5450.30 samples/sec Loss 8.6006 LearningRate 0.1835 Epoch: 5 Global Step: 53280 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:42:15,548-Speed 5393.63 samples/sec Loss 8.6337 LearningRate 0.1835 Epoch: 5 Global Step: 53290 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:42:23,185-Speed 5364.06 samples/sec Loss 8.6356 LearningRate 0.1835 Epoch: 5 Global Step: 53300 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:42:30,714-Speed 5440.99 samples/sec Loss 8.6090 LearningRate 0.1835 Epoch: 5 Global Step: 53310 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:42:38,256-Speed 5431.31 samples/sec Loss 8.5692 LearningRate 0.1835 Epoch: 5 Global Step: 53320 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:42:45,788-Speed 5439.11 samples/sec Loss 8.5795 LearningRate 0.1834 Epoch: 5 Global Step: 53330 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:42:53,357-Speed 5412.57 samples/sec Loss 8.5605 LearningRate 0.1834 Epoch: 5 Global Step: 53340 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:43:00,786-Speed 5513.50 samples/sec Loss 8.5700 LearningRate 0.1834 Epoch: 5 Global Step: 53350 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:43:08,339-Speed 5423.94 samples/sec Loss 8.5855 LearningRate 0.1834 Epoch: 5 Global Step: 53360 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:43:15,871-Speed 5438.36 samples/sec Loss 8.6891 LearningRate 0.1833 Epoch: 5 Global Step: 53370 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:43:23,422-Speed 5425.90 samples/sec Loss 8.5454 LearningRate 0.1833 Epoch: 5 Global Step: 53380 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:43:30,891-Speed 5484.44 samples/sec Loss 8.5616 LearningRate 0.1833 Epoch: 5 Global Step: 53390 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 06:43:38,361-Speed 5484.29 samples/sec Loss 8.5726 LearningRate 0.1833 Epoch: 5 Global Step: 53400 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:43:45,883-Speed 5445.97 samples/sec Loss 8.5860 LearningRate 0.1832 Epoch: 5 Global Step: 53410 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:43:53,363-Speed 5476.76 samples/sec Loss 8.5691 LearningRate 0.1832 Epoch: 5 Global Step: 53420 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:44:00,750-Speed 5545.53 samples/sec Loss 8.5822 LearningRate 0.1832 Epoch: 5 Global Step: 53430 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:44:08,251-Speed 5460.94 samples/sec Loss 8.5398 LearningRate 0.1832 Epoch: 5 Global Step: 53440 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:44:15,686-Speed 5510.29 samples/sec Loss 8.5547 LearningRate 0.1831 Epoch: 5 Global Step: 53450 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:44:23,145-Speed 5492.57 samples/sec Loss 8.6284 LearningRate 0.1831 Epoch: 5 Global Step: 53460 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:44:30,635-Speed 5468.92 samples/sec Loss 8.5280 LearningRate 0.1831 Epoch: 5 Global Step: 53470 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:44:38,312-Speed 5336.14 samples/sec Loss 8.6121 LearningRate 0.1831 Epoch: 5 Global Step: 53480 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:44:45,772-Speed 5491.13 samples/sec Loss 8.5498 LearningRate 0.1830 Epoch: 5 Global Step: 53490 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:44:53,300-Speed 5441.83 samples/sec Loss 8.5918 LearningRate 0.1830 Epoch: 5 Global Step: 53500 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:45:00,720-Speed 5520.90 samples/sec Loss 8.5827 LearningRate 0.1830 Epoch: 5 Global Step: 53510 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:45:08,192-Speed 5482.58 samples/sec Loss 8.6374 LearningRate 0.1830 Epoch: 5 Global Step: 53520 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:45:15,690-Speed 5463.83 samples/sec Loss 8.5665 LearningRate 0.1830 Epoch: 5 Global Step: 53530 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:45:23,176-Speed 5472.26 samples/sec Loss 8.5832 LearningRate 0.1829 Epoch: 5 Global Step: 53540 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:45:30,656-Speed 5476.77 samples/sec Loss 8.5388 LearningRate 0.1829 Epoch: 5 Global Step: 53550 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:45:38,070-Speed 5525.50 samples/sec Loss 8.5050 LearningRate 0.1829 Epoch: 5 Global Step: 53560 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:45:45,538-Speed 5485.18 samples/sec Loss 8.5779 LearningRate 0.1829 Epoch: 5 Global Step: 53570 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:45:52,955-Speed 5523.86 samples/sec Loss 8.5267 LearningRate 0.1828 Epoch: 5 Global Step: 53580 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:46:00,461-Speed 5457.36 samples/sec Loss 8.5703 LearningRate 0.1828 Epoch: 5 Global Step: 53590 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:46:07,847-Speed 5546.06 samples/sec Loss 8.5110 LearningRate 0.1828 Epoch: 5 Global Step: 53600 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:46:15,493-Speed 5357.83 samples/sec Loss 8.5987 LearningRate 0.1828 Epoch: 5 Global Step: 53610 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:46:22,894-Speed 5536.04 samples/sec Loss 8.6403 LearningRate 0.1827 Epoch: 5 Global Step: 53620 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:46:30,383-Speed 5470.25 samples/sec Loss 8.5151 LearningRate 0.1827 Epoch: 5 Global Step: 53630 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:46:37,843-Speed 5490.97 samples/sec Loss 8.5180 LearningRate 0.1827 Epoch: 5 Global Step: 53640 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:46:45,335-Speed 5467.78 samples/sec Loss 8.5956 LearningRate 0.1827 Epoch: 5 Global Step: 53650 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:46:52,854-Speed 5448.66 samples/sec Loss 8.5470 LearningRate 0.1826 Epoch: 5 Global Step: 53660 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:47:00,395-Speed 5433.10 samples/sec Loss 8.5105 LearningRate 0.1826 Epoch: 5 Global Step: 53670 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:47:07,919-Speed 5444.03 samples/sec Loss 8.5426 LearningRate 0.1826 Epoch: 5 Global Step: 53680 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:47:15,382-Speed 5489.31 samples/sec Loss 8.5941 LearningRate 0.1826 Epoch: 5 Global Step: 53690 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:47:22,968-Speed 5400.61 samples/sec Loss 8.5560 LearningRate 0.1825 Epoch: 5 Global Step: 53700 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:47:30,529-Speed 5418.15 samples/sec Loss 8.5658 LearningRate 0.1825 Epoch: 5 Global Step: 53710 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:47:38,006-Speed 5478.58 samples/sec Loss 8.5796 LearningRate 0.1825 Epoch: 5 Global Step: 53720 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:47:45,558-Speed 5424.10 samples/sec Loss 8.5325 LearningRate 0.1825 Epoch: 5 Global Step: 53730 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:47:52,993-Speed 5510.20 samples/sec Loss 8.5903 LearningRate 0.1825 Epoch: 5 Global Step: 53740 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:48:00,452-Speed 5492.54 samples/sec Loss 8.5981 LearningRate 0.1824 Epoch: 5 Global Step: 53750 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:48:08,005-Speed 5423.38 samples/sec Loss 8.5512 LearningRate 0.1824 Epoch: 5 Global Step: 53760 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:48:15,531-Speed 5443.53 samples/sec Loss 8.5343 LearningRate 0.1824 Epoch: 5 Global Step: 53770 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:48:22,978-Speed 5500.24 samples/sec Loss 8.5303 LearningRate 0.1824 Epoch: 5 Global Step: 53780 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:48:30,452-Speed 5481.90 samples/sec Loss 8.5496 LearningRate 0.1823 Epoch: 5 Global Step: 53790 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:48:37,861-Speed 5528.93 samples/sec Loss 8.5137 LearningRate 0.1823 Epoch: 5 Global Step: 53800 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:48:45,469-Speed 5384.25 samples/sec Loss 8.5177 LearningRate 0.1823 Epoch: 5 Global Step: 53810 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:48:52,998-Speed 5441.08 samples/sec Loss 8.6238 LearningRate 0.1823 Epoch: 5 Global Step: 53820 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:49:00,520-Speed 5446.38 samples/sec Loss 8.5692 LearningRate 0.1822 Epoch: 5 Global Step: 53830 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:49:08,099-Speed 5405.60 samples/sec Loss 8.4993 LearningRate 0.1822 Epoch: 5 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:49:15,683-Speed 5401.54 samples/sec Loss 8.5277 LearningRate 0.1822 Epoch: 5 Global Step: 53850 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:49:23,131-Speed 5499.93 samples/sec Loss 8.5801 LearningRate 0.1822 Epoch: 5 Global Step: 53860 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:49:30,587-Speed 5494.54 samples/sec Loss 8.5424 LearningRate 0.1821 Epoch: 5 Global Step: 53870 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:49:38,107-Speed 5447.84 samples/sec Loss 8.5350 LearningRate 0.1821 Epoch: 5 Global Step: 53880 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:49:45,528-Speed 5519.64 samples/sec Loss 8.5575 LearningRate 0.1821 Epoch: 5 Global Step: 53890 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:49:53,030-Speed 5460.56 samples/sec Loss 8.5146 LearningRate 0.1821 Epoch: 5 Global Step: 53900 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:50:00,525-Speed 5466.46 samples/sec Loss 8.5087 LearningRate 0.1820 Epoch: 5 Global Step: 53910 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:50:08,070-Speed 5429.20 samples/sec Loss 8.5831 LearningRate 0.1820 Epoch: 5 Global Step: 53920 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:50:15,687-Speed 5378.48 samples/sec Loss 8.5909 LearningRate 0.1820 Epoch: 5 Global Step: 53930 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:50:23,318-Speed 5367.61 samples/sec Loss 8.5239 LearningRate 0.1820 Epoch: 5 Global Step: 53940 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:50:30,853-Speed 5437.46 samples/sec Loss 8.4530 LearningRate 0.1820 Epoch: 5 Global Step: 53950 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:50:38,310-Speed 5493.84 samples/sec Loss 8.6534 LearningRate 0.1819 Epoch: 5 Global Step: 53960 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:50:45,777-Speed 5485.55 samples/sec Loss 8.5073 LearningRate 0.1819 Epoch: 5 Global Step: 53970 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:50:53,242-Speed 5487.69 samples/sec Loss 8.5767 LearningRate 0.1819 Epoch: 5 Global Step: 53980 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:51:00,682-Speed 5506.99 samples/sec Loss 8.5711 LearningRate 0.1819 Epoch: 5 Global Step: 53990 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:51:08,166-Speed 5473.59 samples/sec Loss 8.5232 LearningRate 0.1818 Epoch: 5 Global Step: 54000 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:52:03,476-[lfw][54000]XNorm: 21.966289 Training: 2022-01-08 06:52:03,477-[lfw][54000]Accuracy-Flip: 0.99700+-0.00277 Training: 2022-01-08 06:52:03,478-[lfw][54000]Accuracy-Highest: 0.99817 Training: 2022-01-08 06:53:03,484-[cfp_fp][54000]XNorm: 19.873822 Training: 2022-01-08 06:53:03,485-[cfp_fp][54000]Accuracy-Flip: 0.98371+-0.00617 Training: 2022-01-08 06:53:03,486-[cfp_fp][54000]Accuracy-Highest: 0.98600 Training: 2022-01-08 06:53:48,872-[agedb_30][54000]XNorm: 21.688754 Training: 2022-01-08 06:53:48,874-[agedb_30][54000]Accuracy-Flip: 0.97333+-0.00837 Training: 2022-01-08 06:53:48,874-[agedb_30][54000]Accuracy-Highest: 0.97333 Training: 2022-01-08 06:53:56,130-Speed 243.86 samples/sec Loss 8.4834 LearningRate 0.1818 Epoch: 5 Global Step: 54010 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:54:03,449-Speed 5598.61 samples/sec Loss 8.5494 LearningRate 0.1818 Epoch: 5 Global Step: 54020 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:54:10,803-Speed 5570.09 samples/sec Loss 8.5273 LearningRate 0.1818 Epoch: 5 Global Step: 54030 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:54:18,355-Speed 5425.73 samples/sec Loss 8.4617 LearningRate 0.1817 Epoch: 5 Global Step: 54040 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:54:25,905-Speed 5426.22 samples/sec Loss 8.5223 LearningRate 0.1817 Epoch: 5 Global Step: 54050 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:54:33,471-Speed 5415.13 samples/sec Loss 8.5562 LearningRate 0.1817 Epoch: 5 Global Step: 54060 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:54:40,994-Speed 5445.64 samples/sec Loss 8.5713 LearningRate 0.1817 Epoch: 5 Global Step: 54070 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:54:48,551-Speed 5420.75 samples/sec Loss 8.6158 LearningRate 0.1816 Epoch: 5 Global Step: 54080 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:54:56,087-Speed 5436.74 samples/sec Loss 8.6020 LearningRate 0.1816 Epoch: 5 Global Step: 54090 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:55:03,664-Speed 5406.38 samples/sec Loss 8.5671 LearningRate 0.1816 Epoch: 5 Global Step: 54100 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 06:55:11,151-Speed 5471.60 samples/sec Loss 8.5608 LearningRate 0.1816 Epoch: 5 Global Step: 54110 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:55:18,680-Speed 5441.27 samples/sec Loss 8.5260 LearningRate 0.1816 Epoch: 5 Global Step: 54120 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:55:26,276-Speed 5399.97 samples/sec Loss 8.5416 LearningRate 0.1815 Epoch: 5 Global Step: 54130 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:55:33,900-Speed 5373.58 samples/sec Loss 8.5485 LearningRate 0.1815 Epoch: 5 Global Step: 54140 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:55:41,462-Speed 5417.23 samples/sec Loss 8.5016 LearningRate 0.1815 Epoch: 5 Global Step: 54150 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:55:48,990-Speed 5441.28 samples/sec Loss 8.5914 LearningRate 0.1815 Epoch: 5 Global Step: 54160 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 06:55:56,464-Speed 5481.66 samples/sec Loss 8.5923 LearningRate 0.1814 Epoch: 5 Global Step: 54170 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:56:04,001-Speed 5435.83 samples/sec Loss 8.5912 LearningRate 0.1814 Epoch: 5 Global Step: 54180 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:56:11,573-Speed 5410.15 samples/sec Loss 8.4886 LearningRate 0.1814 Epoch: 5 Global Step: 54190 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:56:19,118-Speed 5428.90 samples/sec Loss 8.5611 LearningRate 0.1814 Epoch: 5 Global Step: 54200 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:56:26,671-Speed 5424.29 samples/sec Loss 8.5279 LearningRate 0.1813 Epoch: 5 Global Step: 54210 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:56:34,233-Speed 5416.57 samples/sec Loss 8.5104 LearningRate 0.1813 Epoch: 5 Global Step: 54220 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:56:41,777-Speed 5430.94 samples/sec Loss 8.5319 LearningRate 0.1813 Epoch: 5 Global Step: 54230 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:56:49,374-Speed 5391.65 samples/sec Loss 8.5919 LearningRate 0.1813 Epoch: 5 Global Step: 54240 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:56:56,947-Speed 5409.44 samples/sec Loss 8.5072 LearningRate 0.1812 Epoch: 5 Global Step: 54250 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:57:04,575-Speed 5370.39 samples/sec Loss 8.5782 LearningRate 0.1812 Epoch: 5 Global Step: 54260 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-01-08 06:57:12,192-Speed 5378.60 samples/sec Loss 8.5449 LearningRate 0.1812 Epoch: 5 Global Step: 54270 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-01-08 06:57:19,779-Speed 5399.02 samples/sec Loss 8.5002 LearningRate 0.1812 Epoch: 5 Global Step: 54280 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-01-08 06:57:27,334-Speed 5422.35 samples/sec Loss 8.5511 LearningRate 0.1811 Epoch: 5 Global Step: 54290 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-01-08 06:57:34,930-Speed 5393.59 samples/sec Loss 8.5037 LearningRate 0.1811 Epoch: 5 Global Step: 54300 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-01-08 06:57:42,415-Speed 5472.84 samples/sec Loss 8.4574 LearningRate 0.1811 Epoch: 5 Global Step: 54310 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-01-08 06:57:50,022-Speed 5385.24 samples/sec Loss 8.5253 LearningRate 0.1811 Epoch: 5 Global Step: 54320 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-01-08 06:57:57,548-Speed 5442.68 samples/sec Loss 8.5405 LearningRate 0.1811 Epoch: 5 Global Step: 54330 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-01-08 06:58:05,038-Speed 5470.65 samples/sec Loss 8.5741 LearningRate 0.1810 Epoch: 5 Global Step: 54340 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-01-08 06:58:12,576-Speed 5434.60 samples/sec Loss 8.5764 LearningRate 0.1810 Epoch: 5 Global Step: 54350 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-01-08 06:58:20,108-Speed 5439.23 samples/sec Loss 8.5936 LearningRate 0.1810 Epoch: 5 Global Step: 54360 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:58:27,635-Speed 5442.36 samples/sec Loss 8.5240 LearningRate 0.1810 Epoch: 5 Global Step: 54370 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:58:35,168-Speed 5438.09 samples/sec Loss 8.5420 LearningRate 0.1809 Epoch: 5 Global Step: 54380 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:58:42,774-Speed 5385.79 samples/sec Loss 8.5282 LearningRate 0.1809 Epoch: 5 Global Step: 54390 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:58:50,339-Speed 5415.24 samples/sec Loss 8.5470 LearningRate 0.1809 Epoch: 5 Global Step: 54400 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:58:58,028-Speed 5328.34 samples/sec Loss 8.5533 LearningRate 0.1809 Epoch: 5 Global Step: 54410 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:59:05,511-Speed 5473.95 samples/sec Loss 8.4759 LearningRate 0.1808 Epoch: 5 Global Step: 54420 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:59:13,051-Speed 5433.10 samples/sec Loss 8.5345 LearningRate 0.1808 Epoch: 5 Global Step: 54430 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:59:20,616-Speed 5414.97 samples/sec Loss 8.4941 LearningRate 0.1808 Epoch: 5 Global Step: 54440 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:59:28,229-Speed 5381.40 samples/sec Loss 8.5458 LearningRate 0.1808 Epoch: 5 Global Step: 54450 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:59:35,746-Speed 5449.55 samples/sec Loss 8.4521 LearningRate 0.1807 Epoch: 5 Global Step: 54460 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:59:43,450-Speed 5317.26 samples/sec Loss 8.5476 LearningRate 0.1807 Epoch: 5 Global Step: 54470 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:59:50,986-Speed 5436.37 samples/sec Loss 8.5428 LearningRate 0.1807 Epoch: 5 Global Step: 54480 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 06:59:58,608-Speed 5374.71 samples/sec Loss 8.4817 LearningRate 0.1807 Epoch: 5 Global Step: 54490 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:00:06,256-Speed 5355.88 samples/sec Loss 8.4747 LearningRate 0.1807 Epoch: 5 Global Step: 54500 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:00:13,794-Speed 5434.26 samples/sec Loss 8.4386 LearningRate 0.1806 Epoch: 5 Global Step: 54510 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:00:21,363-Speed 5412.74 samples/sec Loss 8.5638 LearningRate 0.1806 Epoch: 5 Global Step: 54520 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:00:29,001-Speed 5363.45 samples/sec Loss 8.5368 LearningRate 0.1806 Epoch: 5 Global Step: 54530 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:00:36,548-Speed 5427.85 samples/sec Loss 8.5353 LearningRate 0.1806 Epoch: 5 Global Step: 54540 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:00:44,091-Speed 5430.63 samples/sec Loss 8.5153 LearningRate 0.1805 Epoch: 5 Global Step: 54550 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:00:51,701-Speed 5383.02 samples/sec Loss 8.5433 LearningRate 0.1805 Epoch: 5 Global Step: 54560 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:00:59,217-Speed 5450.73 samples/sec Loss 8.4791 LearningRate 0.1805 Epoch: 5 Global Step: 54570 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:01:06,924-Speed 5315.17 samples/sec Loss 8.5352 LearningRate 0.1805 Epoch: 5 Global Step: 54580 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:01:14,484-Speed 5418.89 samples/sec Loss 8.5400 LearningRate 0.1804 Epoch: 5 Global Step: 54590 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:01:22,121-Speed 5363.81 samples/sec Loss 8.5178 LearningRate 0.1804 Epoch: 5 Global Step: 54600 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:01:29,612-Speed 5469.21 samples/sec Loss 8.5157 LearningRate 0.1804 Epoch: 5 Global Step: 54610 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:01:37,104-Speed 5467.72 samples/sec Loss 8.5521 LearningRate 0.1804 Epoch: 5 Global Step: 54620 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:01:44,589-Speed 5472.77 samples/sec Loss 8.5231 LearningRate 0.1803 Epoch: 5 Global Step: 54630 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:01:52,124-Speed 5436.69 samples/sec Loss 8.4390 LearningRate 0.1803 Epoch: 5 Global Step: 54640 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:01:59,655-Speed 5440.39 samples/sec Loss 8.4696 LearningRate 0.1803 Epoch: 5 Global Step: 54650 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:02:07,163-Speed 5456.48 samples/sec Loss 8.4871 LearningRate 0.1803 Epoch: 5 Global Step: 54660 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:02:14,670-Speed 5456.18 samples/sec Loss 8.4967 LearningRate 0.1802 Epoch: 5 Global Step: 54670 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:02:22,234-Speed 5416.30 samples/sec Loss 8.4615 LearningRate 0.1802 Epoch: 5 Global Step: 54680 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:02:29,702-Speed 5485.17 samples/sec Loss 8.4914 LearningRate 0.1802 Epoch: 5 Global Step: 54690 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:02:37,224-Speed 5446.42 samples/sec Loss 8.5172 LearningRate 0.1802 Epoch: 5 Global Step: 54700 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:02:44,756-Speed 5438.82 samples/sec Loss 8.5099 LearningRate 0.1802 Epoch: 5 Global Step: 54710 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:02:52,346-Speed 5397.05 samples/sec Loss 8.4628 LearningRate 0.1801 Epoch: 5 Global Step: 54720 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:02:59,913-Speed 5414.41 samples/sec Loss 8.4599 LearningRate 0.1801 Epoch: 5 Global Step: 54730 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:03:07,502-Speed 5397.36 samples/sec Loss 8.4812 LearningRate 0.1801 Epoch: 5 Global Step: 54740 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:03:14,964-Speed 5489.97 samples/sec Loss 8.4792 LearningRate 0.1801 Epoch: 5 Global Step: 54750 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:03:22,573-Speed 5383.96 samples/sec Loss 8.5009 LearningRate 0.1800 Epoch: 5 Global Step: 54760 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:03:30,088-Speed 5451.60 samples/sec Loss 8.4044 LearningRate 0.1800 Epoch: 5 Global Step: 54770 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:03:37,657-Speed 5411.97 samples/sec Loss 8.5053 LearningRate 0.1800 Epoch: 5 Global Step: 54780 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:03:45,279-Speed 5374.31 samples/sec Loss 8.4553 LearningRate 0.1800 Epoch: 5 Global Step: 54790 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:03:52,906-Speed 5371.53 samples/sec Loss 8.5574 LearningRate 0.1799 Epoch: 5 Global Step: 54800 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:04:00,357-Speed 5497.66 samples/sec Loss 8.5424 LearningRate 0.1799 Epoch: 5 Global Step: 54810 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:04:07,923-Speed 5414.98 samples/sec Loss 8.6175 LearningRate 0.1799 Epoch: 5 Global Step: 54820 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:04:15,513-Speed 5397.11 samples/sec Loss 8.5569 LearningRate 0.1799 Epoch: 5 Global Step: 54830 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:04:23,028-Speed 5451.15 samples/sec Loss 8.5564 LearningRate 0.1798 Epoch: 5 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:04:30,626-Speed 5392.06 samples/sec Loss 8.4857 LearningRate 0.1798 Epoch: 5 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:04:38,125-Speed 5463.17 samples/sec Loss 8.5024 LearningRate 0.1798 Epoch: 5 Global Step: 54860 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:04:45,726-Speed 5388.88 samples/sec Loss 8.5366 LearningRate 0.1798 Epoch: 5 Global Step: 54870 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:04:53,280-Speed 5423.13 samples/sec Loss 8.4756 LearningRate 0.1798 Epoch: 5 Global Step: 54880 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:05:00,782-Speed 5460.11 samples/sec Loss 8.5535 LearningRate 0.1797 Epoch: 5 Global Step: 54890 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:05:08,358-Speed 5407.51 samples/sec Loss 8.5348 LearningRate 0.1797 Epoch: 5 Global Step: 54900 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:05:15,807-Speed 5499.46 samples/sec Loss 8.4780 LearningRate 0.1797 Epoch: 5 Global Step: 54910 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:05:23,310-Speed 5459.72 samples/sec Loss 8.4863 LearningRate 0.1797 Epoch: 5 Global Step: 54920 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:05:30,933-Speed 5374.04 samples/sec Loss 8.4754 LearningRate 0.1796 Epoch: 5 Global Step: 54930 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:05:38,490-Speed 5421.14 samples/sec Loss 8.4250 LearningRate 0.1796 Epoch: 5 Global Step: 54940 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:05:46,047-Speed 5420.96 samples/sec Loss 8.4917 LearningRate 0.1796 Epoch: 5 Global Step: 54950 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:05:53,571-Speed 5444.40 samples/sec Loss 8.4770 LearningRate 0.1796 Epoch: 5 Global Step: 54960 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:06:01,207-Speed 5364.06 samples/sec Loss 8.4853 LearningRate 0.1795 Epoch: 5 Global Step: 54970 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:06:08,805-Speed 5392.09 samples/sec Loss 8.4851 LearningRate 0.1795 Epoch: 5 Global Step: 54980 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:06:16,303-Speed 5463.57 samples/sec Loss 8.4940 LearningRate 0.1795 Epoch: 5 Global Step: 54990 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:06:23,851-Speed 5427.29 samples/sec Loss 8.4590 LearningRate 0.1795 Epoch: 5 Global Step: 55000 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:06:31,487-Speed 5364.84 samples/sec Loss 8.4797 LearningRate 0.1794 Epoch: 5 Global Step: 55010 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:06:39,020-Speed 5438.10 samples/sec Loss 8.5409 LearningRate 0.1794 Epoch: 5 Global Step: 55020 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:06:46,686-Speed 5344.46 samples/sec Loss 8.4669 LearningRate 0.1794 Epoch: 5 Global Step: 55030 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:06:54,188-Speed 5460.09 samples/sec Loss 8.5151 LearningRate 0.1794 Epoch: 5 Global Step: 55040 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:07:01,902-Speed 5310.74 samples/sec Loss 8.4695 LearningRate 0.1794 Epoch: 5 Global Step: 55050 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:07:09,486-Speed 5401.48 samples/sec Loss 8.4245 LearningRate 0.1793 Epoch: 5 Global Step: 55060 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:07:17,017-Speed 5439.18 samples/sec Loss 8.3990 LearningRate 0.1793 Epoch: 5 Global Step: 55070 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:07:24,574-Speed 5421.04 samples/sec Loss 8.4494 LearningRate 0.1793 Epoch: 5 Global Step: 55080 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:07:32,147-Speed 5409.52 samples/sec Loss 8.4403 LearningRate 0.1793 Epoch: 5 Global Step: 55090 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:07:39,688-Speed 5432.09 samples/sec Loss 8.5467 LearningRate 0.1792 Epoch: 5 Global Step: 55100 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:07:47,211-Speed 5445.47 samples/sec Loss 8.4630 LearningRate 0.1792 Epoch: 5 Global Step: 55110 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:07:54,832-Speed 5375.44 samples/sec Loss 8.5110 LearningRate 0.1792 Epoch: 5 Global Step: 55120 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:08:02,397-Speed 5415.00 samples/sec Loss 8.4808 LearningRate 0.1792 Epoch: 5 Global Step: 55130 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:08:09,901-Speed 5459.13 samples/sec Loss 8.4363 LearningRate 0.1791 Epoch: 5 Global Step: 55140 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:08:17,394-Speed 5467.40 samples/sec Loss 8.5237 LearningRate 0.1791 Epoch: 5 Global Step: 55150 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:08:24,911-Speed 5449.94 samples/sec Loss 8.4632 LearningRate 0.1791 Epoch: 5 Global Step: 55160 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:08:32,512-Speed 5388.86 samples/sec Loss 8.4696 LearningRate 0.1791 Epoch: 5 Global Step: 55170 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:08:40,103-Speed 5396.96 samples/sec Loss 8.5199 LearningRate 0.1790 Epoch: 5 Global Step: 55180 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:08:47,667-Speed 5416.11 samples/sec Loss 8.4486 LearningRate 0.1790 Epoch: 5 Global Step: 55190 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:08:55,213-Speed 5428.55 samples/sec Loss 8.4426 LearningRate 0.1790 Epoch: 5 Global Step: 55200 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:09:02,815-Speed 5388.20 samples/sec Loss 8.4684 LearningRate 0.1790 Epoch: 5 Global Step: 55210 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:09:10,355-Speed 5433.11 samples/sec Loss 8.5186 LearningRate 0.1790 Epoch: 5 Global Step: 55220 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:09:17,872-Speed 5450.08 samples/sec Loss 8.4777 LearningRate 0.1789 Epoch: 5 Global Step: 55230 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:09:25,366-Speed 5466.23 samples/sec Loss 8.3978 LearningRate 0.1789 Epoch: 5 Global Step: 55240 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:09:32,871-Speed 5457.93 samples/sec Loss 8.4512 LearningRate 0.1789 Epoch: 5 Global Step: 55250 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:09:40,384-Speed 5452.98 samples/sec Loss 8.4761 LearningRate 0.1789 Epoch: 5 Global Step: 55260 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:09:47,954-Speed 5411.40 samples/sec Loss 8.4867 LearningRate 0.1788 Epoch: 5 Global Step: 55270 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:09:55,527-Speed 5409.61 samples/sec Loss 8.4246 LearningRate 0.1788 Epoch: 5 Global Step: 55280 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:10:03,042-Speed 5451.15 samples/sec Loss 8.3840 LearningRate 0.1788 Epoch: 5 Global Step: 55290 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:10:10,628-Speed 5400.17 samples/sec Loss 8.3914 LearningRate 0.1788 Epoch: 5 Global Step: 55300 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:10:18,169-Speed 5432.96 samples/sec Loss 8.4632 LearningRate 0.1787 Epoch: 5 Global Step: 55310 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:10:25,739-Speed 5411.26 samples/sec Loss 8.4954 LearningRate 0.1787 Epoch: 5 Global Step: 55320 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:10:33,229-Speed 5469.50 samples/sec Loss 8.5016 LearningRate 0.1787 Epoch: 5 Global Step: 55330 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:10:40,713-Speed 5473.10 samples/sec Loss 8.4022 LearningRate 0.1787 Epoch: 5 Global Step: 55340 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:10:48,195-Speed 5475.58 samples/sec Loss 8.4264 LearningRate 0.1786 Epoch: 5 Global Step: 55350 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:10:55,784-Speed 5398.21 samples/sec Loss 8.4212 LearningRate 0.1786 Epoch: 5 Global Step: 55360 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:11:06,840-Speed 3705.05 samples/sec Loss 8.4967 LearningRate 0.1786 Epoch: 5 Global Step: 55370 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:11:14,598-Speed 5280.59 samples/sec Loss 8.3991 LearningRate 0.1786 Epoch: 5 Global Step: 55380 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:11:22,201-Speed 5388.26 samples/sec Loss 8.5254 LearningRate 0.1786 Epoch: 5 Global Step: 55390 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:11:29,820-Speed 5376.50 samples/sec Loss 8.4312 LearningRate 0.1785 Epoch: 5 Global Step: 55400 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:11:37,355-Speed 5436.96 samples/sec Loss 8.4587 LearningRate 0.1785 Epoch: 5 Global Step: 55410 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:11:44,863-Speed 5456.61 samples/sec Loss 8.4826 LearningRate 0.1785 Epoch: 5 Global Step: 55420 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:11:52,374-Speed 5453.69 samples/sec Loss 8.4559 LearningRate 0.1785 Epoch: 5 Global Step: 55430 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:11:59,878-Speed 5459.24 samples/sec Loss 8.4834 LearningRate 0.1784 Epoch: 5 Global Step: 55440 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:12:07,381-Speed 5459.80 samples/sec Loss 8.5056 LearningRate 0.1784 Epoch: 5 Global Step: 55450 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:12:14,902-Speed 5446.98 samples/sec Loss 8.4469 LearningRate 0.1784 Epoch: 5 Global Step: 55460 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:12:22,484-Speed 5403.01 samples/sec Loss 8.4530 LearningRate 0.1784 Epoch: 5 Global Step: 55470 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:12:30,017-Speed 5438.07 samples/sec Loss 8.4563 LearningRate 0.1783 Epoch: 5 Global Step: 55480 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:12:37,628-Speed 5382.95 samples/sec Loss 8.5332 LearningRate 0.1783 Epoch: 5 Global Step: 55490 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:12:45,193-Speed 5415.00 samples/sec Loss 8.4801 LearningRate 0.1783 Epoch: 5 Global Step: 55500 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:12:52,824-Speed 5368.26 samples/sec Loss 8.4002 LearningRate 0.1783 Epoch: 5 Global Step: 55510 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:13:00,354-Speed 5440.29 samples/sec Loss 8.4493 LearningRate 0.1782 Epoch: 5 Global Step: 55520 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:13:08,028-Speed 5338.55 samples/sec Loss 8.4523 LearningRate 0.1782 Epoch: 5 Global Step: 55530 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:13:15,637-Speed 5383.85 samples/sec Loss 8.4862 LearningRate 0.1782 Epoch: 5 Global Step: 55540 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:13:23,183-Speed 5428.93 samples/sec Loss 8.4244 LearningRate 0.1782 Epoch: 5 Global Step: 55550 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:13:30,860-Speed 5335.66 samples/sec Loss 8.4983 LearningRate 0.1782 Epoch: 5 Global Step: 55560 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:13:38,400-Speed 5433.37 samples/sec Loss 8.4995 LearningRate 0.1781 Epoch: 5 Global Step: 55570 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:13:45,878-Speed 5477.99 samples/sec Loss 8.4047 LearningRate 0.1781 Epoch: 5 Global Step: 55580 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:13:53,501-Speed 5374.49 samples/sec Loss 8.4227 LearningRate 0.1781 Epoch: 5 Global Step: 55590 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:14:01,119-Speed 5377.34 samples/sec Loss 8.4392 LearningRate 0.1781 Epoch: 5 Global Step: 55600 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:14:08,773-Speed 5352.36 samples/sec Loss 8.3942 LearningRate 0.1780 Epoch: 5 Global Step: 55610 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:14:16,338-Speed 5414.93 samples/sec Loss 8.4272 LearningRate 0.1780 Epoch: 5 Global Step: 55620 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:14:23,924-Speed 5400.27 samples/sec Loss 8.4048 LearningRate 0.1780 Epoch: 5 Global Step: 55630 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:14:31,476-Speed 5424.72 samples/sec Loss 8.4462 LearningRate 0.1780 Epoch: 5 Global Step: 55640 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:14:38,993-Speed 5449.47 samples/sec Loss 8.4629 LearningRate 0.1779 Epoch: 5 Global Step: 55650 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:14:46,577-Speed 5401.53 samples/sec Loss 8.4590 LearningRate 0.1779 Epoch: 5 Global Step: 55660 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:14:54,165-Speed 5398.97 samples/sec Loss 8.3907 LearningRate 0.1779 Epoch: 5 Global Step: 55670 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:15:01,675-Speed 5455.46 samples/sec Loss 8.4472 LearningRate 0.1779 Epoch: 5 Global Step: 55680 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:15:09,222-Speed 5427.97 samples/sec Loss 8.3655 LearningRate 0.1779 Epoch: 5 Global Step: 55690 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:15:16,791-Speed 5411.93 samples/sec Loss 8.4008 LearningRate 0.1778 Epoch: 5 Global Step: 55700 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:15:24,429-Speed 5363.18 samples/sec Loss 8.4625 LearningRate 0.1778 Epoch: 5 Global Step: 55710 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:15:32,086-Speed 5350.78 samples/sec Loss 8.4220 LearningRate 0.1778 Epoch: 5 Global Step: 55720 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:15:39,636-Speed 5425.24 samples/sec Loss 8.4383 LearningRate 0.1778 Epoch: 5 Global Step: 55730 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:15:47,307-Speed 5340.41 samples/sec Loss 8.3717 LearningRate 0.1777 Epoch: 5 Global Step: 55740 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:15:54,911-Speed 5387.59 samples/sec Loss 8.4353 LearningRate 0.1777 Epoch: 5 Global Step: 55750 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:16:02,466-Speed 5422.74 samples/sec Loss 8.4624 LearningRate 0.1777 Epoch: 5 Global Step: 55760 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:16:10,115-Speed 5355.38 samples/sec Loss 8.4404 LearningRate 0.1777 Epoch: 5 Global Step: 55770 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:16:17,622-Speed 5456.38 samples/sec Loss 8.4363 LearningRate 0.1776 Epoch: 5 Global Step: 55780 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:16:25,160-Speed 5435.18 samples/sec Loss 8.3706 LearningRate 0.1776 Epoch: 5 Global Step: 55790 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:16:32,712-Speed 5424.37 samples/sec Loss 8.4331 LearningRate 0.1776 Epoch: 5 Global Step: 55800 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:16:40,285-Speed 5409.62 samples/sec Loss 8.4446 LearningRate 0.1776 Epoch: 5 Global Step: 55810 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:16:47,775-Speed 5468.80 samples/sec Loss 8.3930 LearningRate 0.1775 Epoch: 5 Global Step: 55820 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:16:55,248-Speed 5482.26 samples/sec Loss 8.4244 LearningRate 0.1775 Epoch: 5 Global Step: 55830 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:17:02,813-Speed 5415.25 samples/sec Loss 8.4356 LearningRate 0.1775 Epoch: 5 Global Step: 55840 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:17:10,418-Speed 5386.75 samples/sec Loss 8.4013 LearningRate 0.1775 Epoch: 5 Global Step: 55850 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:17:18,034-Speed 5378.57 samples/sec Loss 8.4355 LearningRate 0.1775 Epoch: 5 Global Step: 55860 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:17:25,552-Speed 5449.33 samples/sec Loss 8.4405 LearningRate 0.1774 Epoch: 5 Global Step: 55870 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:17:33,179-Speed 5371.03 samples/sec Loss 8.4555 LearningRate 0.1774 Epoch: 5 Global Step: 55880 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:17:40,785-Speed 5385.92 samples/sec Loss 8.4493 LearningRate 0.1774 Epoch: 5 Global Step: 55890 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:17:48,315-Speed 5440.79 samples/sec Loss 8.3878 LearningRate 0.1774 Epoch: 5 Global Step: 55900 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:17:55,884-Speed 5411.94 samples/sec Loss 8.4700 LearningRate 0.1773 Epoch: 5 Global Step: 55910 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:18:03,507-Speed 5374.11 samples/sec Loss 8.4745 LearningRate 0.1773 Epoch: 5 Global Step: 55920 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:18:11,080-Speed 5408.97 samples/sec Loss 8.4060 LearningRate 0.1773 Epoch: 5 Global Step: 55930 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:18:18,631-Speed 5425.67 samples/sec Loss 8.4027 LearningRate 0.1773 Epoch: 5 Global Step: 55940 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:18:26,158-Speed 5441.99 samples/sec Loss 8.4103 LearningRate 0.1772 Epoch: 5 Global Step: 55950 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:18:33,694-Speed 5436.21 samples/sec Loss 8.4139 LearningRate 0.1772 Epoch: 5 Global Step: 55960 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:18:41,221-Speed 5442.35 samples/sec Loss 8.4415 LearningRate 0.1772 Epoch: 5 Global Step: 55970 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:18:48,797-Speed 5407.09 samples/sec Loss 8.4058 LearningRate 0.1772 Epoch: 5 Global Step: 55980 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:18:56,302-Speed 5458.31 samples/sec Loss 8.4285 LearningRate 0.1771 Epoch: 5 Global Step: 55990 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:19:03,893-Speed 5396.99 samples/sec Loss 8.4121 LearningRate 0.1771 Epoch: 5 Global Step: 56000 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:19:48,144-[lfw][56000]XNorm: 22.057266 Training: 2022-01-08 07:19:48,145-[lfw][56000]Accuracy-Flip: 0.99767+-0.00238 Training: 2022-01-08 07:19:48,146-[lfw][56000]Accuracy-Highest: 0.99817 Training: 2022-01-08 07:20:40,414-[cfp_fp][56000]XNorm: 19.616236 Training: 2022-01-08 07:20:40,416-[cfp_fp][56000]Accuracy-Flip: 0.98471+-0.00461 Training: 2022-01-08 07:20:40,417-[cfp_fp][56000]Accuracy-Highest: 0.98600 Training: 2022-01-08 07:21:26,438-[agedb_30][56000]XNorm: 21.866216 Training: 2022-01-08 07:21:26,440-[agedb_30][56000]Accuracy-Flip: 0.97517+-0.00693 Training: 2022-01-08 07:21:26,441-[agedb_30][56000]Accuracy-Highest: 0.97517 Training: 2022-01-08 07:21:34,145-Speed 272.61 samples/sec Loss 8.4133 LearningRate 0.1771 Epoch: 5 Global Step: 56010 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:21:41,801-Speed 5351.79 samples/sec Loss 8.3815 LearningRate 0.1771 Epoch: 5 Global Step: 56020 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:21:49,460-Speed 5348.61 samples/sec Loss 8.4853 LearningRate 0.1771 Epoch: 5 Global Step: 56030 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:21:56,940-Speed 5478.02 samples/sec Loss 8.4431 LearningRate 0.1770 Epoch: 5 Global Step: 56040 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:22:04,510-Speed 5411.57 samples/sec Loss 8.3664 LearningRate 0.1770 Epoch: 5 Global Step: 56050 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:22:12,064-Speed 5422.72 samples/sec Loss 8.4304 LearningRate 0.1770 Epoch: 5 Global Step: 56060 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:22:19,576-Speed 5453.95 samples/sec Loss 8.4394 LearningRate 0.1770 Epoch: 5 Global Step: 56070 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:22:27,107-Speed 5439.37 samples/sec Loss 8.3604 LearningRate 0.1769 Epoch: 5 Global Step: 56080 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:22:34,663-Speed 5421.74 samples/sec Loss 8.4127 LearningRate 0.1769 Epoch: 5 Global Step: 56090 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:22:42,272-Speed 5383.65 samples/sec Loss 8.3654 LearningRate 0.1769 Epoch: 5 Global Step: 56100 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:22:49,880-Speed 5384.32 samples/sec Loss 8.4416 LearningRate 0.1769 Epoch: 5 Global Step: 56110 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-01-08 07:22:57,497-Speed 5378.29 samples/sec Loss 8.3666 LearningRate 0.1768 Epoch: 5 Global Step: 56120 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:23:05,042-Speed 5429.20 samples/sec Loss 8.3414 LearningRate 0.1768 Epoch: 5 Global Step: 56130 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:23:12,548-Speed 5458.22 samples/sec Loss 8.2854 LearningRate 0.1768 Epoch: 5 Global Step: 56140 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:23:20,051-Speed 5460.21 samples/sec Loss 8.4335 LearningRate 0.1768 Epoch: 5 Global Step: 56150 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:23:27,646-Speed 5393.51 samples/sec Loss 8.3739 LearningRate 0.1767 Epoch: 5 Global Step: 56160 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:23:35,224-Speed 5405.82 samples/sec Loss 8.4770 LearningRate 0.1767 Epoch: 5 Global Step: 56170 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:23:42,717-Speed 5467.50 samples/sec Loss 8.4421 LearningRate 0.1767 Epoch: 5 Global Step: 56180 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:23:50,226-Speed 5455.23 samples/sec Loss 8.3745 LearningRate 0.1767 Epoch: 5 Global Step: 56190 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:23:57,819-Speed 5395.17 samples/sec Loss 8.4348 LearningRate 0.1767 Epoch: 5 Global Step: 56200 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:24:05,428-Speed 5383.79 samples/sec Loss 8.3788 LearningRate 0.1766 Epoch: 5 Global Step: 56210 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:24:12,958-Speed 5440.65 samples/sec Loss 8.4244 LearningRate 0.1766 Epoch: 5 Global Step: 56220 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:24:20,574-Speed 5379.11 samples/sec Loss 8.4196 LearningRate 0.1766 Epoch: 5 Global Step: 56230 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:24:28,097-Speed 5445.12 samples/sec Loss 8.4522 LearningRate 0.1766 Epoch: 5 Global Step: 56240 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:24:35,572-Speed 5480.51 samples/sec Loss 8.3965 LearningRate 0.1765 Epoch: 5 Global Step: 56250 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:24:43,141-Speed 5411.90 samples/sec Loss 8.4062 LearningRate 0.1765 Epoch: 5 Global Step: 56260 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:24:50,734-Speed 5395.40 samples/sec Loss 8.3560 LearningRate 0.1765 Epoch: 5 Global Step: 56270 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:24:58,286-Speed 5424.55 samples/sec Loss 8.3638 LearningRate 0.1765 Epoch: 5 Global Step: 56280 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:25:05,806-Speed 5446.80 samples/sec Loss 8.4818 LearningRate 0.1764 Epoch: 5 Global Step: 56290 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:25:13,398-Speed 5396.60 samples/sec Loss 8.4607 LearningRate 0.1764 Epoch: 5 Global Step: 56300 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:25:20,980-Speed 5402.82 samples/sec Loss 8.4134 LearningRate 0.1764 Epoch: 5 Global Step: 56310 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:25:28,487-Speed 5456.73 samples/sec Loss 8.3897 LearningRate 0.1764 Epoch: 5 Global Step: 56320 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:25:36,132-Speed 5358.20 samples/sec Loss 8.3996 LearningRate 0.1764 Epoch: 5 Global Step: 56330 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:25:43,704-Speed 5410.41 samples/sec Loss 8.3704 LearningRate 0.1763 Epoch: 5 Global Step: 56340 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:25:51,197-Speed 5467.76 samples/sec Loss 8.3852 LearningRate 0.1763 Epoch: 5 Global Step: 56350 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:25:58,856-Speed 5348.15 samples/sec Loss 8.3537 LearningRate 0.1763 Epoch: 5 Global Step: 56360 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:26:06,383-Speed 5442.10 samples/sec Loss 8.4181 LearningRate 0.1763 Epoch: 5 Global Step: 56370 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:26:13,942-Speed 5420.19 samples/sec Loss 8.4339 LearningRate 0.1762 Epoch: 5 Global Step: 56380 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:26:21,547-Speed 5386.38 samples/sec Loss 8.3698 LearningRate 0.1762 Epoch: 5 Global Step: 56390 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:26:29,057-Speed 5454.85 samples/sec Loss 8.4388 LearningRate 0.1762 Epoch: 5 Global Step: 56400 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:26:36,581-Speed 5444.39 samples/sec Loss 8.4350 LearningRate 0.1762 Epoch: 5 Global Step: 56410 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:26:44,128-Speed 5428.40 samples/sec Loss 8.4388 LearningRate 0.1761 Epoch: 5 Global Step: 56420 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:26:51,666-Speed 5434.23 samples/sec Loss 8.4169 LearningRate 0.1761 Epoch: 5 Global Step: 56430 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:26:59,193-Speed 5442.12 samples/sec Loss 8.3931 LearningRate 0.1761 Epoch: 5 Global Step: 56440 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:27:06,752-Speed 5419.10 samples/sec Loss 8.3714 LearningRate 0.1761 Epoch: 5 Global Step: 56450 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:27:14,334-Speed 5403.87 samples/sec Loss 8.4167 LearningRate 0.1760 Epoch: 5 Global Step: 56460 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:27:21,881-Speed 5427.64 samples/sec Loss 8.4307 LearningRate 0.1760 Epoch: 5 Global Step: 56470 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:27:29,443-Speed 5417.03 samples/sec Loss 8.3477 LearningRate 0.1760 Epoch: 5 Global Step: 56480 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:27:37,059-Speed 5379.17 samples/sec Loss 8.4620 LearningRate 0.1760 Epoch: 5 Global Step: 56490 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:27:44,584-Speed 5443.88 samples/sec Loss 8.4238 LearningRate 0.1760 Epoch: 5 Global Step: 56500 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:27:52,324-Speed 5292.96 samples/sec Loss 8.3748 LearningRate 0.1759 Epoch: 5 Global Step: 56510 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:27:59,918-Speed 5394.82 samples/sec Loss 8.3824 LearningRate 0.1759 Epoch: 5 Global Step: 56520 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:28:07,531-Speed 5380.29 samples/sec Loss 8.3850 LearningRate 0.1759 Epoch: 5 Global Step: 56530 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:28:15,085-Speed 5423.11 samples/sec Loss 8.3767 LearningRate 0.1759 Epoch: 5 Global Step: 56540 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:28:22,716-Speed 5368.62 samples/sec Loss 8.4522 LearningRate 0.1758 Epoch: 5 Global Step: 56550 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:28:30,256-Speed 5433.04 samples/sec Loss 8.3817 LearningRate 0.1758 Epoch: 5 Global Step: 56560 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:28:37,840-Speed 5401.34 samples/sec Loss 8.3539 LearningRate 0.1758 Epoch: 5 Global Step: 56570 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:28:45,463-Speed 5374.24 samples/sec Loss 8.3801 LearningRate 0.1758 Epoch: 5 Global Step: 56580 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:28:53,074-Speed 5382.62 samples/sec Loss 8.3962 LearningRate 0.1757 Epoch: 5 Global Step: 56590 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:29:00,654-Speed 5404.68 samples/sec Loss 8.2978 LearningRate 0.1757 Epoch: 5 Global Step: 56600 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:29:08,196-Speed 5430.91 samples/sec Loss 8.3832 LearningRate 0.1757 Epoch: 5 Global Step: 56610 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:29:15,833-Speed 5364.19 samples/sec Loss 8.4122 LearningRate 0.1757 Epoch: 5 Global Step: 56620 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:29:23,368-Speed 5437.14 samples/sec Loss 8.3575 LearningRate 0.1757 Epoch: 5 Global Step: 56630 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:29:31,038-Speed 5340.87 samples/sec Loss 8.3447 LearningRate 0.1756 Epoch: 5 Global Step: 56640 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:29:38,589-Speed 5425.17 samples/sec Loss 8.3684 LearningRate 0.1756 Epoch: 5 Global Step: 56650 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:29:46,108-Speed 5448.41 samples/sec Loss 8.4025 LearningRate 0.1756 Epoch: 5 Global Step: 56660 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:29:53,667-Speed 5419.22 samples/sec Loss 8.3216 LearningRate 0.1756 Epoch: 5 Global Step: 56670 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:30:01,294-Speed 5370.74 samples/sec Loss 8.3824 LearningRate 0.1755 Epoch: 5 Global Step: 56680 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:30:08,944-Speed 5355.19 samples/sec Loss 8.4011 LearningRate 0.1755 Epoch: 5 Global Step: 56690 Fp16 Grad Scale: 262144 Required: 35 hours Training: 2022-01-08 07:30:16,583-Speed 5362.56 samples/sec Loss 8.4246 LearningRate 0.1755 Epoch: 5 Global Step: 56700 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:30:24,175-Speed 5395.84 samples/sec Loss 8.4076 LearningRate 0.1755 Epoch: 5 Global Step: 56710 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:30:31,708-Speed 5438.39 samples/sec Loss 8.4733 LearningRate 0.1754 Epoch: 5 Global Step: 56720 Fp16 Grad Scale: 131072 Required: 35 hours Training: 2022-01-08 07:30:39,219-Speed 5453.50 samples/sec Loss 8.3747 LearningRate 0.1754 Epoch: 5 Global Step: 56730 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:30:46,850-Speed 5368.63 samples/sec Loss 8.4037 LearningRate 0.1754 Epoch: 5 Global Step: 56740 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:30:54,481-Speed 5368.38 samples/sec Loss 8.3530 LearningRate 0.1754 Epoch: 5 Global Step: 56750 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:31:02,104-Speed 5373.74 samples/sec Loss 8.4207 LearningRate 0.1753 Epoch: 5 Global Step: 56760 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:31:09,706-Speed 5388.75 samples/sec Loss 8.3388 LearningRate 0.1753 Epoch: 5 Global Step: 56770 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:31:17,286-Speed 5404.18 samples/sec Loss 8.4017 LearningRate 0.1753 Epoch: 5 Global Step: 56780 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:31:24,836-Speed 5426.13 samples/sec Loss 8.3871 LearningRate 0.1753 Epoch: 5 Global Step: 56790 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:31:32,495-Speed 5348.54 samples/sec Loss 8.4116 LearningRate 0.1753 Epoch: 5 Global Step: 56800 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:31:40,080-Speed 5400.92 samples/sec Loss 8.4149 LearningRate 0.1752 Epoch: 5 Global Step: 56810 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:31:47,670-Speed 5397.42 samples/sec Loss 8.3631 LearningRate 0.1752 Epoch: 5 Global Step: 56820 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:31:55,213-Speed 5430.70 samples/sec Loss 8.3507 LearningRate 0.1752 Epoch: 5 Global Step: 56830 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:32:02,749-Speed 5436.46 samples/sec Loss 8.3611 LearningRate 0.1752 Epoch: 5 Global Step: 56840 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:32:10,312-Speed 5416.30 samples/sec Loss 8.3156 LearningRate 0.1751 Epoch: 5 Global Step: 56850 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:32:17,967-Speed 5351.68 samples/sec Loss 8.3661 LearningRate 0.1751 Epoch: 5 Global Step: 56860 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:32:25,626-Speed 5348.49 samples/sec Loss 8.3847 LearningRate 0.1751 Epoch: 5 Global Step: 56870 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:32:33,172-Speed 5428.75 samples/sec Loss 8.4121 LearningRate 0.1751 Epoch: 5 Global Step: 56880 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:32:40,755-Speed 5402.55 samples/sec Loss 8.3608 LearningRate 0.1750 Epoch: 5 Global Step: 56890 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:32:48,409-Speed 5352.30 samples/sec Loss 8.3454 LearningRate 0.1750 Epoch: 5 Global Step: 56900 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:32:56,096-Speed 5329.12 samples/sec Loss 8.3255 LearningRate 0.1750 Epoch: 5 Global Step: 56910 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:33:03,707-Speed 5382.13 samples/sec Loss 8.4231 LearningRate 0.1750 Epoch: 5 Global Step: 56920 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:33:11,357-Speed 5355.12 samples/sec Loss 8.3072 LearningRate 0.1750 Epoch: 5 Global Step: 56930 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:33:18,957-Speed 5389.91 samples/sec Loss 8.3779 LearningRate 0.1749 Epoch: 5 Global Step: 56940 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:33:26,598-Speed 5361.37 samples/sec Loss 8.3940 LearningRate 0.1749 Epoch: 5 Global Step: 56950 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:33:34,258-Speed 5348.26 samples/sec Loss 8.4495 LearningRate 0.1749 Epoch: 5 Global Step: 56960 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:33:41,931-Speed 5338.85 samples/sec Loss 8.3916 LearningRate 0.1749 Epoch: 5 Global Step: 56970 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:33:49,603-Speed 5339.37 samples/sec Loss 8.4104 LearningRate 0.1748 Epoch: 5 Global Step: 56980 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:33:57,139-Speed 5435.80 samples/sec Loss 8.3514 LearningRate 0.1748 Epoch: 5 Global Step: 56990 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:34:04,770-Speed 5368.32 samples/sec Loss 8.3244 LearningRate 0.1748 Epoch: 5 Global Step: 57000 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:34:12,379-Speed 5383.97 samples/sec Loss 8.3708 LearningRate 0.1748 Epoch: 5 Global Step: 57010 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:34:20,008-Speed 5369.73 samples/sec Loss 8.3691 LearningRate 0.1747 Epoch: 5 Global Step: 57020 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:34:27,639-Speed 5368.44 samples/sec Loss 8.3882 LearningRate 0.1747 Epoch: 5 Global Step: 57030 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:34:35,300-Speed 5347.27 samples/sec Loss 8.2623 LearningRate 0.1747 Epoch: 5 Global Step: 57040 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:34:42,795-Speed 5465.61 samples/sec Loss 8.3488 LearningRate 0.1747 Epoch: 5 Global Step: 57050 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:34:50,415-Speed 5375.95 samples/sec Loss 8.3175 LearningRate 0.1747 Epoch: 5 Global Step: 57060 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:34:57,947-Speed 5438.61 samples/sec Loss 8.3363 LearningRate 0.1746 Epoch: 5 Global Step: 57070 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:35:05,614-Speed 5343.05 samples/sec Loss 8.3624 LearningRate 0.1746 Epoch: 5 Global Step: 57080 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:35:13,105-Speed 5468.36 samples/sec Loss 8.3631 LearningRate 0.1746 Epoch: 5 Global Step: 57090 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:35:20,721-Speed 5379.38 samples/sec Loss 8.3530 LearningRate 0.1746 Epoch: 5 Global Step: 57100 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:35:28,162-Speed 5505.64 samples/sec Loss 8.3495 LearningRate 0.1745 Epoch: 5 Global Step: 57110 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:35:35,692-Speed 5439.70 samples/sec Loss 8.4158 LearningRate 0.1745 Epoch: 5 Global Step: 57120 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:35:43,201-Speed 5455.90 samples/sec Loss 8.3915 LearningRate 0.1745 Epoch: 5 Global Step: 57130 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:35:50,700-Speed 5462.51 samples/sec Loss 8.3478 LearningRate 0.1745 Epoch: 5 Global Step: 57140 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:35:58,185-Speed 5472.87 samples/sec Loss 8.4023 LearningRate 0.1744 Epoch: 5 Global Step: 57150 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:36:05,763-Speed 5406.02 samples/sec Loss 8.3754 LearningRate 0.1744 Epoch: 5 Global Step: 57160 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:36:13,332-Speed 5412.60 samples/sec Loss 8.3646 LearningRate 0.1744 Epoch: 5 Global Step: 57170 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:36:20,903-Speed 5410.55 samples/sec Loss 8.3510 LearningRate 0.1744 Epoch: 5 Global Step: 57180 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:36:28,473-Speed 5412.00 samples/sec Loss 8.3558 LearningRate 0.1744 Epoch: 5 Global Step: 57190 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:36:36,048-Speed 5407.69 samples/sec Loss 8.3656 LearningRate 0.1743 Epoch: 5 Global Step: 57200 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:36:43,708-Speed 5347.70 samples/sec Loss 8.3318 LearningRate 0.1743 Epoch: 5 Global Step: 57210 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:36:51,360-Speed 5353.67 samples/sec Loss 8.3200 LearningRate 0.1743 Epoch: 5 Global Step: 57220 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:36:59,062-Speed 5319.71 samples/sec Loss 8.3295 LearningRate 0.1743 Epoch: 5 Global Step: 57230 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:37:06,607-Speed 5429.10 samples/sec Loss 8.3856 LearningRate 0.1742 Epoch: 5 Global Step: 57240 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:37:14,163-Speed 5421.04 samples/sec Loss 8.2866 LearningRate 0.1742 Epoch: 5 Global Step: 57250 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:37:21,735-Speed 5410.57 samples/sec Loss 8.3791 LearningRate 0.1742 Epoch: 5 Global Step: 57260 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:37:29,271-Speed 5435.91 samples/sec Loss 8.3378 LearningRate 0.1742 Epoch: 5 Global Step: 57270 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:37:36,867-Speed 5392.80 samples/sec Loss 8.4209 LearningRate 0.1741 Epoch: 5 Global Step: 57280 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:37:44,374-Speed 5456.75 samples/sec Loss 8.3179 LearningRate 0.1741 Epoch: 5 Global Step: 57290 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:37:51,917-Speed 5431.45 samples/sec Loss 8.3490 LearningRate 0.1741 Epoch: 5 Global Step: 57300 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:37:59,428-Speed 5453.83 samples/sec Loss 8.2967 LearningRate 0.1741 Epoch: 5 Global Step: 57310 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:38:07,052-Speed 5373.29 samples/sec Loss 8.3473 LearningRate 0.1740 Epoch: 5 Global Step: 57320 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:38:14,712-Speed 5348.05 samples/sec Loss 8.3967 LearningRate 0.1740 Epoch: 5 Global Step: 57330 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:38:22,653-Speed 5158.78 samples/sec Loss 8.2561 LearningRate 0.1740 Epoch: 5 Global Step: 57340 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:38:30,181-Speed 5442.06 samples/sec Loss 8.3486 LearningRate 0.1740 Epoch: 5 Global Step: 57350 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:38:37,809-Speed 5370.02 samples/sec Loss 8.2804 LearningRate 0.1740 Epoch: 5 Global Step: 57360 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:38:45,344-Speed 5436.43 samples/sec Loss 8.3428 LearningRate 0.1739 Epoch: 5 Global Step: 57370 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:38:52,800-Speed 5494.46 samples/sec Loss 8.4760 LearningRate 0.1739 Epoch: 5 Global Step: 57380 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:39:00,456-Speed 5350.42 samples/sec Loss 8.3370 LearningRate 0.1739 Epoch: 5 Global Step: 57390 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:39:08,067-Speed 5382.84 samples/sec Loss 8.2640 LearningRate 0.1739 Epoch: 5 Global Step: 57400 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:39:15,656-Speed 5397.24 samples/sec Loss 8.3024 LearningRate 0.1738 Epoch: 5 Global Step: 57410 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:39:23,214-Speed 5420.14 samples/sec Loss 8.3268 LearningRate 0.1738 Epoch: 5 Global Step: 57420 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:39:30,654-Speed 5506.66 samples/sec Loss 8.3083 LearningRate 0.1738 Epoch: 5 Global Step: 57430 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:39:38,131-Speed 5478.86 samples/sec Loss 8.3517 LearningRate 0.1738 Epoch: 5 Global Step: 57440 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:39:45,648-Speed 5449.74 samples/sec Loss 8.2479 LearningRate 0.1737 Epoch: 5 Global Step: 57450 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:39:53,208-Speed 5418.50 samples/sec Loss 8.3260 LearningRate 0.1737 Epoch: 5 Global Step: 57460 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:40:00,747-Speed 5434.17 samples/sec Loss 8.3558 LearningRate 0.1737 Epoch: 5 Global Step: 57470 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:40:08,439-Speed 5325.60 samples/sec Loss 8.2823 LearningRate 0.1737 Epoch: 5 Global Step: 57480 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:40:16,028-Speed 5398.12 samples/sec Loss 8.4155 LearningRate 0.1737 Epoch: 5 Global Step: 57490 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:40:23,558-Speed 5440.40 samples/sec Loss 8.3227 LearningRate 0.1736 Epoch: 5 Global Step: 57500 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:40:31,118-Speed 5418.35 samples/sec Loss 8.3072 LearningRate 0.1736 Epoch: 5 Global Step: 57510 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:40:38,666-Speed 5427.86 samples/sec Loss 8.3605 LearningRate 0.1736 Epoch: 5 Global Step: 57520 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:40:46,157-Speed 5468.26 samples/sec Loss 8.3695 LearningRate 0.1736 Epoch: 5 Global Step: 57530 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:40:53,711-Speed 5423.41 samples/sec Loss 8.4198 LearningRate 0.1735 Epoch: 5 Global Step: 57540 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:41:01,376-Speed 5344.43 samples/sec Loss 8.3398 LearningRate 0.1735 Epoch: 5 Global Step: 57550 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:41:08,960-Speed 5402.01 samples/sec Loss 8.2510 LearningRate 0.1735 Epoch: 5 Global Step: 57560 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:41:16,458-Speed 5462.67 samples/sec Loss 8.3324 LearningRate 0.1735 Epoch: 5 Global Step: 57570 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:41:23,957-Speed 5463.31 samples/sec Loss 8.3267 LearningRate 0.1734 Epoch: 5 Global Step: 57580 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:41:31,486-Speed 5440.81 samples/sec Loss 8.3711 LearningRate 0.1734 Epoch: 5 Global Step: 57590 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:41:39,013-Speed 5442.88 samples/sec Loss 8.2772 LearningRate 0.1734 Epoch: 5 Global Step: 57600 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:41:46,555-Speed 5431.77 samples/sec Loss 8.3403 LearningRate 0.1734 Epoch: 5 Global Step: 57610 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:41:54,142-Speed 5399.10 samples/sec Loss 8.2820 LearningRate 0.1734 Epoch: 5 Global Step: 57620 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:42:01,687-Speed 5429.55 samples/sec Loss 8.2871 LearningRate 0.1733 Epoch: 5 Global Step: 57630 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:42:09,276-Speed 5398.11 samples/sec Loss 8.3423 LearningRate 0.1733 Epoch: 5 Global Step: 57640 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:42:16,736-Speed 5490.95 samples/sec Loss 8.3088 LearningRate 0.1733 Epoch: 5 Global Step: 57650 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:42:24,196-Speed 5491.70 samples/sec Loss 8.3169 LearningRate 0.1733 Epoch: 5 Global Step: 57660 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:42:31,844-Speed 5356.14 samples/sec Loss 8.3655 LearningRate 0.1732 Epoch: 5 Global Step: 57670 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:42:39,625-Speed 5264.87 samples/sec Loss 8.3182 LearningRate 0.1732 Epoch: 5 Global Step: 57680 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:42:47,213-Speed 5398.88 samples/sec Loss 8.3154 LearningRate 0.1732 Epoch: 5 Global Step: 57690 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:42:54,716-Speed 5459.75 samples/sec Loss 8.3693 LearningRate 0.1732 Epoch: 5 Global Step: 57700 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:43:02,321-Speed 5386.09 samples/sec Loss 8.2315 LearningRate 0.1731 Epoch: 5 Global Step: 57710 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:43:09,874-Speed 5424.39 samples/sec Loss 8.3319 LearningRate 0.1731 Epoch: 5 Global Step: 57720 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:43:17,406-Speed 5439.40 samples/sec Loss 8.3204 LearningRate 0.1731 Epoch: 5 Global Step: 57730 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:43:24,985-Speed 5405.09 samples/sec Loss 8.2522 LearningRate 0.1731 Epoch: 5 Global Step: 57740 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:43:32,567-Speed 5403.04 samples/sec Loss 8.2920 LearningRate 0.1731 Epoch: 5 Global Step: 57750 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:43:40,120-Speed 5423.38 samples/sec Loss 8.3453 LearningRate 0.1730 Epoch: 5 Global Step: 57760 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:43:47,625-Speed 5458.93 samples/sec Loss 8.3396 LearningRate 0.1730 Epoch: 5 Global Step: 57770 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:43:55,150-Speed 5443.53 samples/sec Loss 8.3796 LearningRate 0.1730 Epoch: 5 Global Step: 57780 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:44:02,744-Speed 5393.90 samples/sec Loss 8.3593 LearningRate 0.1730 Epoch: 5 Global Step: 57790 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:44:10,378-Speed 5366.92 samples/sec Loss 8.2761 LearningRate 0.1729 Epoch: 5 Global Step: 57800 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:44:18,069-Speed 5326.42 samples/sec Loss 8.3639 LearningRate 0.1729 Epoch: 5 Global Step: 57810 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:44:25,600-Speed 5439.10 samples/sec Loss 8.4109 LearningRate 0.1729 Epoch: 5 Global Step: 57820 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:44:33,143-Speed 5430.82 samples/sec Loss 8.2767 LearningRate 0.1729 Epoch: 5 Global Step: 57830 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:44:40,693-Speed 5426.64 samples/sec Loss 8.3524 LearningRate 0.1728 Epoch: 5 Global Step: 57840 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:44:48,207-Speed 5451.51 samples/sec Loss 8.3598 LearningRate 0.1728 Epoch: 5 Global Step: 57850 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:44:55,738-Speed 5439.74 samples/sec Loss 8.3293 LearningRate 0.1728 Epoch: 5 Global Step: 57860 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:45:03,279-Speed 5431.88 samples/sec Loss 8.2595 LearningRate 0.1728 Epoch: 5 Global Step: 57870 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:45:10,804-Speed 5444.83 samples/sec Loss 8.2629 LearningRate 0.1728 Epoch: 5 Global Step: 57880 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:45:18,440-Speed 5364.58 samples/sec Loss 8.3176 LearningRate 0.1727 Epoch: 5 Global Step: 57890 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:45:25,954-Speed 5451.68 samples/sec Loss 8.2642 LearningRate 0.1727 Epoch: 5 Global Step: 57900 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:45:33,552-Speed 5391.61 samples/sec Loss 8.3670 LearningRate 0.1727 Epoch: 5 Global Step: 57910 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:45:41,123-Speed 5411.19 samples/sec Loss 8.2551 LearningRate 0.1727 Epoch: 5 Global Step: 57920 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:45:48,623-Speed 5461.99 samples/sec Loss 8.2960 LearningRate 0.1726 Epoch: 5 Global Step: 57930 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:45:56,228-Speed 5386.56 samples/sec Loss 8.2649 LearningRate 0.1726 Epoch: 5 Global Step: 57940 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:46:03,799-Speed 5411.03 samples/sec Loss 8.3328 LearningRate 0.1726 Epoch: 5 Global Step: 57950 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:46:11,336-Speed 5434.60 samples/sec Loss 8.3909 LearningRate 0.1726 Epoch: 5 Global Step: 57960 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:46:18,831-Speed 5465.77 samples/sec Loss 8.3493 LearningRate 0.1725 Epoch: 5 Global Step: 57970 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:46:26,304-Speed 5482.01 samples/sec Loss 8.3266 LearningRate 0.1725 Epoch: 5 Global Step: 57980 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:46:33,797-Speed 5467.16 samples/sec Loss 8.3620 LearningRate 0.1725 Epoch: 5 Global Step: 57990 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:46:41,294-Speed 5464.10 samples/sec Loss 8.3587 LearningRate 0.1725 Epoch: 5 Global Step: 58000 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:47:25,305-[lfw][58000]XNorm: 22.065349 Training: 2022-01-08 07:47:25,305-[lfw][58000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-01-08 07:47:25,306-[lfw][58000]Accuracy-Highest: 0.99817 Training: 2022-01-08 07:48:16,844-[cfp_fp][58000]XNorm: 20.194377 Training: 2022-01-08 07:48:16,845-[cfp_fp][58000]Accuracy-Flip: 0.98271+-0.00659 Training: 2022-01-08 07:48:16,846-[cfp_fp][58000]Accuracy-Highest: 0.98600 Training: 2022-01-08 07:49:02,402-[agedb_30][58000]XNorm: 22.030144 Training: 2022-01-08 07:49:02,403-[agedb_30][58000]Accuracy-Flip: 0.97467+-0.00666 Training: 2022-01-08 07:49:02,404-[agedb_30][58000]Accuracy-Highest: 0.97517 Training: 2022-01-08 07:49:09,964-Speed 275.51 samples/sec Loss 8.2991 LearningRate 0.1725 Epoch: 5 Global Step: 58010 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:49:17,463-Speed 5463.14 samples/sec Loss 8.3650 LearningRate 0.1724 Epoch: 5 Global Step: 58020 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:49:25,004-Speed 5433.11 samples/sec Loss 8.4459 LearningRate 0.1724 Epoch: 5 Global Step: 58030 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:49:32,434-Speed 5513.09 samples/sec Loss 8.2671 LearningRate 0.1724 Epoch: 5 Global Step: 58040 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:49:39,955-Speed 5447.71 samples/sec Loss 8.2814 LearningRate 0.1724 Epoch: 5 Global Step: 58050 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:49:47,516-Speed 5417.43 samples/sec Loss 8.2755 LearningRate 0.1723 Epoch: 5 Global Step: 58060 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:49:55,083-Speed 5413.62 samples/sec Loss 8.2687 LearningRate 0.1723 Epoch: 5 Global Step: 58070 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:50:02,493-Speed 5528.76 samples/sec Loss 8.2356 LearningRate 0.1723 Epoch: 5 Global Step: 58080 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:50:09,953-Speed 5491.35 samples/sec Loss 8.2489 LearningRate 0.1723 Epoch: 5 Global Step: 58090 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:50:17,549-Speed 5392.78 samples/sec Loss 8.2454 LearningRate 0.1722 Epoch: 5 Global Step: 58100 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:50:25,086-Speed 5435.29 samples/sec Loss 8.2946 LearningRate 0.1722 Epoch: 5 Global Step: 58110 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:50:32,498-Speed 5526.78 samples/sec Loss 8.3606 LearningRate 0.1722 Epoch: 5 Global Step: 58120 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:50:40,004-Speed 5458.47 samples/sec Loss 8.3000 LearningRate 0.1722 Epoch: 5 Global Step: 58130 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:50:47,504-Speed 5461.68 samples/sec Loss 8.2519 LearningRate 0.1722 Epoch: 5 Global Step: 58140 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:50:55,012-Speed 5455.75 samples/sec Loss 8.3205 LearningRate 0.1721 Epoch: 5 Global Step: 58150 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:51:02,563-Speed 5425.08 samples/sec Loss 8.2736 LearningRate 0.1721 Epoch: 5 Global Step: 58160 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:51:10,034-Speed 5483.54 samples/sec Loss 8.3094 LearningRate 0.1721 Epoch: 5 Global Step: 58170 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:51:17,626-Speed 5396.28 samples/sec Loss 8.2585 LearningRate 0.1721 Epoch: 5 Global Step: 58180 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:51:25,340-Speed 5310.27 samples/sec Loss 8.3294 LearningRate 0.1720 Epoch: 5 Global Step: 58190 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:51:32,894-Speed 5422.55 samples/sec Loss 8.2244 LearningRate 0.1720 Epoch: 5 Global Step: 58200 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:51:40,446-Speed 5425.04 samples/sec Loss 8.2879 LearningRate 0.1720 Epoch: 5 Global Step: 58210 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:51:48,059-Speed 5380.90 samples/sec Loss 8.2051 LearningRate 0.1720 Epoch: 5 Global Step: 58220 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:51:55,576-Speed 5449.72 samples/sec Loss 8.3429 LearningRate 0.1719 Epoch: 5 Global Step: 58230 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:52:03,190-Speed 5380.59 samples/sec Loss 8.2679 LearningRate 0.1719 Epoch: 5 Global Step: 58240 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:52:10,759-Speed 5412.35 samples/sec Loss 8.2023 LearningRate 0.1719 Epoch: 5 Global Step: 58250 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:52:18,362-Speed 5387.94 samples/sec Loss 8.2957 LearningRate 0.1719 Epoch: 5 Global Step: 58260 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:52:25,956-Speed 5393.76 samples/sec Loss 8.3692 LearningRate 0.1719 Epoch: 5 Global Step: 58270 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:52:33,481-Speed 5444.35 samples/sec Loss 8.2907 LearningRate 0.1718 Epoch: 5 Global Step: 58280 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:52:40,996-Speed 5451.47 samples/sec Loss 8.3255 LearningRate 0.1718 Epoch: 5 Global Step: 58290 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:52:48,507-Speed 5453.86 samples/sec Loss 8.2497 LearningRate 0.1718 Epoch: 5 Global Step: 58300 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:52:56,037-Speed 5439.80 samples/sec Loss 8.1794 LearningRate 0.1718 Epoch: 5 Global Step: 58310 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:53:03,639-Speed 5388.81 samples/sec Loss 8.2873 LearningRate 0.1717 Epoch: 5 Global Step: 58320 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:53:11,168-Speed 5441.69 samples/sec Loss 8.2861 LearningRate 0.1717 Epoch: 5 Global Step: 58330 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:53:18,635-Speed 5485.41 samples/sec Loss 8.2178 LearningRate 0.1717 Epoch: 5 Global Step: 58340 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:53:26,204-Speed 5412.36 samples/sec Loss 8.2771 LearningRate 0.1717 Epoch: 5 Global Step: 58350 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:53:33,791-Speed 5400.11 samples/sec Loss 8.3191 LearningRate 0.1716 Epoch: 5 Global Step: 58360 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:53:41,532-Speed 5292.00 samples/sec Loss 8.1974 LearningRate 0.1716 Epoch: 5 Global Step: 58370 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:53:49,088-Speed 5421.59 samples/sec Loss 8.3433 LearningRate 0.1716 Epoch: 5 Global Step: 58380 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:53:56,551-Speed 5489.42 samples/sec Loss 8.2822 LearningRate 0.1716 Epoch: 5 Global Step: 58390 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:54:04,117-Speed 5414.12 samples/sec Loss 8.2716 LearningRate 0.1716 Epoch: 5 Global Step: 58400 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:54:11,602-Speed 5473.86 samples/sec Loss 8.2969 LearningRate 0.1715 Epoch: 5 Global Step: 58410 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:54:19,157-Speed 5421.97 samples/sec Loss 8.2923 LearningRate 0.1715 Epoch: 5 Global Step: 58420 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:54:26,895-Speed 5294.17 samples/sec Loss 8.2087 LearningRate 0.1715 Epoch: 5 Global Step: 58430 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:54:34,495-Speed 5389.93 samples/sec Loss 8.2877 LearningRate 0.1715 Epoch: 5 Global Step: 58440 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:54:42,057-Speed 5417.43 samples/sec Loss 8.3265 LearningRate 0.1714 Epoch: 5 Global Step: 58450 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:54:49,573-Speed 5449.99 samples/sec Loss 8.2728 LearningRate 0.1714 Epoch: 5 Global Step: 58460 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:54:57,126-Speed 5424.05 samples/sec Loss 8.2792 LearningRate 0.1714 Epoch: 5 Global Step: 58470 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:55:04,653-Speed 5442.80 samples/sec Loss 8.1768 LearningRate 0.1714 Epoch: 5 Global Step: 58480 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:55:12,176-Speed 5445.49 samples/sec Loss 8.2819 LearningRate 0.1713 Epoch: 5 Global Step: 58490 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:55:19,805-Speed 5369.13 samples/sec Loss 8.2939 LearningRate 0.1713 Epoch: 5 Global Step: 58500 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:55:27,384-Speed 5405.65 samples/sec Loss 8.2383 LearningRate 0.1713 Epoch: 5 Global Step: 58510 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:55:34,938-Speed 5423.15 samples/sec Loss 8.2743 LearningRate 0.1713 Epoch: 5 Global Step: 58520 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:55:42,485-Speed 5427.93 samples/sec Loss 8.2331 LearningRate 0.1713 Epoch: 5 Global Step: 58530 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 07:55:49,990-Speed 5458.05 samples/sec Loss 8.3248 LearningRate 0.1712 Epoch: 5 Global Step: 58540 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:55:57,743-Speed 5283.86 samples/sec Loss 8.2779 LearningRate 0.1712 Epoch: 5 Global Step: 58550 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:56:05,426-Speed 5332.68 samples/sec Loss 8.2417 LearningRate 0.1712 Epoch: 5 Global Step: 58560 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:56:12,924-Speed 5463.68 samples/sec Loss 8.2633 LearningRate 0.1712 Epoch: 5 Global Step: 58570 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:56:20,424-Speed 5461.91 samples/sec Loss 8.2158 LearningRate 0.1711 Epoch: 5 Global Step: 58580 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:56:27,932-Speed 5456.04 samples/sec Loss 8.2297 LearningRate 0.1711 Epoch: 5 Global Step: 58590 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:56:35,562-Speed 5369.37 samples/sec Loss 8.2505 LearningRate 0.1711 Epoch: 5 Global Step: 58600 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:56:43,027-Speed 5488.41 samples/sec Loss 8.2573 LearningRate 0.1711 Epoch: 5 Global Step: 58610 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:56:50,483-Speed 5493.45 samples/sec Loss 8.2752 LearningRate 0.1710 Epoch: 5 Global Step: 58620 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:56:58,013-Speed 5440.51 samples/sec Loss 8.3037 LearningRate 0.1710 Epoch: 5 Global Step: 58630 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:57:05,551-Speed 5434.58 samples/sec Loss 8.1876 LearningRate 0.1710 Epoch: 5 Global Step: 58640 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:57:13,044-Speed 5467.03 samples/sec Loss 8.2258 LearningRate 0.1710 Epoch: 5 Global Step: 58650 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:57:20,659-Speed 5379.88 samples/sec Loss 8.2196 LearningRate 0.1710 Epoch: 5 Global Step: 58660 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 07:57:28,195-Speed 5435.85 samples/sec Loss 8.3163 LearningRate 0.1709 Epoch: 5 Global Step: 58670 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 07:57:35,690-Speed 5465.33 samples/sec Loss 8.2357 LearningRate 0.1709 Epoch: 5 Global Step: 58680 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 07:57:43,206-Speed 5451.28 samples/sec Loss 8.2439 LearningRate 0.1709 Epoch: 5 Global Step: 58690 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 07:57:50,783-Speed 5405.96 samples/sec Loss 8.1838 LearningRate 0.1709 Epoch: 5 Global Step: 58700 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 07:57:58,348-Speed 5415.17 samples/sec Loss 8.2829 LearningRate 0.1708 Epoch: 5 Global Step: 58710 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 07:58:05,912-Speed 5416.14 samples/sec Loss 8.2134 LearningRate 0.1708 Epoch: 5 Global Step: 58720 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 07:58:13,458-Speed 5428.94 samples/sec Loss 8.1924 LearningRate 0.1708 Epoch: 5 Global Step: 58730 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 07:58:21,063-Speed 5386.98 samples/sec Loss 8.2525 LearningRate 0.1708 Epoch: 5 Global Step: 58740 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 07:58:28,691-Speed 5370.03 samples/sec Loss 8.3135 LearningRate 0.1707 Epoch: 5 Global Step: 58750 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 07:58:36,182-Speed 5468.75 samples/sec Loss 8.3638 LearningRate 0.1707 Epoch: 5 Global Step: 58760 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:58:43,780-Speed 5392.22 samples/sec Loss 8.2681 LearningRate 0.1707 Epoch: 5 Global Step: 58770 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:58:51,301-Speed 5446.50 samples/sec Loss 8.2884 LearningRate 0.1707 Epoch: 5 Global Step: 58780 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:58:58,943-Speed 5360.78 samples/sec Loss 8.3020 LearningRate 0.1707 Epoch: 5 Global Step: 58790 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:59:06,472-Speed 5440.64 samples/sec Loss 8.2363 LearningRate 0.1706 Epoch: 5 Global Step: 58800 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:59:14,079-Speed 5385.51 samples/sec Loss 8.2285 LearningRate 0.1706 Epoch: 5 Global Step: 58810 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:59:21,598-Speed 5448.41 samples/sec Loss 8.1943 LearningRate 0.1706 Epoch: 5 Global Step: 58820 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:59:29,041-Speed 5503.37 samples/sec Loss 8.2698 LearningRate 0.1706 Epoch: 5 Global Step: 58830 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:59:36,600-Speed 5419.51 samples/sec Loss 8.2688 LearningRate 0.1705 Epoch: 5 Global Step: 58840 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:59:44,053-Speed 5496.49 samples/sec Loss 8.2117 LearningRate 0.1705 Epoch: 5 Global Step: 58850 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 07:59:51,504-Speed 5498.53 samples/sec Loss 8.2267 LearningRate 0.1705 Epoch: 5 Global Step: 58860 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 07:59:58,987-Speed 5474.15 samples/sec Loss 8.1725 LearningRate 0.1705 Epoch: 5 Global Step: 58870 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:00:06,559-Speed 5409.51 samples/sec Loss 8.2160 LearningRate 0.1704 Epoch: 5 Global Step: 58880 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:00:14,095-Speed 5436.01 samples/sec Loss 8.2200 LearningRate 0.1704 Epoch: 5 Global Step: 58890 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:00:21,554-Speed 5492.51 samples/sec Loss 8.1942 LearningRate 0.1704 Epoch: 5 Global Step: 58900 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:00:29,144-Speed 5396.73 samples/sec Loss 8.3163 LearningRate 0.1704 Epoch: 5 Global Step: 58910 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:00:36,729-Speed 5401.19 samples/sec Loss 8.2268 LearningRate 0.1704 Epoch: 5 Global Step: 58920 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:00:44,257-Speed 5441.41 samples/sec Loss 8.2204 LearningRate 0.1703 Epoch: 5 Global Step: 58930 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:00:51,799-Speed 5432.05 samples/sec Loss 8.3089 LearningRate 0.1703 Epoch: 5 Global Step: 58940 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:00:59,238-Speed 5506.38 samples/sec Loss 8.1966 LearningRate 0.1703 Epoch: 5 Global Step: 58950 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:01:06,809-Speed 5410.62 samples/sec Loss 8.2301 LearningRate 0.1703 Epoch: 5 Global Step: 58960 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:01:14,312-Speed 5460.15 samples/sec Loss 8.3121 LearningRate 0.1702 Epoch: 5 Global Step: 58970 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:01:21,776-Speed 5488.61 samples/sec Loss 8.2793 LearningRate 0.1702 Epoch: 5 Global Step: 58980 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:01:29,314-Speed 5434.40 samples/sec Loss 8.1847 LearningRate 0.1702 Epoch: 5 Global Step: 58990 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:01:36,891-Speed 5406.66 samples/sec Loss 8.3039 LearningRate 0.1702 Epoch: 5 Global Step: 59000 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:01:44,453-Speed 5416.86 samples/sec Loss 8.2046 LearningRate 0.1702 Epoch: 5 Global Step: 59010 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:01:52,006-Speed 5424.27 samples/sec Loss 8.2714 LearningRate 0.1701 Epoch: 5 Global Step: 59020 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:01:59,638-Speed 5367.61 samples/sec Loss 8.2299 LearningRate 0.1701 Epoch: 5 Global Step: 59030 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:02:07,216-Speed 5405.23 samples/sec Loss 8.2238 LearningRate 0.1701 Epoch: 5 Global Step: 59040 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:02:14,707-Speed 5469.08 samples/sec Loss 8.2596 LearningRate 0.1701 Epoch: 5 Global Step: 59050 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:02:22,220-Speed 5452.80 samples/sec Loss 8.2234 LearningRate 0.1700 Epoch: 5 Global Step: 59060 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:02:29,704-Speed 5474.04 samples/sec Loss 8.2531 LearningRate 0.1700 Epoch: 5 Global Step: 59070 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:02:37,171-Speed 5485.66 samples/sec Loss 8.2288 LearningRate 0.1700 Epoch: 5 Global Step: 59080 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:02:44,712-Speed 5432.15 samples/sec Loss 8.1853 LearningRate 0.1700 Epoch: 5 Global Step: 59090 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:02:52,349-Speed 5364.40 samples/sec Loss 8.2661 LearningRate 0.1699 Epoch: 5 Global Step: 59100 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:02:59,902-Speed 5423.90 samples/sec Loss 8.2482 LearningRate 0.1699 Epoch: 5 Global Step: 59110 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:03:07,743-Speed 5224.25 samples/sec Loss 8.3109 LearningRate 0.1699 Epoch: 5 Global Step: 59120 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:03:15,250-Speed 5456.85 samples/sec Loss 8.2458 LearningRate 0.1699 Epoch: 5 Global Step: 59130 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:03:22,749-Speed 5463.72 samples/sec Loss 8.1583 LearningRate 0.1699 Epoch: 5 Global Step: 59140 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:03:30,266-Speed 5449.97 samples/sec Loss 8.2297 LearningRate 0.1698 Epoch: 5 Global Step: 59150 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:03:37,898-Speed 5366.71 samples/sec Loss 8.2834 LearningRate 0.1698 Epoch: 5 Global Step: 59160 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:03:45,431-Speed 5438.77 samples/sec Loss 8.2270 LearningRate 0.1698 Epoch: 5 Global Step: 59170 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:03:53,016-Speed 5400.51 samples/sec Loss 8.2079 LearningRate 0.1698 Epoch: 5 Global Step: 59180 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:04:00,508-Speed 5468.51 samples/sec Loss 8.1828 LearningRate 0.1697 Epoch: 5 Global Step: 59190 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:04:07,991-Speed 5474.03 samples/sec Loss 8.2204 LearningRate 0.1697 Epoch: 5 Global Step: 59200 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:04:15,546-Speed 5422.43 samples/sec Loss 8.2180 LearningRate 0.1697 Epoch: 5 Global Step: 59210 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:04:23,205-Speed 5348.66 samples/sec Loss 8.1920 LearningRate 0.1697 Epoch: 5 Global Step: 59220 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:04:30,748-Speed 5431.03 samples/sec Loss 8.2193 LearningRate 0.1696 Epoch: 5 Global Step: 59230 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:04:38,344-Speed 5392.69 samples/sec Loss 8.2045 LearningRate 0.1696 Epoch: 5 Global Step: 59240 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:04:45,890-Speed 5429.16 samples/sec Loss 8.3024 LearningRate 0.1696 Epoch: 5 Global Step: 59250 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:04:53,550-Speed 5347.92 samples/sec Loss 8.1682 LearningRate 0.1696 Epoch: 5 Global Step: 59260 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:05:01,107-Speed 5420.93 samples/sec Loss 8.2228 LearningRate 0.1696 Epoch: 5 Global Step: 59270 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:05:08,618-Speed 5454.19 samples/sec Loss 8.2186 LearningRate 0.1695 Epoch: 5 Global Step: 59280 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:05:16,322-Speed 5317.38 samples/sec Loss 8.2100 LearningRate 0.1695 Epoch: 5 Global Step: 59290 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:05:23,856-Speed 5437.86 samples/sec Loss 8.1318 LearningRate 0.1695 Epoch: 5 Global Step: 59300 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:05:31,442-Speed 5400.21 samples/sec Loss 8.1613 LearningRate 0.1695 Epoch: 5 Global Step: 59310 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:05:39,077-Speed 5364.81 samples/sec Loss 8.2120 LearningRate 0.1694 Epoch: 5 Global Step: 59320 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:05:46,534-Speed 5504.80 samples/sec Loss 8.1714 LearningRate 0.1694 Epoch: 5 Global Step: 59330 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:05:54,167-Speed 5366.85 samples/sec Loss 8.2808 LearningRate 0.1694 Epoch: 5 Global Step: 59340 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:06:01,792-Speed 5372.94 samples/sec Loss 8.2041 LearningRate 0.1694 Epoch: 5 Global Step: 59350 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:06:09,310-Speed 5448.42 samples/sec Loss 8.3106 LearningRate 0.1693 Epoch: 5 Global Step: 59360 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:06:16,870-Speed 5418.72 samples/sec Loss 8.2014 LearningRate 0.1693 Epoch: 5 Global Step: 59370 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:06:24,512-Speed 5360.49 samples/sec Loss 8.2453 LearningRate 0.1693 Epoch: 5 Global Step: 59380 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 08:06:32,024-Speed 5453.50 samples/sec Loss 8.1694 LearningRate 0.1693 Epoch: 5 Global Step: 59390 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 08:06:39,576-Speed 5424.85 samples/sec Loss 8.1207 LearningRate 0.1693 Epoch: 5 Global Step: 59400 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 08:06:47,079-Speed 5459.07 samples/sec Loss 8.1819 LearningRate 0.1692 Epoch: 5 Global Step: 59410 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 08:06:54,586-Speed 5457.07 samples/sec Loss 8.2028 LearningRate 0.1692 Epoch: 5 Global Step: 59420 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 08:07:02,153-Speed 5413.89 samples/sec Loss 8.1677 LearningRate 0.1692 Epoch: 5 Global Step: 59430 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:07:09,631-Speed 5478.20 samples/sec Loss 8.1891 LearningRate 0.1692 Epoch: 5 Global Step: 59440 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:07:17,177-Speed 5428.33 samples/sec Loss 8.1949 LearningRate 0.1691 Epoch: 5 Global Step: 59450 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:07:24,758-Speed 5404.08 samples/sec Loss 8.1758 LearningRate 0.1691 Epoch: 5 Global Step: 59460 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:07:32,323-Speed 5415.35 samples/sec Loss 8.1760 LearningRate 0.1691 Epoch: 5 Global Step: 59470 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:07:39,787-Speed 5488.46 samples/sec Loss 8.1489 LearningRate 0.1691 Epoch: 5 Global Step: 59480 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:07:47,288-Speed 5460.58 samples/sec Loss 8.1996 LearningRate 0.1691 Epoch: 5 Global Step: 59490 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:07:54,813-Speed 5444.08 samples/sec Loss 8.1992 LearningRate 0.1690 Epoch: 5 Global Step: 59500 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:08:02,432-Speed 5377.14 samples/sec Loss 8.2864 LearningRate 0.1690 Epoch: 5 Global Step: 59510 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:08:09,955-Speed 5445.31 samples/sec Loss 8.1891 LearningRate 0.1690 Epoch: 5 Global Step: 59520 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:08:17,583-Speed 5370.34 samples/sec Loss 8.2352 LearningRate 0.1690 Epoch: 5 Global Step: 59530 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 08:08:25,064-Speed 5475.58 samples/sec Loss 8.2031 LearningRate 0.1689 Epoch: 5 Global Step: 59540 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 08:08:32,568-Speed 5459.32 samples/sec Loss 8.2039 LearningRate 0.1689 Epoch: 5 Global Step: 59550 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:08:40,126-Speed 5420.42 samples/sec Loss 8.2453 LearningRate 0.1689 Epoch: 5 Global Step: 59560 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:08:47,673-Speed 5427.55 samples/sec Loss 8.1889 LearningRate 0.1689 Epoch: 5 Global Step: 59570 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:08:55,214-Speed 5432.42 samples/sec Loss 8.1712 LearningRate 0.1688 Epoch: 5 Global Step: 59580 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:09:02,826-Speed 5382.24 samples/sec Loss 8.1591 LearningRate 0.1688 Epoch: 5 Global Step: 59590 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:09:10,325-Speed 5462.64 samples/sec Loss 8.1830 LearningRate 0.1688 Epoch: 5 Global Step: 59600 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:09:17,900-Speed 5408.06 samples/sec Loss 8.2250 LearningRate 0.1688 Epoch: 5 Global Step: 59610 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:09:25,714-Speed 5242.74 samples/sec Loss 8.1654 LearningRate 0.1688 Epoch: 5 Global Step: 59620 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:09:33,391-Speed 5336.51 samples/sec Loss 8.1669 LearningRate 0.1687 Epoch: 5 Global Step: 59630 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:09:40,984-Speed 5395.16 samples/sec Loss 8.2291 LearningRate 0.1687 Epoch: 5 Global Step: 59640 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:09:48,505-Speed 5446.39 samples/sec Loss 8.1241 LearningRate 0.1687 Epoch: 5 Global Step: 59650 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 08:09:56,035-Speed 5440.52 samples/sec Loss 8.2051 LearningRate 0.1687 Epoch: 5 Global Step: 59660 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 08:10:03,556-Speed 5446.69 samples/sec Loss 8.2103 LearningRate 0.1686 Epoch: 5 Global Step: 59670 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 08:10:11,196-Speed 5361.78 samples/sec Loss 8.1846 LearningRate 0.1686 Epoch: 5 Global Step: 59680 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:10:18,908-Speed 5312.14 samples/sec Loss 8.1715 LearningRate 0.1686 Epoch: 5 Global Step: 59690 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:10:26,551-Speed 5359.99 samples/sec Loss 8.1447 LearningRate 0.1686 Epoch: 5 Global Step: 59700 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:10:34,138-Speed 5399.10 samples/sec Loss 8.2399 LearningRate 0.1685 Epoch: 5 Global Step: 59710 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:10:41,676-Speed 5434.94 samples/sec Loss 8.1556 LearningRate 0.1685 Epoch: 5 Global Step: 59720 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:10:49,174-Speed 5463.37 samples/sec Loss 8.1468 LearningRate 0.1685 Epoch: 5 Global Step: 59730 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:10:56,666-Speed 5467.84 samples/sec Loss 8.1640 LearningRate 0.1685 Epoch: 5 Global Step: 59740 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:11:04,223-Speed 5420.67 samples/sec Loss 8.2503 LearningRate 0.1685 Epoch: 5 Global Step: 59750 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:11:11,784-Speed 5417.93 samples/sec Loss 8.2148 LearningRate 0.1684 Epoch: 5 Global Step: 59760 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:11:19,363-Speed 5405.02 samples/sec Loss 8.2024 LearningRate 0.1684 Epoch: 5 Global Step: 59770 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:11:26,904-Speed 5432.40 samples/sec Loss 8.2156 LearningRate 0.1684 Epoch: 5 Global Step: 59780 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:11:34,414-Speed 5454.82 samples/sec Loss 8.0896 LearningRate 0.1684 Epoch: 5 Global Step: 59790 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:11:42,020-Speed 5386.12 samples/sec Loss 8.2134 LearningRate 0.1683 Epoch: 5 Global Step: 59800 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:11:49,636-Speed 5378.64 samples/sec Loss 8.1235 LearningRate 0.1683 Epoch: 5 Global Step: 59810 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:11:57,131-Speed 5465.72 samples/sec Loss 8.1751 LearningRate 0.1683 Epoch: 5 Global Step: 59820 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:12:04,632-Speed 5460.94 samples/sec Loss 8.1848 LearningRate 0.1683 Epoch: 5 Global Step: 59830 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:12:12,143-Speed 5453.88 samples/sec Loss 8.2049 LearningRate 0.1683 Epoch: 5 Global Step: 59840 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:12:19,801-Speed 5350.01 samples/sec Loss 8.2422 LearningRate 0.1682 Epoch: 5 Global Step: 59850 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:12:27,315-Speed 5451.70 samples/sec Loss 8.2261 LearningRate 0.1682 Epoch: 5 Global Step: 59860 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:12:34,913-Speed 5391.47 samples/sec Loss 8.1935 LearningRate 0.1682 Epoch: 5 Global Step: 59870 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:12:42,357-Speed 5502.99 samples/sec Loss 8.1900 LearningRate 0.1682 Epoch: 5 Global Step: 59880 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:12:50,005-Speed 5356.61 samples/sec Loss 8.1480 LearningRate 0.1681 Epoch: 5 Global Step: 59890 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:12:57,748-Speed 5290.38 samples/sec Loss 8.2237 LearningRate 0.1681 Epoch: 5 Global Step: 59900 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:13:05,401-Speed 5353.12 samples/sec Loss 8.2065 LearningRate 0.1681 Epoch: 5 Global Step: 59910 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:13:12,965-Speed 5415.82 samples/sec Loss 8.1957 LearningRate 0.1681 Epoch: 5 Global Step: 59920 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:13:20,571-Speed 5385.81 samples/sec Loss 8.1321 LearningRate 0.1680 Epoch: 5 Global Step: 59930 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:13:28,152-Speed 5403.52 samples/sec Loss 8.1787 LearningRate 0.1680 Epoch: 5 Global Step: 59940 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:13:35,730-Speed 5406.54 samples/sec Loss 8.2058 LearningRate 0.1680 Epoch: 5 Global Step: 59950 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:13:43,195-Speed 5487.43 samples/sec Loss 8.2352 LearningRate 0.1680 Epoch: 5 Global Step: 59960 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:13:50,657-Speed 5490.09 samples/sec Loss 8.1810 LearningRate 0.1680 Epoch: 5 Global Step: 59970 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:13:58,226-Speed 5411.82 samples/sec Loss 8.2205 LearningRate 0.1679 Epoch: 5 Global Step: 59980 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:14:05,929-Speed 5318.52 samples/sec Loss 8.1258 LearningRate 0.1679 Epoch: 5 Global Step: 59990 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:14:13,411-Speed 5475.31 samples/sec Loss 8.1819 LearningRate 0.1679 Epoch: 5 Global Step: 60000 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:14:57,714-[lfw][60000]XNorm: 22.505832 Training: 2022-01-08 08:14:57,715-[lfw][60000]Accuracy-Flip: 0.99783+-0.00289 Training: 2022-01-08 08:14:57,715-[lfw][60000]Accuracy-Highest: 0.99817 Training: 2022-01-08 08:15:50,005-[cfp_fp][60000]XNorm: 20.086574 Training: 2022-01-08 08:15:50,006-[cfp_fp][60000]Accuracy-Flip: 0.98529+-0.00542 Training: 2022-01-08 08:15:50,007-[cfp_fp][60000]Accuracy-Highest: 0.98600 Training: 2022-01-08 08:16:35,870-[agedb_30][60000]XNorm: 22.146841 Training: 2022-01-08 08:16:35,871-[agedb_30][60000]Accuracy-Flip: 0.97283+-0.00610 Training: 2022-01-08 08:16:35,872-[agedb_30][60000]Accuracy-Highest: 0.97517 Training: 2022-01-08 08:16:43,484-Speed 272.94 samples/sec Loss 8.1454 LearningRate 0.1679 Epoch: 5 Global Step: 60010 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:16:50,945-Speed 5491.11 samples/sec Loss 8.1256 LearningRate 0.1678 Epoch: 5 Global Step: 60020 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:16:58,384-Speed 5507.17 samples/sec Loss 8.1467 LearningRate 0.1678 Epoch: 5 Global Step: 60030 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:17:05,896-Speed 5453.93 samples/sec Loss 8.1353 LearningRate 0.1678 Epoch: 5 Global Step: 60040 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:17:13,400-Speed 5459.57 samples/sec Loss 8.1883 LearningRate 0.1678 Epoch: 5 Global Step: 60050 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:17:20,915-Speed 5451.48 samples/sec Loss 8.1904 LearningRate 0.1678 Epoch: 5 Global Step: 60060 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:17:28,451-Speed 5436.37 samples/sec Loss 8.1595 LearningRate 0.1677 Epoch: 5 Global Step: 60070 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:17:36,049-Speed 5391.59 samples/sec Loss 8.1432 LearningRate 0.1677 Epoch: 5 Global Step: 60080 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:17:43,625-Speed 5407.04 samples/sec Loss 8.1306 LearningRate 0.1677 Epoch: 5 Global Step: 60090 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:17:51,257-Speed 5367.73 samples/sec Loss 8.1409 LearningRate 0.1677 Epoch: 5 Global Step: 60100 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:17:58,806-Speed 5427.01 samples/sec Loss 8.1694 LearningRate 0.1676 Epoch: 5 Global Step: 60110 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:18:06,270-Speed 5488.15 samples/sec Loss 8.1378 LearningRate 0.1676 Epoch: 5 Global Step: 60120 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:18:13,809-Speed 5433.45 samples/sec Loss 8.1164 LearningRate 0.1676 Epoch: 5 Global Step: 60130 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:18:21,269-Speed 5491.20 samples/sec Loss 8.1470 LearningRate 0.1676 Epoch: 5 Global Step: 60140 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:18:28,786-Speed 5450.39 samples/sec Loss 8.1344 LearningRate 0.1675 Epoch: 5 Global Step: 60150 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:18:36,540-Speed 5283.68 samples/sec Loss 8.0945 LearningRate 0.1675 Epoch: 5 Global Step: 60160 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 08:18:44,020-Speed 5475.91 samples/sec Loss 8.1691 LearningRate 0.1675 Epoch: 5 Global Step: 60170 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:18:51,560-Speed 5433.52 samples/sec Loss 8.1625 LearningRate 0.1675 Epoch: 5 Global Step: 60180 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:18:59,055-Speed 5465.75 samples/sec Loss 8.1746 LearningRate 0.1675 Epoch: 5 Global Step: 60190 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:19:06,635-Speed 5404.05 samples/sec Loss 8.2445 LearningRate 0.1674 Epoch: 5 Global Step: 60200 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:19:14,111-Speed 5479.59 samples/sec Loss 8.1538 LearningRate 0.1674 Epoch: 5 Global Step: 60210 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:19:21,676-Speed 5415.44 samples/sec Loss 8.1359 LearningRate 0.1674 Epoch: 5 Global Step: 60220 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:19:29,175-Speed 5462.91 samples/sec Loss 8.1550 LearningRate 0.1674 Epoch: 5 Global Step: 60230 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:19:36,666-Speed 5468.79 samples/sec Loss 8.0802 LearningRate 0.1673 Epoch: 5 Global Step: 60240 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:19:44,195-Speed 5440.56 samples/sec Loss 8.0311 LearningRate 0.1673 Epoch: 5 Global Step: 60250 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:19:51,867-Speed 5339.50 samples/sec Loss 8.1106 LearningRate 0.1673 Epoch: 5 Global Step: 60260 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:19:59,445-Speed 5406.44 samples/sec Loss 8.0523 LearningRate 0.1673 Epoch: 5 Global Step: 60270 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:20:06,989-Speed 5429.81 samples/sec Loss 8.1022 LearningRate 0.1672 Epoch: 5 Global Step: 60280 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:20:14,523-Speed 5437.72 samples/sec Loss 8.1085 LearningRate 0.1672 Epoch: 5 Global Step: 60290 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:20:22,005-Speed 5475.50 samples/sec Loss 8.1178 LearningRate 0.1672 Epoch: 5 Global Step: 60300 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:20:29,631-Speed 5371.74 samples/sec Loss 8.1657 LearningRate 0.1672 Epoch: 5 Global Step: 60310 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:20:37,135-Speed 5458.99 samples/sec Loss 8.1280 LearningRate 0.1672 Epoch: 5 Global Step: 60320 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:20:44,581-Speed 5501.68 samples/sec Loss 8.0691 LearningRate 0.1671 Epoch: 5 Global Step: 60330 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:20:52,209-Speed 5370.29 samples/sec Loss 8.1642 LearningRate 0.1671 Epoch: 5 Global Step: 60340 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:20:59,784-Speed 5407.86 samples/sec Loss 8.1399 LearningRate 0.1671 Epoch: 5 Global Step: 60350 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:21:07,261-Speed 5479.37 samples/sec Loss 8.1298 LearningRate 0.1671 Epoch: 5 Global Step: 60360 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:21:14,805-Speed 5429.69 samples/sec Loss 8.1391 LearningRate 0.1670 Epoch: 5 Global Step: 60370 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 08:21:22,332-Speed 5442.95 samples/sec Loss 8.1051 LearningRate 0.1670 Epoch: 5 Global Step: 60380 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:21:29,785-Speed 5496.17 samples/sec Loss 8.1845 LearningRate 0.1670 Epoch: 5 Global Step: 60390 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:21:37,331-Speed 5428.53 samples/sec Loss 8.0716 LearningRate 0.1670 Epoch: 5 Global Step: 60400 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:21:44,846-Speed 5451.54 samples/sec Loss 8.1460 LearningRate 0.1670 Epoch: 5 Global Step: 60410 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:21:52,339-Speed 5466.82 samples/sec Loss 8.1985 LearningRate 0.1669 Epoch: 5 Global Step: 60420 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:21:59,859-Speed 5447.17 samples/sec Loss 8.1387 LearningRate 0.1669 Epoch: 5 Global Step: 60430 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:22:07,363-Speed 5459.83 samples/sec Loss 8.1020 LearningRate 0.1669 Epoch: 5 Global Step: 60440 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:22:14,995-Speed 5367.64 samples/sec Loss 8.1063 LearningRate 0.1669 Epoch: 5 Global Step: 60450 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:22:22,505-Speed 5454.43 samples/sec Loss 8.0980 LearningRate 0.1668 Epoch: 5 Global Step: 60460 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:22:30,000-Speed 5465.42 samples/sec Loss 8.1191 LearningRate 0.1668 Epoch: 5 Global Step: 60470 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:22:37,516-Speed 5450.36 samples/sec Loss 8.1031 LearningRate 0.1668 Epoch: 5 Global Step: 60480 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:22:44,980-Speed 5488.60 samples/sec Loss 8.1310 LearningRate 0.1668 Epoch: 5 Global Step: 60490 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:22:52,592-Speed 5381.62 samples/sec Loss 8.1875 LearningRate 0.1667 Epoch: 5 Global Step: 60500 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:23:00,149-Speed 5421.08 samples/sec Loss 8.1755 LearningRate 0.1667 Epoch: 5 Global Step: 60510 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:23:07,667-Speed 5448.99 samples/sec Loss 8.1826 LearningRate 0.1667 Epoch: 5 Global Step: 60520 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:23:15,288-Speed 5375.23 samples/sec Loss 8.1560 LearningRate 0.1667 Epoch: 5 Global Step: 60530 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:23:22,857-Speed 5412.14 samples/sec Loss 8.1456 LearningRate 0.1667 Epoch: 5 Global Step: 60540 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:23:30,266-Speed 5529.43 samples/sec Loss 8.0564 LearningRate 0.1666 Epoch: 5 Global Step: 60550 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:23:37,741-Speed 5480.14 samples/sec Loss 8.1470 LearningRate 0.1666 Epoch: 5 Global Step: 60560 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:23:45,277-Speed 5435.35 samples/sec Loss 8.1679 LearningRate 0.1666 Epoch: 5 Global Step: 60570 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:23:52,782-Speed 5458.64 samples/sec Loss 8.1121 LearningRate 0.1666 Epoch: 5 Global Step: 60580 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-01-08 08:24:00,289-Speed 5457.22 samples/sec Loss 8.1763 LearningRate 0.1665 Epoch: 5 Global Step: 60590 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:24:07,812-Speed 5444.96 samples/sec Loss 8.1373 LearningRate 0.1665 Epoch: 5 Global Step: 60600 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:24:15,334-Speed 5446.08 samples/sec Loss 8.0861 LearningRate 0.1665 Epoch: 5 Global Step: 60610 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:24:22,889-Speed 5422.82 samples/sec Loss 8.1681 LearningRate 0.1665 Epoch: 5 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:24:30,410-Speed 5446.53 samples/sec Loss 8.1918 LearningRate 0.1665 Epoch: 5 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:24:37,927-Speed 5450.06 samples/sec Loss 8.1845 LearningRate 0.1664 Epoch: 5 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:24:45,511-Speed 5401.08 samples/sec Loss 8.0795 LearningRate 0.1664 Epoch: 5 Global Step: 60650 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:24:53,030-Speed 5448.48 samples/sec Loss 8.1613 LearningRate 0.1664 Epoch: 5 Global Step: 60660 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:25:00,543-Speed 5452.92 samples/sec Loss 8.0886 LearningRate 0.1664 Epoch: 5 Global Step: 60670 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:25:08,074-Speed 5439.29 samples/sec Loss 8.1143 LearningRate 0.1663 Epoch: 5 Global Step: 60680 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:25:15,606-Speed 5438.83 samples/sec Loss 8.1017 LearningRate 0.1663 Epoch: 5 Global Step: 60690 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:25:23,074-Speed 5485.19 samples/sec Loss 8.1614 LearningRate 0.1663 Epoch: 5 Global Step: 60700 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:25:30,597-Speed 5445.64 samples/sec Loss 8.1321 LearningRate 0.1663 Epoch: 5 Global Step: 60710 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:25:38,070-Speed 5481.93 samples/sec Loss 8.1385 LearningRate 0.1663 Epoch: 5 Global Step: 60720 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:25:45,627-Speed 5420.77 samples/sec Loss 8.1354 LearningRate 0.1662 Epoch: 5 Global Step: 60730 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:25:53,106-Speed 5477.25 samples/sec Loss 8.1232 LearningRate 0.1662 Epoch: 5 Global Step: 60740 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:26:00,629-Speed 5445.29 samples/sec Loss 8.1514 LearningRate 0.1662 Epoch: 5 Global Step: 60750 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:26:08,136-Speed 5457.51 samples/sec Loss 8.1303 LearningRate 0.1662 Epoch: 5 Global Step: 60760 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:26:15,622-Speed 5471.39 samples/sec Loss 8.0806 LearningRate 0.1661 Epoch: 5 Global Step: 60770 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:26:23,164-Speed 5431.75 samples/sec Loss 8.1215 LearningRate 0.1661 Epoch: 5 Global Step: 60780 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:26:30,746-Speed 5403.66 samples/sec Loss 8.0694 LearningRate 0.1661 Epoch: 5 Global Step: 60790 Fp16 Grad Scale: 262144 Required: 34 hours Training: 2022-01-08 08:26:38,393-Speed 5356.41 samples/sec Loss 8.1364 LearningRate 0.1661 Epoch: 5 Global Step: 60800 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:26:45,876-Speed 5474.64 samples/sec Loss 8.2141 LearningRate 0.1660 Epoch: 5 Global Step: 60810 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:26:53,374-Speed 5463.08 samples/sec Loss 8.1088 LearningRate 0.1660 Epoch: 5 Global Step: 60820 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:27:00,880-Speed 5457.96 samples/sec Loss 8.1072 LearningRate 0.1660 Epoch: 5 Global Step: 60830 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:27:08,391-Speed 5454.35 samples/sec Loss 8.1021 LearningRate 0.1660 Epoch: 5 Global Step: 60840 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:27:15,892-Speed 5460.85 samples/sec Loss 8.0506 LearningRate 0.1660 Epoch: 5 Global Step: 60850 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:27:23,454-Speed 5417.19 samples/sec Loss 8.0966 LearningRate 0.1659 Epoch: 5 Global Step: 60860 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:27:30,965-Speed 5453.96 samples/sec Loss 8.1392 LearningRate 0.1659 Epoch: 5 Global Step: 60870 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:27:38,501-Speed 5436.12 samples/sec Loss 8.0806 LearningRate 0.1659 Epoch: 5 Global Step: 60880 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:27:46,248-Speed 5288.18 samples/sec Loss 8.1228 LearningRate 0.1659 Epoch: 5 Global Step: 60890 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:27:53,821-Speed 5408.85 samples/sec Loss 8.1391 LearningRate 0.1658 Epoch: 5 Global Step: 60900 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:28:01,448-Speed 5426.82 samples/sec Loss 8.0664 LearningRate 0.1658 Epoch: 5 Global Step: 60910 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:28:09,118-Speed 5340.97 samples/sec Loss 8.1119 LearningRate 0.1658 Epoch: 5 Global Step: 60920 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:28:16,660-Speed 5431.72 samples/sec Loss 8.0645 LearningRate 0.1658 Epoch: 5 Global Step: 60930 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:28:24,263-Speed 5387.74 samples/sec Loss 8.0227 LearningRate 0.1658 Epoch: 5 Global Step: 60940 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:28:31,721-Speed 5493.40 samples/sec Loss 8.0822 LearningRate 0.1657 Epoch: 5 Global Step: 60950 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-01-08 08:28:39,258-Speed 5434.89 samples/sec Loss 8.1734 LearningRate 0.1657 Epoch: 5 Global Step: 60960 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:28:46,783-Speed 5443.94 samples/sec Loss 8.0786 LearningRate 0.1657 Epoch: 5 Global Step: 60970 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:28:54,250-Speed 5486.45 samples/sec Loss 8.1126 LearningRate 0.1657 Epoch: 5 Global Step: 60980 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:29:01,757-Speed 5457.01 samples/sec Loss 8.0822 LearningRate 0.1656 Epoch: 5 Global Step: 60990 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:29:09,234-Speed 5478.30 samples/sec Loss 8.1295 LearningRate 0.1656 Epoch: 5 Global Step: 61000 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:29:16,801-Speed 5414.11 samples/sec Loss 8.0912 LearningRate 0.1656 Epoch: 5 Global Step: 61010 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:29:24,342-Speed 5432.43 samples/sec Loss 8.1334 LearningRate 0.1656 Epoch: 5 Global Step: 61020 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:29:31,935-Speed 5394.61 samples/sec Loss 8.0809 LearningRate 0.1655 Epoch: 5 Global Step: 61030 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:29:39,585-Speed 5355.47 samples/sec Loss 8.0682 LearningRate 0.1655 Epoch: 5 Global Step: 61040 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:29:47,150-Speed 5415.06 samples/sec Loss 8.0319 LearningRate 0.1655 Epoch: 5 Global Step: 61050 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-01-08 08:29:54,744-Speed 5394.37 samples/sec Loss 8.1323 LearningRate 0.1655 Epoch: 5 Global Step: 61060 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 08:30:02,275-Speed 5439.50 samples/sec Loss 8.1313 LearningRate 0.1655 Epoch: 5 Global Step: 61070 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:30:09,916-Speed 5361.47 samples/sec Loss 8.1097 LearningRate 0.1654 Epoch: 5 Global Step: 61080 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:30:17,394-Speed 5477.51 samples/sec Loss 8.1100 LearningRate 0.1654 Epoch: 5 Global Step: 61090 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:30:24,898-Speed 5459.50 samples/sec Loss 8.0726 LearningRate 0.1654 Epoch: 5 Global Step: 61100 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:30:32,473-Speed 5407.92 samples/sec Loss 8.0840 LearningRate 0.1654 Epoch: 5 Global Step: 61110 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:30:39,961-Speed 5470.91 samples/sec Loss 8.0491 LearningRate 0.1653 Epoch: 5 Global Step: 61120 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:30:47,443-Speed 5474.84 samples/sec Loss 8.0934 LearningRate 0.1653 Epoch: 5 Global Step: 61130 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:30:55,849-Speed 4873.59 samples/sec Loss 8.0362 LearningRate 0.1653 Epoch: 5 Global Step: 61140 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:31:03,429-Speed 5404.35 samples/sec Loss 8.1402 LearningRate 0.1653 Epoch: 5 Global Step: 61150 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:31:10,958-Speed 5440.58 samples/sec Loss 8.1657 LearningRate 0.1653 Epoch: 5 Global Step: 61160 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:31:18,497-Speed 5434.33 samples/sec Loss 8.0275 LearningRate 0.1652 Epoch: 5 Global Step: 61170 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:31:26,072-Speed 5407.88 samples/sec Loss 8.0749 LearningRate 0.1652 Epoch: 5 Global Step: 61180 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:31:33,538-Speed 5487.30 samples/sec Loss 8.1400 LearningRate 0.1652 Epoch: 5 Global Step: 61190 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:31:41,108-Speed 5411.09 samples/sec Loss 8.0610 LearningRate 0.1652 Epoch: 5 Global Step: 61200 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:31:48,641-Speed 5438.40 samples/sec Loss 8.0638 LearningRate 0.1651 Epoch: 5 Global Step: 61210 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:31:56,129-Speed 5471.12 samples/sec Loss 8.0079 LearningRate 0.1651 Epoch: 5 Global Step: 61220 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:32:03,691-Speed 5416.97 samples/sec Loss 8.0916 LearningRate 0.1651 Epoch: 5 Global Step: 61230 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:32:11,240-Speed 5426.27 samples/sec Loss 8.1046 LearningRate 0.1651 Epoch: 5 Global Step: 61240 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:32:18,757-Speed 5450.16 samples/sec Loss 8.1418 LearningRate 0.1651 Epoch: 5 Global Step: 61250 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:32:26,300-Speed 5431.16 samples/sec Loss 8.1486 LearningRate 0.1650 Epoch: 5 Global Step: 61260 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:32:34,008-Speed 5314.89 samples/sec Loss 8.0444 LearningRate 0.1650 Epoch: 5 Global Step: 61270 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:32:41,570-Speed 5416.78 samples/sec Loss 8.0861 LearningRate 0.1650 Epoch: 5 Global Step: 61280 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:32:49,055-Speed 5473.75 samples/sec Loss 8.0529 LearningRate 0.1650 Epoch: 5 Global Step: 61290 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:32:56,617-Speed 5417.39 samples/sec Loss 8.0719 LearningRate 0.1649 Epoch: 5 Global Step: 61300 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:33:04,171-Speed 5422.81 samples/sec Loss 8.0521 LearningRate 0.1649 Epoch: 5 Global Step: 61310 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:33:11,782-Speed 5382.41 samples/sec Loss 8.1336 LearningRate 0.1649 Epoch: 5 Global Step: 61320 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:33:19,444-Speed 5346.56 samples/sec Loss 7.9907 LearningRate 0.1649 Epoch: 5 Global Step: 61330 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:33:27,063-Speed 5377.17 samples/sec Loss 8.0450 LearningRate 0.1648 Epoch: 5 Global Step: 61340 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 08:33:34,532-Speed 5484.37 samples/sec Loss 8.0725 LearningRate 0.1648 Epoch: 5 Global Step: 61350 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 08:33:42,284-Speed 5284.56 samples/sec Loss 8.1586 LearningRate 0.1648 Epoch: 5 Global Step: 61360 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:33:49,954-Speed 5341.28 samples/sec Loss 8.0755 LearningRate 0.1648 Epoch: 5 Global Step: 61370 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:33:57,493-Speed 5433.65 samples/sec Loss 8.0593 LearningRate 0.1648 Epoch: 5 Global Step: 61380 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:34:05,102-Speed 5384.16 samples/sec Loss 8.0245 LearningRate 0.1647 Epoch: 5 Global Step: 61390 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:34:12,577-Speed 5479.65 samples/sec Loss 8.1010 LearningRate 0.1647 Epoch: 5 Global Step: 61400 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:34:20,133-Speed 5421.80 samples/sec Loss 8.1171 LearningRate 0.1647 Epoch: 5 Global Step: 61410 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:34:27,826-Speed 5325.00 samples/sec Loss 8.0964 LearningRate 0.1647 Epoch: 5 Global Step: 61420 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:34:35,382-Speed 5421.58 samples/sec Loss 8.0924 LearningRate 0.1646 Epoch: 5 Global Step: 61430 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:34:42,909-Speed 5442.52 samples/sec Loss 8.0889 LearningRate 0.1646 Epoch: 5 Global Step: 61440 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:34:50,464-Speed 5422.63 samples/sec Loss 8.0924 LearningRate 0.1646 Epoch: 5 Global Step: 61450 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:34:58,040-Speed 5406.84 samples/sec Loss 8.1138 LearningRate 0.1646 Epoch: 5 Global Step: 61460 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:35:05,678-Speed 5363.51 samples/sec Loss 8.0862 LearningRate 0.1646 Epoch: 5 Global Step: 61470 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:35:13,321-Speed 5359.87 samples/sec Loss 8.0228 LearningRate 0.1645 Epoch: 5 Global Step: 61480 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:35:20,856-Speed 5436.69 samples/sec Loss 7.9843 LearningRate 0.1645 Epoch: 5 Global Step: 61490 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:35:28,363-Speed 5456.63 samples/sec Loss 8.0939 LearningRate 0.1645 Epoch: 5 Global Step: 61500 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:35:35,927-Speed 5415.95 samples/sec Loss 8.0303 LearningRate 0.1645 Epoch: 5 Global Step: 61510 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:35:43,489-Speed 5416.97 samples/sec Loss 8.0156 LearningRate 0.1644 Epoch: 5 Global Step: 61520 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:35:51,025-Speed 5436.29 samples/sec Loss 8.0358 LearningRate 0.1644 Epoch: 5 Global Step: 61530 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:35:58,572-Speed 5427.74 samples/sec Loss 8.0446 LearningRate 0.1644 Epoch: 5 Global Step: 61540 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:36:06,096-Speed 5444.47 samples/sec Loss 8.0336 LearningRate 0.1644 Epoch: 5 Global Step: 61550 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:36:13,675-Speed 5405.21 samples/sec Loss 8.0423 LearningRate 0.1644 Epoch: 5 Global Step: 61560 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:36:21,288-Speed 5381.46 samples/sec Loss 8.0959 LearningRate 0.1643 Epoch: 5 Global Step: 61570 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:36:28,924-Speed 5364.19 samples/sec Loss 8.1593 LearningRate 0.1643 Epoch: 5 Global Step: 61580 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:36:36,437-Speed 5453.08 samples/sec Loss 8.0761 LearningRate 0.1643 Epoch: 5 Global Step: 61590 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:36:43,962-Speed 5444.11 samples/sec Loss 8.0476 LearningRate 0.1643 Epoch: 5 Global Step: 61600 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:36:51,516-Speed 5423.09 samples/sec Loss 8.0500 LearningRate 0.1642 Epoch: 5 Global Step: 61610 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:36:59,089-Speed 5409.26 samples/sec Loss 8.1129 LearningRate 0.1642 Epoch: 5 Global Step: 61620 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:37:06,596-Speed 5456.65 samples/sec Loss 8.0375 LearningRate 0.1642 Epoch: 5 Global Step: 61630 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:37:14,105-Speed 5455.86 samples/sec Loss 8.1298 LearningRate 0.1642 Epoch: 5 Global Step: 61640 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:37:21,770-Speed 5344.48 samples/sec Loss 8.0204 LearningRate 0.1641 Epoch: 5 Global Step: 61650 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:37:29,342-Speed 5409.92 samples/sec Loss 7.9301 LearningRate 0.1641 Epoch: 5 Global Step: 61660 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:37:36,881-Speed 5434.09 samples/sec Loss 8.0844 LearningRate 0.1641 Epoch: 5 Global Step: 61670 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:37:44,386-Speed 5458.38 samples/sec Loss 8.0640 LearningRate 0.1641 Epoch: 5 Global Step: 61680 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:37:51,916-Speed 5440.68 samples/sec Loss 7.9866 LearningRate 0.1641 Epoch: 5 Global Step: 61690 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:37:59,493-Speed 5406.18 samples/sec Loss 8.0525 LearningRate 0.1640 Epoch: 5 Global Step: 61700 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:38:07,003-Speed 5454.56 samples/sec Loss 8.0191 LearningRate 0.1640 Epoch: 5 Global Step: 61710 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:38:14,551-Speed 5427.83 samples/sec Loss 7.9881 LearningRate 0.1640 Epoch: 5 Global Step: 61720 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:38:22,055-Speed 5458.84 samples/sec Loss 8.0506 LearningRate 0.1640 Epoch: 5 Global Step: 61730 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:38:29,583-Speed 5441.50 samples/sec Loss 8.0342 LearningRate 0.1639 Epoch: 5 Global Step: 61740 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:38:37,121-Speed 5434.71 samples/sec Loss 8.0634 LearningRate 0.1639 Epoch: 5 Global Step: 61750 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:38:44,683-Speed 5416.88 samples/sec Loss 8.1133 LearningRate 0.1639 Epoch: 5 Global Step: 61760 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:38:52,273-Speed 5397.26 samples/sec Loss 8.0269 LearningRate 0.1639 Epoch: 5 Global Step: 61770 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:38:59,816-Speed 5431.77 samples/sec Loss 8.0266 LearningRate 0.1639 Epoch: 5 Global Step: 61780 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:39:07,407-Speed 5396.37 samples/sec Loss 8.0203 LearningRate 0.1638 Epoch: 5 Global Step: 61790 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:39:14,980-Speed 5408.73 samples/sec Loss 8.0987 LearningRate 0.1638 Epoch: 5 Global Step: 61800 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:39:22,464-Speed 5474.24 samples/sec Loss 8.0462 LearningRate 0.1638 Epoch: 5 Global Step: 61810 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:39:29,990-Speed 5443.00 samples/sec Loss 8.1082 LearningRate 0.1638 Epoch: 5 Global Step: 61820 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:39:37,580-Speed 5397.49 samples/sec Loss 8.1144 LearningRate 0.1637 Epoch: 5 Global Step: 61830 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:39:45,226-Speed 5357.75 samples/sec Loss 8.0419 LearningRate 0.1637 Epoch: 5 Global Step: 61840 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:39:52,739-Speed 5452.47 samples/sec Loss 8.1166 LearningRate 0.1637 Epoch: 5 Global Step: 61850 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:40:00,213-Speed 5481.33 samples/sec Loss 8.0611 LearningRate 0.1637 Epoch: 5 Global Step: 61860 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:40:07,757-Speed 5429.80 samples/sec Loss 8.1206 LearningRate 0.1637 Epoch: 5 Global Step: 61870 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:40:15,324-Speed 5413.41 samples/sec Loss 7.9959 LearningRate 0.1636 Epoch: 5 Global Step: 61880 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:40:23,140-Speed 5241.96 samples/sec Loss 8.0373 LearningRate 0.1636 Epoch: 5 Global Step: 61890 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:40:30,780-Speed 5361.61 samples/sec Loss 7.9053 LearningRate 0.1636 Epoch: 5 Global Step: 61900 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:40:38,335-Speed 5422.52 samples/sec Loss 8.0029 LearningRate 0.1636 Epoch: 5 Global Step: 61910 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:40:45,898-Speed 5416.13 samples/sec Loss 8.0371 LearningRate 0.1635 Epoch: 5 Global Step: 61920 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:40:53,598-Speed 5320.09 samples/sec Loss 8.0015 LearningRate 0.1635 Epoch: 5 Global Step: 61930 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:41:01,129-Speed 5440.12 samples/sec Loss 8.1628 LearningRate 0.1635 Epoch: 5 Global Step: 61940 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:41:08,906-Speed 5267.40 samples/sec Loss 8.0040 LearningRate 0.1635 Epoch: 5 Global Step: 61950 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:41:16,458-Speed 5424.27 samples/sec Loss 8.0692 LearningRate 0.1635 Epoch: 5 Global Step: 61960 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:41:24,015-Speed 5422.28 samples/sec Loss 8.0321 LearningRate 0.1634 Epoch: 5 Global Step: 61970 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:41:31,664-Speed 5355.53 samples/sec Loss 8.0531 LearningRate 0.1634 Epoch: 5 Global Step: 61980 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:41:39,276-Speed 5381.63 samples/sec Loss 8.0506 LearningRate 0.1634 Epoch: 5 Global Step: 61990 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:41:46,792-Speed 5450.27 samples/sec Loss 7.9653 LearningRate 0.1634 Epoch: 5 Global Step: 62000 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:42:30,440-[lfw][62000]XNorm: 24.483062 Training: 2022-01-08 08:42:30,440-[lfw][62000]Accuracy-Flip: 0.99750+-0.00271 Training: 2022-01-08 08:42:30,441-[lfw][62000]Accuracy-Highest: 0.99817 Training: 2022-01-08 08:43:21,313-[cfp_fp][62000]XNorm: 21.841480 Training: 2022-01-08 08:43:21,314-[cfp_fp][62000]Accuracy-Flip: 0.98414+-0.00591 Training: 2022-01-08 08:43:21,314-[cfp_fp][62000]Accuracy-Highest: 0.98600 Training: 2022-01-08 08:44:07,315-[agedb_30][62000]XNorm: 23.860036 Training: 2022-01-08 08:44:07,316-[agedb_30][62000]Accuracy-Flip: 0.97667+-0.00632 Training: 2022-01-08 08:44:07,317-[agedb_30][62000]Accuracy-Highest: 0.97667 Training: 2022-01-08 08:44:14,871-Speed 276.61 samples/sec Loss 8.0580 LearningRate 0.1633 Epoch: 5 Global Step: 62010 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:44:22,494-Speed 5374.90 samples/sec Loss 7.9895 LearningRate 0.1633 Epoch: 5 Global Step: 62020 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:44:30,167-Speed 5339.08 samples/sec Loss 8.0346 LearningRate 0.1633 Epoch: 5 Global Step: 62030 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:44:37,696-Speed 5441.68 samples/sec Loss 8.0234 LearningRate 0.1633 Epoch: 5 Global Step: 62040 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:44:45,237-Speed 5432.14 samples/sec Loss 8.0448 LearningRate 0.1632 Epoch: 5 Global Step: 62050 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:44:52,770-Speed 5438.44 samples/sec Loss 8.0530 LearningRate 0.1632 Epoch: 5 Global Step: 62060 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:45:00,290-Speed 5448.06 samples/sec Loss 8.0814 LearningRate 0.1632 Epoch: 5 Global Step: 62070 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 08:45:07,792-Speed 5460.20 samples/sec Loss 8.0471 LearningRate 0.1632 Epoch: 5 Global Step: 62080 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:45:15,317-Speed 5443.96 samples/sec Loss 8.0694 LearningRate 0.1632 Epoch: 5 Global Step: 62090 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:45:22,779-Speed 5489.86 samples/sec Loss 8.0410 LearningRate 0.1631 Epoch: 5 Global Step: 62100 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:45:30,435-Speed 5350.94 samples/sec Loss 8.0090 LearningRate 0.1631 Epoch: 5 Global Step: 62110 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:45:38,034-Speed 5390.81 samples/sec Loss 8.0557 LearningRate 0.1631 Epoch: 5 Global Step: 62120 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:45:45,524-Speed 5469.10 samples/sec Loss 8.0338 LearningRate 0.1631 Epoch: 5 Global Step: 62130 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:45:53,191-Speed 5343.25 samples/sec Loss 7.9804 LearningRate 0.1630 Epoch: 5 Global Step: 62140 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:46:00,743-Speed 5424.83 samples/sec Loss 8.0015 LearningRate 0.1630 Epoch: 5 Global Step: 62150 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:46:08,319-Speed 5406.97 samples/sec Loss 8.0124 LearningRate 0.1630 Epoch: 5 Global Step: 62160 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:46:15,888-Speed 5412.14 samples/sec Loss 8.0546 LearningRate 0.1630 Epoch: 5 Global Step: 62170 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:46:23,355-Speed 5486.04 samples/sec Loss 8.0771 LearningRate 0.1630 Epoch: 5 Global Step: 62180 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:46:30,878-Speed 5445.70 samples/sec Loss 8.0594 LearningRate 0.1629 Epoch: 5 Global Step: 62190 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:46:38,481-Speed 5387.49 samples/sec Loss 8.0860 LearningRate 0.1629 Epoch: 5 Global Step: 62200 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:46:46,057-Speed 5407.70 samples/sec Loss 8.0537 LearningRate 0.1629 Epoch: 5 Global Step: 62210 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:47:08,463-Speed 1828.29 samples/sec Loss 7.9910 LearningRate 0.1629 Epoch: 6 Global Step: 62220 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:47:15,994-Speed 5438.92 samples/sec Loss 8.0278 LearningRate 0.1628 Epoch: 6 Global Step: 62230 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:47:23,423-Speed 5514.65 samples/sec Loss 8.0598 LearningRate 0.1628 Epoch: 6 Global Step: 62240 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:47:31,050-Speed 5371.01 samples/sec Loss 7.9564 LearningRate 0.1628 Epoch: 6 Global Step: 62250 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:47:38,621-Speed 5410.80 samples/sec Loss 7.9926 LearningRate 0.1628 Epoch: 6 Global Step: 62260 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:47:46,242-Speed 5375.64 samples/sec Loss 8.0076 LearningRate 0.1628 Epoch: 6 Global Step: 62270 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:47:53,673-Speed 5512.88 samples/sec Loss 7.9362 LearningRate 0.1627 Epoch: 6 Global Step: 62280 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:48:01,116-Speed 5503.56 samples/sec Loss 8.0098 LearningRate 0.1627 Epoch: 6 Global Step: 62290 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:48:08,547-Speed 5512.98 samples/sec Loss 8.0955 LearningRate 0.1627 Epoch: 6 Global Step: 62300 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:48:15,987-Speed 5506.08 samples/sec Loss 7.9582 LearningRate 0.1627 Epoch: 6 Global Step: 62310 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:48:23,478-Speed 5468.63 samples/sec Loss 7.9825 LearningRate 0.1626 Epoch: 6 Global Step: 62320 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:48:30,943-Speed 5487.74 samples/sec Loss 7.9392 LearningRate 0.1626 Epoch: 6 Global Step: 62330 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:48:38,374-Speed 5512.77 samples/sec Loss 7.9924 LearningRate 0.1626 Epoch: 6 Global Step: 62340 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:48:45,918-Speed 5429.59 samples/sec Loss 8.0278 LearningRate 0.1626 Epoch: 6 Global Step: 62350 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:48:53,407-Speed 5470.20 samples/sec Loss 8.0790 LearningRate 0.1626 Epoch: 6 Global Step: 62360 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:49:00,894-Speed 5471.55 samples/sec Loss 8.0444 LearningRate 0.1625 Epoch: 6 Global Step: 62370 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:49:08,351-Speed 5494.14 samples/sec Loss 7.9558 LearningRate 0.1625 Epoch: 6 Global Step: 62380 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:49:15,843-Speed 5467.13 samples/sec Loss 7.9707 LearningRate 0.1625 Epoch: 6 Global Step: 62390 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 08:49:23,541-Speed 5321.96 samples/sec Loss 7.9307 LearningRate 0.1625 Epoch: 6 Global Step: 62400 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:49:31,011-Speed 5483.62 samples/sec Loss 8.0293 LearningRate 0.1624 Epoch: 6 Global Step: 62410 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:49:38,513-Speed 5460.96 samples/sec Loss 7.9691 LearningRate 0.1624 Epoch: 6 Global Step: 62420 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:49:45,992-Speed 5476.92 samples/sec Loss 7.9912 LearningRate 0.1624 Epoch: 6 Global Step: 62430 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:49:53,460-Speed 5485.80 samples/sec Loss 7.9504 LearningRate 0.1624 Epoch: 6 Global Step: 62440 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:50:01,036-Speed 5407.16 samples/sec Loss 7.9482 LearningRate 0.1624 Epoch: 6 Global Step: 62450 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:50:08,680-Speed 5359.01 samples/sec Loss 7.9886 LearningRate 0.1623 Epoch: 6 Global Step: 62460 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:50:16,371-Speed 5326.74 samples/sec Loss 7.9773 LearningRate 0.1623 Epoch: 6 Global Step: 62470 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:50:24,055-Speed 5331.24 samples/sec Loss 7.9676 LearningRate 0.1623 Epoch: 6 Global Step: 62480 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:50:31,764-Speed 5314.17 samples/sec Loss 8.0044 LearningRate 0.1623 Epoch: 6 Global Step: 62490 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:50:39,456-Speed 5325.75 samples/sec Loss 7.9613 LearningRate 0.1622 Epoch: 6 Global Step: 62500 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:50:47,155-Speed 5320.50 samples/sec Loss 7.9043 LearningRate 0.1622 Epoch: 6 Global Step: 62510 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:50:54,842-Speed 5329.27 samples/sec Loss 8.0517 LearningRate 0.1622 Epoch: 6 Global Step: 62520 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:51:02,490-Speed 5356.28 samples/sec Loss 8.0281 LearningRate 0.1622 Epoch: 6 Global Step: 62530 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:51:10,187-Speed 5323.10 samples/sec Loss 7.9361 LearningRate 0.1622 Epoch: 6 Global Step: 62540 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:51:18,026-Speed 5225.17 samples/sec Loss 8.0105 LearningRate 0.1621 Epoch: 6 Global Step: 62550 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:51:25,684-Speed 5349.29 samples/sec Loss 7.9661 LearningRate 0.1621 Epoch: 6 Global Step: 62560 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:51:33,221-Speed 5435.06 samples/sec Loss 7.9804 LearningRate 0.1621 Epoch: 6 Global Step: 62570 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:51:40,828-Speed 5386.14 samples/sec Loss 7.9644 LearningRate 0.1621 Epoch: 6 Global Step: 62580 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:51:48,349-Speed 5446.37 samples/sec Loss 8.0450 LearningRate 0.1620 Epoch: 6 Global Step: 62590 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:51:55,805-Speed 5494.31 samples/sec Loss 7.9831 LearningRate 0.1620 Epoch: 6 Global Step: 62600 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:52:03,448-Speed 5359.64 samples/sec Loss 7.9759 LearningRate 0.1620 Epoch: 6 Global Step: 62610 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:52:11,030-Speed 5403.64 samples/sec Loss 8.0159 LearningRate 0.1620 Epoch: 6 Global Step: 62620 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:52:18,873-Speed 5222.90 samples/sec Loss 7.9703 LearningRate 0.1619 Epoch: 6 Global Step: 62630 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:52:26,397-Speed 5444.26 samples/sec Loss 7.9540 LearningRate 0.1619 Epoch: 6 Global Step: 62640 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:52:33,961-Speed 5415.62 samples/sec Loss 8.0556 LearningRate 0.1619 Epoch: 6 Global Step: 62650 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:52:41,606-Speed 5358.98 samples/sec Loss 8.0517 LearningRate 0.1619 Epoch: 6 Global Step: 62660 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:52:49,371-Speed 5275.31 samples/sec Loss 7.9630 LearningRate 0.1619 Epoch: 6 Global Step: 62670 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:52:56,915-Speed 5429.90 samples/sec Loss 7.9642 LearningRate 0.1618 Epoch: 6 Global Step: 62680 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:53:04,585-Speed 5341.33 samples/sec Loss 8.0214 LearningRate 0.1618 Epoch: 6 Global Step: 62690 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:53:12,152-Speed 5414.06 samples/sec Loss 7.8985 LearningRate 0.1618 Epoch: 6 Global Step: 62700 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 08:53:19,775-Speed 5373.38 samples/sec Loss 8.0019 LearningRate 0.1618 Epoch: 6 Global Step: 62710 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 08:53:27,323-Speed 5427.30 samples/sec Loss 8.0022 LearningRate 0.1617 Epoch: 6 Global Step: 62720 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 08:53:34,915-Speed 5395.95 samples/sec Loss 8.0522 LearningRate 0.1617 Epoch: 6 Global Step: 62730 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:53:42,581-Speed 5344.03 samples/sec Loss 8.0436 LearningRate 0.1617 Epoch: 6 Global Step: 62740 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:53:50,134-Speed 5423.93 samples/sec Loss 8.0418 LearningRate 0.1617 Epoch: 6 Global Step: 62750 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:53:57,649-Speed 5450.77 samples/sec Loss 7.9683 LearningRate 0.1617 Epoch: 6 Global Step: 62760 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:54:05,199-Speed 5425.87 samples/sec Loss 7.9815 LearningRate 0.1616 Epoch: 6 Global Step: 62770 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:54:12,779-Speed 5404.41 samples/sec Loss 7.9820 LearningRate 0.1616 Epoch: 6 Global Step: 62780 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:54:20,319-Speed 5433.32 samples/sec Loss 7.9325 LearningRate 0.1616 Epoch: 6 Global Step: 62790 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:54:27,879-Speed 5418.54 samples/sec Loss 8.0155 LearningRate 0.1616 Epoch: 6 Global Step: 62800 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:54:35,376-Speed 5463.82 samples/sec Loss 7.9935 LearningRate 0.1615 Epoch: 6 Global Step: 62810 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:54:42,931-Speed 5423.14 samples/sec Loss 8.0063 LearningRate 0.1615 Epoch: 6 Global Step: 62820 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:54:50,485-Speed 5422.61 samples/sec Loss 7.9380 LearningRate 0.1615 Epoch: 6 Global Step: 62830 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:54:58,008-Speed 5444.81 samples/sec Loss 7.9694 LearningRate 0.1615 Epoch: 6 Global Step: 62840 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:55:05,479-Speed 5483.46 samples/sec Loss 8.0081 LearningRate 0.1615 Epoch: 6 Global Step: 62850 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:55:12,910-Speed 5513.37 samples/sec Loss 7.9412 LearningRate 0.1614 Epoch: 6 Global Step: 62860 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:55:20,524-Speed 5380.29 samples/sec Loss 7.9320 LearningRate 0.1614 Epoch: 6 Global Step: 62870 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:55:28,108-Speed 5401.04 samples/sec Loss 7.9794 LearningRate 0.1614 Epoch: 6 Global Step: 62880 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:55:35,676-Speed 5413.37 samples/sec Loss 7.9338 LearningRate 0.1614 Epoch: 6 Global Step: 62890 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:55:43,259-Speed 5402.41 samples/sec Loss 7.9065 LearningRate 0.1613 Epoch: 6 Global Step: 62900 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:55:50,895-Speed 5364.45 samples/sec Loss 7.9784 LearningRate 0.1613 Epoch: 6 Global Step: 62910 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:55:58,528-Speed 5366.70 samples/sec Loss 8.0184 LearningRate 0.1613 Epoch: 6 Global Step: 62920 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:56:06,062-Speed 5437.86 samples/sec Loss 7.9323 LearningRate 0.1613 Epoch: 6 Global Step: 62930 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:56:13,680-Speed 5377.43 samples/sec Loss 7.9943 LearningRate 0.1613 Epoch: 6 Global Step: 62940 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:56:21,172-Speed 5467.77 samples/sec Loss 7.8688 LearningRate 0.1612 Epoch: 6 Global Step: 62950 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:56:28,741-Speed 5412.33 samples/sec Loss 7.9707 LearningRate 0.1612 Epoch: 6 Global Step: 62960 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:56:36,264-Speed 5445.30 samples/sec Loss 7.9751 LearningRate 0.1612 Epoch: 6 Global Step: 62970 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:56:43,816-Speed 5425.07 samples/sec Loss 7.9690 LearningRate 0.1612 Epoch: 6 Global Step: 62980 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:56:51,346-Speed 5439.83 samples/sec Loss 7.9764 LearningRate 0.1611 Epoch: 6 Global Step: 62990 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:56:58,837-Speed 5468.67 samples/sec Loss 8.0145 LearningRate 0.1611 Epoch: 6 Global Step: 63000 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:57:06,493-Speed 5350.71 samples/sec Loss 8.0237 LearningRate 0.1611 Epoch: 6 Global Step: 63010 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:57:14,015-Speed 5446.00 samples/sec Loss 8.0259 LearningRate 0.1611 Epoch: 6 Global Step: 63020 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:57:21,641-Speed 5372.43 samples/sec Loss 7.9670 LearningRate 0.1611 Epoch: 6 Global Step: 63030 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:57:29,391-Speed 5285.00 samples/sec Loss 7.9599 LearningRate 0.1610 Epoch: 6 Global Step: 63040 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:57:36,923-Speed 5439.24 samples/sec Loss 7.9396 LearningRate 0.1610 Epoch: 6 Global Step: 63050 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:57:44,447-Speed 5444.53 samples/sec Loss 7.8847 LearningRate 0.1610 Epoch: 6 Global Step: 63060 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:57:52,011-Speed 5416.36 samples/sec Loss 7.9715 LearningRate 0.1610 Epoch: 6 Global Step: 63070 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:57:59,574-Speed 5416.22 samples/sec Loss 8.0298 LearningRate 0.1609 Epoch: 6 Global Step: 63080 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:58:07,075-Speed 5460.76 samples/sec Loss 7.9558 LearningRate 0.1609 Epoch: 6 Global Step: 63090 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:58:14,637-Speed 5417.46 samples/sec Loss 7.9512 LearningRate 0.1609 Epoch: 6 Global Step: 63100 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:58:22,299-Speed 5347.10 samples/sec Loss 7.9965 LearningRate 0.1609 Epoch: 6 Global Step: 63110 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:58:29,819-Speed 5446.80 samples/sec Loss 7.9443 LearningRate 0.1609 Epoch: 6 Global Step: 63120 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 08:58:37,374-Speed 5422.06 samples/sec Loss 7.9537 LearningRate 0.1608 Epoch: 6 Global Step: 63130 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:58:45,011-Speed 5364.65 samples/sec Loss 7.9034 LearningRate 0.1608 Epoch: 6 Global Step: 63140 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:58:52,702-Speed 5326.33 samples/sec Loss 7.9729 LearningRate 0.1608 Epoch: 6 Global Step: 63150 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:59:00,364-Speed 5346.84 samples/sec Loss 7.9600 LearningRate 0.1608 Epoch: 6 Global Step: 63160 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:59:07,998-Speed 5365.49 samples/sec Loss 7.9481 LearningRate 0.1607 Epoch: 6 Global Step: 63170 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:59:15,557-Speed 5419.91 samples/sec Loss 8.0207 LearningRate 0.1607 Epoch: 6 Global Step: 63180 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:59:23,087-Speed 5440.24 samples/sec Loss 7.9620 LearningRate 0.1607 Epoch: 6 Global Step: 63190 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:59:30,581-Speed 5466.34 samples/sec Loss 7.8850 LearningRate 0.1607 Epoch: 6 Global Step: 63200 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:59:38,211-Speed 5368.73 samples/sec Loss 7.9178 LearningRate 0.1607 Epoch: 6 Global Step: 63210 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:59:45,834-Speed 5374.34 samples/sec Loss 7.9129 LearningRate 0.1606 Epoch: 6 Global Step: 63220 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 08:59:53,459-Speed 5372.51 samples/sec Loss 7.9898 LearningRate 0.1606 Epoch: 6 Global Step: 63230 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:00:00,927-Speed 5484.79 samples/sec Loss 7.9074 LearningRate 0.1606 Epoch: 6 Global Step: 63240 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:00:08,503-Speed 5407.25 samples/sec Loss 7.8931 LearningRate 0.1606 Epoch: 6 Global Step: 63250 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:00:16,006-Speed 5460.03 samples/sec Loss 7.9560 LearningRate 0.1605 Epoch: 6 Global Step: 63260 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:00:23,553-Speed 5428.26 samples/sec Loss 8.0110 LearningRate 0.1605 Epoch: 6 Global Step: 63270 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:00:31,259-Speed 5315.83 samples/sec Loss 7.9051 LearningRate 0.1605 Epoch: 6 Global Step: 63280 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:00:38,956-Speed 5322.22 samples/sec Loss 7.9462 LearningRate 0.1605 Epoch: 6 Global Step: 63290 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:00:46,737-Speed 5265.11 samples/sec Loss 7.9444 LearningRate 0.1605 Epoch: 6 Global Step: 63300 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:00:54,403-Speed 5344.29 samples/sec Loss 8.0283 LearningRate 0.1604 Epoch: 6 Global Step: 63310 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:01:01,973-Speed 5411.11 samples/sec Loss 7.9174 LearningRate 0.1604 Epoch: 6 Global Step: 63320 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:01:09,537-Speed 5415.66 samples/sec Loss 7.9557 LearningRate 0.1604 Epoch: 6 Global Step: 63330 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:01:17,164-Speed 5371.63 samples/sec Loss 7.9317 LearningRate 0.1604 Epoch: 6 Global Step: 63340 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:01:24,660-Speed 5464.60 samples/sec Loss 7.9385 LearningRate 0.1603 Epoch: 6 Global Step: 63350 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:01:32,262-Speed 5388.52 samples/sec Loss 7.9422 LearningRate 0.1603 Epoch: 6 Global Step: 63360 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:01:39,833-Speed 5410.80 samples/sec Loss 7.9337 LearningRate 0.1603 Epoch: 6 Global Step: 63370 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:01:47,543-Speed 5313.70 samples/sec Loss 7.8840 LearningRate 0.1603 Epoch: 6 Global Step: 63380 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:01:55,031-Speed 5471.28 samples/sec Loss 7.9896 LearningRate 0.1603 Epoch: 6 Global Step: 63390 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:02:02,575-Speed 5429.93 samples/sec Loss 7.9528 LearningRate 0.1602 Epoch: 6 Global Step: 63400 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:02:10,296-Speed 5305.56 samples/sec Loss 7.9336 LearningRate 0.1602 Epoch: 6 Global Step: 63410 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:02:17,836-Speed 5433.35 samples/sec Loss 7.9733 LearningRate 0.1602 Epoch: 6 Global Step: 63420 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:02:25,355-Speed 5448.82 samples/sec Loss 7.9440 LearningRate 0.1602 Epoch: 6 Global Step: 63430 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:02:32,877-Speed 5445.46 samples/sec Loss 7.9645 LearningRate 0.1601 Epoch: 6 Global Step: 63440 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:02:40,373-Speed 5464.87 samples/sec Loss 7.8278 LearningRate 0.1601 Epoch: 6 Global Step: 63450 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:02:47,917-Speed 5430.51 samples/sec Loss 7.8615 LearningRate 0.1601 Epoch: 6 Global Step: 63460 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:02:55,365-Speed 5499.86 samples/sec Loss 7.9437 LearningRate 0.1601 Epoch: 6 Global Step: 63470 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:03:02,968-Speed 5388.49 samples/sec Loss 7.9251 LearningRate 0.1601 Epoch: 6 Global Step: 63480 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:03:10,462-Speed 5465.92 samples/sec Loss 7.8854 LearningRate 0.1600 Epoch: 6 Global Step: 63490 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:03:18,094-Speed 5367.59 samples/sec Loss 7.9681 LearningRate 0.1600 Epoch: 6 Global Step: 63500 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:03:25,623-Speed 5441.28 samples/sec Loss 7.9319 LearningRate 0.1600 Epoch: 6 Global Step: 63510 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:03:33,247-Speed 5373.08 samples/sec Loss 7.8544 LearningRate 0.1600 Epoch: 6 Global Step: 63520 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:03:40,879-Speed 5367.51 samples/sec Loss 7.9317 LearningRate 0.1599 Epoch: 6 Global Step: 63530 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:03:48,538-Speed 5348.80 samples/sec Loss 7.9503 LearningRate 0.1599 Epoch: 6 Global Step: 63540 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:03:56,045-Speed 5457.58 samples/sec Loss 7.9409 LearningRate 0.1599 Epoch: 6 Global Step: 63550 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:04:03,601-Speed 5421.17 samples/sec Loss 7.9205 LearningRate 0.1599 Epoch: 6 Global Step: 63560 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:04:11,106-Speed 5458.83 samples/sec Loss 7.9319 LearningRate 0.1599 Epoch: 6 Global Step: 63570 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:04:18,715-Speed 5383.50 samples/sec Loss 7.9499 LearningRate 0.1598 Epoch: 6 Global Step: 63580 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:04:26,201-Speed 5472.00 samples/sec Loss 7.9130 LearningRate 0.1598 Epoch: 6 Global Step: 63590 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:04:33,760-Speed 5419.67 samples/sec Loss 7.9158 LearningRate 0.1598 Epoch: 6 Global Step: 63600 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:04:41,363-Speed 5388.08 samples/sec Loss 7.9437 LearningRate 0.1598 Epoch: 6 Global Step: 63610 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:04:48,836-Speed 5481.49 samples/sec Loss 7.8568 LearningRate 0.1597 Epoch: 6 Global Step: 63620 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:04:56,335-Speed 5462.84 samples/sec Loss 7.8892 LearningRate 0.1597 Epoch: 6 Global Step: 63630 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:05:03,894-Speed 5419.90 samples/sec Loss 7.9160 LearningRate 0.1597 Epoch: 6 Global Step: 63640 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:05:11,624-Speed 5299.21 samples/sec Loss 7.8562 LearningRate 0.1597 Epoch: 6 Global Step: 63650 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:05:19,160-Speed 5435.41 samples/sec Loss 7.9260 LearningRate 0.1597 Epoch: 6 Global Step: 63660 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:05:26,671-Speed 5454.24 samples/sec Loss 7.8859 LearningRate 0.1596 Epoch: 6 Global Step: 63670 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:05:34,165-Speed 5466.95 samples/sec Loss 7.9505 LearningRate 0.1596 Epoch: 6 Global Step: 63680 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:05:41,694-Speed 5440.14 samples/sec Loss 7.9417 LearningRate 0.1596 Epoch: 6 Global Step: 63690 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:05:49,174-Speed 5477.11 samples/sec Loss 7.9206 LearningRate 0.1596 Epoch: 6 Global Step: 63700 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:05:56,725-Speed 5424.76 samples/sec Loss 7.9586 LearningRate 0.1595 Epoch: 6 Global Step: 63710 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:06:04,210-Speed 5473.32 samples/sec Loss 7.8600 LearningRate 0.1595 Epoch: 6 Global Step: 63720 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:06:11,688-Speed 5477.60 samples/sec Loss 7.8494 LearningRate 0.1595 Epoch: 6 Global Step: 63730 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:06:19,210-Speed 5446.41 samples/sec Loss 7.9340 LearningRate 0.1595 Epoch: 6 Global Step: 63740 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:06:26,684-Speed 5481.03 samples/sec Loss 7.8777 LearningRate 0.1595 Epoch: 6 Global Step: 63750 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:06:34,193-Speed 5455.94 samples/sec Loss 7.9396 LearningRate 0.1594 Epoch: 6 Global Step: 63760 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:06:41,671-Speed 5478.18 samples/sec Loss 8.0015 LearningRate 0.1594 Epoch: 6 Global Step: 63770 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:06:49,123-Speed 5497.03 samples/sec Loss 7.9174 LearningRate 0.1594 Epoch: 6 Global Step: 63780 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:06:56,688-Speed 5415.55 samples/sec Loss 7.9512 LearningRate 0.1594 Epoch: 6 Global Step: 63790 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:07:04,294-Speed 5385.56 samples/sec Loss 7.9618 LearningRate 0.1593 Epoch: 6 Global Step: 63800 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:07:11,786-Speed 5468.13 samples/sec Loss 7.8101 LearningRate 0.1593 Epoch: 6 Global Step: 63810 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:07:19,246-Speed 5491.02 samples/sec Loss 7.9624 LearningRate 0.1593 Epoch: 6 Global Step: 63820 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:07:26,788-Speed 5431.99 samples/sec Loss 7.8902 LearningRate 0.1593 Epoch: 6 Global Step: 63830 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:07:34,386-Speed 5391.59 samples/sec Loss 7.9228 LearningRate 0.1593 Epoch: 6 Global Step: 63840 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:07:41,953-Speed 5413.58 samples/sec Loss 7.9330 LearningRate 0.1592 Epoch: 6 Global Step: 63850 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:07:49,419-Speed 5487.10 samples/sec Loss 7.9339 LearningRate 0.1592 Epoch: 6 Global Step: 63860 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:07:56,927-Speed 5455.77 samples/sec Loss 7.9521 LearningRate 0.1592 Epoch: 6 Global Step: 63870 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:08:04,457-Speed 5440.65 samples/sec Loss 7.8947 LearningRate 0.1592 Epoch: 6 Global Step: 63880 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:08:11,941-Speed 5473.31 samples/sec Loss 7.8299 LearningRate 0.1591 Epoch: 6 Global Step: 63890 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:08:19,398-Speed 5493.94 samples/sec Loss 7.8627 LearningRate 0.1591 Epoch: 6 Global Step: 63900 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:08:26,887-Speed 5470.04 samples/sec Loss 7.9118 LearningRate 0.1591 Epoch: 6 Global Step: 63910 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:08:34,395-Speed 5456.27 samples/sec Loss 7.9751 LearningRate 0.1591 Epoch: 6 Global Step: 63920 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 09:08:42,043-Speed 5356.25 samples/sec Loss 7.9560 LearningRate 0.1591 Epoch: 6 Global Step: 63930 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:08:49,539-Speed 5464.95 samples/sec Loss 7.9086 LearningRate 0.1590 Epoch: 6 Global Step: 63940 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:08:56,992-Speed 5496.28 samples/sec Loss 7.8947 LearningRate 0.1590 Epoch: 6 Global Step: 63950 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:09:04,484-Speed 5467.99 samples/sec Loss 7.9095 LearningRate 0.1590 Epoch: 6 Global Step: 63960 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:09:11,912-Speed 5515.31 samples/sec Loss 7.9123 LearningRate 0.1590 Epoch: 6 Global Step: 63970 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:09:19,377-Speed 5487.84 samples/sec Loss 7.8530 LearningRate 0.1589 Epoch: 6 Global Step: 63980 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:09:26,821-Speed 5502.65 samples/sec Loss 7.8435 LearningRate 0.1589 Epoch: 6 Global Step: 63990 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:09:34,303-Speed 5475.58 samples/sec Loss 7.8751 LearningRate 0.1589 Epoch: 6 Global Step: 64000 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:10:18,232-[lfw][64000]XNorm: 23.364512 Training: 2022-01-08 09:10:18,233-[lfw][64000]Accuracy-Flip: 0.99783+-0.00279 Training: 2022-01-08 09:10:18,233-[lfw][64000]Accuracy-Highest: 0.99817 Training: 2022-01-08 09:11:09,983-[cfp_fp][64000]XNorm: 21.319322 Training: 2022-01-08 09:11:09,984-[cfp_fp][64000]Accuracy-Flip: 0.98457+-0.00541 Training: 2022-01-08 09:11:09,985-[cfp_fp][64000]Accuracy-Highest: 0.98600 Training: 2022-01-08 09:11:56,151-[agedb_30][64000]XNorm: 23.179053 Training: 2022-01-08 09:11:56,153-[agedb_30][64000]Accuracy-Flip: 0.97333+-0.00601 Training: 2022-01-08 09:11:56,153-[agedb_30][64000]Accuracy-Highest: 0.97667 Training: 2022-01-08 09:12:03,341-Speed 274.83 samples/sec Loss 7.8360 LearningRate 0.1589 Epoch: 6 Global Step: 64010 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:12:11,041-Speed 5321.07 samples/sec Loss 7.9691 LearningRate 0.1589 Epoch: 6 Global Step: 64020 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:12:18,647-Speed 5386.20 samples/sec Loss 7.8796 LearningRate 0.1588 Epoch: 6 Global Step: 64030 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-01-08 09:12:26,319-Speed 5339.83 samples/sec Loss 7.8392 LearningRate 0.1588 Epoch: 6 Global Step: 64040 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:12:33,840-Speed 5447.35 samples/sec Loss 7.8844 LearningRate 0.1588 Epoch: 6 Global Step: 64050 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:12:41,344-Speed 5459.52 samples/sec Loss 7.9158 LearningRate 0.1588 Epoch: 6 Global Step: 64060 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:12:48,856-Speed 5453.14 samples/sec Loss 7.8561 LearningRate 0.1587 Epoch: 6 Global Step: 64070 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:12:56,374-Speed 5448.84 samples/sec Loss 7.8801 LearningRate 0.1587 Epoch: 6 Global Step: 64080 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:13:03,871-Speed 5464.27 samples/sec Loss 7.9044 LearningRate 0.1587 Epoch: 6 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:13:11,386-Speed 5451.23 samples/sec Loss 7.8989 LearningRate 0.1587 Epoch: 6 Global Step: 64100 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:13:18,921-Speed 5436.60 samples/sec Loss 7.8612 LearningRate 0.1587 Epoch: 6 Global Step: 64110 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:13:26,419-Speed 5463.32 samples/sec Loss 7.8331 LearningRate 0.1586 Epoch: 6 Global Step: 64120 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:13:33,970-Speed 5425.10 samples/sec Loss 7.8552 LearningRate 0.1586 Epoch: 6 Global Step: 64130 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:13:41,469-Speed 5463.03 samples/sec Loss 7.8818 LearningRate 0.1586 Epoch: 6 Global Step: 64140 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:13:48,930-Speed 5490.89 samples/sec Loss 7.8828 LearningRate 0.1586 Epoch: 6 Global Step: 64150 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:13:56,362-Speed 5511.52 samples/sec Loss 7.8992 LearningRate 0.1585 Epoch: 6 Global Step: 64160 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:14:03,956-Speed 5394.47 samples/sec Loss 7.8316 LearningRate 0.1585 Epoch: 6 Global Step: 64170 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:14:11,444-Speed 5470.65 samples/sec Loss 7.8954 LearningRate 0.1585 Epoch: 6 Global Step: 64180 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:14:18,984-Speed 5433.83 samples/sec Loss 7.8399 LearningRate 0.1585 Epoch: 6 Global Step: 64190 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:14:26,538-Speed 5422.43 samples/sec Loss 7.8445 LearningRate 0.1585 Epoch: 6 Global Step: 64200 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:14:33,993-Speed 5494.71 samples/sec Loss 7.8848 LearningRate 0.1584 Epoch: 6 Global Step: 64210 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:14:41,434-Speed 5505.43 samples/sec Loss 7.8509 LearningRate 0.1584 Epoch: 6 Global Step: 64220 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:14:48,908-Speed 5481.30 samples/sec Loss 7.8058 LearningRate 0.1584 Epoch: 6 Global Step: 64230 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:14:56,409-Speed 5461.17 samples/sec Loss 7.8192 LearningRate 0.1584 Epoch: 6 Global Step: 64240 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:15:03,911-Speed 5460.68 samples/sec Loss 7.8525 LearningRate 0.1583 Epoch: 6 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:15:11,614-Speed 5318.11 samples/sec Loss 7.8949 LearningRate 0.1583 Epoch: 6 Global Step: 64260 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:15:19,226-Speed 5381.66 samples/sec Loss 7.8631 LearningRate 0.1583 Epoch: 6 Global Step: 64270 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:15:26,862-Speed 5365.39 samples/sec Loss 7.8742 LearningRate 0.1583 Epoch: 6 Global Step: 64280 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:15:34,431-Speed 5411.72 samples/sec Loss 7.9409 LearningRate 0.1583 Epoch: 6 Global Step: 64290 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:15:42,027-Speed 5392.97 samples/sec Loss 7.9143 LearningRate 0.1582 Epoch: 6 Global Step: 64300 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:15:49,685-Speed 5349.30 samples/sec Loss 7.9281 LearningRate 0.1582 Epoch: 6 Global Step: 64310 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:15:57,376-Speed 5326.50 samples/sec Loss 7.9257 LearningRate 0.1582 Epoch: 6 Global Step: 64320 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:16:05,136-Speed 5279.13 samples/sec Loss 7.8392 LearningRate 0.1582 Epoch: 6 Global Step: 64330 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:16:12,630-Speed 5466.43 samples/sec Loss 7.8867 LearningRate 0.1581 Epoch: 6 Global Step: 64340 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:16:20,071-Speed 5505.08 samples/sec Loss 7.8830 LearningRate 0.1581 Epoch: 6 Global Step: 64350 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:16:27,619-Speed 5427.60 samples/sec Loss 7.8057 LearningRate 0.1581 Epoch: 6 Global Step: 64360 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:16:35,162-Speed 5431.39 samples/sec Loss 7.8186 LearningRate 0.1581 Epoch: 6 Global Step: 64370 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:16:42,789-Speed 5370.66 samples/sec Loss 7.8356 LearningRate 0.1581 Epoch: 6 Global Step: 64380 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:16:50,280-Speed 5469.11 samples/sec Loss 7.8502 LearningRate 0.1580 Epoch: 6 Global Step: 64390 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:16:57,767-Speed 5471.68 samples/sec Loss 7.8872 LearningRate 0.1580 Epoch: 6 Global Step: 64400 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:17:05,277-Speed 5454.41 samples/sec Loss 7.8995 LearningRate 0.1580 Epoch: 6 Global Step: 64410 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:17:12,942-Speed 5344.51 samples/sec Loss 7.9055 LearningRate 0.1580 Epoch: 6 Global Step: 64420 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:17:20,444-Speed 5461.26 samples/sec Loss 7.8624 LearningRate 0.1579 Epoch: 6 Global Step: 64430 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:17:27,958-Speed 5451.30 samples/sec Loss 7.8994 LearningRate 0.1579 Epoch: 6 Global Step: 64440 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:17:35,519-Speed 5418.24 samples/sec Loss 7.8888 LearningRate 0.1579 Epoch: 6 Global Step: 64450 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:17:43,061-Speed 5431.54 samples/sec Loss 7.9135 LearningRate 0.1579 Epoch: 6 Global Step: 64460 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:17:50,587-Speed 5443.30 samples/sec Loss 7.8697 LearningRate 0.1579 Epoch: 6 Global Step: 64470 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:17:58,079-Speed 5468.16 samples/sec Loss 7.8913 LearningRate 0.1578 Epoch: 6 Global Step: 64480 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:18:05,545-Speed 5487.15 samples/sec Loss 7.8912 LearningRate 0.1578 Epoch: 6 Global Step: 64490 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:18:12,986-Speed 5505.41 samples/sec Loss 7.8797 LearningRate 0.1578 Epoch: 6 Global Step: 64500 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 09:18:20,534-Speed 5427.10 samples/sec Loss 7.8755 LearningRate 0.1578 Epoch: 6 Global Step: 64510 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 09:18:28,023-Speed 5470.35 samples/sec Loss 7.9428 LearningRate 0.1577 Epoch: 6 Global Step: 64520 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 09:18:35,558-Speed 5436.36 samples/sec Loss 7.9629 LearningRate 0.1577 Epoch: 6 Global Step: 64530 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 09:18:43,055-Speed 5464.22 samples/sec Loss 7.9058 LearningRate 0.1577 Epoch: 6 Global Step: 64540 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 09:18:50,620-Speed 5415.47 samples/sec Loss 7.8896 LearningRate 0.1577 Epoch: 6 Global Step: 64550 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 09:18:58,131-Speed 5453.80 samples/sec Loss 7.8678 LearningRate 0.1577 Epoch: 6 Global Step: 64560 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 09:19:05,658-Speed 5442.53 samples/sec Loss 7.8774 LearningRate 0.1576 Epoch: 6 Global Step: 64570 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 09:19:13,138-Speed 5476.76 samples/sec Loss 7.7909 LearningRate 0.1576 Epoch: 6 Global Step: 64580 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 09:19:20,638-Speed 5461.93 samples/sec Loss 7.8952 LearningRate 0.1576 Epoch: 6 Global Step: 64590 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-01-08 09:19:28,121-Speed 5474.51 samples/sec Loss 7.9196 LearningRate 0.1576 Epoch: 6 Global Step: 64600 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:19:35,757-Speed 5365.36 samples/sec Loss 7.8012 LearningRate 0.1575 Epoch: 6 Global Step: 64610 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:19:43,309-Speed 5423.83 samples/sec Loss 7.8805 LearningRate 0.1575 Epoch: 6 Global Step: 64620 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:19:50,772-Speed 5489.11 samples/sec Loss 7.9330 LearningRate 0.1575 Epoch: 6 Global Step: 64630 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:19:58,206-Speed 5510.71 samples/sec Loss 7.8776 LearningRate 0.1575 Epoch: 6 Global Step: 64640 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:20:05,704-Speed 5463.96 samples/sec Loss 7.8303 LearningRate 0.1575 Epoch: 6 Global Step: 64650 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:20:13,137-Speed 5511.25 samples/sec Loss 7.8588 LearningRate 0.1574 Epoch: 6 Global Step: 64660 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:20:20,775-Speed 5363.59 samples/sec Loss 7.8889 LearningRate 0.1574 Epoch: 6 Global Step: 64670 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:20:28,279-Speed 5459.09 samples/sec Loss 7.8312 LearningRate 0.1574 Epoch: 6 Global Step: 64680 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:20:35,846-Speed 5413.72 samples/sec Loss 7.9434 LearningRate 0.1574 Epoch: 6 Global Step: 64690 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:20:43,419-Speed 5409.53 samples/sec Loss 7.8588 LearningRate 0.1573 Epoch: 6 Global Step: 64700 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:20:50,924-Speed 5457.97 samples/sec Loss 7.8553 LearningRate 0.1573 Epoch: 6 Global Step: 64710 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:20:58,508-Speed 5401.51 samples/sec Loss 7.8471 LearningRate 0.1573 Epoch: 6 Global Step: 64720 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:21:06,167-Speed 5349.19 samples/sec Loss 7.8927 LearningRate 0.1573 Epoch: 6 Global Step: 64730 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:21:13,849-Speed 5332.49 samples/sec Loss 7.8217 LearningRate 0.1573 Epoch: 6 Global Step: 64740 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:21:21,405-Speed 5421.75 samples/sec Loss 7.8354 LearningRate 0.1572 Epoch: 6 Global Step: 64750 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:21:28,973-Speed 5412.70 samples/sec Loss 7.7464 LearningRate 0.1572 Epoch: 6 Global Step: 64760 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:21:36,548-Speed 5408.41 samples/sec Loss 7.8506 LearningRate 0.1572 Epoch: 6 Global Step: 64770 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:21:44,065-Speed 5449.69 samples/sec Loss 7.8081 LearningRate 0.1572 Epoch: 6 Global Step: 64780 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:21:51,680-Speed 5379.58 samples/sec Loss 7.8178 LearningRate 0.1572 Epoch: 6 Global Step: 64790 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:21:59,172-Speed 5468.27 samples/sec Loss 7.8927 LearningRate 0.1571 Epoch: 6 Global Step: 64800 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:22:06,692-Speed 5447.09 samples/sec Loss 7.9223 LearningRate 0.1571 Epoch: 6 Global Step: 64810 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:22:14,215-Speed 5445.59 samples/sec Loss 7.8071 LearningRate 0.1571 Epoch: 6 Global Step: 64820 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:22:21,700-Speed 5472.91 samples/sec Loss 7.8191 LearningRate 0.1571 Epoch: 6 Global Step: 64830 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:22:29,156-Speed 5494.15 samples/sec Loss 7.7915 LearningRate 0.1570 Epoch: 6 Global Step: 64840 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:22:36,687-Speed 5439.35 samples/sec Loss 7.7724 LearningRate 0.1570 Epoch: 6 Global Step: 64850 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:22:44,296-Speed 5384.11 samples/sec Loss 7.9066 LearningRate 0.1570 Epoch: 6 Global Step: 64860 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:22:51,798-Speed 5460.47 samples/sec Loss 7.8049 LearningRate 0.1570 Epoch: 6 Global Step: 64870 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:22:59,436-Speed 5363.27 samples/sec Loss 7.8313 LearningRate 0.1570 Epoch: 6 Global Step: 64880 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:23:06,947-Speed 5454.26 samples/sec Loss 7.8530 LearningRate 0.1569 Epoch: 6 Global Step: 64890 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:23:14,503-Speed 5421.76 samples/sec Loss 7.8626 LearningRate 0.1569 Epoch: 6 Global Step: 64900 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:23:21,992-Speed 5469.29 samples/sec Loss 7.8820 LearningRate 0.1569 Epoch: 6 Global Step: 64910 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:23:29,495-Speed 5460.47 samples/sec Loss 7.9108 LearningRate 0.1569 Epoch: 6 Global Step: 64920 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:23:36,915-Speed 5520.83 samples/sec Loss 7.8413 LearningRate 0.1568 Epoch: 6 Global Step: 64930 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:23:44,445-Speed 5440.53 samples/sec Loss 7.8003 LearningRate 0.1568 Epoch: 6 Global Step: 64940 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:23:51,960-Speed 5450.43 samples/sec Loss 7.8409 LearningRate 0.1568 Epoch: 6 Global Step: 64950 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:23:59,523-Speed 5416.59 samples/sec Loss 7.8362 LearningRate 0.1568 Epoch: 6 Global Step: 64960 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:24:06,999-Speed 5479.90 samples/sec Loss 7.8652 LearningRate 0.1568 Epoch: 6 Global Step: 64970 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:24:14,583-Speed 5401.37 samples/sec Loss 7.8015 LearningRate 0.1567 Epoch: 6 Global Step: 64980 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:24:22,049-Speed 5486.57 samples/sec Loss 7.8038 LearningRate 0.1567 Epoch: 6 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:24:29,663-Speed 5380.46 samples/sec Loss 7.8410 LearningRate 0.1567 Epoch: 6 Global Step: 65000 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:24:37,311-Speed 5356.37 samples/sec Loss 7.8284 LearningRate 0.1567 Epoch: 6 Global Step: 65010 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:24:44,951-Speed 5362.04 samples/sec Loss 7.8515 LearningRate 0.1566 Epoch: 6 Global Step: 65020 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:24:52,624-Speed 5338.79 samples/sec Loss 7.8236 LearningRate 0.1566 Epoch: 6 Global Step: 65030 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:25:00,128-Speed 5458.88 samples/sec Loss 7.9028 LearningRate 0.1566 Epoch: 6 Global Step: 65040 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:25:07,900-Speed 5271.66 samples/sec Loss 7.8071 LearningRate 0.1566 Epoch: 6 Global Step: 65050 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:25:15,495-Speed 5393.78 samples/sec Loss 7.7859 LearningRate 0.1566 Epoch: 6 Global Step: 65060 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:25:23,103-Speed 5383.72 samples/sec Loss 7.7810 LearningRate 0.1565 Epoch: 6 Global Step: 65070 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:25:30,693-Speed 5397.91 samples/sec Loss 7.9008 LearningRate 0.1565 Epoch: 6 Global Step: 65080 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:25:38,221-Speed 5441.61 samples/sec Loss 7.7970 LearningRate 0.1565 Epoch: 6 Global Step: 65090 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:25:45,677-Speed 5494.78 samples/sec Loss 7.7788 LearningRate 0.1565 Epoch: 6 Global Step: 65100 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:25:53,233-Speed 5421.21 samples/sec Loss 7.7873 LearningRate 0.1564 Epoch: 6 Global Step: 65110 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:26:00,749-Speed 5450.50 samples/sec Loss 7.7574 LearningRate 0.1564 Epoch: 6 Global Step: 65120 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:26:08,294-Speed 5429.58 samples/sec Loss 7.7661 LearningRate 0.1564 Epoch: 6 Global Step: 65130 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:26:15,942-Speed 5356.30 samples/sec Loss 7.7833 LearningRate 0.1564 Epoch: 6 Global Step: 65140 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:26:23,483-Speed 5431.93 samples/sec Loss 7.8060 LearningRate 0.1564 Epoch: 6 Global Step: 65150 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:26:30,924-Speed 5505.32 samples/sec Loss 7.8223 LearningRate 0.1563 Epoch: 6 Global Step: 65160 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:26:38,443-Speed 5448.90 samples/sec Loss 7.8853 LearningRate 0.1563 Epoch: 6 Global Step: 65170 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:26:45,954-Speed 5453.85 samples/sec Loss 7.8414 LearningRate 0.1563 Epoch: 6 Global Step: 65180 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:26:53,439-Speed 5472.47 samples/sec Loss 7.8692 LearningRate 0.1563 Epoch: 6 Global Step: 65190 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:27:00,915-Speed 5479.72 samples/sec Loss 7.8562 LearningRate 0.1562 Epoch: 6 Global Step: 65200 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:27:08,505-Speed 5397.38 samples/sec Loss 7.7906 LearningRate 0.1562 Epoch: 6 Global Step: 65210 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:27:16,032-Speed 5442.74 samples/sec Loss 7.8994 LearningRate 0.1562 Epoch: 6 Global Step: 65220 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:27:23,647-Speed 5379.29 samples/sec Loss 7.8528 LearningRate 0.1562 Epoch: 6 Global Step: 65230 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-01-08 09:27:31,315-Speed 5342.16 samples/sec Loss 7.7697 LearningRate 0.1562 Epoch: 6 Global Step: 65240 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:27:38,895-Speed 5404.57 samples/sec Loss 7.8052 LearningRate 0.1561 Epoch: 6 Global Step: 65250 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:27:46,458-Speed 5416.56 samples/sec Loss 7.7999 LearningRate 0.1561 Epoch: 6 Global Step: 65260 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:27:54,034-Speed 5407.06 samples/sec Loss 7.6978 LearningRate 0.1561 Epoch: 6 Global Step: 65270 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:28:01,568-Speed 5437.80 samples/sec Loss 7.8548 LearningRate 0.1561 Epoch: 6 Global Step: 65280 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:28:09,113-Speed 5429.78 samples/sec Loss 7.7998 LearningRate 0.1561 Epoch: 6 Global Step: 65290 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:28:16,712-Speed 5390.90 samples/sec Loss 7.7996 LearningRate 0.1560 Epoch: 6 Global Step: 65300 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:28:24,213-Speed 5461.61 samples/sec Loss 7.8146 LearningRate 0.1560 Epoch: 6 Global Step: 65310 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:28:31,958-Speed 5289.23 samples/sec Loss 7.8285 LearningRate 0.1560 Epoch: 6 Global Step: 65320 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:28:39,401-Speed 5503.87 samples/sec Loss 7.7754 LearningRate 0.1560 Epoch: 6 Global Step: 65330 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:28:46,996-Speed 5393.34 samples/sec Loss 7.8213 LearningRate 0.1559 Epoch: 6 Global Step: 65340 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:28:54,537-Speed 5432.58 samples/sec Loss 7.8992 LearningRate 0.1559 Epoch: 6 Global Step: 65350 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:29:02,241-Speed 5317.95 samples/sec Loss 7.8560 LearningRate 0.1559 Epoch: 6 Global Step: 65360 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:29:09,767-Speed 5443.09 samples/sec Loss 7.8503 LearningRate 0.1559 Epoch: 6 Global Step: 65370 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:29:17,354-Speed 5398.96 samples/sec Loss 7.8628 LearningRate 0.1559 Epoch: 6 Global Step: 65380 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:29:24,905-Speed 5425.19 samples/sec Loss 7.8150 LearningRate 0.1558 Epoch: 6 Global Step: 65390 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:29:32,385-Speed 5477.19 samples/sec Loss 7.7939 LearningRate 0.1558 Epoch: 6 Global Step: 65400 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:29:39,908-Speed 5445.24 samples/sec Loss 7.8071 LearningRate 0.1558 Epoch: 6 Global Step: 65410 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-01-08 09:29:47,413-Speed 5458.23 samples/sec Loss 7.8201 LearningRate 0.1558 Epoch: 6 Global Step: 65420 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:29:54,932-Speed 5448.28 samples/sec Loss 7.7877 LearningRate 0.1557 Epoch: 6 Global Step: 65430 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:30:02,413-Speed 5476.26 samples/sec Loss 7.7767 LearningRate 0.1557 Epoch: 6 Global Step: 65440 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 09:30:09,824-Speed 5527.84 samples/sec Loss 7.7595 LearningRate 0.1557 Epoch: 6 Global Step: 65450 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:30:17,284-Speed 5491.16 samples/sec Loss 7.8122 LearningRate 0.1557 Epoch: 6 Global Step: 65460 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:30:24,734-Speed 5499.04 samples/sec Loss 7.7934 LearningRate 0.1557 Epoch: 6 Global Step: 65470 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:30:32,284-Speed 5425.23 samples/sec Loss 7.8492 LearningRate 0.1556 Epoch: 6 Global Step: 65480 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:30:39,841-Speed 5421.31 samples/sec Loss 7.8227 LearningRate 0.1556 Epoch: 6 Global Step: 65490 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:30:47,392-Speed 5425.15 samples/sec Loss 7.7401 LearningRate 0.1556 Epoch: 6 Global Step: 65500 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:30:54,907-Speed 5451.24 samples/sec Loss 7.7240 LearningRate 0.1556 Epoch: 6 Global Step: 65510 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:31:02,503-Speed 5392.78 samples/sec Loss 7.8094 LearningRate 0.1555 Epoch: 6 Global Step: 65520 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:31:09,980-Speed 5479.20 samples/sec Loss 7.7503 LearningRate 0.1555 Epoch: 6 Global Step: 65530 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:31:17,512-Speed 5438.53 samples/sec Loss 7.7317 LearningRate 0.1555 Epoch: 6 Global Step: 65540 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:31:24,975-Speed 5489.16 samples/sec Loss 7.8345 LearningRate 0.1555 Epoch: 6 Global Step: 65550 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:31:32,433-Speed 5493.09 samples/sec Loss 7.7275 LearningRate 0.1555 Epoch: 6 Global Step: 65560 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:31:39,985-Speed 5424.27 samples/sec Loss 7.7964 LearningRate 0.1554 Epoch: 6 Global Step: 65570 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:31:47,492-Speed 5457.43 samples/sec Loss 7.7767 LearningRate 0.1554 Epoch: 6 Global Step: 65580 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:31:55,064-Speed 5409.65 samples/sec Loss 7.8027 LearningRate 0.1554 Epoch: 6 Global Step: 65590 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:32:02,497-Speed 5511.76 samples/sec Loss 7.8045 LearningRate 0.1554 Epoch: 6 Global Step: 65600 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:32:09,947-Speed 5498.30 samples/sec Loss 7.8082 LearningRate 0.1553 Epoch: 6 Global Step: 65610 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:32:17,426-Speed 5477.26 samples/sec Loss 7.7177 LearningRate 0.1553 Epoch: 6 Global Step: 65620 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:32:24,862-Speed 5509.44 samples/sec Loss 7.8017 LearningRate 0.1553 Epoch: 6 Global Step: 65630 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:32:32,275-Speed 5525.82 samples/sec Loss 7.8107 LearningRate 0.1553 Epoch: 6 Global Step: 65640 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:32:39,763-Speed 5470.84 samples/sec Loss 7.7096 LearningRate 0.1553 Epoch: 6 Global Step: 65650 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:32:47,237-Speed 5481.27 samples/sec Loss 7.8199 LearningRate 0.1552 Epoch: 6 Global Step: 65660 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:32:54,658-Speed 5520.56 samples/sec Loss 7.8179 LearningRate 0.1552 Epoch: 6 Global Step: 65670 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:33:02,195-Speed 5435.48 samples/sec Loss 7.7722 LearningRate 0.1552 Epoch: 6 Global Step: 65680 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:33:09,633-Speed 5506.95 samples/sec Loss 7.6942 LearningRate 0.1552 Epoch: 6 Global Step: 65690 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:33:17,199-Speed 5414.47 samples/sec Loss 7.7528 LearningRate 0.1552 Epoch: 6 Global Step: 65700 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:33:24,682-Speed 5474.81 samples/sec Loss 7.7665 LearningRate 0.1551 Epoch: 6 Global Step: 65710 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:33:32,183-Speed 5461.45 samples/sec Loss 7.7200 LearningRate 0.1551 Epoch: 6 Global Step: 65720 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:33:39,636-Speed 5496.55 samples/sec Loss 7.7211 LearningRate 0.1551 Epoch: 6 Global Step: 65730 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:33:47,075-Speed 5507.07 samples/sec Loss 7.7395 LearningRate 0.1551 Epoch: 6 Global Step: 65740 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:33:54,509-Speed 5510.52 samples/sec Loss 7.7485 LearningRate 0.1550 Epoch: 6 Global Step: 65750 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:34:02,004-Speed 5466.02 samples/sec Loss 7.7377 LearningRate 0.1550 Epoch: 6 Global Step: 65760 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:34:09,511-Speed 5456.68 samples/sec Loss 7.7735 LearningRate 0.1550 Epoch: 6 Global Step: 65770 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:34:16,958-Speed 5500.88 samples/sec Loss 7.7578 LearningRate 0.1550 Epoch: 6 Global Step: 65780 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:34:24,430-Speed 5483.21 samples/sec Loss 7.6833 LearningRate 0.1550 Epoch: 6 Global Step: 65790 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:34:32,005-Speed 5407.64 samples/sec Loss 7.7424 LearningRate 0.1549 Epoch: 6 Global Step: 65800 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:34:39,439-Speed 5510.41 samples/sec Loss 7.8149 LearningRate 0.1549 Epoch: 6 Global Step: 65810 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:34:46,961-Speed 5446.15 samples/sec Loss 7.8853 LearningRate 0.1549 Epoch: 6 Global Step: 65820 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:34:54,420-Speed 5492.60 samples/sec Loss 7.7828 LearningRate 0.1549 Epoch: 6 Global Step: 65830 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:35:01,883-Speed 5489.08 samples/sec Loss 7.7614 LearningRate 0.1548 Epoch: 6 Global Step: 65840 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:35:09,377-Speed 5466.24 samples/sec Loss 7.8589 LearningRate 0.1548 Epoch: 6 Global Step: 65850 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:35:17,137-Speed 5279.56 samples/sec Loss 7.7755 LearningRate 0.1548 Epoch: 6 Global Step: 65860 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:35:24,734-Speed 5391.98 samples/sec Loss 7.8055 LearningRate 0.1548 Epoch: 6 Global Step: 65870 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:35:32,349-Speed 5379.80 samples/sec Loss 7.7866 LearningRate 0.1548 Epoch: 6 Global Step: 65880 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:35:39,871-Speed 5446.36 samples/sec Loss 7.8506 LearningRate 0.1547 Epoch: 6 Global Step: 65890 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:35:47,459-Speed 5398.78 samples/sec Loss 7.7309 LearningRate 0.1547 Epoch: 6 Global Step: 65900 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:35:55,044-Speed 5400.83 samples/sec Loss 7.7185 LearningRate 0.1547 Epoch: 6 Global Step: 65910 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:36:02,628-Speed 5401.60 samples/sec Loss 7.7136 LearningRate 0.1547 Epoch: 6 Global Step: 65920 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:36:10,190-Speed 5416.98 samples/sec Loss 7.7691 LearningRate 0.1546 Epoch: 6 Global Step: 65930 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:36:17,838-Speed 5356.35 samples/sec Loss 7.7045 LearningRate 0.1546 Epoch: 6 Global Step: 65940 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:36:25,429-Speed 5396.54 samples/sec Loss 7.8010 LearningRate 0.1546 Epoch: 6 Global Step: 65950 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:36:33,224-Speed 5255.81 samples/sec Loss 7.7581 LearningRate 0.1546 Epoch: 6 Global Step: 65960 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:36:40,744-Speed 5446.80 samples/sec Loss 7.8232 LearningRate 0.1546 Epoch: 6 Global Step: 65970 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:36:48,276-Speed 5439.17 samples/sec Loss 7.7529 LearningRate 0.1545 Epoch: 6 Global Step: 65980 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:36:55,744-Speed 5486.06 samples/sec Loss 7.8048 LearningRate 0.1545 Epoch: 6 Global Step: 65990 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:37:03,261-Speed 5449.57 samples/sec Loss 7.7845 LearningRate 0.1545 Epoch: 6 Global Step: 66000 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:38:01,667-[lfw][66000]XNorm: 23.302244 Training: 2022-01-08 09:38:01,667-[lfw][66000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-08 09:38:01,668-[lfw][66000]Accuracy-Highest: 0.99817 Training: 2022-01-08 09:39:07,024-[cfp_fp][66000]XNorm: 21.406650 Training: 2022-01-08 09:39:07,025-[cfp_fp][66000]Accuracy-Flip: 0.98229+-0.00434 Training: 2022-01-08 09:39:07,026-[cfp_fp][66000]Accuracy-Highest: 0.98600 Training: 2022-01-08 09:39:52,822-[agedb_30][66000]XNorm: 23.238662 Training: 2022-01-08 09:39:52,823-[agedb_30][66000]Accuracy-Flip: 0.97367+-0.00741 Training: 2022-01-08 09:39:52,824-[agedb_30][66000]Accuracy-Highest: 0.97667 Training: 2022-01-08 09:40:00,427-Speed 231.20 samples/sec Loss 7.7415 LearningRate 0.1545 Epoch: 6 Global Step: 66010 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:40:08,034-Speed 5385.66 samples/sec Loss 7.8329 LearningRate 0.1545 Epoch: 6 Global Step: 66020 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:40:15,569-Speed 5437.81 samples/sec Loss 7.7801 LearningRate 0.1544 Epoch: 6 Global Step: 66030 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:40:23,173-Speed 5387.87 samples/sec Loss 7.7460 LearningRate 0.1544 Epoch: 6 Global Step: 66040 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:40:30,684-Speed 5454.69 samples/sec Loss 7.6985 LearningRate 0.1544 Epoch: 6 Global Step: 66050 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:40:38,362-Speed 5335.42 samples/sec Loss 7.7827 LearningRate 0.1544 Epoch: 6 Global Step: 66060 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:40:45,909-Speed 5428.66 samples/sec Loss 7.7371 LearningRate 0.1543 Epoch: 6 Global Step: 66070 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:40:53,581-Speed 5339.27 samples/sec Loss 7.7144 LearningRate 0.1543 Epoch: 6 Global Step: 66080 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:41:01,055-Speed 5481.46 samples/sec Loss 7.6908 LearningRate 0.1543 Epoch: 6 Global Step: 66090 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:41:08,531-Speed 5479.07 samples/sec Loss 7.7230 LearningRate 0.1543 Epoch: 6 Global Step: 66100 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:41:15,980-Speed 5499.42 samples/sec Loss 7.7602 LearningRate 0.1543 Epoch: 6 Global Step: 66110 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:41:23,487-Speed 5457.12 samples/sec Loss 7.8028 LearningRate 0.1542 Epoch: 6 Global Step: 66120 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:41:30,999-Speed 5453.70 samples/sec Loss 7.7764 LearningRate 0.1542 Epoch: 6 Global Step: 66130 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:41:38,508-Speed 5455.03 samples/sec Loss 7.7725 LearningRate 0.1542 Epoch: 6 Global Step: 66140 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:41:45,930-Speed 5519.42 samples/sec Loss 7.7913 LearningRate 0.1542 Epoch: 6 Global Step: 66150 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:41:53,362-Speed 5512.05 samples/sec Loss 7.7476 LearningRate 0.1541 Epoch: 6 Global Step: 66160 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:42:00,777-Speed 5524.81 samples/sec Loss 7.7600 LearningRate 0.1541 Epoch: 6 Global Step: 66170 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:42:08,296-Speed 5448.19 samples/sec Loss 7.7138 LearningRate 0.1541 Epoch: 6 Global Step: 66180 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:42:15,807-Speed 5453.77 samples/sec Loss 7.7142 LearningRate 0.1541 Epoch: 6 Global Step: 66190 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:42:23,278-Speed 5483.87 samples/sec Loss 7.7695 LearningRate 0.1541 Epoch: 6 Global Step: 66200 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:42:30,826-Speed 5427.11 samples/sec Loss 7.7436 LearningRate 0.1540 Epoch: 6 Global Step: 66210 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:42:38,329-Speed 5459.76 samples/sec Loss 7.7536 LearningRate 0.1540 Epoch: 6 Global Step: 66220 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:42:45,827-Speed 5463.30 samples/sec Loss 7.7636 LearningRate 0.1540 Epoch: 6 Global Step: 66230 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:42:53,339-Speed 5453.68 samples/sec Loss 7.7963 LearningRate 0.1540 Epoch: 6 Global Step: 66240 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:43:00,918-Speed 5405.23 samples/sec Loss 7.7694 LearningRate 0.1539 Epoch: 6 Global Step: 66250 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:43:08,498-Speed 5404.56 samples/sec Loss 7.7561 LearningRate 0.1539 Epoch: 6 Global Step: 66260 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:43:16,173-Speed 5336.89 samples/sec Loss 7.7158 LearningRate 0.1539 Epoch: 6 Global Step: 66270 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:43:23,794-Speed 5376.01 samples/sec Loss 7.7390 LearningRate 0.1539 Epoch: 6 Global Step: 66280 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:43:31,368-Speed 5408.48 samples/sec Loss 7.7708 LearningRate 0.1539 Epoch: 6 Global Step: 66290 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:43:38,927-Speed 5419.15 samples/sec Loss 7.7382 LearningRate 0.1538 Epoch: 6 Global Step: 66300 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:43:46,496-Speed 5412.29 samples/sec Loss 7.7765 LearningRate 0.1538 Epoch: 6 Global Step: 66310 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:43:54,074-Speed 5405.82 samples/sec Loss 7.7676 LearningRate 0.1538 Epoch: 6 Global Step: 66320 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:44:01,590-Speed 5450.56 samples/sec Loss 7.7704 LearningRate 0.1538 Epoch: 6 Global Step: 66330 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:44:09,146-Speed 5421.17 samples/sec Loss 7.7482 LearningRate 0.1538 Epoch: 6 Global Step: 66340 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:44:16,610-Speed 5489.09 samples/sec Loss 7.7471 LearningRate 0.1537 Epoch: 6 Global Step: 66350 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:44:24,181-Speed 5410.52 samples/sec Loss 7.7693 LearningRate 0.1537 Epoch: 6 Global Step: 66360 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:44:31,784-Speed 5388.42 samples/sec Loss 7.7640 LearningRate 0.1537 Epoch: 6 Global Step: 66370 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:44:39,468-Speed 5331.42 samples/sec Loss 7.7963 LearningRate 0.1537 Epoch: 6 Global Step: 66380 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:44:46,949-Speed 5475.39 samples/sec Loss 7.7047 LearningRate 0.1536 Epoch: 6 Global Step: 66390 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:44:54,545-Speed 5393.39 samples/sec Loss 7.7182 LearningRate 0.1536 Epoch: 6 Global Step: 66400 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:45:02,112-Speed 5414.18 samples/sec Loss 7.8166 LearningRate 0.1536 Epoch: 6 Global Step: 66410 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:45:09,646-Speed 5437.33 samples/sec Loss 7.7345 LearningRate 0.1536 Epoch: 6 Global Step: 66420 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:45:17,222-Speed 5406.93 samples/sec Loss 7.7226 LearningRate 0.1536 Epoch: 6 Global Step: 66430 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:45:24,723-Speed 5461.85 samples/sec Loss 7.7192 LearningRate 0.1535 Epoch: 6 Global Step: 66440 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:45:32,206-Speed 5474.34 samples/sec Loss 7.7093 LearningRate 0.1535 Epoch: 6 Global Step: 66450 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:45:39,639-Speed 5510.93 samples/sec Loss 7.7317 LearningRate 0.1535 Epoch: 6 Global Step: 66460 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:45:47,132-Speed 5467.28 samples/sec Loss 7.7041 LearningRate 0.1535 Epoch: 6 Global Step: 66470 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:45:54,593-Speed 5490.79 samples/sec Loss 7.6654 LearningRate 0.1534 Epoch: 6 Global Step: 66480 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 09:46:02,035-Speed 5504.90 samples/sec Loss 7.6917 LearningRate 0.1534 Epoch: 6 Global Step: 66490 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:46:09,551-Speed 5450.38 samples/sec Loss 7.7446 LearningRate 0.1534 Epoch: 6 Global Step: 66500 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:46:17,112-Speed 5418.14 samples/sec Loss 7.7107 LearningRate 0.1534 Epoch: 6 Global Step: 66510 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:46:24,674-Speed 5417.15 samples/sec Loss 7.7326 LearningRate 0.1534 Epoch: 6 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:46:32,174-Speed 5462.28 samples/sec Loss 7.7901 LearningRate 0.1533 Epoch: 6 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:46:39,760-Speed 5399.91 samples/sec Loss 7.6926 LearningRate 0.1533 Epoch: 6 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:46:47,224-Speed 5489.12 samples/sec Loss 7.6364 LearningRate 0.1533 Epoch: 6 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:46:54,789-Speed 5414.75 samples/sec Loss 7.7513 LearningRate 0.1533 Epoch: 6 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:47:02,344-Speed 5422.30 samples/sec Loss 7.7716 LearningRate 0.1533 Epoch: 6 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:47:09,825-Speed 5476.31 samples/sec Loss 7.7221 LearningRate 0.1532 Epoch: 6 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:47:17,373-Speed 5426.90 samples/sec Loss 7.7091 LearningRate 0.1532 Epoch: 6 Global Step: 66590 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:47:24,942-Speed 5411.99 samples/sec Loss 7.7799 LearningRate 0.1532 Epoch: 6 Global Step: 66600 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:47:32,477-Speed 5436.87 samples/sec Loss 7.6920 LearningRate 0.1532 Epoch: 6 Global Step: 66610 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:47:40,033-Speed 5421.82 samples/sec Loss 7.7584 LearningRate 0.1531 Epoch: 6 Global Step: 66620 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:47:47,582-Speed 5426.42 samples/sec Loss 7.7467 LearningRate 0.1531 Epoch: 6 Global Step: 66630 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:47:55,230-Speed 5356.24 samples/sec Loss 7.7325 LearningRate 0.1531 Epoch: 6 Global Step: 66640 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:48:02,809-Speed 5405.20 samples/sec Loss 7.7073 LearningRate 0.1531 Epoch: 6 Global Step: 66650 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:48:10,359-Speed 5425.80 samples/sec Loss 7.7299 LearningRate 0.1531 Epoch: 6 Global Step: 66660 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:48:17,937-Speed 5405.41 samples/sec Loss 7.7219 LearningRate 0.1530 Epoch: 6 Global Step: 66670 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:48:25,397-Speed 5491.39 samples/sec Loss 7.7144 LearningRate 0.1530 Epoch: 6 Global Step: 66680 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:48:32,970-Speed 5409.82 samples/sec Loss 7.7393 LearningRate 0.1530 Epoch: 6 Global Step: 66690 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:48:40,546-Speed 5407.35 samples/sec Loss 7.7509 LearningRate 0.1530 Epoch: 6 Global Step: 66700 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:48:48,071-Speed 5444.03 samples/sec Loss 7.6848 LearningRate 0.1529 Epoch: 6 Global Step: 66710 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 09:48:55,759-Speed 5328.51 samples/sec Loss 7.7250 LearningRate 0.1529 Epoch: 6 Global Step: 66720 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:49:03,409-Speed 5354.47 samples/sec Loss 7.7423 LearningRate 0.1529 Epoch: 6 Global Step: 66730 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:49:10,967-Speed 5420.81 samples/sec Loss 7.7471 LearningRate 0.1529 Epoch: 6 Global Step: 66740 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:49:18,455-Speed 5470.77 samples/sec Loss 7.7834 LearningRate 0.1529 Epoch: 6 Global Step: 66750 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:49:25,939-Speed 5473.37 samples/sec Loss 7.7515 LearningRate 0.1528 Epoch: 6 Global Step: 66760 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:49:33,522-Speed 5402.35 samples/sec Loss 7.7381 LearningRate 0.1528 Epoch: 6 Global Step: 66770 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:49:41,056-Speed 5437.47 samples/sec Loss 7.7163 LearningRate 0.1528 Epoch: 6 Global Step: 66780 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:49:48,773-Speed 5308.29 samples/sec Loss 7.7829 LearningRate 0.1528 Epoch: 6 Global Step: 66790 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:49:56,315-Speed 5431.75 samples/sec Loss 7.7619 LearningRate 0.1528 Epoch: 6 Global Step: 66800 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:50:03,829-Speed 5451.97 samples/sec Loss 7.6746 LearningRate 0.1527 Epoch: 6 Global Step: 66810 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:50:11,355-Speed 5443.48 samples/sec Loss 7.7468 LearningRate 0.1527 Epoch: 6 Global Step: 66820 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:50:18,902-Speed 5427.58 samples/sec Loss 7.7341 LearningRate 0.1527 Epoch: 6 Global Step: 66830 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:50:26,509-Speed 5385.00 samples/sec Loss 7.7033 LearningRate 0.1527 Epoch: 6 Global Step: 66840 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:50:34,088-Speed 5405.84 samples/sec Loss 7.6481 LearningRate 0.1526 Epoch: 6 Global Step: 66850 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:50:41,810-Speed 5304.74 samples/sec Loss 7.6913 LearningRate 0.1526 Epoch: 6 Global Step: 66860 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:50:49,292-Speed 5475.26 samples/sec Loss 7.7153 LearningRate 0.1526 Epoch: 6 Global Step: 66870 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:50:56,814-Speed 5445.95 samples/sec Loss 7.7096 LearningRate 0.1526 Epoch: 6 Global Step: 66880 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:51:04,314-Speed 5461.72 samples/sec Loss 7.6957 LearningRate 0.1526 Epoch: 6 Global Step: 66890 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:51:11,848-Speed 5437.81 samples/sec Loss 7.7234 LearningRate 0.1525 Epoch: 6 Global Step: 66900 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:51:19,350-Speed 5460.02 samples/sec Loss 7.7147 LearningRate 0.1525 Epoch: 6 Global Step: 66910 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:51:26,762-Speed 5526.85 samples/sec Loss 7.7084 LearningRate 0.1525 Epoch: 6 Global Step: 66920 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:51:34,337-Speed 5408.57 samples/sec Loss 7.7073 LearningRate 0.1525 Epoch: 6 Global Step: 66930 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:51:41,832-Speed 5465.48 samples/sec Loss 7.7416 LearningRate 0.1524 Epoch: 6 Global Step: 66940 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:51:49,271-Speed 5506.30 samples/sec Loss 7.5818 LearningRate 0.1524 Epoch: 6 Global Step: 66950 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:51:56,751-Speed 5477.25 samples/sec Loss 7.7247 LearningRate 0.1524 Epoch: 6 Global Step: 66960 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:52:04,278-Speed 5442.25 samples/sec Loss 7.6513 LearningRate 0.1524 Epoch: 6 Global Step: 66970 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:52:11,810-Speed 5439.04 samples/sec Loss 7.7795 LearningRate 0.1524 Epoch: 6 Global Step: 66980 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:52:19,350-Speed 5432.37 samples/sec Loss 7.7025 LearningRate 0.1523 Epoch: 6 Global Step: 66990 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:52:26,950-Speed 5390.71 samples/sec Loss 7.7415 LearningRate 0.1523 Epoch: 6 Global Step: 67000 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:52:34,512-Speed 5416.92 samples/sec Loss 7.7245 LearningRate 0.1523 Epoch: 6 Global Step: 67010 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:52:42,058-Speed 5428.88 samples/sec Loss 7.8421 LearningRate 0.1523 Epoch: 6 Global Step: 67020 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:52:49,514-Speed 5493.86 samples/sec Loss 7.6670 LearningRate 0.1523 Epoch: 6 Global Step: 67030 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:52:57,051-Speed 5435.77 samples/sec Loss 7.7119 LearningRate 0.1522 Epoch: 6 Global Step: 67040 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:53:04,675-Speed 5373.13 samples/sec Loss 7.7313 LearningRate 0.1522 Epoch: 6 Global Step: 67050 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:53:12,281-Speed 5385.94 samples/sec Loss 7.6921 LearningRate 0.1522 Epoch: 6 Global Step: 67060 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:53:19,775-Speed 5465.98 samples/sec Loss 7.7245 LearningRate 0.1522 Epoch: 6 Global Step: 67070 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:53:27,230-Speed 5495.45 samples/sec Loss 7.7332 LearningRate 0.1521 Epoch: 6 Global Step: 67080 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:53:34,791-Speed 5417.82 samples/sec Loss 7.7118 LearningRate 0.1521 Epoch: 6 Global Step: 67090 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:53:42,298-Speed 5457.31 samples/sec Loss 7.7158 LearningRate 0.1521 Epoch: 6 Global Step: 67100 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:53:49,930-Speed 5367.51 samples/sec Loss 7.6841 LearningRate 0.1521 Epoch: 6 Global Step: 67110 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:53:57,537-Speed 5385.41 samples/sec Loss 7.7001 LearningRate 0.1521 Epoch: 6 Global Step: 67120 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:54:05,090-Speed 5423.63 samples/sec Loss 7.6912 LearningRate 0.1520 Epoch: 6 Global Step: 67130 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:54:12,527-Speed 5508.89 samples/sec Loss 7.6489 LearningRate 0.1520 Epoch: 6 Global Step: 67140 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:54:20,005-Speed 5477.69 samples/sec Loss 7.7301 LearningRate 0.1520 Epoch: 6 Global Step: 67150 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:54:27,476-Speed 5483.50 samples/sec Loss 7.7162 LearningRate 0.1520 Epoch: 6 Global Step: 67160 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:54:34,956-Speed 5476.43 samples/sec Loss 7.7414 LearningRate 0.1519 Epoch: 6 Global Step: 67170 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:54:42,527-Speed 5410.68 samples/sec Loss 7.6942 LearningRate 0.1519 Epoch: 6 Global Step: 67180 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:54:50,308-Speed 5264.72 samples/sec Loss 7.6907 LearningRate 0.1519 Epoch: 6 Global Step: 67190 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:54:57,802-Speed 5466.92 samples/sec Loss 7.7393 LearningRate 0.1519 Epoch: 6 Global Step: 67200 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:55:05,367-Speed 5414.98 samples/sec Loss 7.7025 LearningRate 0.1519 Epoch: 6 Global Step: 67210 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:55:12,908-Speed 5432.73 samples/sec Loss 7.7412 LearningRate 0.1518 Epoch: 6 Global Step: 67220 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:55:20,460-Speed 5424.44 samples/sec Loss 7.7062 LearningRate 0.1518 Epoch: 6 Global Step: 67230 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:55:27,950-Speed 5469.32 samples/sec Loss 7.6522 LearningRate 0.1518 Epoch: 6 Global Step: 67240 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:55:35,541-Speed 5396.37 samples/sec Loss 7.6836 LearningRate 0.1518 Epoch: 6 Global Step: 67250 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:55:43,272-Speed 5298.98 samples/sec Loss 7.6784 LearningRate 0.1518 Epoch: 6 Global Step: 67260 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:55:50,770-Speed 5463.89 samples/sec Loss 7.7201 LearningRate 0.1517 Epoch: 6 Global Step: 67270 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:55:58,271-Speed 5461.01 samples/sec Loss 7.7329 LearningRate 0.1517 Epoch: 6 Global Step: 67280 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:56:05,826-Speed 5422.70 samples/sec Loss 7.7358 LearningRate 0.1517 Epoch: 6 Global Step: 67290 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:56:13,417-Speed 5396.01 samples/sec Loss 7.7358 LearningRate 0.1517 Epoch: 6 Global Step: 67300 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:56:21,010-Speed 5395.59 samples/sec Loss 7.7452 LearningRate 0.1516 Epoch: 6 Global Step: 67310 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:56:28,527-Speed 5449.97 samples/sec Loss 7.7421 LearningRate 0.1516 Epoch: 6 Global Step: 67320 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:56:35,985-Speed 5492.57 samples/sec Loss 7.7223 LearningRate 0.1516 Epoch: 6 Global Step: 67330 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:56:43,485-Speed 5462.68 samples/sec Loss 7.6236 LearningRate 0.1516 Epoch: 6 Global Step: 67340 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:56:51,065-Speed 5404.02 samples/sec Loss 7.7511 LearningRate 0.1516 Epoch: 6 Global Step: 67350 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:56:58,503-Speed 5507.51 samples/sec Loss 7.7388 LearningRate 0.1515 Epoch: 6 Global Step: 67360 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:57:05,957-Speed 5495.69 samples/sec Loss 7.6900 LearningRate 0.1515 Epoch: 6 Global Step: 67370 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:57:13,485-Speed 5442.65 samples/sec Loss 7.6767 LearningRate 0.1515 Epoch: 6 Global Step: 67380 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:57:21,243-Speed 5279.86 samples/sec Loss 7.6359 LearningRate 0.1515 Epoch: 6 Global Step: 67390 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:57:29,014-Speed 5271.84 samples/sec Loss 7.6627 LearningRate 0.1515 Epoch: 6 Global Step: 67400 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:57:36,586-Speed 5410.29 samples/sec Loss 7.6691 LearningRate 0.1514 Epoch: 6 Global Step: 67410 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:57:44,097-Speed 5454.81 samples/sec Loss 7.6915 LearningRate 0.1514 Epoch: 6 Global Step: 67420 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:57:51,631-Speed 5437.02 samples/sec Loss 7.6423 LearningRate 0.1514 Epoch: 6 Global Step: 67430 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:57:59,153-Speed 5445.68 samples/sec Loss 7.7244 LearningRate 0.1514 Epoch: 6 Global Step: 67440 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:58:06,677-Speed 5444.71 samples/sec Loss 7.7379 LearningRate 0.1513 Epoch: 6 Global Step: 67450 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:58:14,147-Speed 5484.28 samples/sec Loss 7.7236 LearningRate 0.1513 Epoch: 6 Global Step: 67460 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:58:21,603-Speed 5494.01 samples/sec Loss 7.6912 LearningRate 0.1513 Epoch: 6 Global Step: 67470 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:58:29,153-Speed 5426.19 samples/sec Loss 7.6434 LearningRate 0.1513 Epoch: 6 Global Step: 67480 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:58:36,695-Speed 5431.45 samples/sec Loss 7.6499 LearningRate 0.1513 Epoch: 6 Global Step: 67490 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:58:44,127-Speed 5512.47 samples/sec Loss 7.6599 LearningRate 0.1512 Epoch: 6 Global Step: 67500 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:58:51,635-Speed 5456.42 samples/sec Loss 7.6490 LearningRate 0.1512 Epoch: 6 Global Step: 67510 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:58:59,319-Speed 5330.73 samples/sec Loss 7.6867 LearningRate 0.1512 Epoch: 6 Global Step: 67520 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:59:06,819-Speed 5462.29 samples/sec Loss 7.7161 LearningRate 0.1512 Epoch: 6 Global Step: 67530 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 09:59:14,312-Speed 5467.21 samples/sec Loss 7.6868 LearningRate 0.1511 Epoch: 6 Global Step: 67540 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:59:21,788-Speed 5479.37 samples/sec Loss 7.7082 LearningRate 0.1511 Epoch: 6 Global Step: 67550 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:59:29,320-Speed 5439.40 samples/sec Loss 7.6989 LearningRate 0.1511 Epoch: 6 Global Step: 67560 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:59:36,779-Speed 5491.86 samples/sec Loss 7.6838 LearningRate 0.1511 Epoch: 6 Global Step: 67570 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:59:44,365-Speed 5399.92 samples/sec Loss 7.6439 LearningRate 0.1511 Epoch: 6 Global Step: 67580 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:59:51,799-Speed 5510.61 samples/sec Loss 7.6800 LearningRate 0.1510 Epoch: 6 Global Step: 67590 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 09:59:59,347-Speed 5427.42 samples/sec Loss 7.7289 LearningRate 0.1510 Epoch: 6 Global Step: 67600 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:00:06,931-Speed 5401.21 samples/sec Loss 7.6702 LearningRate 0.1510 Epoch: 6 Global Step: 67610 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:00:14,398-Speed 5486.38 samples/sec Loss 7.6241 LearningRate 0.1510 Epoch: 6 Global Step: 67620 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:00:21,983-Speed 5400.84 samples/sec Loss 7.6175 LearningRate 0.1510 Epoch: 6 Global Step: 67630 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:00:29,457-Speed 5481.04 samples/sec Loss 7.6389 LearningRate 0.1509 Epoch: 6 Global Step: 67640 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:00:36,978-Speed 5446.97 samples/sec Loss 7.5974 LearningRate 0.1509 Epoch: 6 Global Step: 67650 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:00:44,442-Speed 5488.78 samples/sec Loss 7.6121 LearningRate 0.1509 Epoch: 6 Global Step: 67660 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:00:52,100-Speed 5349.21 samples/sec Loss 7.6615 LearningRate 0.1509 Epoch: 6 Global Step: 67670 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:00:59,614-Speed 5451.98 samples/sec Loss 7.6419 LearningRate 0.1508 Epoch: 6 Global Step: 67680 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:01:07,068-Speed 5495.10 samples/sec Loss 7.7262 LearningRate 0.1508 Epoch: 6 Global Step: 67690 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:01:14,528-Speed 5491.36 samples/sec Loss 7.6796 LearningRate 0.1508 Epoch: 6 Global Step: 67700 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:01:22,063-Speed 5437.61 samples/sec Loss 7.6778 LearningRate 0.1508 Epoch: 6 Global Step: 67710 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:01:29,704-Speed 5360.73 samples/sec Loss 7.6423 LearningRate 0.1508 Epoch: 6 Global Step: 67720 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:01:37,287-Speed 5401.84 samples/sec Loss 7.7165 LearningRate 0.1507 Epoch: 6 Global Step: 67730 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:01:44,850-Speed 5416.77 samples/sec Loss 7.6535 LearningRate 0.1507 Epoch: 6 Global Step: 67740 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:01:52,386-Speed 5436.32 samples/sec Loss 7.7501 LearningRate 0.1507 Epoch: 6 Global Step: 67750 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:01:59,921-Speed 5436.37 samples/sec Loss 7.6729 LearningRate 0.1507 Epoch: 6 Global Step: 67760 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:02:07,437-Speed 5450.63 samples/sec Loss 7.6073 LearningRate 0.1507 Epoch: 6 Global Step: 67770 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:02:14,981-Speed 5429.44 samples/sec Loss 7.6233 LearningRate 0.1506 Epoch: 6 Global Step: 67780 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:02:22,481-Speed 5462.97 samples/sec Loss 7.6233 LearningRate 0.1506 Epoch: 6 Global Step: 67790 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:02:29,997-Speed 5450.47 samples/sec Loss 7.7003 LearningRate 0.1506 Epoch: 6 Global Step: 67800 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:02:37,586-Speed 5397.08 samples/sec Loss 7.7044 LearningRate 0.1506 Epoch: 6 Global Step: 67810 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:02:45,074-Speed 5471.37 samples/sec Loss 7.7337 LearningRate 0.1505 Epoch: 6 Global Step: 67820 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:02:52,757-Speed 5331.92 samples/sec Loss 7.6861 LearningRate 0.1505 Epoch: 6 Global Step: 67830 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:03:00,381-Speed 5373.68 samples/sec Loss 7.6756 LearningRate 0.1505 Epoch: 6 Global Step: 67840 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:03:08,149-Speed 5273.36 samples/sec Loss 7.6642 LearningRate 0.1505 Epoch: 6 Global Step: 67850 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:03:15,769-Speed 5375.92 samples/sec Loss 7.6592 LearningRate 0.1505 Epoch: 6 Global Step: 67860 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:03:23,387-Speed 5377.44 samples/sec Loss 7.6856 LearningRate 0.1504 Epoch: 6 Global Step: 67870 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:03:31,082-Speed 5323.59 samples/sec Loss 7.6354 LearningRate 0.1504 Epoch: 6 Global Step: 67880 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:03:38,703-Speed 5375.78 samples/sec Loss 7.6888 LearningRate 0.1504 Epoch: 6 Global Step: 67890 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:03:46,191-Speed 5470.62 samples/sec Loss 7.6533 LearningRate 0.1504 Epoch: 6 Global Step: 67900 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:03:53,651-Speed 5491.47 samples/sec Loss 7.6330 LearningRate 0.1503 Epoch: 6 Global Step: 67910 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:04:01,186-Speed 5437.24 samples/sec Loss 7.6387 LearningRate 0.1503 Epoch: 6 Global Step: 67920 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:04:08,677-Speed 5468.04 samples/sec Loss 7.6395 LearningRate 0.1503 Epoch: 6 Global Step: 67930 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:04:16,195-Speed 5448.87 samples/sec Loss 7.7330 LearningRate 0.1503 Epoch: 6 Global Step: 67940 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:04:23,803-Speed 5384.88 samples/sec Loss 7.6588 LearningRate 0.1503 Epoch: 6 Global Step: 67950 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:04:31,372-Speed 5412.76 samples/sec Loss 7.6920 LearningRate 0.1502 Epoch: 6 Global Step: 67960 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:04:38,844-Speed 5481.77 samples/sec Loss 7.6785 LearningRate 0.1502 Epoch: 6 Global Step: 67970 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:04:46,408-Speed 5416.32 samples/sec Loss 7.6326 LearningRate 0.1502 Epoch: 6 Global Step: 67980 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:04:53,907-Speed 5464.43 samples/sec Loss 7.6793 LearningRate 0.1502 Epoch: 6 Global Step: 67990 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:05:01,435-Speed 5442.22 samples/sec Loss 7.6874 LearningRate 0.1502 Epoch: 6 Global Step: 68000 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:05:45,468-[lfw][68000]XNorm: 24.109080 Training: 2022-01-08 10:05:45,468-[lfw][68000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-01-08 10:05:45,469-[lfw][68000]Accuracy-Highest: 0.99817 Training: 2022-01-08 10:06:37,483-[cfp_fp][68000]XNorm: 22.016478 Training: 2022-01-08 10:06:37,484-[cfp_fp][68000]Accuracy-Flip: 0.98571+-0.00515 Training: 2022-01-08 10:06:37,485-[cfp_fp][68000]Accuracy-Highest: 0.98600 Training: 2022-01-08 10:07:24,082-[agedb_30][68000]XNorm: 24.030862 Training: 2022-01-08 10:07:24,084-[agedb_30][68000]Accuracy-Flip: 0.97500+-0.00632 Training: 2022-01-08 10:07:24,084-[agedb_30][68000]Accuracy-Highest: 0.97667 Training: 2022-01-08 10:07:31,600-Speed 272.77 samples/sec Loss 7.6134 LearningRate 0.1501 Epoch: 6 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:07:39,147-Speed 5429.29 samples/sec Loss 7.6032 LearningRate 0.1501 Epoch: 6 Global Step: 68020 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:07:46,753-Speed 5386.03 samples/sec Loss 7.7205 LearningRate 0.1501 Epoch: 6 Global Step: 68030 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:07:54,403-Speed 5356.38 samples/sec Loss 7.6617 LearningRate 0.1501 Epoch: 6 Global Step: 68040 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:08:02,051-Speed 5356.73 samples/sec Loss 7.6328 LearningRate 0.1500 Epoch: 6 Global Step: 68050 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:08:09,615-Speed 5415.97 samples/sec Loss 7.6074 LearningRate 0.1500 Epoch: 6 Global Step: 68060 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:08:17,069-Speed 5496.02 samples/sec Loss 7.6357 LearningRate 0.1500 Epoch: 6 Global Step: 68070 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:08:24,644-Speed 5407.55 samples/sec Loss 7.6218 LearningRate 0.1500 Epoch: 6 Global Step: 68080 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:08:32,233-Speed 5397.99 samples/sec Loss 7.6123 LearningRate 0.1500 Epoch: 6 Global Step: 68090 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:08:39,787-Speed 5423.24 samples/sec Loss 7.5991 LearningRate 0.1499 Epoch: 6 Global Step: 68100 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:08:47,353-Speed 5415.01 samples/sec Loss 7.6384 LearningRate 0.1499 Epoch: 6 Global Step: 68110 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:08:54,968-Speed 5379.38 samples/sec Loss 7.6592 LearningRate 0.1499 Epoch: 6 Global Step: 68120 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 10:09:02,447-Speed 5477.33 samples/sec Loss 7.7046 LearningRate 0.1499 Epoch: 6 Global Step: 68130 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:09:09,946-Speed 5462.53 samples/sec Loss 7.6019 LearningRate 0.1499 Epoch: 6 Global Step: 68140 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:09:17,577-Speed 5368.38 samples/sec Loss 7.6221 LearningRate 0.1498 Epoch: 6 Global Step: 68150 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:09:25,107-Speed 5440.09 samples/sec Loss 7.6643 LearningRate 0.1498 Epoch: 6 Global Step: 68160 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:09:32,715-Speed 5384.36 samples/sec Loss 7.6352 LearningRate 0.1498 Epoch: 6 Global Step: 68170 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:09:40,195-Speed 5476.81 samples/sec Loss 7.6777 LearningRate 0.1498 Epoch: 6 Global Step: 68180 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:09:47,815-Speed 5376.22 samples/sec Loss 7.6524 LearningRate 0.1497 Epoch: 6 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:09:55,409-Speed 5394.43 samples/sec Loss 7.6551 LearningRate 0.1497 Epoch: 6 Global Step: 68200 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:10:02,948-Speed 5433.56 samples/sec Loss 7.6794 LearningRate 0.1497 Epoch: 6 Global Step: 68210 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:10:10,547-Speed 5390.86 samples/sec Loss 7.6745 LearningRate 0.1497 Epoch: 6 Global Step: 68220 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:10:18,016-Speed 5485.23 samples/sec Loss 7.6507 LearningRate 0.1497 Epoch: 6 Global Step: 68230 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:10:25,445-Speed 5514.09 samples/sec Loss 7.6335 LearningRate 0.1496 Epoch: 6 Global Step: 68240 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:10:32,979-Speed 5436.85 samples/sec Loss 7.6264 LearningRate 0.1496 Epoch: 6 Global Step: 68250 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:10:40,436-Speed 5493.74 samples/sec Loss 7.6547 LearningRate 0.1496 Epoch: 6 Global Step: 68260 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:10:48,130-Speed 5324.47 samples/sec Loss 7.6174 LearningRate 0.1496 Epoch: 6 Global Step: 68270 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:10:55,690-Speed 5418.79 samples/sec Loss 7.6054 LearningRate 0.1496 Epoch: 6 Global Step: 68280 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:11:03,295-Speed 5386.48 samples/sec Loss 7.6436 LearningRate 0.1495 Epoch: 6 Global Step: 68290 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:11:10,812-Speed 5449.69 samples/sec Loss 7.5445 LearningRate 0.1495 Epoch: 6 Global Step: 68300 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:11:18,412-Speed 5389.99 samples/sec Loss 7.6803 LearningRate 0.1495 Epoch: 6 Global Step: 68310 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:11:25,955-Speed 5431.07 samples/sec Loss 7.6722 LearningRate 0.1495 Epoch: 6 Global Step: 68320 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:11:33,754-Speed 5252.15 samples/sec Loss 7.6542 LearningRate 0.1494 Epoch: 6 Global Step: 68330 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:11:41,701-Speed 5154.96 samples/sec Loss 7.6189 LearningRate 0.1494 Epoch: 6 Global Step: 68340 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:11:49,519-Speed 5240.08 samples/sec Loss 7.6291 LearningRate 0.1494 Epoch: 6 Global Step: 68350 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:11:57,179-Speed 5347.56 samples/sec Loss 7.6066 LearningRate 0.1494 Epoch: 6 Global Step: 68360 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:12:05,019-Speed 5225.33 samples/sec Loss 7.7339 LearningRate 0.1494 Epoch: 6 Global Step: 68370 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:12:12,863-Speed 5222.26 samples/sec Loss 7.6609 LearningRate 0.1493 Epoch: 6 Global Step: 68380 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:12:20,886-Speed 5106.02 samples/sec Loss 7.6132 LearningRate 0.1493 Epoch: 6 Global Step: 68390 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:12:28,580-Speed 5324.27 samples/sec Loss 7.6209 LearningRate 0.1493 Epoch: 6 Global Step: 68400 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:12:36,174-Speed 5394.71 samples/sec Loss 7.6158 LearningRate 0.1493 Epoch: 6 Global Step: 68410 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:12:43,796-Speed 5374.93 samples/sec Loss 7.6185 LearningRate 0.1493 Epoch: 6 Global Step: 68420 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:12:51,432-Speed 5364.57 samples/sec Loss 7.5738 LearningRate 0.1492 Epoch: 6 Global Step: 68430 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:12:58,959-Speed 5442.16 samples/sec Loss 7.6245 LearningRate 0.1492 Epoch: 6 Global Step: 68440 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:13:06,473-Speed 5451.88 samples/sec Loss 7.6187 LearningRate 0.1492 Epoch: 6 Global Step: 68450 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:13:13,966-Speed 5467.77 samples/sec Loss 7.7309 LearningRate 0.1492 Epoch: 6 Global Step: 68460 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:13:21,497-Speed 5439.95 samples/sec Loss 7.6646 LearningRate 0.1491 Epoch: 6 Global Step: 68470 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:13:29,046-Speed 5426.09 samples/sec Loss 7.6324 LearningRate 0.1491 Epoch: 6 Global Step: 68480 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:13:36,668-Speed 5374.56 samples/sec Loss 7.5993 LearningRate 0.1491 Epoch: 6 Global Step: 68490 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:13:44,184-Speed 5450.94 samples/sec Loss 7.7222 LearningRate 0.1491 Epoch: 6 Global Step: 68500 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:13:51,754-Speed 5411.37 samples/sec Loss 7.6082 LearningRate 0.1491 Epoch: 6 Global Step: 68510 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 10:13:59,415-Speed 5347.62 samples/sec Loss 7.5890 LearningRate 0.1490 Epoch: 6 Global Step: 68520 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 10:14:07,078-Speed 5345.78 samples/sec Loss 7.5997 LearningRate 0.1490 Epoch: 6 Global Step: 68530 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:14:14,729-Speed 5354.30 samples/sec Loss 7.6680 LearningRate 0.1490 Epoch: 6 Global Step: 68540 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:14:22,334-Speed 5387.29 samples/sec Loss 7.6778 LearningRate 0.1490 Epoch: 6 Global Step: 68550 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:14:29,985-Speed 5353.92 samples/sec Loss 7.6176 LearningRate 0.1490 Epoch: 6 Global Step: 68560 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:14:37,684-Speed 5320.12 samples/sec Loss 7.5955 LearningRate 0.1489 Epoch: 6 Global Step: 68570 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:14:45,250-Speed 5414.68 samples/sec Loss 7.5659 LearningRate 0.1489 Epoch: 6 Global Step: 68580 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:14:52,716-Speed 5487.29 samples/sec Loss 7.6193 LearningRate 0.1489 Epoch: 6 Global Step: 68590 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:15:00,387-Speed 5340.44 samples/sec Loss 7.6226 LearningRate 0.1489 Epoch: 6 Global Step: 68600 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:15:07,854-Speed 5485.71 samples/sec Loss 7.5797 LearningRate 0.1488 Epoch: 6 Global Step: 68610 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:15:15,410-Speed 5421.69 samples/sec Loss 7.5659 LearningRate 0.1488 Epoch: 6 Global Step: 68620 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:15:22,918-Speed 5456.43 samples/sec Loss 7.5750 LearningRate 0.1488 Epoch: 6 Global Step: 68630 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 10:15:30,711-Speed 5256.75 samples/sec Loss 7.7017 LearningRate 0.1488 Epoch: 6 Global Step: 68640 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 10:15:38,318-Speed 5384.40 samples/sec Loss 7.6668 LearningRate 0.1488 Epoch: 6 Global Step: 68650 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-01-08 10:15:45,895-Speed 5407.02 samples/sec Loss 7.6123 LearningRate 0.1487 Epoch: 6 Global Step: 68660 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:15:53,509-Speed 5380.47 samples/sec Loss 7.6097 LearningRate 0.1487 Epoch: 6 Global Step: 68670 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:16:01,155-Speed 5357.78 samples/sec Loss 7.5984 LearningRate 0.1487 Epoch: 6 Global Step: 68680 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:16:08,856-Speed 5319.03 samples/sec Loss 7.6189 LearningRate 0.1487 Epoch: 6 Global Step: 68690 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:16:16,398-Speed 5431.70 samples/sec Loss 7.6716 LearningRate 0.1487 Epoch: 6 Global Step: 68700 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:16:23,988-Speed 5397.91 samples/sec Loss 7.5513 LearningRate 0.1486 Epoch: 6 Global Step: 68710 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:16:31,511-Speed 5444.93 samples/sec Loss 7.6239 LearningRate 0.1486 Epoch: 6 Global Step: 68720 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:16:39,075-Speed 5415.54 samples/sec Loss 7.6361 LearningRate 0.1486 Epoch: 6 Global Step: 68730 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:16:46,763-Speed 5329.03 samples/sec Loss 7.6285 LearningRate 0.1486 Epoch: 6 Global Step: 68740 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:16:54,483-Speed 5306.47 samples/sec Loss 7.6353 LearningRate 0.1485 Epoch: 6 Global Step: 68750 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:17:02,097-Speed 5379.76 samples/sec Loss 7.6193 LearningRate 0.1485 Epoch: 6 Global Step: 68760 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:17:09,611-Speed 5451.37 samples/sec Loss 7.5687 LearningRate 0.1485 Epoch: 6 Global Step: 68770 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:17:17,175-Speed 5416.20 samples/sec Loss 7.5762 LearningRate 0.1485 Epoch: 6 Global Step: 68780 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:17:24,836-Speed 5347.23 samples/sec Loss 7.5846 LearningRate 0.1485 Epoch: 6 Global Step: 68790 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:17:32,406-Speed 5411.25 samples/sec Loss 7.5008 LearningRate 0.1484 Epoch: 6 Global Step: 68800 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:17:39,967-Speed 5417.70 samples/sec Loss 7.5925 LearningRate 0.1484 Epoch: 6 Global Step: 68810 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:17:47,503-Speed 5436.42 samples/sec Loss 7.5926 LearningRate 0.1484 Epoch: 6 Global Step: 68820 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:17:55,098-Speed 5393.77 samples/sec Loss 7.6443 LearningRate 0.1484 Epoch: 6 Global Step: 68830 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:18:02,640-Speed 5431.42 samples/sec Loss 7.5787 LearningRate 0.1484 Epoch: 6 Global Step: 68840 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:18:10,210-Speed 5411.72 samples/sec Loss 7.5985 LearningRate 0.1483 Epoch: 6 Global Step: 68850 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:18:17,856-Speed 5357.98 samples/sec Loss 7.5673 LearningRate 0.1483 Epoch: 6 Global Step: 68860 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:18:25,374-Speed 5449.13 samples/sec Loss 7.5272 LearningRate 0.1483 Epoch: 6 Global Step: 68870 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:18:32,961-Speed 5398.84 samples/sec Loss 7.6372 LearningRate 0.1483 Epoch: 6 Global Step: 68880 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:18:40,569-Speed 5384.69 samples/sec Loss 7.5228 LearningRate 0.1482 Epoch: 6 Global Step: 68890 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:18:48,099-Speed 5440.68 samples/sec Loss 7.5887 LearningRate 0.1482 Epoch: 6 Global Step: 68900 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:18:55,756-Speed 5349.72 samples/sec Loss 7.5967 LearningRate 0.1482 Epoch: 6 Global Step: 68910 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:19:03,443-Speed 5329.09 samples/sec Loss 7.5614 LearningRate 0.1482 Epoch: 6 Global Step: 68920 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:19:11,037-Speed 5394.70 samples/sec Loss 7.6004 LearningRate 0.1482 Epoch: 6 Global Step: 68930 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:19:18,638-Speed 5389.24 samples/sec Loss 7.5920 LearningRate 0.1481 Epoch: 6 Global Step: 68940 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:19:26,256-Speed 5377.85 samples/sec Loss 7.5919 LearningRate 0.1481 Epoch: 6 Global Step: 68950 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:19:33,780-Speed 5443.91 samples/sec Loss 7.5686 LearningRate 0.1481 Epoch: 6 Global Step: 68960 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:19:41,355-Speed 5408.62 samples/sec Loss 7.5858 LearningRate 0.1481 Epoch: 6 Global Step: 68970 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:19:49,028-Speed 5339.25 samples/sec Loss 7.5719 LearningRate 0.1481 Epoch: 6 Global Step: 68980 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:19:56,781-Speed 5283.81 samples/sec Loss 7.6165 LearningRate 0.1480 Epoch: 6 Global Step: 68990 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:20:04,453-Speed 5339.33 samples/sec Loss 7.6535 LearningRate 0.1480 Epoch: 6 Global Step: 69000 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:20:12,044-Speed 5396.10 samples/sec Loss 7.5734 LearningRate 0.1480 Epoch: 6 Global Step: 69010 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:20:19,641-Speed 5392.98 samples/sec Loss 7.6491 LearningRate 0.1480 Epoch: 6 Global Step: 69020 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:20:27,193-Speed 5424.41 samples/sec Loss 7.5385 LearningRate 0.1479 Epoch: 6 Global Step: 69030 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:20:34,791-Speed 5390.99 samples/sec Loss 7.5647 LearningRate 0.1479 Epoch: 6 Global Step: 69040 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:20:42,331-Speed 5433.73 samples/sec Loss 7.5917 LearningRate 0.1479 Epoch: 6 Global Step: 69050 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:20:49,841-Speed 5454.90 samples/sec Loss 7.5750 LearningRate 0.1479 Epoch: 6 Global Step: 69060 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:20:57,389-Speed 5427.40 samples/sec Loss 7.5636 LearningRate 0.1479 Epoch: 6 Global Step: 69070 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:21:04,986-Speed 5392.37 samples/sec Loss 7.6024 LearningRate 0.1478 Epoch: 6 Global Step: 69080 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:21:12,520-Speed 5436.97 samples/sec Loss 7.6161 LearningRate 0.1478 Epoch: 6 Global Step: 69090 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:21:20,149-Speed 5369.94 samples/sec Loss 7.5876 LearningRate 0.1478 Epoch: 6 Global Step: 69100 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:21:27,723-Speed 5408.98 samples/sec Loss 7.5342 LearningRate 0.1478 Epoch: 6 Global Step: 69110 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:21:35,400-Speed 5335.91 samples/sec Loss 7.5884 LearningRate 0.1478 Epoch: 6 Global Step: 69120 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:21:43,005-Speed 5386.47 samples/sec Loss 7.5926 LearningRate 0.1477 Epoch: 6 Global Step: 69130 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:21:50,554-Speed 5427.03 samples/sec Loss 7.6241 LearningRate 0.1477 Epoch: 6 Global Step: 69140 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:21:58,113-Speed 5419.56 samples/sec Loss 7.5662 LearningRate 0.1477 Epoch: 6 Global Step: 69150 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:22:05,700-Speed 5398.83 samples/sec Loss 7.5585 LearningRate 0.1477 Epoch: 6 Global Step: 69160 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:22:13,337-Speed 5364.03 samples/sec Loss 7.5528 LearningRate 0.1476 Epoch: 6 Global Step: 69170 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:22:21,001-Speed 5345.27 samples/sec Loss 7.6314 LearningRate 0.1476 Epoch: 6 Global Step: 69180 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:22:28,529-Speed 5441.72 samples/sec Loss 7.5757 LearningRate 0.1476 Epoch: 6 Global Step: 69190 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:22:36,164-Speed 5365.24 samples/sec Loss 7.6204 LearningRate 0.1476 Epoch: 6 Global Step: 69200 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:22:43,730-Speed 5414.45 samples/sec Loss 7.6119 LearningRate 0.1476 Epoch: 6 Global Step: 69210 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:22:51,360-Speed 5368.87 samples/sec Loss 7.5985 LearningRate 0.1475 Epoch: 6 Global Step: 69220 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:22:59,453-Speed 5062.49 samples/sec Loss 7.5469 LearningRate 0.1475 Epoch: 6 Global Step: 69230 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:23:07,019-Speed 5413.84 samples/sec Loss 7.5464 LearningRate 0.1475 Epoch: 6 Global Step: 69240 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:23:14,657-Speed 5363.10 samples/sec Loss 7.5991 LearningRate 0.1475 Epoch: 6 Global Step: 69250 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:23:22,247-Speed 5397.14 samples/sec Loss 7.5358 LearningRate 0.1475 Epoch: 6 Global Step: 69260 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:23:29,813-Speed 5414.83 samples/sec Loss 7.5316 LearningRate 0.1474 Epoch: 6 Global Step: 69270 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:23:37,364-Speed 5425.45 samples/sec Loss 7.5718 LearningRate 0.1474 Epoch: 6 Global Step: 69280 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:23:44,900-Speed 5435.71 samples/sec Loss 7.5434 LearningRate 0.1474 Epoch: 6 Global Step: 69290 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:23:52,488-Speed 5399.04 samples/sec Loss 7.5578 LearningRate 0.1474 Epoch: 6 Global Step: 69300 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:24:00,005-Speed 5449.30 samples/sec Loss 7.6373 LearningRate 0.1473 Epoch: 6 Global Step: 69310 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:24:07,643-Speed 5363.15 samples/sec Loss 7.5745 LearningRate 0.1473 Epoch: 6 Global Step: 69320 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:24:15,233-Speed 5397.10 samples/sec Loss 7.6714 LearningRate 0.1473 Epoch: 6 Global Step: 69330 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:24:22,864-Speed 5368.84 samples/sec Loss 7.5078 LearningRate 0.1473 Epoch: 6 Global Step: 69340 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:24:30,581-Speed 5308.41 samples/sec Loss 7.5939 LearningRate 0.1473 Epoch: 6 Global Step: 69350 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:24:38,079-Speed 5463.79 samples/sec Loss 7.5463 LearningRate 0.1472 Epoch: 6 Global Step: 69360 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:24:45,631-Speed 5424.03 samples/sec Loss 7.5742 LearningRate 0.1472 Epoch: 6 Global Step: 69370 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:24:53,212-Speed 5403.53 samples/sec Loss 7.5015 LearningRate 0.1472 Epoch: 6 Global Step: 69380 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:25:00,830-Speed 5377.68 samples/sec Loss 7.5967 LearningRate 0.1472 Epoch: 6 Global Step: 69390 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:25:08,430-Speed 5389.79 samples/sec Loss 7.5829 LearningRate 0.1472 Epoch: 6 Global Step: 69400 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:25:16,008-Speed 5405.81 samples/sec Loss 7.5381 LearningRate 0.1471 Epoch: 6 Global Step: 69410 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:25:23,530-Speed 5446.48 samples/sec Loss 7.6010 LearningRate 0.1471 Epoch: 6 Global Step: 69420 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:25:31,042-Speed 5453.60 samples/sec Loss 7.5794 LearningRate 0.1471 Epoch: 6 Global Step: 69430 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:25:38,622-Speed 5404.03 samples/sec Loss 7.5245 LearningRate 0.1471 Epoch: 6 Global Step: 69440 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:25:46,166-Speed 5430.45 samples/sec Loss 7.6042 LearningRate 0.1470 Epoch: 6 Global Step: 69450 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:25:53,663-Speed 5464.15 samples/sec Loss 7.5524 LearningRate 0.1470 Epoch: 6 Global Step: 69460 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:26:01,344-Speed 5333.85 samples/sec Loss 7.5681 LearningRate 0.1470 Epoch: 6 Global Step: 69470 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:26:08,922-Speed 5405.45 samples/sec Loss 7.6015 LearningRate 0.1470 Epoch: 6 Global Step: 69480 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:26:16,538-Speed 5378.84 samples/sec Loss 7.5671 LearningRate 0.1470 Epoch: 6 Global Step: 69490 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:26:24,272-Speed 5296.38 samples/sec Loss 7.5051 LearningRate 0.1469 Epoch: 6 Global Step: 69500 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:26:31,864-Speed 5396.39 samples/sec Loss 7.5528 LearningRate 0.1469 Epoch: 6 Global Step: 69510 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:26:39,521-Speed 5349.63 samples/sec Loss 7.5101 LearningRate 0.1469 Epoch: 6 Global Step: 69520 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:26:47,361-Speed 5225.44 samples/sec Loss 7.5689 LearningRate 0.1469 Epoch: 6 Global Step: 69530 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:26:54,925-Speed 5415.43 samples/sec Loss 7.5975 LearningRate 0.1469 Epoch: 6 Global Step: 69540 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:27:02,408-Speed 5475.16 samples/sec Loss 7.5955 LearningRate 0.1468 Epoch: 6 Global Step: 69550 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:27:09,940-Speed 5438.67 samples/sec Loss 7.5536 LearningRate 0.1468 Epoch: 6 Global Step: 69560 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:27:17,423-Speed 5473.95 samples/sec Loss 7.6078 LearningRate 0.1468 Epoch: 6 Global Step: 69570 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:27:24,914-Speed 5469.10 samples/sec Loss 7.5529 LearningRate 0.1468 Epoch: 6 Global Step: 69580 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:27:32,614-Speed 5319.98 samples/sec Loss 7.4436 LearningRate 0.1467 Epoch: 6 Global Step: 69590 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:27:40,346-Speed 5298.05 samples/sec Loss 7.5839 LearningRate 0.1467 Epoch: 6 Global Step: 69600 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:27:47,908-Speed 5417.06 samples/sec Loss 7.5705 LearningRate 0.1467 Epoch: 6 Global Step: 69610 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:27:55,527-Speed 5377.17 samples/sec Loss 7.5869 LearningRate 0.1467 Epoch: 6 Global Step: 69620 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:28:03,174-Speed 5357.10 samples/sec Loss 7.5468 LearningRate 0.1467 Epoch: 6 Global Step: 69630 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:28:10,745-Speed 5410.44 samples/sec Loss 7.5084 LearningRate 0.1466 Epoch: 6 Global Step: 69640 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:28:18,312-Speed 5413.51 samples/sec Loss 7.5744 LearningRate 0.1466 Epoch: 6 Global Step: 69650 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:28:25,990-Speed 5335.81 samples/sec Loss 7.5228 LearningRate 0.1466 Epoch: 6 Global Step: 69660 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:28:33,488-Speed 5463.54 samples/sec Loss 7.6154 LearningRate 0.1466 Epoch: 6 Global Step: 69670 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:28:41,141-Speed 5352.98 samples/sec Loss 7.6155 LearningRate 0.1466 Epoch: 6 Global Step: 69680 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:28:48,707-Speed 5413.69 samples/sec Loss 7.5580 LearningRate 0.1465 Epoch: 6 Global Step: 69690 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:28:56,275-Speed 5413.72 samples/sec Loss 7.5237 LearningRate 0.1465 Epoch: 6 Global Step: 69700 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:29:03,964-Speed 5328.03 samples/sec Loss 7.5436 LearningRate 0.1465 Epoch: 6 Global Step: 69710 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:29:11,597-Speed 5366.36 samples/sec Loss 7.5445 LearningRate 0.1465 Epoch: 6 Global Step: 69720 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:29:19,219-Speed 5374.97 samples/sec Loss 7.5116 LearningRate 0.1465 Epoch: 6 Global Step: 69730 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:29:26,987-Speed 5273.46 samples/sec Loss 7.5745 LearningRate 0.1464 Epoch: 6 Global Step: 69740 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:29:34,631-Speed 5359.22 samples/sec Loss 7.5153 LearningRate 0.1464 Epoch: 6 Global Step: 69750 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:29:42,270-Speed 5362.51 samples/sec Loss 7.4908 LearningRate 0.1464 Epoch: 6 Global Step: 69760 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:29:49,907-Speed 5363.56 samples/sec Loss 7.4854 LearningRate 0.1464 Epoch: 6 Global Step: 69770 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:29:57,513-Speed 5386.32 samples/sec Loss 7.5707 LearningRate 0.1463 Epoch: 6 Global Step: 69780 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:30:04,992-Speed 5477.54 samples/sec Loss 7.4819 LearningRate 0.1463 Epoch: 6 Global Step: 69790 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:30:12,682-Speed 5327.00 samples/sec Loss 7.5315 LearningRate 0.1463 Epoch: 6 Global Step: 69800 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:30:20,302-Speed 5375.70 samples/sec Loss 7.4904 LearningRate 0.1463 Epoch: 6 Global Step: 69810 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:30:27,884-Speed 5403.48 samples/sec Loss 7.4865 LearningRate 0.1463 Epoch: 6 Global Step: 69820 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:30:35,389-Speed 5458.94 samples/sec Loss 7.4932 LearningRate 0.1462 Epoch: 6 Global Step: 69830 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:30:42,984-Speed 5393.14 samples/sec Loss 7.5347 LearningRate 0.1462 Epoch: 6 Global Step: 69840 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:30:50,482-Speed 5463.54 samples/sec Loss 7.5042 LearningRate 0.1462 Epoch: 6 Global Step: 69850 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:30:58,099-Speed 5378.33 samples/sec Loss 7.4928 LearningRate 0.1462 Epoch: 6 Global Step: 69860 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:31:05,739-Speed 5362.62 samples/sec Loss 7.5234 LearningRate 0.1462 Epoch: 6 Global Step: 69870 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:31:13,352-Speed 5380.46 samples/sec Loss 7.6225 LearningRate 0.1461 Epoch: 6 Global Step: 69880 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:31:20,828-Speed 5479.94 samples/sec Loss 7.5666 LearningRate 0.1461 Epoch: 6 Global Step: 69890 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:31:28,322-Speed 5466.32 samples/sec Loss 7.4901 LearningRate 0.1461 Epoch: 6 Global Step: 69900 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:31:35,994-Speed 5340.15 samples/sec Loss 7.5363 LearningRate 0.1461 Epoch: 6 Global Step: 69910 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:31:43,670-Speed 5336.54 samples/sec Loss 7.5850 LearningRate 0.1460 Epoch: 6 Global Step: 69920 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:31:51,239-Speed 5412.41 samples/sec Loss 7.6363 LearningRate 0.1460 Epoch: 6 Global Step: 69930 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:31:58,865-Speed 5372.24 samples/sec Loss 7.6133 LearningRate 0.1460 Epoch: 6 Global Step: 69940 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:32:06,509-Speed 5358.66 samples/sec Loss 7.5516 LearningRate 0.1460 Epoch: 6 Global Step: 69950 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:32:14,036-Speed 5442.29 samples/sec Loss 7.5099 LearningRate 0.1460 Epoch: 6 Global Step: 69960 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:32:21,686-Speed 5355.52 samples/sec Loss 7.4933 LearningRate 0.1459 Epoch: 6 Global Step: 69970 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:32:29,389-Speed 5318.10 samples/sec Loss 7.5691 LearningRate 0.1459 Epoch: 6 Global Step: 69980 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:32:37,002-Speed 5381.09 samples/sec Loss 7.5447 LearningRate 0.1459 Epoch: 6 Global Step: 69990 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:32:44,498-Speed 5464.51 samples/sec Loss 7.5572 LearningRate 0.1459 Epoch: 6 Global Step: 70000 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:33:28,705-[lfw][70000]XNorm: 22.685974 Training: 2022-01-08 10:33:28,706-[lfw][70000]Accuracy-Flip: 0.99683+-0.00273 Training: 2022-01-08 10:33:28,706-[lfw][70000]Accuracy-Highest: 0.99817 Training: 2022-01-08 10:34:20,186-[cfp_fp][70000]XNorm: 20.707267 Training: 2022-01-08 10:34:20,187-[cfp_fp][70000]Accuracy-Flip: 0.98771+-0.00410 Training: 2022-01-08 10:34:20,188-[cfp_fp][70000]Accuracy-Highest: 0.98771 Training: 2022-01-08 10:35:05,934-[agedb_30][70000]XNorm: 22.748361 Training: 2022-01-08 10:35:05,936-[agedb_30][70000]Accuracy-Flip: 0.97350+-0.00474 Training: 2022-01-08 10:35:05,936-[agedb_30][70000]Accuracy-Highest: 0.97667 Training: 2022-01-08 10:35:13,584-Speed 274.74 samples/sec Loss 7.5159 LearningRate 0.1459 Epoch: 6 Global Step: 70010 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:35:21,110-Speed 5444.03 samples/sec Loss 7.5007 LearningRate 0.1458 Epoch: 6 Global Step: 70020 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:35:28,613-Speed 5460.62 samples/sec Loss 7.5405 LearningRate 0.1458 Epoch: 6 Global Step: 70030 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:35:36,246-Speed 5366.62 samples/sec Loss 7.5770 LearningRate 0.1458 Epoch: 6 Global Step: 70040 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:35:43,825-Speed 5405.40 samples/sec Loss 7.5591 LearningRate 0.1458 Epoch: 6 Global Step: 70050 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:35:51,377-Speed 5424.85 samples/sec Loss 7.5308 LearningRate 0.1457 Epoch: 6 Global Step: 70060 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:35:58,932-Speed 5421.96 samples/sec Loss 7.4917 LearningRate 0.1457 Epoch: 6 Global Step: 70070 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:36:06,519-Speed 5399.67 samples/sec Loss 7.4810 LearningRate 0.1457 Epoch: 6 Global Step: 70080 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-01-08 10:36:14,118-Speed 5390.86 samples/sec Loss 7.5318 LearningRate 0.1457 Epoch: 6 Global Step: 70090 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:36:21,541-Speed 5518.34 samples/sec Loss 7.5203 LearningRate 0.1457 Epoch: 6 Global Step: 70100 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-01-08 10:36:28,984-Speed 5504.36 samples/sec Loss 7.4702 LearningRate 0.1456 Epoch: 6 Global Step: 70110 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 10:36:36,518-Speed 5437.53 samples/sec Loss 7.5178 LearningRate 0.1456 Epoch: 6 Global Step: 70120 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 10:36:44,228-Speed 5312.58 samples/sec Loss 7.5514 LearningRate 0.1456 Epoch: 6 Global Step: 70130 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 10:36:51,755-Speed 5442.59 samples/sec Loss 7.5088 LearningRate 0.1456 Epoch: 6 Global Step: 70140 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 10:36:59,282-Speed 5442.27 samples/sec Loss 7.5018 LearningRate 0.1456 Epoch: 6 Global Step: 70150 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 10:37:06,736-Speed 5495.97 samples/sec Loss 7.5108 LearningRate 0.1455 Epoch: 6 Global Step: 70160 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 10:37:14,295-Speed 5419.59 samples/sec Loss 7.4523 LearningRate 0.1455 Epoch: 6 Global Step: 70170 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 10:37:21,849-Speed 5422.46 samples/sec Loss 7.5413 LearningRate 0.1455 Epoch: 6 Global Step: 70180 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 10:37:29,558-Speed 5314.24 samples/sec Loss 7.4496 LearningRate 0.1455 Epoch: 6 Global Step: 70190 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 10:37:37,244-Speed 5329.93 samples/sec Loss 7.5022 LearningRate 0.1455 Epoch: 6 Global Step: 70200 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 10:37:44,758-Speed 5452.05 samples/sec Loss 7.5053 LearningRate 0.1454 Epoch: 6 Global Step: 70210 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:37:52,498-Speed 5292.36 samples/sec Loss 7.4945 LearningRate 0.1454 Epoch: 6 Global Step: 70220 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:37:59,936-Speed 5507.78 samples/sec Loss 7.5561 LearningRate 0.1454 Epoch: 6 Global Step: 70230 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:38:07,555-Speed 5376.09 samples/sec Loss 7.5543 LearningRate 0.1454 Epoch: 6 Global Step: 70240 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:38:15,154-Speed 5391.45 samples/sec Loss 7.4986 LearningRate 0.1453 Epoch: 6 Global Step: 70250 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:38:22,623-Speed 5484.56 samples/sec Loss 7.4916 LearningRate 0.1453 Epoch: 6 Global Step: 70260 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:38:30,057-Speed 5510.84 samples/sec Loss 7.5030 LearningRate 0.1453 Epoch: 6 Global Step: 70270 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:38:37,618-Speed 5417.10 samples/sec Loss 7.5434 LearningRate 0.1453 Epoch: 6 Global Step: 70280 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:38:45,262-Speed 5359.52 samples/sec Loss 7.5861 LearningRate 0.1453 Epoch: 6 Global Step: 70290 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:38:52,823-Speed 5418.69 samples/sec Loss 7.4767 LearningRate 0.1452 Epoch: 6 Global Step: 70300 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:39:00,445-Speed 5374.09 samples/sec Loss 7.5718 LearningRate 0.1452 Epoch: 6 Global Step: 70310 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:39:07,865-Speed 5521.33 samples/sec Loss 7.5039 LearningRate 0.1452 Epoch: 6 Global Step: 70320 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:39:15,396-Speed 5439.65 samples/sec Loss 7.4726 LearningRate 0.1452 Epoch: 6 Global Step: 70330 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:39:22,978-Speed 5402.71 samples/sec Loss 7.5124 LearningRate 0.1452 Epoch: 6 Global Step: 70340 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:39:30,553-Speed 5408.36 samples/sec Loss 7.4626 LearningRate 0.1451 Epoch: 6 Global Step: 70350 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:39:38,144-Speed 5396.17 samples/sec Loss 7.4657 LearningRate 0.1451 Epoch: 6 Global Step: 70360 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:39:45,728-Speed 5401.95 samples/sec Loss 7.5295 LearningRate 0.1451 Epoch: 6 Global Step: 70370 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:39:53,261-Speed 5437.96 samples/sec Loss 7.4710 LearningRate 0.1451 Epoch: 6 Global Step: 70380 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:40:00,820-Speed 5419.30 samples/sec Loss 7.5070 LearningRate 0.1451 Epoch: 6 Global Step: 70390 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:40:08,420-Speed 5390.25 samples/sec Loss 7.4445 LearningRate 0.1450 Epoch: 6 Global Step: 70400 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:40:15,957-Speed 5435.55 samples/sec Loss 7.4515 LearningRate 0.1450 Epoch: 6 Global Step: 70410 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:40:23,451-Speed 5466.43 samples/sec Loss 7.4883 LearningRate 0.1450 Epoch: 6 Global Step: 70420 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:40:30,969-Speed 5448.88 samples/sec Loss 7.4614 LearningRate 0.1450 Epoch: 6 Global Step: 70430 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:40:38,580-Speed 5382.89 samples/sec Loss 7.4958 LearningRate 0.1449 Epoch: 6 Global Step: 70440 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:40:46,192-Speed 5381.23 samples/sec Loss 7.5069 LearningRate 0.1449 Epoch: 6 Global Step: 70450 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:40:53,726-Speed 5437.74 samples/sec Loss 7.5059 LearningRate 0.1449 Epoch: 6 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:41:01,225-Speed 5462.86 samples/sec Loss 7.5052 LearningRate 0.1449 Epoch: 6 Global Step: 70470 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:41:08,737-Speed 5453.49 samples/sec Loss 7.5026 LearningRate 0.1449 Epoch: 6 Global Step: 70480 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:41:16,426-Speed 5327.29 samples/sec Loss 7.5046 LearningRate 0.1448 Epoch: 6 Global Step: 70490 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:41:24,006-Speed 5404.47 samples/sec Loss 7.4788 LearningRate 0.1448 Epoch: 6 Global Step: 70500 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:41:31,632-Speed 5372.40 samples/sec Loss 7.5064 LearningRate 0.1448 Epoch: 6 Global Step: 70510 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:41:39,236-Speed 5387.39 samples/sec Loss 7.5064 LearningRate 0.1448 Epoch: 6 Global Step: 70520 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:41:47,044-Speed 5246.29 samples/sec Loss 7.5263 LearningRate 0.1448 Epoch: 6 Global Step: 70530 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:41:54,626-Speed 5402.87 samples/sec Loss 7.5286 LearningRate 0.1447 Epoch: 6 Global Step: 70540 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:42:02,185-Speed 5419.25 samples/sec Loss 7.4694 LearningRate 0.1447 Epoch: 6 Global Step: 70550 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:42:09,803-Speed 5377.72 samples/sec Loss 7.4490 LearningRate 0.1447 Epoch: 6 Global Step: 70560 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:42:17,515-Speed 5311.07 samples/sec Loss 7.4398 LearningRate 0.1447 Epoch: 6 Global Step: 70570 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:42:25,070-Speed 5422.44 samples/sec Loss 7.4829 LearningRate 0.1446 Epoch: 6 Global Step: 70580 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:42:32,737-Speed 5343.04 samples/sec Loss 7.4439 LearningRate 0.1446 Epoch: 6 Global Step: 70590 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:42:40,297-Speed 5419.24 samples/sec Loss 7.4756 LearningRate 0.1446 Epoch: 6 Global Step: 70600 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:42:47,855-Speed 5419.47 samples/sec Loss 7.4730 LearningRate 0.1446 Epoch: 6 Global Step: 70610 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:42:55,361-Speed 5458.22 samples/sec Loss 7.4698 LearningRate 0.1446 Epoch: 6 Global Step: 70620 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:43:02,836-Speed 5480.45 samples/sec Loss 7.4681 LearningRate 0.1445 Epoch: 6 Global Step: 70630 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:43:10,361-Speed 5443.84 samples/sec Loss 7.5003 LearningRate 0.1445 Epoch: 6 Global Step: 70640 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:43:17,881-Speed 5447.15 samples/sec Loss 7.4986 LearningRate 0.1445 Epoch: 6 Global Step: 70650 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:43:25,345-Speed 5488.66 samples/sec Loss 7.5045 LearningRate 0.1445 Epoch: 6 Global Step: 70660 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:43:32,865-Speed 5447.29 samples/sec Loss 7.4201 LearningRate 0.1445 Epoch: 6 Global Step: 70670 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:43:40,416-Speed 5425.02 samples/sec Loss 7.5006 LearningRate 0.1444 Epoch: 6 Global Step: 70680 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:43:47,888-Speed 5482.58 samples/sec Loss 7.4802 LearningRate 0.1444 Epoch: 6 Global Step: 70690 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:43:55,443-Speed 5422.35 samples/sec Loss 7.4736 LearningRate 0.1444 Epoch: 6 Global Step: 70700 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:44:02,995-Speed 5424.61 samples/sec Loss 7.5206 LearningRate 0.1444 Epoch: 6 Global Step: 70710 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:44:10,619-Speed 5372.94 samples/sec Loss 7.4962 LearningRate 0.1444 Epoch: 6 Global Step: 70720 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:44:18,119-Speed 5462.45 samples/sec Loss 7.4831 LearningRate 0.1443 Epoch: 6 Global Step: 70730 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:44:25,754-Speed 5364.99 samples/sec Loss 7.5364 LearningRate 0.1443 Epoch: 6 Global Step: 70740 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:44:33,215-Speed 5490.95 samples/sec Loss 7.4779 LearningRate 0.1443 Epoch: 6 Global Step: 70750 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:44:40,789-Speed 5408.99 samples/sec Loss 7.4856 LearningRate 0.1443 Epoch: 6 Global Step: 70760 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:44:48,390-Speed 5388.95 samples/sec Loss 7.4608 LearningRate 0.1442 Epoch: 6 Global Step: 70770 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:44:55,936-Speed 5429.41 samples/sec Loss 7.3830 LearningRate 0.1442 Epoch: 6 Global Step: 70780 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:45:03,494-Speed 5419.73 samples/sec Loss 7.5276 LearningRate 0.1442 Epoch: 6 Global Step: 70790 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:45:11,008-Speed 5451.97 samples/sec Loss 7.4247 LearningRate 0.1442 Epoch: 6 Global Step: 70800 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:45:18,630-Speed 5375.02 samples/sec Loss 7.4811 LearningRate 0.1442 Epoch: 6 Global Step: 70810 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:45:26,078-Speed 5500.12 samples/sec Loss 7.4730 LearningRate 0.1441 Epoch: 6 Global Step: 70820 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:45:33,573-Speed 5466.03 samples/sec Loss 7.4266 LearningRate 0.1441 Epoch: 6 Global Step: 70830 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:45:41,171-Speed 5391.37 samples/sec Loss 7.4929 LearningRate 0.1441 Epoch: 6 Global Step: 70840 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:45:48,713-Speed 5431.87 samples/sec Loss 7.5140 LearningRate 0.1441 Epoch: 6 Global Step: 70850 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:45:56,167-Speed 5495.86 samples/sec Loss 7.4682 LearningRate 0.1441 Epoch: 6 Global Step: 70860 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:46:03,752-Speed 5400.88 samples/sec Loss 7.3882 LearningRate 0.1440 Epoch: 6 Global Step: 70870 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:46:11,274-Speed 5446.12 samples/sec Loss 7.4707 LearningRate 0.1440 Epoch: 6 Global Step: 70880 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:46:18,786-Speed 5453.34 samples/sec Loss 7.4240 LearningRate 0.1440 Epoch: 6 Global Step: 70890 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:46:26,235-Speed 5499.65 samples/sec Loss 7.4899 LearningRate 0.1440 Epoch: 6 Global Step: 70900 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:46:33,800-Speed 5415.15 samples/sec Loss 7.5402 LearningRate 0.1440 Epoch: 6 Global Step: 70910 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:46:41,361-Speed 5417.89 samples/sec Loss 7.4633 LearningRate 0.1439 Epoch: 6 Global Step: 70920 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:46:48,880-Speed 5448.37 samples/sec Loss 7.4939 LearningRate 0.1439 Epoch: 6 Global Step: 70930 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:46:56,420-Speed 5433.10 samples/sec Loss 7.4777 LearningRate 0.1439 Epoch: 6 Global Step: 70940 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:47:03,974-Speed 5423.32 samples/sec Loss 7.4957 LearningRate 0.1439 Epoch: 6 Global Step: 70950 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:47:11,588-Speed 5380.36 samples/sec Loss 7.4500 LearningRate 0.1438 Epoch: 6 Global Step: 70960 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:47:19,081-Speed 5467.27 samples/sec Loss 7.5083 LearningRate 0.1438 Epoch: 6 Global Step: 70970 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:47:26,680-Speed 5390.51 samples/sec Loss 7.4498 LearningRate 0.1438 Epoch: 6 Global Step: 70980 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:47:34,194-Speed 5452.15 samples/sec Loss 7.5199 LearningRate 0.1438 Epoch: 6 Global Step: 70990 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:47:41,753-Speed 5419.24 samples/sec Loss 7.5099 LearningRate 0.1438 Epoch: 6 Global Step: 71000 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:47:49,443-Speed 5327.12 samples/sec Loss 7.5110 LearningRate 0.1437 Epoch: 6 Global Step: 71010 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:47:57,179-Speed 5295.24 samples/sec Loss 7.4378 LearningRate 0.1437 Epoch: 6 Global Step: 71020 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:48:04,966-Speed 5260.85 samples/sec Loss 7.4728 LearningRate 0.1437 Epoch: 6 Global Step: 71030 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:48:12,555-Speed 5398.03 samples/sec Loss 7.4604 LearningRate 0.1437 Epoch: 6 Global Step: 71040 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:48:20,130-Speed 5408.04 samples/sec Loss 7.3995 LearningRate 0.1437 Epoch: 6 Global Step: 71050 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:48:27,585-Speed 5495.01 samples/sec Loss 7.4596 LearningRate 0.1436 Epoch: 6 Global Step: 71060 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:48:35,106-Speed 5446.90 samples/sec Loss 7.4961 LearningRate 0.1436 Epoch: 6 Global Step: 71070 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:48:42,665-Speed 5419.28 samples/sec Loss 7.3540 LearningRate 0.1436 Epoch: 6 Global Step: 71080 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:48:50,248-Speed 5402.04 samples/sec Loss 7.4985 LearningRate 0.1436 Epoch: 6 Global Step: 71090 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:48:57,822-Speed 5409.02 samples/sec Loss 7.4279 LearningRate 0.1436 Epoch: 6 Global Step: 71100 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:49:05,420-Speed 5391.64 samples/sec Loss 7.5049 LearningRate 0.1435 Epoch: 6 Global Step: 71110 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:49:12,920-Speed 5461.65 samples/sec Loss 7.4990 LearningRate 0.1435 Epoch: 6 Global Step: 71120 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:49:20,525-Speed 5387.23 samples/sec Loss 7.4684 LearningRate 0.1435 Epoch: 6 Global Step: 71130 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:49:28,058-Speed 5437.49 samples/sec Loss 7.5474 LearningRate 0.1435 Epoch: 6 Global Step: 71140 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:49:35,612-Speed 5423.33 samples/sec Loss 7.4485 LearningRate 0.1434 Epoch: 6 Global Step: 71150 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:49:43,104-Speed 5467.74 samples/sec Loss 7.4820 LearningRate 0.1434 Epoch: 6 Global Step: 71160 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:49:50,658-Speed 5423.11 samples/sec Loss 7.5076 LearningRate 0.1434 Epoch: 6 Global Step: 71170 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:49:58,168-Speed 5455.12 samples/sec Loss 7.5024 LearningRate 0.1434 Epoch: 6 Global Step: 71180 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:50:05,775-Speed 5384.93 samples/sec Loss 7.4402 LearningRate 0.1434 Epoch: 6 Global Step: 71190 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:50:13,278-Speed 5460.10 samples/sec Loss 7.4990 LearningRate 0.1433 Epoch: 6 Global Step: 71200 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:50:20,712-Speed 5510.54 samples/sec Loss 7.4926 LearningRate 0.1433 Epoch: 6 Global Step: 71210 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:50:28,329-Speed 5377.72 samples/sec Loss 7.4777 LearningRate 0.1433 Epoch: 6 Global Step: 71220 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 10:50:35,822-Speed 5467.52 samples/sec Loss 7.4363 LearningRate 0.1433 Epoch: 6 Global Step: 71230 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:50:43,361-Speed 5433.70 samples/sec Loss 7.4169 LearningRate 0.1433 Epoch: 6 Global Step: 71240 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:50:50,830-Speed 5484.85 samples/sec Loss 7.4291 LearningRate 0.1432 Epoch: 6 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:50:58,294-Speed 5487.82 samples/sec Loss 7.4819 LearningRate 0.1432 Epoch: 6 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:51:05,871-Speed 5407.55 samples/sec Loss 7.4359 LearningRate 0.1432 Epoch: 6 Global Step: 71270 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:51:13,350-Speed 5476.48 samples/sec Loss 7.3935 LearningRate 0.1432 Epoch: 6 Global Step: 71280 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:51:20,979-Speed 5369.87 samples/sec Loss 7.4458 LearningRate 0.1432 Epoch: 6 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:51:28,460-Speed 5475.90 samples/sec Loss 7.4367 LearningRate 0.1431 Epoch: 6 Global Step: 71300 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:51:36,007-Speed 5428.13 samples/sec Loss 7.4578 LearningRate 0.1431 Epoch: 6 Global Step: 71310 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:51:43,548-Speed 5432.50 samples/sec Loss 7.3777 LearningRate 0.1431 Epoch: 6 Global Step: 71320 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:51:51,094-Speed 5428.05 samples/sec Loss 7.4434 LearningRate 0.1431 Epoch: 6 Global Step: 71330 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:51:58,571-Speed 5479.41 samples/sec Loss 7.5113 LearningRate 0.1430 Epoch: 6 Global Step: 71340 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:52:06,001-Speed 5513.76 samples/sec Loss 7.3980 LearningRate 0.1430 Epoch: 6 Global Step: 71350 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:52:13,558-Speed 5420.26 samples/sec Loss 7.4505 LearningRate 0.1430 Epoch: 6 Global Step: 71360 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:52:21,073-Speed 5450.77 samples/sec Loss 7.4268 LearningRate 0.1430 Epoch: 6 Global Step: 71370 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:52:28,676-Speed 5388.09 samples/sec Loss 7.4198 LearningRate 0.1430 Epoch: 6 Global Step: 71380 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:52:36,288-Speed 5382.38 samples/sec Loss 7.4046 LearningRate 0.1429 Epoch: 6 Global Step: 71390 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:52:43,879-Speed 5396.35 samples/sec Loss 7.4754 LearningRate 0.1429 Epoch: 6 Global Step: 71400 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:52:51,389-Speed 5454.81 samples/sec Loss 7.4481 LearningRate 0.1429 Epoch: 6 Global Step: 71410 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:52:58,912-Speed 5444.98 samples/sec Loss 7.4190 LearningRate 0.1429 Epoch: 6 Global Step: 71420 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:53:06,443-Speed 5439.38 samples/sec Loss 7.4824 LearningRate 0.1429 Epoch: 6 Global Step: 71430 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:53:14,012-Speed 5412.29 samples/sec Loss 7.4327 LearningRate 0.1428 Epoch: 6 Global Step: 71440 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:53:21,577-Speed 5415.43 samples/sec Loss 7.4365 LearningRate 0.1428 Epoch: 6 Global Step: 71450 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 10:53:29,085-Speed 5455.94 samples/sec Loss 7.3979 LearningRate 0.1428 Epoch: 6 Global Step: 71460 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:53:36,561-Speed 5480.46 samples/sec Loss 7.4536 LearningRate 0.1428 Epoch: 6 Global Step: 71470 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:53:44,090-Speed 5440.82 samples/sec Loss 7.5242 LearningRate 0.1428 Epoch: 6 Global Step: 71480 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:53:51,582-Speed 5467.72 samples/sec Loss 7.5098 LearningRate 0.1427 Epoch: 6 Global Step: 71490 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:53:59,078-Speed 5464.99 samples/sec Loss 7.4149 LearningRate 0.1427 Epoch: 6 Global Step: 71500 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:54:06,571-Speed 5467.27 samples/sec Loss 7.4780 LearningRate 0.1427 Epoch: 6 Global Step: 71510 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:54:14,062-Speed 5469.01 samples/sec Loss 7.3932 LearningRate 0.1427 Epoch: 6 Global Step: 71520 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:54:21,533-Speed 5482.50 samples/sec Loss 7.4834 LearningRate 0.1426 Epoch: 6 Global Step: 71530 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:54:29,160-Speed 5371.11 samples/sec Loss 7.4093 LearningRate 0.1426 Epoch: 6 Global Step: 71540 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:54:36,689-Speed 5441.58 samples/sec Loss 7.3593 LearningRate 0.1426 Epoch: 6 Global Step: 71550 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:54:44,199-Speed 5455.14 samples/sec Loss 7.4467 LearningRate 0.1426 Epoch: 6 Global Step: 71560 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:54:51,769-Speed 5410.91 samples/sec Loss 7.4220 LearningRate 0.1426 Epoch: 6 Global Step: 71570 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:54:59,287-Speed 5449.40 samples/sec Loss 7.4117 LearningRate 0.1425 Epoch: 6 Global Step: 71580 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:55:06,830-Speed 5430.77 samples/sec Loss 7.3680 LearningRate 0.1425 Epoch: 6 Global Step: 71590 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:55:14,388-Speed 5420.70 samples/sec Loss 7.3595 LearningRate 0.1425 Epoch: 6 Global Step: 71600 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:55:21,908-Speed 5447.35 samples/sec Loss 7.4397 LearningRate 0.1425 Epoch: 6 Global Step: 71610 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:55:29,494-Speed 5399.61 samples/sec Loss 7.4228 LearningRate 0.1425 Epoch: 6 Global Step: 71620 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:55:36,972-Speed 5478.88 samples/sec Loss 7.4515 LearningRate 0.1424 Epoch: 6 Global Step: 71630 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:55:44,546-Speed 5408.78 samples/sec Loss 7.5209 LearningRate 0.1424 Epoch: 6 Global Step: 71640 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:55:52,057-Speed 5453.43 samples/sec Loss 7.4539 LearningRate 0.1424 Epoch: 6 Global Step: 71650 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:55:59,617-Speed 5418.73 samples/sec Loss 7.3995 LearningRate 0.1424 Epoch: 6 Global Step: 71660 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:56:07,103-Speed 5471.99 samples/sec Loss 7.4403 LearningRate 0.1424 Epoch: 6 Global Step: 71670 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:56:14,615-Speed 5453.95 samples/sec Loss 7.3534 LearningRate 0.1423 Epoch: 6 Global Step: 71680 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:56:22,246-Speed 5368.15 samples/sec Loss 7.3883 LearningRate 0.1423 Epoch: 6 Global Step: 71690 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:56:29,757-Speed 5454.14 samples/sec Loss 7.4127 LearningRate 0.1423 Epoch: 6 Global Step: 71700 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:56:37,204-Speed 5500.98 samples/sec Loss 7.4176 LearningRate 0.1423 Epoch: 6 Global Step: 71710 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:56:44,805-Speed 5389.86 samples/sec Loss 7.4714 LearningRate 0.1422 Epoch: 6 Global Step: 71720 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:56:52,337-Speed 5438.22 samples/sec Loss 7.4024 LearningRate 0.1422 Epoch: 6 Global Step: 71730 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:56:59,796-Speed 5492.74 samples/sec Loss 7.4279 LearningRate 0.1422 Epoch: 6 Global Step: 71740 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:57:07,218-Speed 5518.92 samples/sec Loss 7.4049 LearningRate 0.1422 Epoch: 6 Global Step: 71750 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 10:57:14,726-Speed 5456.49 samples/sec Loss 7.3998 LearningRate 0.1422 Epoch: 6 Global Step: 71760 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:57:22,219-Speed 5467.41 samples/sec Loss 7.3561 LearningRate 0.1421 Epoch: 6 Global Step: 71770 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:57:29,681-Speed 5489.82 samples/sec Loss 7.4330 LearningRate 0.1421 Epoch: 6 Global Step: 71780 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:57:37,154-Speed 5481.94 samples/sec Loss 7.4127 LearningRate 0.1421 Epoch: 6 Global Step: 71790 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:57:44,665-Speed 5454.30 samples/sec Loss 7.4639 LearningRate 0.1421 Epoch: 6 Global Step: 71800 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:57:52,133-Speed 5485.55 samples/sec Loss 7.3996 LearningRate 0.1421 Epoch: 6 Global Step: 71810 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:57:59,624-Speed 5468.26 samples/sec Loss 7.4481 LearningRate 0.1420 Epoch: 6 Global Step: 71820 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:58:07,181-Speed 5421.31 samples/sec Loss 7.3912 LearningRate 0.1420 Epoch: 6 Global Step: 71830 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:58:14,661-Speed 5476.48 samples/sec Loss 7.4082 LearningRate 0.1420 Epoch: 6 Global Step: 71840 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:58:22,242-Speed 5403.50 samples/sec Loss 7.4316 LearningRate 0.1420 Epoch: 6 Global Step: 71850 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:58:29,796-Speed 5423.70 samples/sec Loss 7.4206 LearningRate 0.1420 Epoch: 6 Global Step: 71860 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:58:37,228-Speed 5511.67 samples/sec Loss 7.4156 LearningRate 0.1419 Epoch: 6 Global Step: 71870 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 10:58:44,725-Speed 5464.45 samples/sec Loss 7.3799 LearningRate 0.1419 Epoch: 6 Global Step: 71880 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:58:52,273-Speed 5427.55 samples/sec Loss 7.4069 LearningRate 0.1419 Epoch: 6 Global Step: 71890 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:58:59,745-Speed 5482.39 samples/sec Loss 7.3830 LearningRate 0.1419 Epoch: 6 Global Step: 71900 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:59:07,214-Speed 5485.09 samples/sec Loss 7.3807 LearningRate 0.1418 Epoch: 6 Global Step: 71910 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:59:14,806-Speed 5395.92 samples/sec Loss 7.4755 LearningRate 0.1418 Epoch: 6 Global Step: 71920 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:59:22,249-Speed 5503.47 samples/sec Loss 7.4375 LearningRate 0.1418 Epoch: 6 Global Step: 71930 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:59:29,738-Speed 5470.65 samples/sec Loss 7.4008 LearningRate 0.1418 Epoch: 6 Global Step: 71940 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:59:37,164-Speed 5516.00 samples/sec Loss 7.4299 LearningRate 0.1418 Epoch: 6 Global Step: 71950 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:59:44,625-Speed 5490.41 samples/sec Loss 7.4283 LearningRate 0.1417 Epoch: 6 Global Step: 71960 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:59:52,107-Speed 5475.44 samples/sec Loss 7.3817 LearningRate 0.1417 Epoch: 6 Global Step: 71970 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 10:59:59,558-Speed 5497.72 samples/sec Loss 7.4711 LearningRate 0.1417 Epoch: 6 Global Step: 71980 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:00:07,002-Speed 5503.58 samples/sec Loss 7.4469 LearningRate 0.1417 Epoch: 6 Global Step: 71990 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:00:14,448-Speed 5501.77 samples/sec Loss 7.4139 LearningRate 0.1417 Epoch: 6 Global Step: 72000 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:00:58,263-[lfw][72000]XNorm: 23.137435 Training: 2022-01-08 11:00:58,264-[lfw][72000]Accuracy-Flip: 0.99800+-0.00245 Training: 2022-01-08 11:00:58,264-[lfw][72000]Accuracy-Highest: 0.99817 Training: 2022-01-08 11:01:50,014-[cfp_fp][72000]XNorm: 20.903826 Training: 2022-01-08 11:01:50,015-[cfp_fp][72000]Accuracy-Flip: 0.98514+-0.00607 Training: 2022-01-08 11:01:50,015-[cfp_fp][72000]Accuracy-Highest: 0.98771 Training: 2022-01-08 11:02:35,556-[agedb_30][72000]XNorm: 23.205493 Training: 2022-01-08 11:02:35,557-[agedb_30][72000]Accuracy-Flip: 0.97267+-0.00629 Training: 2022-01-08 11:02:35,558-[agedb_30][72000]Accuracy-Highest: 0.97667 Training: 2022-01-08 11:02:43,195-Speed 275.37 samples/sec Loss 7.4430 LearningRate 0.1416 Epoch: 6 Global Step: 72010 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:02:50,950-Speed 5283.04 samples/sec Loss 7.4033 LearningRate 0.1416 Epoch: 6 Global Step: 72020 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:02:58,530-Speed 5404.97 samples/sec Loss 7.4265 LearningRate 0.1416 Epoch: 6 Global Step: 72030 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:03:06,085-Speed 5423.24 samples/sec Loss 7.3904 LearningRate 0.1416 Epoch: 6 Global Step: 72040 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:03:13,570-Speed 5473.36 samples/sec Loss 7.4322 LearningRate 0.1416 Epoch: 6 Global Step: 72050 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:03:21,169-Speed 5390.96 samples/sec Loss 7.4442 LearningRate 0.1415 Epoch: 6 Global Step: 72060 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:03:28,716-Speed 5428.00 samples/sec Loss 7.3863 LearningRate 0.1415 Epoch: 6 Global Step: 72070 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:03:36,273-Speed 5420.70 samples/sec Loss 7.5002 LearningRate 0.1415 Epoch: 6 Global Step: 72080 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:03:43,752-Speed 5477.61 samples/sec Loss 7.3720 LearningRate 0.1415 Epoch: 6 Global Step: 72090 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:03:51,221-Speed 5484.31 samples/sec Loss 7.4472 LearningRate 0.1415 Epoch: 6 Global Step: 72100 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:03:58,850-Speed 5369.81 samples/sec Loss 7.4218 LearningRate 0.1414 Epoch: 6 Global Step: 72110 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:04:06,458-Speed 5384.19 samples/sec Loss 7.3658 LearningRate 0.1414 Epoch: 6 Global Step: 72120 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:04:13,937-Speed 5478.09 samples/sec Loss 7.4780 LearningRate 0.1414 Epoch: 6 Global Step: 72130 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:04:21,384-Speed 5500.53 samples/sec Loss 7.4093 LearningRate 0.1414 Epoch: 6 Global Step: 72140 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:04:28,844-Speed 5491.32 samples/sec Loss 7.4041 LearningRate 0.1413 Epoch: 6 Global Step: 72150 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:04:36,280-Speed 5508.82 samples/sec Loss 7.4560 LearningRate 0.1413 Epoch: 6 Global Step: 72160 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:04:43,755-Speed 5480.38 samples/sec Loss 7.3314 LearningRate 0.1413 Epoch: 6 Global Step: 72170 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:04:51,275-Speed 5447.66 samples/sec Loss 7.4699 LearningRate 0.1413 Epoch: 6 Global Step: 72180 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:04:58,781-Speed 5457.64 samples/sec Loss 7.4015 LearningRate 0.1413 Epoch: 6 Global Step: 72190 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:05:06,212-Speed 5512.60 samples/sec Loss 7.4158 LearningRate 0.1412 Epoch: 6 Global Step: 72200 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:05:13,798-Speed 5400.14 samples/sec Loss 7.4080 LearningRate 0.1412 Epoch: 6 Global Step: 72210 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:05:21,311-Speed 5452.80 samples/sec Loss 7.3354 LearningRate 0.1412 Epoch: 6 Global Step: 72220 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:05:28,968-Speed 5350.06 samples/sec Loss 7.4161 LearningRate 0.1412 Epoch: 6 Global Step: 72230 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:05:36,492-Speed 5444.58 samples/sec Loss 7.3941 LearningRate 0.1412 Epoch: 6 Global Step: 72240 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:05:44,006-Speed 5451.91 samples/sec Loss 7.3494 LearningRate 0.1411 Epoch: 6 Global Step: 72250 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:05:51,519-Speed 5452.56 samples/sec Loss 7.4295 LearningRate 0.1411 Epoch: 6 Global Step: 72260 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:05:59,084-Speed 5415.51 samples/sec Loss 7.4611 LearningRate 0.1411 Epoch: 6 Global Step: 72270 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:06:06,555-Speed 5483.10 samples/sec Loss 7.3578 LearningRate 0.1411 Epoch: 6 Global Step: 72280 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:06:14,063-Speed 5456.11 samples/sec Loss 7.3727 LearningRate 0.1411 Epoch: 6 Global Step: 72290 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:06:21,635-Speed 5409.88 samples/sec Loss 7.3952 LearningRate 0.1410 Epoch: 6 Global Step: 72300 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:06:29,132-Speed 5464.21 samples/sec Loss 7.5059 LearningRate 0.1410 Epoch: 6 Global Step: 72310 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:06:36,613-Speed 5476.01 samples/sec Loss 7.4242 LearningRate 0.1410 Epoch: 6 Global Step: 72320 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:06:44,114-Speed 5461.51 samples/sec Loss 7.4308 LearningRate 0.1410 Epoch: 6 Global Step: 72330 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:06:51,820-Speed 5315.66 samples/sec Loss 7.4409 LearningRate 0.1410 Epoch: 6 Global Step: 72340 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:06:59,313-Speed 5467.42 samples/sec Loss 7.3954 LearningRate 0.1409 Epoch: 6 Global Step: 72350 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:07:06,809-Speed 5465.66 samples/sec Loss 7.4327 LearningRate 0.1409 Epoch: 6 Global Step: 72360 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:07:14,301-Speed 5468.00 samples/sec Loss 7.3711 LearningRate 0.1409 Epoch: 6 Global Step: 72370 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:07:21,825-Speed 5444.67 samples/sec Loss 7.3712 LearningRate 0.1409 Epoch: 6 Global Step: 72380 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:07:29,265-Speed 5505.17 samples/sec Loss 7.3687 LearningRate 0.1408 Epoch: 6 Global Step: 72390 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:07:36,721-Speed 5494.71 samples/sec Loss 7.3326 LearningRate 0.1408 Epoch: 6 Global Step: 72400 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:07:44,179-Speed 5493.58 samples/sec Loss 7.4062 LearningRate 0.1408 Epoch: 6 Global Step: 72410 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:07:51,718-Speed 5433.22 samples/sec Loss 7.4148 LearningRate 0.1408 Epoch: 6 Global Step: 72420 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:07:59,271-Speed 5423.73 samples/sec Loss 7.3567 LearningRate 0.1408 Epoch: 6 Global Step: 72430 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:08:06,767-Speed 5465.50 samples/sec Loss 7.3531 LearningRate 0.1407 Epoch: 6 Global Step: 72440 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:08:14,202-Speed 5509.59 samples/sec Loss 7.3616 LearningRate 0.1407 Epoch: 6 Global Step: 72450 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:08:21,656-Speed 5495.51 samples/sec Loss 7.3958 LearningRate 0.1407 Epoch: 6 Global Step: 72460 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:08:29,154-Speed 5463.88 samples/sec Loss 7.3744 LearningRate 0.1407 Epoch: 6 Global Step: 72470 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:08:36,751-Speed 5392.26 samples/sec Loss 7.3677 LearningRate 0.1407 Epoch: 6 Global Step: 72480 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:08:44,169-Speed 5522.60 samples/sec Loss 7.4281 LearningRate 0.1406 Epoch: 6 Global Step: 72490 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:08:51,713-Speed 5430.00 samples/sec Loss 7.4764 LearningRate 0.1406 Epoch: 6 Global Step: 72500 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:08:59,274-Speed 5417.58 samples/sec Loss 7.4622 LearningRate 0.1406 Epoch: 6 Global Step: 72510 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:09:06,761-Speed 5471.88 samples/sec Loss 7.3537 LearningRate 0.1406 Epoch: 6 Global Step: 72520 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:09:14,273-Speed 5453.79 samples/sec Loss 7.3594 LearningRate 0.1406 Epoch: 6 Global Step: 72530 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:09:21,717-Speed 5503.10 samples/sec Loss 7.4027 LearningRate 0.1405 Epoch: 6 Global Step: 72540 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:09:29,238-Speed 5446.69 samples/sec Loss 7.3355 LearningRate 0.1405 Epoch: 6 Global Step: 72550 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:09:36,815-Speed 5406.38 samples/sec Loss 7.3569 LearningRate 0.1405 Epoch: 6 Global Step: 72560 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:09:44,528-Speed 5311.39 samples/sec Loss 7.4289 LearningRate 0.1405 Epoch: 6 Global Step: 72570 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:09:52,119-Speed 5396.44 samples/sec Loss 7.3657 LearningRate 0.1404 Epoch: 6 Global Step: 72580 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:10:16,693-Speed 1666.88 samples/sec Loss 7.4607 LearningRate 0.1404 Epoch: 7 Global Step: 72590 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:10:24,127-Speed 5511.03 samples/sec Loss 7.4436 LearningRate 0.1404 Epoch: 7 Global Step: 72600 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:10:31,610-Speed 5474.71 samples/sec Loss 7.4297 LearningRate 0.1404 Epoch: 7 Global Step: 72610 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:10:39,123-Speed 5452.35 samples/sec Loss 7.3824 LearningRate 0.1404 Epoch: 7 Global Step: 72620 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:10:46,528-Speed 5531.86 samples/sec Loss 7.3430 LearningRate 0.1403 Epoch: 7 Global Step: 72630 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:10:53,964-Speed 5509.84 samples/sec Loss 7.3928 LearningRate 0.1403 Epoch: 7 Global Step: 72640 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:11:01,511-Speed 5427.51 samples/sec Loss 7.4474 LearningRate 0.1403 Epoch: 7 Global Step: 72650 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:11:08,947-Speed 5509.17 samples/sec Loss 7.3735 LearningRate 0.1403 Epoch: 7 Global Step: 72660 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:11:16,407-Speed 5491.60 samples/sec Loss 7.2623 LearningRate 0.1403 Epoch: 7 Global Step: 72670 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:11:23,899-Speed 5467.69 samples/sec Loss 7.3554 LearningRate 0.1402 Epoch: 7 Global Step: 72680 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:11:31,370-Speed 5483.54 samples/sec Loss 7.3888 LearningRate 0.1402 Epoch: 7 Global Step: 72690 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:11:38,833-Speed 5489.22 samples/sec Loss 7.3765 LearningRate 0.1402 Epoch: 7 Global Step: 72700 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:11:46,387-Speed 5422.89 samples/sec Loss 7.3532 LearningRate 0.1402 Epoch: 7 Global Step: 72710 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:11:53,921-Speed 5437.52 samples/sec Loss 7.3988 LearningRate 0.1402 Epoch: 7 Global Step: 72720 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:12:01,342-Speed 5520.17 samples/sec Loss 7.3422 LearningRate 0.1401 Epoch: 7 Global Step: 72730 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:12:08,793-Speed 5497.82 samples/sec Loss 7.3905 LearningRate 0.1401 Epoch: 7 Global Step: 72740 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:12:16,309-Speed 5450.17 samples/sec Loss 7.3322 LearningRate 0.1401 Epoch: 7 Global Step: 72750 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:12:23,737-Speed 5515.44 samples/sec Loss 7.3718 LearningRate 0.1401 Epoch: 7 Global Step: 72760 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:12:31,193-Speed 5494.29 samples/sec Loss 7.3610 LearningRate 0.1401 Epoch: 7 Global Step: 72770 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:12:38,785-Speed 5395.62 samples/sec Loss 7.3989 LearningRate 0.1400 Epoch: 7 Global Step: 72780 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:12:46,243-Speed 5492.87 samples/sec Loss 7.3857 LearningRate 0.1400 Epoch: 7 Global Step: 72790 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:12:53,735-Speed 5468.06 samples/sec Loss 7.2923 LearningRate 0.1400 Epoch: 7 Global Step: 72800 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-01-08 11:13:01,288-Speed 5423.97 samples/sec Loss 7.3039 LearningRate 0.1400 Epoch: 7 Global Step: 72810 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:13:08,752-Speed 5488.29 samples/sec Loss 7.3165 LearningRate 0.1399 Epoch: 7 Global Step: 72820 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:13:16,209-Speed 5493.60 samples/sec Loss 7.3271 LearningRate 0.1399 Epoch: 7 Global Step: 72830 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:13:23,680-Speed 5483.04 samples/sec Loss 7.4296 LearningRate 0.1399 Epoch: 7 Global Step: 72840 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:13:31,234-Speed 5423.00 samples/sec Loss 7.3405 LearningRate 0.1399 Epoch: 7 Global Step: 72850 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:13:38,770-Speed 5436.28 samples/sec Loss 7.3098 LearningRate 0.1399 Epoch: 7 Global Step: 72860 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:13:46,506-Speed 5294.93 samples/sec Loss 7.3240 LearningRate 0.1398 Epoch: 7 Global Step: 72870 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:13:54,312-Speed 5248.02 samples/sec Loss 7.3266 LearningRate 0.1398 Epoch: 7 Global Step: 72880 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:14:02,156-Speed 5222.68 samples/sec Loss 7.3851 LearningRate 0.1398 Epoch: 7 Global Step: 72890 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:14:09,927-Speed 5271.68 samples/sec Loss 7.2824 LearningRate 0.1398 Epoch: 7 Global Step: 72900 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:14:17,649-Speed 5304.46 samples/sec Loss 7.3049 LearningRate 0.1398 Epoch: 7 Global Step: 72910 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:14:25,339-Speed 5326.99 samples/sec Loss 7.3128 LearningRate 0.1397 Epoch: 7 Global Step: 72920 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:14:33,157-Speed 5240.40 samples/sec Loss 7.3335 LearningRate 0.1397 Epoch: 7 Global Step: 72930 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:14:40,964-Speed 5247.06 samples/sec Loss 7.4153 LearningRate 0.1397 Epoch: 7 Global Step: 72940 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:14:48,859-Speed 5188.87 samples/sec Loss 7.3566 LearningRate 0.1397 Epoch: 7 Global Step: 72950 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:14:56,740-Speed 5197.91 samples/sec Loss 7.4146 LearningRate 0.1397 Epoch: 7 Global Step: 72960 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:15:04,388-Speed 5356.52 samples/sec Loss 7.3226 LearningRate 0.1396 Epoch: 7 Global Step: 72970 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:15:11,912-Speed 5444.15 samples/sec Loss 7.3995 LearningRate 0.1396 Epoch: 7 Global Step: 72980 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:15:19,473-Speed 5417.86 samples/sec Loss 7.3450 LearningRate 0.1396 Epoch: 7 Global Step: 72990 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:15:27,215-Speed 5291.09 samples/sec Loss 7.3084 LearningRate 0.1396 Epoch: 7 Global Step: 73000 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:15:34,773-Speed 5420.78 samples/sec Loss 7.3541 LearningRate 0.1396 Epoch: 7 Global Step: 73010 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:15:42,384-Speed 5381.97 samples/sec Loss 7.3621 LearningRate 0.1395 Epoch: 7 Global Step: 73020 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:15:49,871-Speed 5471.50 samples/sec Loss 7.3385 LearningRate 0.1395 Epoch: 7 Global Step: 73030 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:15:57,344-Speed 5481.87 samples/sec Loss 7.4053 LearningRate 0.1395 Epoch: 7 Global Step: 73040 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:16:04,859-Speed 5451.36 samples/sec Loss 7.3751 LearningRate 0.1395 Epoch: 7 Global Step: 73050 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:16:12,342-Speed 5474.32 samples/sec Loss 7.3477 LearningRate 0.1395 Epoch: 7 Global Step: 73060 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:16:19,949-Speed 5385.56 samples/sec Loss 7.3242 LearningRate 0.1394 Epoch: 7 Global Step: 73070 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:16:27,387-Speed 5507.56 samples/sec Loss 7.3456 LearningRate 0.1394 Epoch: 7 Global Step: 73080 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:16:34,867-Speed 5476.57 samples/sec Loss 7.3780 LearningRate 0.1394 Epoch: 7 Global Step: 73090 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:16:42,361-Speed 5466.74 samples/sec Loss 7.2944 LearningRate 0.1394 Epoch: 7 Global Step: 73100 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:16:49,909-Speed 5426.86 samples/sec Loss 7.3415 LearningRate 0.1393 Epoch: 7 Global Step: 73110 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:16:57,453-Speed 5430.62 samples/sec Loss 7.3750 LearningRate 0.1393 Epoch: 7 Global Step: 73120 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:17:05,031-Speed 5405.72 samples/sec Loss 7.3238 LearningRate 0.1393 Epoch: 7 Global Step: 73130 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:17:12,485-Speed 5496.03 samples/sec Loss 7.3030 LearningRate 0.1393 Epoch: 7 Global Step: 73140 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:17:20,134-Speed 5355.58 samples/sec Loss 7.3428 LearningRate 0.1393 Epoch: 7 Global Step: 73150 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:17:27,697-Speed 5416.74 samples/sec Loss 7.3082 LearningRate 0.1392 Epoch: 7 Global Step: 73160 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:17:35,270-Speed 5409.26 samples/sec Loss 7.2856 LearningRate 0.1392 Epoch: 7 Global Step: 73170 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:17:43,119-Speed 5218.97 samples/sec Loss 7.3544 LearningRate 0.1392 Epoch: 7 Global Step: 73180 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:17:50,785-Speed 5343.76 samples/sec Loss 7.3999 LearningRate 0.1392 Epoch: 7 Global Step: 73190 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:17:58,330-Speed 5429.85 samples/sec Loss 7.3746 LearningRate 0.1392 Epoch: 7 Global Step: 73200 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:18:05,943-Speed 5380.67 samples/sec Loss 7.3891 LearningRate 0.1391 Epoch: 7 Global Step: 73210 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:18:13,378-Speed 5509.82 samples/sec Loss 7.3750 LearningRate 0.1391 Epoch: 7 Global Step: 73220 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:18:20,924-Speed 5428.89 samples/sec Loss 7.3895 LearningRate 0.1391 Epoch: 7 Global Step: 73230 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:18:28,430-Speed 5457.48 samples/sec Loss 7.4100 LearningRate 0.1391 Epoch: 7 Global Step: 73240 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:18:36,107-Speed 5336.08 samples/sec Loss 7.3321 LearningRate 0.1391 Epoch: 7 Global Step: 73250 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:18:43,614-Speed 5457.12 samples/sec Loss 7.2886 LearningRate 0.1390 Epoch: 7 Global Step: 73260 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:18:51,079-Speed 5487.78 samples/sec Loss 7.2673 LearningRate 0.1390 Epoch: 7 Global Step: 73270 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:18:58,664-Speed 5400.06 samples/sec Loss 7.3262 LearningRate 0.1390 Epoch: 7 Global Step: 73280 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:19:06,204-Speed 5433.51 samples/sec Loss 7.3968 LearningRate 0.1390 Epoch: 7 Global Step: 73290 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:19:13,734-Speed 5440.29 samples/sec Loss 7.3426 LearningRate 0.1390 Epoch: 7 Global Step: 73300 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:19:21,295-Speed 5417.76 samples/sec Loss 7.2947 LearningRate 0.1389 Epoch: 7 Global Step: 73310 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:19:28,839-Speed 5430.23 samples/sec Loss 7.3300 LearningRate 0.1389 Epoch: 7 Global Step: 73320 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:19:36,481-Speed 5360.37 samples/sec Loss 7.4398 LearningRate 0.1389 Epoch: 7 Global Step: 73330 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:19:44,026-Speed 5429.55 samples/sec Loss 7.3370 LearningRate 0.1389 Epoch: 7 Global Step: 73340 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:19:51,617-Speed 5396.67 samples/sec Loss 7.4187 LearningRate 0.1388 Epoch: 7 Global Step: 73350 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:19:59,165-Speed 5427.44 samples/sec Loss 7.3668 LearningRate 0.1388 Epoch: 7 Global Step: 73360 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:20:06,578-Speed 5525.78 samples/sec Loss 7.2713 LearningRate 0.1388 Epoch: 7 Global Step: 73370 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:20:14,073-Speed 5465.72 samples/sec Loss 7.2707 LearningRate 0.1388 Epoch: 7 Global Step: 73380 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:20:21,566-Speed 5467.65 samples/sec Loss 7.3502 LearningRate 0.1388 Epoch: 7 Global Step: 73390 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:20:29,162-Speed 5392.49 samples/sec Loss 7.3104 LearningRate 0.1387 Epoch: 7 Global Step: 73400 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:20:36,679-Speed 5449.82 samples/sec Loss 7.3261 LearningRate 0.1387 Epoch: 7 Global Step: 73410 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:20:44,171-Speed 5467.98 samples/sec Loss 7.2624 LearningRate 0.1387 Epoch: 7 Global Step: 73420 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:20:51,663-Speed 5468.34 samples/sec Loss 7.4119 LearningRate 0.1387 Epoch: 7 Global Step: 73430 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:20:59,198-Speed 5436.52 samples/sec Loss 7.3823 LearningRate 0.1387 Epoch: 7 Global Step: 73440 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:21:06,786-Speed 5399.12 samples/sec Loss 7.3459 LearningRate 0.1386 Epoch: 7 Global Step: 73450 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:21:14,360-Speed 5408.46 samples/sec Loss 7.3949 LearningRate 0.1386 Epoch: 7 Global Step: 73460 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:21:21,862-Speed 5460.24 samples/sec Loss 7.3264 LearningRate 0.1386 Epoch: 7 Global Step: 73470 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:21:29,439-Speed 5407.33 samples/sec Loss 7.3855 LearningRate 0.1386 Epoch: 7 Global Step: 73480 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:21:37,082-Speed 5359.33 samples/sec Loss 7.3033 LearningRate 0.1386 Epoch: 7 Global Step: 73490 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:21:44,649-Speed 5413.91 samples/sec Loss 7.2858 LearningRate 0.1385 Epoch: 7 Global Step: 73500 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:21:52,306-Speed 5350.21 samples/sec Loss 7.4086 LearningRate 0.1385 Epoch: 7 Global Step: 73510 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:21:59,976-Speed 5340.74 samples/sec Loss 7.3528 LearningRate 0.1385 Epoch: 7 Global Step: 73520 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:22:07,568-Speed 5396.17 samples/sec Loss 7.3109 LearningRate 0.1385 Epoch: 7 Global Step: 73530 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:22:15,162-Speed 5394.33 samples/sec Loss 7.3337 LearningRate 0.1385 Epoch: 7 Global Step: 73540 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:22:22,818-Speed 5350.44 samples/sec Loss 7.3592 LearningRate 0.1384 Epoch: 7 Global Step: 73550 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:22:30,364-Speed 5428.76 samples/sec Loss 7.3481 LearningRate 0.1384 Epoch: 7 Global Step: 73560 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:22:37,902-Speed 5434.86 samples/sec Loss 7.3372 LearningRate 0.1384 Epoch: 7 Global Step: 73570 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:22:45,488-Speed 5400.14 samples/sec Loss 7.3858 LearningRate 0.1384 Epoch: 7 Global Step: 73580 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:22:53,058-Speed 5410.72 samples/sec Loss 7.3065 LearningRate 0.1384 Epoch: 7 Global Step: 73590 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:23:00,736-Speed 5336.04 samples/sec Loss 7.3381 LearningRate 0.1383 Epoch: 7 Global Step: 73600 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:23:08,277-Speed 5431.92 samples/sec Loss 7.3055 LearningRate 0.1383 Epoch: 7 Global Step: 73610 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:23:15,812-Speed 5437.03 samples/sec Loss 7.2350 LearningRate 0.1383 Epoch: 7 Global Step: 73620 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:23:23,313-Speed 5461.26 samples/sec Loss 7.3341 LearningRate 0.1383 Epoch: 7 Global Step: 73630 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:23:30,914-Speed 5389.50 samples/sec Loss 7.2410 LearningRate 0.1382 Epoch: 7 Global Step: 73640 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:23:38,511-Speed 5392.25 samples/sec Loss 7.3459 LearningRate 0.1382 Epoch: 7 Global Step: 73650 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:23:46,101-Speed 5397.11 samples/sec Loss 7.2982 LearningRate 0.1382 Epoch: 7 Global Step: 73660 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:23:53,610-Speed 5455.58 samples/sec Loss 7.2504 LearningRate 0.1382 Epoch: 7 Global Step: 73670 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:24:01,126-Speed 5449.82 samples/sec Loss 7.3471 LearningRate 0.1382 Epoch: 7 Global Step: 73680 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:24:08,697-Speed 5410.88 samples/sec Loss 7.2519 LearningRate 0.1381 Epoch: 7 Global Step: 73690 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:24:16,164-Speed 5486.64 samples/sec Loss 7.3081 LearningRate 0.1381 Epoch: 7 Global Step: 73700 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:24:23,700-Speed 5435.99 samples/sec Loss 7.3066 LearningRate 0.1381 Epoch: 7 Global Step: 73710 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:24:31,287-Speed 5398.69 samples/sec Loss 7.2883 LearningRate 0.1381 Epoch: 7 Global Step: 73720 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:24:38,929-Speed 5360.81 samples/sec Loss 7.3668 LearningRate 0.1381 Epoch: 7 Global Step: 73730 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:24:46,408-Speed 5477.62 samples/sec Loss 7.3485 LearningRate 0.1380 Epoch: 7 Global Step: 73740 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:24:53,881-Speed 5482.03 samples/sec Loss 7.3353 LearningRate 0.1380 Epoch: 7 Global Step: 73750 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:25:01,357-Speed 5478.98 samples/sec Loss 7.3647 LearningRate 0.1380 Epoch: 7 Global Step: 73760 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:25:09,002-Speed 5359.00 samples/sec Loss 7.3358 LearningRate 0.1380 Epoch: 7 Global Step: 73770 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:25:16,433-Speed 5512.70 samples/sec Loss 7.2709 LearningRate 0.1380 Epoch: 7 Global Step: 73780 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:25:23,911-Speed 5478.19 samples/sec Loss 7.3466 LearningRate 0.1379 Epoch: 7 Global Step: 73790 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:25:31,346-Speed 5510.13 samples/sec Loss 7.3815 LearningRate 0.1379 Epoch: 7 Global Step: 73800 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:25:38,854-Speed 5456.05 samples/sec Loss 7.3244 LearningRate 0.1379 Epoch: 7 Global Step: 73810 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:25:46,348-Speed 5466.56 samples/sec Loss 7.3181 LearningRate 0.1379 Epoch: 7 Global Step: 73820 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:25:53,950-Speed 5388.63 samples/sec Loss 7.3172 LearningRate 0.1379 Epoch: 7 Global Step: 73830 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:26:01,501-Speed 5425.70 samples/sec Loss 7.3322 LearningRate 0.1378 Epoch: 7 Global Step: 73840 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:26:08,974-Speed 5481.36 samples/sec Loss 7.1700 LearningRate 0.1378 Epoch: 7 Global Step: 73850 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:26:16,470-Speed 5465.03 samples/sec Loss 7.2920 LearningRate 0.1378 Epoch: 7 Global Step: 73860 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:26:23,995-Speed 5444.46 samples/sec Loss 7.2975 LearningRate 0.1378 Epoch: 7 Global Step: 73870 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:26:31,583-Speed 5398.01 samples/sec Loss 7.3256 LearningRate 0.1378 Epoch: 7 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:26:39,140-Speed 5420.69 samples/sec Loss 7.2928 LearningRate 0.1377 Epoch: 7 Global Step: 73890 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:26:46,598-Speed 5492.94 samples/sec Loss 7.2635 LearningRate 0.1377 Epoch: 7 Global Step: 73900 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:26:54,113-Speed 5451.78 samples/sec Loss 7.3294 LearningRate 0.1377 Epoch: 7 Global Step: 73910 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:27:01,618-Speed 5457.67 samples/sec Loss 7.3042 LearningRate 0.1377 Epoch: 7 Global Step: 73920 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:27:09,093-Speed 5480.00 samples/sec Loss 7.3062 LearningRate 0.1377 Epoch: 7 Global Step: 73930 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:27:16,649-Speed 5422.41 samples/sec Loss 7.3311 LearningRate 0.1376 Epoch: 7 Global Step: 73940 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:27:24,166-Speed 5449.35 samples/sec Loss 7.3221 LearningRate 0.1376 Epoch: 7 Global Step: 73950 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:27:31,749-Speed 5402.43 samples/sec Loss 7.2737 LearningRate 0.1376 Epoch: 7 Global Step: 73960 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:27:39,309-Speed 5418.31 samples/sec Loss 7.3335 LearningRate 0.1376 Epoch: 7 Global Step: 73970 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:27:46,805-Speed 5464.76 samples/sec Loss 7.2530 LearningRate 0.1375 Epoch: 7 Global Step: 73980 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:27:54,340-Speed 5437.27 samples/sec Loss 7.2838 LearningRate 0.1375 Epoch: 7 Global Step: 73990 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:28:01,898-Speed 5419.65 samples/sec Loss 7.2819 LearningRate 0.1375 Epoch: 7 Global Step: 74000 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:28:46,382-[lfw][74000]XNorm: 23.041898 Training: 2022-01-08 11:28:46,383-[lfw][74000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-01-08 11:28:46,383-[lfw][74000]Accuracy-Highest: 0.99817 Training: 2022-01-08 11:29:38,159-[cfp_fp][74000]XNorm: 20.887236 Training: 2022-01-08 11:29:38,160-[cfp_fp][74000]Accuracy-Flip: 0.98371+-0.00539 Training: 2022-01-08 11:29:38,160-[cfp_fp][74000]Accuracy-Highest: 0.98771 Training: 2022-01-08 11:30:24,007-[agedb_30][74000]XNorm: 22.723156 Training: 2022-01-08 11:30:24,008-[agedb_30][74000]Accuracy-Flip: 0.97283+-0.01019 Training: 2022-01-08 11:30:24,008-[agedb_30][74000]Accuracy-Highest: 0.97667 Training: 2022-01-08 11:30:31,601-Speed 273.61 samples/sec Loss 7.3666 LearningRate 0.1375 Epoch: 7 Global Step: 74010 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:30:39,034-Speed 5512.69 samples/sec Loss 7.2581 LearningRate 0.1375 Epoch: 7 Global Step: 74020 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:30:46,540-Speed 5457.83 samples/sec Loss 7.3262 LearningRate 0.1374 Epoch: 7 Global Step: 74030 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:30:54,087-Speed 5428.79 samples/sec Loss 7.2425 LearningRate 0.1374 Epoch: 7 Global Step: 74040 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:31:01,702-Speed 5379.39 samples/sec Loss 7.3546 LearningRate 0.1374 Epoch: 7 Global Step: 74050 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:31:09,197-Speed 5466.81 samples/sec Loss 7.3082 LearningRate 0.1374 Epoch: 7 Global Step: 74060 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:31:16,857-Speed 5347.78 samples/sec Loss 7.2786 LearningRate 0.1374 Epoch: 7 Global Step: 74070 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:31:24,441-Speed 5401.30 samples/sec Loss 7.2803 LearningRate 0.1373 Epoch: 7 Global Step: 74080 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:31:31,900-Speed 5492.20 samples/sec Loss 7.3386 LearningRate 0.1373 Epoch: 7 Global Step: 74090 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:31:39,424-Speed 5444.83 samples/sec Loss 7.2248 LearningRate 0.1373 Epoch: 7 Global Step: 74100 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:31:47,019-Speed 5393.70 samples/sec Loss 7.3724 LearningRate 0.1373 Epoch: 7 Global Step: 74110 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:31:54,478-Speed 5491.37 samples/sec Loss 7.2424 LearningRate 0.1373 Epoch: 7 Global Step: 74120 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:32:02,076-Speed 5391.63 samples/sec Loss 7.3084 LearningRate 0.1372 Epoch: 7 Global Step: 74130 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:32:09,538-Speed 5490.47 samples/sec Loss 7.2460 LearningRate 0.1372 Epoch: 7 Global Step: 74140 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:32:16,977-Speed 5506.43 samples/sec Loss 7.3549 LearningRate 0.1372 Epoch: 7 Global Step: 74150 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-01-08 11:32:24,459-Speed 5475.18 samples/sec Loss 7.2781 LearningRate 0.1372 Epoch: 7 Global Step: 74160 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:32:32,005-Speed 5428.48 samples/sec Loss 7.3207 LearningRate 0.1372 Epoch: 7 Global Step: 74170 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:32:39,466-Speed 5491.26 samples/sec Loss 7.2321 LearningRate 0.1371 Epoch: 7 Global Step: 74180 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:32:46,995-Speed 5440.46 samples/sec Loss 7.2846 LearningRate 0.1371 Epoch: 7 Global Step: 74190 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:32:54,452-Speed 5493.42 samples/sec Loss 7.2611 LearningRate 0.1371 Epoch: 7 Global Step: 74200 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:33:01,886-Speed 5510.74 samples/sec Loss 7.3099 LearningRate 0.1371 Epoch: 7 Global Step: 74210 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:33:09,393-Speed 5457.31 samples/sec Loss 7.2675 LearningRate 0.1371 Epoch: 7 Global Step: 74220 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:33:16,971-Speed 5405.81 samples/sec Loss 7.2332 LearningRate 0.1370 Epoch: 7 Global Step: 74230 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:33:24,415-Speed 5502.42 samples/sec Loss 7.2851 LearningRate 0.1370 Epoch: 7 Global Step: 74240 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:33:31,878-Speed 5489.39 samples/sec Loss 7.2783 LearningRate 0.1370 Epoch: 7 Global Step: 74250 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:33:39,352-Speed 5481.35 samples/sec Loss 7.3512 LearningRate 0.1370 Epoch: 7 Global Step: 74260 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:33:47,025-Speed 5338.92 samples/sec Loss 7.2827 LearningRate 0.1369 Epoch: 7 Global Step: 74270 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:33:54,496-Speed 5482.76 samples/sec Loss 7.2579 LearningRate 0.1369 Epoch: 7 Global Step: 74280 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:34:01,905-Speed 5528.96 samples/sec Loss 7.2702 LearningRate 0.1369 Epoch: 7 Global Step: 74290 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:34:09,399-Speed 5466.37 samples/sec Loss 7.3246 LearningRate 0.1369 Epoch: 7 Global Step: 74300 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:34:16,913-Speed 5452.40 samples/sec Loss 7.3266 LearningRate 0.1369 Epoch: 7 Global Step: 74310 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:34:24,447-Speed 5436.97 samples/sec Loss 7.2812 LearningRate 0.1368 Epoch: 7 Global Step: 74320 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:34:32,054-Speed 5385.43 samples/sec Loss 7.2799 LearningRate 0.1368 Epoch: 7 Global Step: 74330 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:34:39,557-Speed 5460.32 samples/sec Loss 7.2674 LearningRate 0.1368 Epoch: 7 Global Step: 74340 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:34:46,995-Speed 5507.68 samples/sec Loss 7.2769 LearningRate 0.1368 Epoch: 7 Global Step: 74350 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:34:54,626-Speed 5367.93 samples/sec Loss 7.3163 LearningRate 0.1368 Epoch: 7 Global Step: 74360 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:35:02,256-Speed 5368.71 samples/sec Loss 7.3279 LearningRate 0.1367 Epoch: 7 Global Step: 74370 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:35:09,853-Speed 5393.04 samples/sec Loss 7.2389 LearningRate 0.1367 Epoch: 7 Global Step: 74380 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-01-08 11:35:17,381-Speed 5441.29 samples/sec Loss 7.2787 LearningRate 0.1367 Epoch: 7 Global Step: 74390 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:35:24,949-Speed 5413.22 samples/sec Loss 7.2613 LearningRate 0.1367 Epoch: 7 Global Step: 74400 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:35:32,384-Speed 5509.66 samples/sec Loss 7.3326 LearningRate 0.1367 Epoch: 7 Global Step: 74410 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:35:39,883-Speed 5462.99 samples/sec Loss 7.3081 LearningRate 0.1366 Epoch: 7 Global Step: 74420 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:35:47,349-Speed 5486.37 samples/sec Loss 7.3071 LearningRate 0.1366 Epoch: 7 Global Step: 74430 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:35:54,925-Speed 5407.41 samples/sec Loss 7.3342 LearningRate 0.1366 Epoch: 7 Global Step: 74440 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-01-08 11:36:02,389-Speed 5489.14 samples/sec Loss 7.2979 LearningRate 0.1366 Epoch: 7 Global Step: 74450 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:36:09,884-Speed 5465.34 samples/sec Loss 7.3092 LearningRate 0.1366 Epoch: 7 Global Step: 74460 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:36:17,482-Speed 5391.92 samples/sec Loss 7.3195 LearningRate 0.1365 Epoch: 7 Global Step: 74470 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:36:25,009-Speed 5442.40 samples/sec Loss 7.1674 LearningRate 0.1365 Epoch: 7 Global Step: 74480 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:36:32,496-Speed 5471.86 samples/sec Loss 7.2314 LearningRate 0.1365 Epoch: 7 Global Step: 74490 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:36:40,069-Speed 5409.03 samples/sec Loss 7.2255 LearningRate 0.1365 Epoch: 7 Global Step: 74500 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:36:47,625-Speed 5421.36 samples/sec Loss 7.2612 LearningRate 0.1365 Epoch: 7 Global Step: 74510 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:36:55,069-Speed 5503.26 samples/sec Loss 7.2322 LearningRate 0.1364 Epoch: 7 Global Step: 74520 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:37:02,539-Speed 5484.36 samples/sec Loss 7.2428 LearningRate 0.1364 Epoch: 7 Global Step: 74530 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:37:10,018-Speed 5477.00 samples/sec Loss 7.2596 LearningRate 0.1364 Epoch: 7 Global Step: 74540 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:37:17,514-Speed 5465.22 samples/sec Loss 7.2537 LearningRate 0.1364 Epoch: 7 Global Step: 74550 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:37:25,071-Speed 5420.77 samples/sec Loss 7.3093 LearningRate 0.1364 Epoch: 7 Global Step: 74560 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:37:32,553-Speed 5475.78 samples/sec Loss 7.2859 LearningRate 0.1363 Epoch: 7 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:37:40,031-Speed 5478.12 samples/sec Loss 7.2872 LearningRate 0.1363 Epoch: 7 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:37:47,604-Speed 5409.10 samples/sec Loss 7.2555 LearningRate 0.1363 Epoch: 7 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:37:55,105-Speed 5461.88 samples/sec Loss 7.2856 LearningRate 0.1363 Epoch: 7 Global Step: 74600 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:38:02,619-Speed 5451.27 samples/sec Loss 7.2977 LearningRate 0.1363 Epoch: 7 Global Step: 74610 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:38:10,109-Speed 5469.36 samples/sec Loss 7.2820 LearningRate 0.1362 Epoch: 7 Global Step: 74620 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:38:17,631-Speed 5446.52 samples/sec Loss 7.2198 LearningRate 0.1362 Epoch: 7 Global Step: 74630 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:38:25,081-Speed 5498.85 samples/sec Loss 7.2578 LearningRate 0.1362 Epoch: 7 Global Step: 74640 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:38:32,506-Speed 5516.84 samples/sec Loss 7.2566 LearningRate 0.1362 Epoch: 7 Global Step: 74650 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:38:40,007-Speed 5461.05 samples/sec Loss 7.1925 LearningRate 0.1361 Epoch: 7 Global Step: 74660 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:38:47,522-Speed 5451.56 samples/sec Loss 7.2688 LearningRate 0.1361 Epoch: 7 Global Step: 74670 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:38:55,007-Speed 5473.08 samples/sec Loss 7.2208 LearningRate 0.1361 Epoch: 7 Global Step: 74680 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:39:02,561-Speed 5423.00 samples/sec Loss 7.2931 LearningRate 0.1361 Epoch: 7 Global Step: 74690 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:39:10,031-Speed 5483.99 samples/sec Loss 7.2717 LearningRate 0.1361 Epoch: 7 Global Step: 74700 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:39:17,546-Speed 5451.37 samples/sec Loss 7.2889 LearningRate 0.1360 Epoch: 7 Global Step: 74710 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 11:39:24,996-Speed 5498.70 samples/sec Loss 7.2833 LearningRate 0.1360 Epoch: 7 Global Step: 74720 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 11:39:32,437-Speed 5505.60 samples/sec Loss 7.2244 LearningRate 0.1360 Epoch: 7 Global Step: 74730 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 11:39:39,908-Speed 5483.08 samples/sec Loss 7.2970 LearningRate 0.1360 Epoch: 7 Global Step: 74740 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 11:39:47,468-Speed 5418.88 samples/sec Loss 7.1870 LearningRate 0.1360 Epoch: 7 Global Step: 74750 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 11:39:55,128-Speed 5348.45 samples/sec Loss 7.2078 LearningRate 0.1359 Epoch: 7 Global Step: 74760 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 11:40:02,552-Speed 5517.81 samples/sec Loss 7.2223 LearningRate 0.1359 Epoch: 7 Global Step: 74770 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 11:40:10,167-Speed 5379.49 samples/sec Loss 7.2332 LearningRate 0.1359 Epoch: 7 Global Step: 74780 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 11:40:17,689-Speed 5446.06 samples/sec Loss 7.1839 LearningRate 0.1359 Epoch: 7 Global Step: 74790 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 11:40:25,159-Speed 5484.43 samples/sec Loss 7.2209 LearningRate 0.1359 Epoch: 7 Global Step: 74800 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 11:40:32,636-Speed 5478.93 samples/sec Loss 7.2077 LearningRate 0.1358 Epoch: 7 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:40:40,070-Speed 5509.87 samples/sec Loss 7.2806 LearningRate 0.1358 Epoch: 7 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:40:47,652-Speed 5403.60 samples/sec Loss 7.2017 LearningRate 0.1358 Epoch: 7 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:40:55,152-Speed 5462.28 samples/sec Loss 7.2977 LearningRate 0.1358 Epoch: 7 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:41:02,580-Speed 5514.90 samples/sec Loss 7.2257 LearningRate 0.1358 Epoch: 7 Global Step: 74850 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:41:10,092-Speed 5453.36 samples/sec Loss 7.2307 LearningRate 0.1357 Epoch: 7 Global Step: 74860 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:41:17,580-Speed 5471.10 samples/sec Loss 7.2839 LearningRate 0.1357 Epoch: 7 Global Step: 74870 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:41:24,988-Speed 5529.46 samples/sec Loss 7.2226 LearningRate 0.1357 Epoch: 7 Global Step: 74880 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:41:32,453-Speed 5488.06 samples/sec Loss 7.2623 LearningRate 0.1357 Epoch: 7 Global Step: 74890 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:41:39,946-Speed 5466.66 samples/sec Loss 7.2458 LearningRate 0.1357 Epoch: 7 Global Step: 74900 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:41:47,436-Speed 5469.38 samples/sec Loss 7.2818 LearningRate 0.1356 Epoch: 7 Global Step: 74910 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:41:54,867-Speed 5513.28 samples/sec Loss 7.2271 LearningRate 0.1356 Epoch: 7 Global Step: 74920 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:42:02,363-Speed 5465.00 samples/sec Loss 7.2636 LearningRate 0.1356 Epoch: 7 Global Step: 74930 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:42:09,894-Speed 5439.55 samples/sec Loss 7.2812 LearningRate 0.1356 Epoch: 7 Global Step: 74940 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:42:17,370-Speed 5479.70 samples/sec Loss 7.3073 LearningRate 0.1356 Epoch: 7 Global Step: 74950 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:42:24,789-Speed 5522.14 samples/sec Loss 7.2017 LearningRate 0.1355 Epoch: 7 Global Step: 74960 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:42:32,482-Speed 5324.63 samples/sec Loss 7.2081 LearningRate 0.1355 Epoch: 7 Global Step: 74970 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:42:39,959-Speed 5478.74 samples/sec Loss 7.2351 LearningRate 0.1355 Epoch: 7 Global Step: 74980 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:42:47,415-Speed 5494.66 samples/sec Loss 7.2368 LearningRate 0.1355 Epoch: 7 Global Step: 74990 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:42:54,997-Speed 5403.60 samples/sec Loss 7.2188 LearningRate 0.1355 Epoch: 7 Global Step: 75000 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:43:02,499-Speed 5460.60 samples/sec Loss 7.2893 LearningRate 0.1354 Epoch: 7 Global Step: 75010 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:43:10,192-Speed 5324.74 samples/sec Loss 7.2450 LearningRate 0.1354 Epoch: 7 Global Step: 75020 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:43:17,698-Speed 5457.58 samples/sec Loss 7.1920 LearningRate 0.1354 Epoch: 7 Global Step: 75030 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:43:25,159-Speed 5490.99 samples/sec Loss 7.2481 LearningRate 0.1354 Epoch: 7 Global Step: 75040 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:43:32,679-Speed 5447.32 samples/sec Loss 7.2095 LearningRate 0.1353 Epoch: 7 Global Step: 75050 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:43:40,129-Speed 5499.03 samples/sec Loss 7.2408 LearningRate 0.1353 Epoch: 7 Global Step: 75060 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:43:47,573-Speed 5503.32 samples/sec Loss 7.2067 LearningRate 0.1353 Epoch: 7 Global Step: 75070 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:43:55,040-Speed 5486.10 samples/sec Loss 7.1844 LearningRate 0.1353 Epoch: 7 Global Step: 75080 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:44:02,520-Speed 5477.10 samples/sec Loss 7.2050 LearningRate 0.1353 Epoch: 7 Global Step: 75090 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:44:10,011-Speed 5468.08 samples/sec Loss 7.2196 LearningRate 0.1352 Epoch: 7 Global Step: 75100 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:44:17,619-Speed 5384.51 samples/sec Loss 7.2529 LearningRate 0.1352 Epoch: 7 Global Step: 75110 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:44:25,055-Speed 5509.32 samples/sec Loss 7.2293 LearningRate 0.1352 Epoch: 7 Global Step: 75120 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:44:32,537-Speed 5475.44 samples/sec Loss 7.3014 LearningRate 0.1352 Epoch: 7 Global Step: 75130 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:44:40,072-Speed 5436.60 samples/sec Loss 7.2711 LearningRate 0.1352 Epoch: 7 Global Step: 75140 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:44:47,510-Speed 5507.84 samples/sec Loss 7.2253 LearningRate 0.1351 Epoch: 7 Global Step: 75150 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:44:54,900-Speed 5543.04 samples/sec Loss 7.1776 LearningRate 0.1351 Epoch: 7 Global Step: 75160 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:45:02,381-Speed 5476.15 samples/sec Loss 7.2629 LearningRate 0.1351 Epoch: 7 Global Step: 75170 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:45:09,912-Speed 5439.91 samples/sec Loss 7.1792 LearningRate 0.1351 Epoch: 7 Global Step: 75180 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:45:17,501-Speed 5397.30 samples/sec Loss 7.2127 LearningRate 0.1351 Epoch: 7 Global Step: 75190 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:45:24,984-Speed 5475.25 samples/sec Loss 7.2911 LearningRate 0.1350 Epoch: 7 Global Step: 75200 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:45:32,477-Speed 5467.30 samples/sec Loss 7.2318 LearningRate 0.1350 Epoch: 7 Global Step: 75210 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:45:40,039-Speed 5416.64 samples/sec Loss 7.2653 LearningRate 0.1350 Epoch: 7 Global Step: 75220 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:45:47,469-Speed 5513.60 samples/sec Loss 7.1780 LearningRate 0.1350 Epoch: 7 Global Step: 75230 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:45:54,942-Speed 5482.30 samples/sec Loss 7.2854 LearningRate 0.1350 Epoch: 7 Global Step: 75240 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:46:02,407-Speed 5487.94 samples/sec Loss 7.2213 LearningRate 0.1349 Epoch: 7 Global Step: 75250 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:46:09,929-Speed 5446.09 samples/sec Loss 7.2220 LearningRate 0.1349 Epoch: 7 Global Step: 75260 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:46:17,425-Speed 5464.59 samples/sec Loss 7.2710 LearningRate 0.1349 Epoch: 7 Global Step: 75270 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:46:24,861-Speed 5508.68 samples/sec Loss 7.2296 LearningRate 0.1349 Epoch: 7 Global Step: 75280 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:46:32,285-Speed 5518.29 samples/sec Loss 7.2335 LearningRate 0.1349 Epoch: 7 Global Step: 75290 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:46:39,738-Speed 5496.91 samples/sec Loss 7.1901 LearningRate 0.1348 Epoch: 7 Global Step: 75300 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:46:47,297-Speed 5419.14 samples/sec Loss 7.1855 LearningRate 0.1348 Epoch: 7 Global Step: 75310 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:46:54,747-Speed 5498.84 samples/sec Loss 7.2048 LearningRate 0.1348 Epoch: 7 Global Step: 75320 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:47:02,302-Speed 5422.66 samples/sec Loss 7.2707 LearningRate 0.1348 Epoch: 7 Global Step: 75330 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:47:09,839-Speed 5435.30 samples/sec Loss 7.2250 LearningRate 0.1348 Epoch: 7 Global Step: 75340 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:47:17,317-Speed 5477.38 samples/sec Loss 7.1963 LearningRate 0.1347 Epoch: 7 Global Step: 75350 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:47:24,767-Speed 5499.58 samples/sec Loss 7.2482 LearningRate 0.1347 Epoch: 7 Global Step: 75360 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:47:32,264-Speed 5463.86 samples/sec Loss 7.1944 LearningRate 0.1347 Epoch: 7 Global Step: 75370 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:47:39,894-Speed 5369.33 samples/sec Loss 7.2216 LearningRate 0.1347 Epoch: 7 Global Step: 75380 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:47:47,443-Speed 5426.55 samples/sec Loss 7.2841 LearningRate 0.1347 Epoch: 7 Global Step: 75390 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:47:54,882-Speed 5506.49 samples/sec Loss 7.2369 LearningRate 0.1346 Epoch: 7 Global Step: 75400 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:48:02,325-Speed 5504.35 samples/sec Loss 7.2126 LearningRate 0.1346 Epoch: 7 Global Step: 75410 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:48:09,941-Speed 5378.66 samples/sec Loss 7.1815 LearningRate 0.1346 Epoch: 7 Global Step: 75420 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:48:17,591-Speed 5355.26 samples/sec Loss 7.2742 LearningRate 0.1346 Epoch: 7 Global Step: 75430 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:48:25,166-Speed 5407.41 samples/sec Loss 7.2570 LearningRate 0.1346 Epoch: 7 Global Step: 75440 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:48:32,709-Speed 5430.86 samples/sec Loss 7.2319 LearningRate 0.1345 Epoch: 7 Global Step: 75450 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:48:40,373-Speed 5345.47 samples/sec Loss 7.1777 LearningRate 0.1345 Epoch: 7 Global Step: 75460 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:48:47,871-Speed 5463.39 samples/sec Loss 7.2297 LearningRate 0.1345 Epoch: 7 Global Step: 75470 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:48:55,453-Speed 5402.95 samples/sec Loss 7.2317 LearningRate 0.1345 Epoch: 7 Global Step: 75480 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:49:02,886-Speed 5511.06 samples/sec Loss 7.2578 LearningRate 0.1345 Epoch: 7 Global Step: 75490 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:49:10,382-Speed 5465.47 samples/sec Loss 7.1194 LearningRate 0.1344 Epoch: 7 Global Step: 75500 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:49:17,863-Speed 5475.47 samples/sec Loss 7.2470 LearningRate 0.1344 Epoch: 7 Global Step: 75510 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:49:25,324-Speed 5490.21 samples/sec Loss 7.2002 LearningRate 0.1344 Epoch: 7 Global Step: 75520 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:49:32,844-Speed 5447.63 samples/sec Loss 7.2368 LearningRate 0.1344 Epoch: 7 Global Step: 75530 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:49:40,478-Speed 5366.75 samples/sec Loss 7.2945 LearningRate 0.1343 Epoch: 7 Global Step: 75540 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:49:48,049-Speed 5410.98 samples/sec Loss 7.3033 LearningRate 0.1343 Epoch: 7 Global Step: 75550 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:49:55,539-Speed 5468.75 samples/sec Loss 7.1816 LearningRate 0.1343 Epoch: 7 Global Step: 75560 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:50:03,016-Speed 5479.27 samples/sec Loss 7.1776 LearningRate 0.1343 Epoch: 7 Global Step: 75570 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:50:10,597-Speed 5403.70 samples/sec Loss 7.2363 LearningRate 0.1343 Epoch: 7 Global Step: 75580 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:50:18,054-Speed 5493.28 samples/sec Loss 7.2536 LearningRate 0.1342 Epoch: 7 Global Step: 75590 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:50:25,498-Speed 5503.05 samples/sec Loss 7.2474 LearningRate 0.1342 Epoch: 7 Global Step: 75600 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:50:32,958-Speed 5490.88 samples/sec Loss 7.1820 LearningRate 0.1342 Epoch: 7 Global Step: 75610 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:50:40,429-Speed 5483.75 samples/sec Loss 7.1804 LearningRate 0.1342 Epoch: 7 Global Step: 75620 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:50:47,896-Speed 5485.88 samples/sec Loss 7.2017 LearningRate 0.1342 Epoch: 7 Global Step: 75630 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:50:55,382-Speed 5472.38 samples/sec Loss 7.1600 LearningRate 0.1341 Epoch: 7 Global Step: 75640 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:51:02,955-Speed 5409.33 samples/sec Loss 7.1844 LearningRate 0.1341 Epoch: 7 Global Step: 75650 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:51:10,465-Speed 5455.07 samples/sec Loss 7.2601 LearningRate 0.1341 Epoch: 7 Global Step: 75660 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:51:17,928-Speed 5489.18 samples/sec Loss 7.2404 LearningRate 0.1341 Epoch: 7 Global Step: 75670 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:51:25,322-Speed 5540.10 samples/sec Loss 7.2999 LearningRate 0.1341 Epoch: 7 Global Step: 75680 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:51:32,753-Speed 5513.10 samples/sec Loss 7.1725 LearningRate 0.1340 Epoch: 7 Global Step: 75690 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:51:40,194-Speed 5505.17 samples/sec Loss 7.1468 LearningRate 0.1340 Epoch: 7 Global Step: 75700 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:51:47,677-Speed 5474.51 samples/sec Loss 7.2149 LearningRate 0.1340 Epoch: 7 Global Step: 75710 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:51:55,093-Speed 5523.55 samples/sec Loss 7.1829 LearningRate 0.1340 Epoch: 7 Global Step: 75720 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:52:02,571-Speed 5477.86 samples/sec Loss 7.1982 LearningRate 0.1340 Epoch: 7 Global Step: 75730 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:52:10,076-Speed 5458.85 samples/sec Loss 7.1932 LearningRate 0.1339 Epoch: 7 Global Step: 75740 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:52:17,603-Speed 5442.54 samples/sec Loss 7.2428 LearningRate 0.1339 Epoch: 7 Global Step: 75750 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:52:25,022-Speed 5521.94 samples/sec Loss 7.2288 LearningRate 0.1339 Epoch: 7 Global Step: 75760 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:52:32,527-Speed 5457.59 samples/sec Loss 7.2045 LearningRate 0.1339 Epoch: 7 Global Step: 75770 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:52:40,003-Speed 5479.70 samples/sec Loss 7.1802 LearningRate 0.1339 Epoch: 7 Global Step: 75780 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:52:47,479-Speed 5479.65 samples/sec Loss 7.2436 LearningRate 0.1338 Epoch: 7 Global Step: 75790 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:52:54,997-Speed 5449.17 samples/sec Loss 7.1145 LearningRate 0.1338 Epoch: 7 Global Step: 75800 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:53:02,511-Speed 5451.89 samples/sec Loss 7.1872 LearningRate 0.1338 Epoch: 7 Global Step: 75810 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:53:10,097-Speed 5400.43 samples/sec Loss 7.1165 LearningRate 0.1338 Epoch: 7 Global Step: 75820 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:53:17,554-Speed 5493.55 samples/sec Loss 7.2490 LearningRate 0.1338 Epoch: 7 Global Step: 75830 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:53:25,014-Speed 5490.96 samples/sec Loss 7.2536 LearningRate 0.1337 Epoch: 7 Global Step: 75840 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:53:32,537-Speed 5444.84 samples/sec Loss 7.2176 LearningRate 0.1337 Epoch: 7 Global Step: 75850 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:53:40,042-Speed 5458.99 samples/sec Loss 7.1163 LearningRate 0.1337 Epoch: 7 Global Step: 75860 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:53:47,640-Speed 5391.32 samples/sec Loss 7.1285 LearningRate 0.1337 Epoch: 7 Global Step: 75870 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:53:55,153-Speed 5453.03 samples/sec Loss 7.1984 LearningRate 0.1337 Epoch: 7 Global Step: 75880 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:54:02,697-Speed 5429.33 samples/sec Loss 7.2644 LearningRate 0.1336 Epoch: 7 Global Step: 75890 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:54:10,230-Speed 5438.15 samples/sec Loss 7.1686 LearningRate 0.1336 Epoch: 7 Global Step: 75900 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:54:17,737-Speed 5457.04 samples/sec Loss 7.2091 LearningRate 0.1336 Epoch: 7 Global Step: 75910 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:54:25,235-Speed 5463.46 samples/sec Loss 7.2030 LearningRate 0.1336 Epoch: 7 Global Step: 75920 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:54:32,763-Speed 5442.32 samples/sec Loss 7.1538 LearningRate 0.1336 Epoch: 7 Global Step: 75930 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:54:40,387-Speed 5373.19 samples/sec Loss 7.1385 LearningRate 0.1335 Epoch: 7 Global Step: 75940 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:54:47,912-Speed 5443.81 samples/sec Loss 7.1715 LearningRate 0.1335 Epoch: 7 Global Step: 75950 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:54:55,479-Speed 5413.66 samples/sec Loss 7.3184 LearningRate 0.1335 Epoch: 7 Global Step: 75960 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:55:03,101-Speed 5374.41 samples/sec Loss 7.2225 LearningRate 0.1335 Epoch: 7 Global Step: 75970 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:55:10,644-Speed 5431.04 samples/sec Loss 7.2583 LearningRate 0.1335 Epoch: 7 Global Step: 75980 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:55:18,298-Speed 5352.28 samples/sec Loss 7.2212 LearningRate 0.1334 Epoch: 7 Global Step: 75990 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:55:25,745-Speed 5501.08 samples/sec Loss 7.1384 LearningRate 0.1334 Epoch: 7 Global Step: 76000 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:56:10,041-[lfw][76000]XNorm: 22.878465 Training: 2022-01-08 11:56:10,042-[lfw][76000]Accuracy-Flip: 0.99717+-0.00279 Training: 2022-01-08 11:56:10,043-[lfw][76000]Accuracy-Highest: 0.99817 Training: 2022-01-08 11:57:01,801-[cfp_fp][76000]XNorm: 20.644460 Training: 2022-01-08 11:57:01,802-[cfp_fp][76000]Accuracy-Flip: 0.98629+-0.00487 Training: 2022-01-08 11:57:01,803-[cfp_fp][76000]Accuracy-Highest: 0.98771 Training: 2022-01-08 11:57:47,409-[agedb_30][76000]XNorm: 22.711006 Training: 2022-01-08 11:57:47,410-[agedb_30][76000]Accuracy-Flip: 0.97450+-0.00785 Training: 2022-01-08 11:57:47,411-[agedb_30][76000]Accuracy-Highest: 0.97667 Training: 2022-01-08 11:57:55,012-Speed 274.41 samples/sec Loss 7.1943 LearningRate 0.1334 Epoch: 7 Global Step: 76010 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:58:02,469-Speed 5494.93 samples/sec Loss 7.2153 LearningRate 0.1334 Epoch: 7 Global Step: 76020 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:58:10,035-Speed 5415.70 samples/sec Loss 7.1822 LearningRate 0.1334 Epoch: 7 Global Step: 76030 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:58:17,409-Speed 5556.47 samples/sec Loss 7.1313 LearningRate 0.1333 Epoch: 7 Global Step: 76040 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 11:58:24,894-Speed 5473.45 samples/sec Loss 7.1569 LearningRate 0.1333 Epoch: 7 Global Step: 76050 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:58:32,446-Speed 5424.48 samples/sec Loss 7.1722 LearningRate 0.1333 Epoch: 7 Global Step: 76060 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:58:40,022-Speed 5407.43 samples/sec Loss 7.2019 LearningRate 0.1333 Epoch: 7 Global Step: 76070 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:58:47,491-Speed 5484.84 samples/sec Loss 7.1185 LearningRate 0.1333 Epoch: 7 Global Step: 76080 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:58:55,082-Speed 5398.10 samples/sec Loss 7.1450 LearningRate 0.1332 Epoch: 7 Global Step: 76090 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:59:02,686-Speed 5387.48 samples/sec Loss 7.1939 LearningRate 0.1332 Epoch: 7 Global Step: 76100 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:59:10,201-Speed 5451.46 samples/sec Loss 7.1721 LearningRate 0.1332 Epoch: 7 Global Step: 76110 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:59:17,745-Speed 5430.07 samples/sec Loss 7.1680 LearningRate 0.1332 Epoch: 7 Global Step: 76120 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:59:25,260-Speed 5450.38 samples/sec Loss 7.1719 LearningRate 0.1331 Epoch: 7 Global Step: 76130 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 11:59:32,688-Speed 5515.77 samples/sec Loss 7.1155 LearningRate 0.1331 Epoch: 7 Global Step: 76140 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 11:59:40,312-Speed 5373.33 samples/sec Loss 7.1865 LearningRate 0.1331 Epoch: 7 Global Step: 76150 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 11:59:47,819-Speed 5456.99 samples/sec Loss 7.2045 LearningRate 0.1331 Epoch: 7 Global Step: 76160 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 11:59:55,334-Speed 5450.97 samples/sec Loss 7.1701 LearningRate 0.1331 Epoch: 7 Global Step: 76170 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:00:02,850-Speed 5450.40 samples/sec Loss 7.2086 LearningRate 0.1330 Epoch: 7 Global Step: 76180 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:00:10,316-Speed 5487.35 samples/sec Loss 7.1375 LearningRate 0.1330 Epoch: 7 Global Step: 76190 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:00:17,775-Speed 5491.48 samples/sec Loss 7.1649 LearningRate 0.1330 Epoch: 7 Global Step: 76200 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:00:25,295-Speed 5447.33 samples/sec Loss 7.1957 LearningRate 0.1330 Epoch: 7 Global Step: 76210 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:00:32,776-Speed 5475.35 samples/sec Loss 7.2206 LearningRate 0.1330 Epoch: 7 Global Step: 76220 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:00:40,240-Speed 5489.06 samples/sec Loss 7.1968 LearningRate 0.1329 Epoch: 7 Global Step: 76230 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:00:47,787-Speed 5428.25 samples/sec Loss 7.1278 LearningRate 0.1329 Epoch: 7 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:00:55,304-Speed 5449.42 samples/sec Loss 7.2051 LearningRate 0.1329 Epoch: 7 Global Step: 76250 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:01:02,918-Speed 5380.06 samples/sec Loss 7.1614 LearningRate 0.1329 Epoch: 7 Global Step: 76260 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:01:10,366-Speed 5500.61 samples/sec Loss 7.1577 LearningRate 0.1329 Epoch: 7 Global Step: 76270 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:01:17,842-Speed 5479.64 samples/sec Loss 7.2243 LearningRate 0.1328 Epoch: 7 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:01:25,333-Speed 5468.06 samples/sec Loss 7.1005 LearningRate 0.1328 Epoch: 7 Global Step: 76290 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:01:32,845-Speed 5453.40 samples/sec Loss 7.1172 LearningRate 0.1328 Epoch: 7 Global Step: 76300 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:01:40,320-Speed 5480.57 samples/sec Loss 7.1546 LearningRate 0.1328 Epoch: 7 Global Step: 76310 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:01:47,760-Speed 5506.18 samples/sec Loss 7.1468 LearningRate 0.1328 Epoch: 7 Global Step: 76320 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:01:55,269-Speed 5455.80 samples/sec Loss 7.1680 LearningRate 0.1327 Epoch: 7 Global Step: 76330 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:02:02,863-Speed 5394.41 samples/sec Loss 7.2027 LearningRate 0.1327 Epoch: 7 Global Step: 76340 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:02:10,443-Speed 5404.11 samples/sec Loss 7.1728 LearningRate 0.1327 Epoch: 7 Global Step: 76350 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:02:17,951-Speed 5456.65 samples/sec Loss 7.1763 LearningRate 0.1327 Epoch: 7 Global Step: 76360 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:02:25,418-Speed 5486.19 samples/sec Loss 7.1208 LearningRate 0.1327 Epoch: 7 Global Step: 76370 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:02:32,907-Speed 5470.08 samples/sec Loss 7.1383 LearningRate 0.1326 Epoch: 7 Global Step: 76380 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:02:40,442-Speed 5436.91 samples/sec Loss 7.1199 LearningRate 0.1326 Epoch: 7 Global Step: 76390 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:02:47,902-Speed 5491.21 samples/sec Loss 7.1040 LearningRate 0.1326 Epoch: 7 Global Step: 76400 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:02:55,386-Speed 5473.68 samples/sec Loss 7.1032 LearningRate 0.1326 Epoch: 7 Global Step: 76410 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:03:02,876-Speed 5469.51 samples/sec Loss 7.2159 LearningRate 0.1326 Epoch: 7 Global Step: 76420 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:03:10,374-Speed 5463.55 samples/sec Loss 7.1639 LearningRate 0.1325 Epoch: 7 Global Step: 76430 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:03:17,879-Speed 5458.95 samples/sec Loss 7.1869 LearningRate 0.1325 Epoch: 7 Global Step: 76440 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:03:25,367-Speed 5470.82 samples/sec Loss 7.2045 LearningRate 0.1325 Epoch: 7 Global Step: 76450 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:03:32,885-Speed 5448.43 samples/sec Loss 7.1806 LearningRate 0.1325 Epoch: 7 Global Step: 76460 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:03:40,329-Speed 5503.67 samples/sec Loss 7.1212 LearningRate 0.1325 Epoch: 7 Global Step: 76470 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:03:47,832-Speed 5460.00 samples/sec Loss 7.1668 LearningRate 0.1324 Epoch: 7 Global Step: 76480 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:03:55,259-Speed 5515.88 samples/sec Loss 7.1431 LearningRate 0.1324 Epoch: 7 Global Step: 76490 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:04:02,829-Speed 5410.73 samples/sec Loss 7.1446 LearningRate 0.1324 Epoch: 7 Global Step: 76500 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:04:10,443-Speed 5380.96 samples/sec Loss 7.1230 LearningRate 0.1324 Epoch: 7 Global Step: 76510 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:04:18,062-Speed 5376.73 samples/sec Loss 7.0999 LearningRate 0.1324 Epoch: 7 Global Step: 76520 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:04:25,631-Speed 5412.42 samples/sec Loss 7.1704 LearningRate 0.1323 Epoch: 7 Global Step: 76530 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:04:33,188-Speed 5420.50 samples/sec Loss 7.1676 LearningRate 0.1323 Epoch: 7 Global Step: 76540 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:04:40,855-Speed 5343.17 samples/sec Loss 7.1335 LearningRate 0.1323 Epoch: 7 Global Step: 76550 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:04:48,383-Speed 5441.28 samples/sec Loss 7.1738 LearningRate 0.1323 Epoch: 7 Global Step: 76560 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:04:55,978-Speed 5394.22 samples/sec Loss 7.1308 LearningRate 0.1323 Epoch: 7 Global Step: 76570 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:05:03,566-Speed 5398.89 samples/sec Loss 7.1362 LearningRate 0.1322 Epoch: 7 Global Step: 76580 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:05:11,097-Speed 5439.54 samples/sec Loss 7.1110 LearningRate 0.1322 Epoch: 7 Global Step: 76590 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:05:18,709-Speed 5381.06 samples/sec Loss 7.1963 LearningRate 0.1322 Epoch: 7 Global Step: 76600 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:05:26,317-Speed 5384.74 samples/sec Loss 7.1065 LearningRate 0.1322 Epoch: 7 Global Step: 76610 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:05:33,845-Speed 5442.05 samples/sec Loss 7.1126 LearningRate 0.1322 Epoch: 7 Global Step: 76620 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:05:41,406-Speed 5418.10 samples/sec Loss 7.0862 LearningRate 0.1321 Epoch: 7 Global Step: 76630 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:05:48,887-Speed 5475.81 samples/sec Loss 7.0957 LearningRate 0.1321 Epoch: 7 Global Step: 76640 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:05:56,368-Speed 5475.88 samples/sec Loss 7.1005 LearningRate 0.1321 Epoch: 7 Global Step: 76650 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:06:04,018-Speed 5354.98 samples/sec Loss 7.1102 LearningRate 0.1321 Epoch: 7 Global Step: 76660 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:06:11,765-Speed 5288.21 samples/sec Loss 7.1126 LearningRate 0.1321 Epoch: 7 Global Step: 76670 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:06:19,322-Speed 5421.23 samples/sec Loss 7.0827 LearningRate 0.1320 Epoch: 7 Global Step: 76680 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:06:26,801-Speed 5476.91 samples/sec Loss 7.1762 LearningRate 0.1320 Epoch: 7 Global Step: 76690 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:06:34,368-Speed 5413.75 samples/sec Loss 7.1132 LearningRate 0.1320 Epoch: 7 Global Step: 76700 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:06:41,932-Speed 5416.24 samples/sec Loss 7.1405 LearningRate 0.1320 Epoch: 7 Global Step: 76710 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:06:49,521-Speed 5397.72 samples/sec Loss 7.1456 LearningRate 0.1320 Epoch: 7 Global Step: 76720 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:06:56,995-Speed 5480.92 samples/sec Loss 7.1310 LearningRate 0.1319 Epoch: 7 Global Step: 76730 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:07:04,576-Speed 5403.52 samples/sec Loss 7.1261 LearningRate 0.1319 Epoch: 7 Global Step: 76740 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:07:12,192-Speed 5379.54 samples/sec Loss 7.1420 LearningRate 0.1319 Epoch: 7 Global Step: 76750 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:07:19,787-Speed 5393.49 samples/sec Loss 7.1669 LearningRate 0.1319 Epoch: 7 Global Step: 76760 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:07:27,359-Speed 5409.89 samples/sec Loss 7.1353 LearningRate 0.1319 Epoch: 7 Global Step: 76770 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:07:34,853-Speed 5466.04 samples/sec Loss 7.1317 LearningRate 0.1318 Epoch: 7 Global Step: 76780 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:07:42,338-Speed 5473.34 samples/sec Loss 7.1246 LearningRate 0.1318 Epoch: 7 Global Step: 76790 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:07:49,932-Speed 5394.59 samples/sec Loss 7.1215 LearningRate 0.1318 Epoch: 7 Global Step: 76800 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:07:57,473-Speed 5431.82 samples/sec Loss 7.1631 LearningRate 0.1318 Epoch: 7 Global Step: 76810 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:08:05,078-Speed 5387.15 samples/sec Loss 7.0952 LearningRate 0.1318 Epoch: 7 Global Step: 76820 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:08:12,622-Speed 5430.37 samples/sec Loss 7.0788 LearningRate 0.1317 Epoch: 7 Global Step: 76830 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:08:20,404-Speed 5263.70 samples/sec Loss 7.1348 LearningRate 0.1317 Epoch: 7 Global Step: 76840 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:08:27,916-Speed 5453.14 samples/sec Loss 7.1276 LearningRate 0.1317 Epoch: 7 Global Step: 76850 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:08:35,489-Speed 5409.53 samples/sec Loss 7.1393 LearningRate 0.1317 Epoch: 7 Global Step: 76860 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:08:42,900-Speed 5528.13 samples/sec Loss 7.0658 LearningRate 0.1317 Epoch: 7 Global Step: 76870 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:08:50,390-Speed 5468.81 samples/sec Loss 7.1556 LearningRate 0.1316 Epoch: 7 Global Step: 76880 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:08:57,924-Speed 5437.42 samples/sec Loss 7.1329 LearningRate 0.1316 Epoch: 7 Global Step: 76890 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:09:05,440-Speed 5450.07 samples/sec Loss 7.1290 LearningRate 0.1316 Epoch: 7 Global Step: 76900 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:09:13,034-Speed 5394.90 samples/sec Loss 7.1428 LearningRate 0.1316 Epoch: 7 Global Step: 76910 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:09:20,605-Speed 5410.75 samples/sec Loss 7.1636 LearningRate 0.1316 Epoch: 7 Global Step: 76920 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:09:28,059-Speed 5496.03 samples/sec Loss 7.1419 LearningRate 0.1315 Epoch: 7 Global Step: 76930 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:09:35,591-Speed 5438.63 samples/sec Loss 7.0836 LearningRate 0.1315 Epoch: 7 Global Step: 76940 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:09:43,083-Speed 5467.86 samples/sec Loss 7.1457 LearningRate 0.1315 Epoch: 7 Global Step: 76950 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:09:50,630-Speed 5428.41 samples/sec Loss 7.1089 LearningRate 0.1315 Epoch: 7 Global Step: 76960 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:09:58,084-Speed 5495.31 samples/sec Loss 7.1767 LearningRate 0.1315 Epoch: 7 Global Step: 76970 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:10:05,553-Speed 5485.41 samples/sec Loss 7.1552 LearningRate 0.1314 Epoch: 7 Global Step: 76980 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:10:12,995-Speed 5504.37 samples/sec Loss 7.1637 LearningRate 0.1314 Epoch: 7 Global Step: 76990 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:10:20,522-Speed 5442.65 samples/sec Loss 7.1518 LearningRate 0.1314 Epoch: 7 Global Step: 77000 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:10:28,031-Speed 5455.61 samples/sec Loss 7.1687 LearningRate 0.1314 Epoch: 7 Global Step: 77010 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:10:35,653-Speed 5374.27 samples/sec Loss 7.1945 LearningRate 0.1313 Epoch: 7 Global Step: 77020 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:10:43,160-Speed 5457.22 samples/sec Loss 7.1601 LearningRate 0.1313 Epoch: 7 Global Step: 77030 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:10:50,773-Speed 5380.73 samples/sec Loss 7.1152 LearningRate 0.1313 Epoch: 7 Global Step: 77040 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:10:58,319-Speed 5429.38 samples/sec Loss 7.1731 LearningRate 0.1313 Epoch: 7 Global Step: 77050 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:11:06,037-Speed 5307.00 samples/sec Loss 7.0719 LearningRate 0.1313 Epoch: 7 Global Step: 77060 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:11:13,535-Speed 5463.37 samples/sec Loss 7.0955 LearningRate 0.1312 Epoch: 7 Global Step: 77070 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:11:21,137-Speed 5389.05 samples/sec Loss 7.1295 LearningRate 0.1312 Epoch: 7 Global Step: 77080 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:11:28,708-Speed 5410.99 samples/sec Loss 7.1059 LearningRate 0.1312 Epoch: 7 Global Step: 77090 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:11:36,207-Speed 5462.49 samples/sec Loss 7.1557 LearningRate 0.1312 Epoch: 7 Global Step: 77100 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:11:43,784-Speed 5406.43 samples/sec Loss 7.1205 LearningRate 0.1312 Epoch: 7 Global Step: 77110 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:11:51,553-Speed 5273.16 samples/sec Loss 7.1314 LearningRate 0.1311 Epoch: 7 Global Step: 77120 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:11:59,122-Speed 5412.74 samples/sec Loss 7.1348 LearningRate 0.1311 Epoch: 7 Global Step: 77130 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:12:06,649-Speed 5442.38 samples/sec Loss 7.1436 LearningRate 0.1311 Epoch: 7 Global Step: 77140 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:12:14,189-Speed 5432.30 samples/sec Loss 7.1537 LearningRate 0.1311 Epoch: 7 Global Step: 77150 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:12:21,712-Speed 5445.87 samples/sec Loss 7.1071 LearningRate 0.1311 Epoch: 7 Global Step: 77160 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:12:29,298-Speed 5400.06 samples/sec Loss 7.0893 LearningRate 0.1310 Epoch: 7 Global Step: 77170 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:12:36,926-Speed 5370.58 samples/sec Loss 7.0373 LearningRate 0.1310 Epoch: 7 Global Step: 77180 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:12:44,527-Speed 5388.84 samples/sec Loss 7.1434 LearningRate 0.1310 Epoch: 7 Global Step: 77190 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:12:52,104-Speed 5406.61 samples/sec Loss 7.1075 LearningRate 0.1310 Epoch: 7 Global Step: 77200 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:12:59,777-Speed 5339.19 samples/sec Loss 7.0472 LearningRate 0.1310 Epoch: 7 Global Step: 77210 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:13:07,458-Speed 5333.02 samples/sec Loss 7.0909 LearningRate 0.1309 Epoch: 7 Global Step: 77220 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:13:14,988-Speed 5440.16 samples/sec Loss 7.1804 LearningRate 0.1309 Epoch: 7 Global Step: 77230 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:13:22,505-Speed 5449.44 samples/sec Loss 7.1694 LearningRate 0.1309 Epoch: 7 Global Step: 77240 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:13:30,100-Speed 5393.92 samples/sec Loss 7.1039 LearningRate 0.1309 Epoch: 7 Global Step: 77250 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:13:37,630-Speed 5440.79 samples/sec Loss 7.1835 LearningRate 0.1309 Epoch: 7 Global Step: 77260 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:13:45,085-Speed 5494.66 samples/sec Loss 7.1280 LearningRate 0.1308 Epoch: 7 Global Step: 77270 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:13:52,552-Speed 5485.77 samples/sec Loss 7.1406 LearningRate 0.1308 Epoch: 7 Global Step: 77280 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:14:00,078-Speed 5443.81 samples/sec Loss 7.1087 LearningRate 0.1308 Epoch: 7 Global Step: 77290 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:14:07,736-Speed 5349.06 samples/sec Loss 7.1462 LearningRate 0.1308 Epoch: 7 Global Step: 77300 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:14:15,323-Speed 5399.30 samples/sec Loss 7.0577 LearningRate 0.1308 Epoch: 7 Global Step: 77310 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:14:22,916-Speed 5395.07 samples/sec Loss 7.1116 LearningRate 0.1307 Epoch: 7 Global Step: 77320 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:14:30,412-Speed 5465.25 samples/sec Loss 7.0620 LearningRate 0.1307 Epoch: 7 Global Step: 77330 Fp16 Grad Scale: 262144 Required: 30 hours Training: 2022-01-08 12:14:37,813-Speed 5534.72 samples/sec Loss 7.1161 LearningRate 0.1307 Epoch: 7 Global Step: 77340 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:14:45,231-Speed 5522.87 samples/sec Loss 7.0578 LearningRate 0.1307 Epoch: 7 Global Step: 77350 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:14:52,789-Speed 5420.04 samples/sec Loss 7.1450 LearningRate 0.1307 Epoch: 7 Global Step: 77360 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:15:00,241-Speed 5497.29 samples/sec Loss 7.0617 LearningRate 0.1306 Epoch: 7 Global Step: 77370 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:15:07,901-Speed 5347.79 samples/sec Loss 7.1026 LearningRate 0.1306 Epoch: 7 Global Step: 77380 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:15:15,423-Speed 5446.28 samples/sec Loss 7.0895 LearningRate 0.1306 Epoch: 7 Global Step: 77390 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:15:22,801-Speed 5552.05 samples/sec Loss 7.0986 LearningRate 0.1306 Epoch: 7 Global Step: 77400 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:15:30,304-Speed 5460.22 samples/sec Loss 7.1054 LearningRate 0.1306 Epoch: 7 Global Step: 77410 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:15:37,867-Speed 5416.69 samples/sec Loss 7.1150 LearningRate 0.1305 Epoch: 7 Global Step: 77420 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:15:45,440-Speed 5408.87 samples/sec Loss 7.1455 LearningRate 0.1305 Epoch: 7 Global Step: 77430 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:15:52,960-Speed 5447.23 samples/sec Loss 7.0736 LearningRate 0.1305 Epoch: 7 Global Step: 77440 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:16:00,445-Speed 5473.60 samples/sec Loss 7.0961 LearningRate 0.1305 Epoch: 7 Global Step: 77450 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:16:07,986-Speed 5431.76 samples/sec Loss 7.1079 LearningRate 0.1305 Epoch: 7 Global Step: 77460 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:16:15,416-Speed 5513.99 samples/sec Loss 7.0823 LearningRate 0.1304 Epoch: 7 Global Step: 77470 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:16:22,893-Speed 5478.94 samples/sec Loss 7.0670 LearningRate 0.1304 Epoch: 7 Global Step: 77480 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:16:30,452-Speed 5419.25 samples/sec Loss 7.1045 LearningRate 0.1304 Epoch: 7 Global Step: 77490 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:16:38,060-Speed 5385.20 samples/sec Loss 7.0791 LearningRate 0.1304 Epoch: 7 Global Step: 77500 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:16:45,519-Speed 5491.54 samples/sec Loss 7.0901 LearningRate 0.1304 Epoch: 7 Global Step: 77510 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:16:53,039-Speed 5448.05 samples/sec Loss 7.1146 LearningRate 0.1303 Epoch: 7 Global Step: 77520 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:17:00,534-Speed 5465.05 samples/sec Loss 7.1434 LearningRate 0.1303 Epoch: 7 Global Step: 77530 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:17:08,039-Speed 5459.38 samples/sec Loss 7.1269 LearningRate 0.1303 Epoch: 7 Global Step: 77540 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:17:15,790-Speed 5284.45 samples/sec Loss 7.0972 LearningRate 0.1303 Epoch: 7 Global Step: 77550 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:17:23,334-Speed 5430.50 samples/sec Loss 7.1134 LearningRate 0.1303 Epoch: 7 Global Step: 77560 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:17:30,733-Speed 5536.40 samples/sec Loss 7.0952 LearningRate 0.1302 Epoch: 7 Global Step: 77570 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:17:38,168-Speed 5510.53 samples/sec Loss 7.1079 LearningRate 0.1302 Epoch: 7 Global Step: 77580 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:17:45,631-Speed 5488.82 samples/sec Loss 7.1650 LearningRate 0.1302 Epoch: 7 Global Step: 77590 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:17:53,174-Speed 5431.24 samples/sec Loss 7.0915 LearningRate 0.1302 Epoch: 7 Global Step: 77600 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:18:00,596-Speed 5519.37 samples/sec Loss 7.1101 LearningRate 0.1302 Epoch: 7 Global Step: 77610 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:18:08,018-Speed 5519.45 samples/sec Loss 7.0558 LearningRate 0.1301 Epoch: 7 Global Step: 77620 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:18:15,518-Speed 5461.92 samples/sec Loss 7.0771 LearningRate 0.1301 Epoch: 7 Global Step: 77630 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:18:23,098-Speed 5404.36 samples/sec Loss 7.0661 LearningRate 0.1301 Epoch: 7 Global Step: 77640 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:18:30,629-Speed 5439.92 samples/sec Loss 7.1085 LearningRate 0.1301 Epoch: 7 Global Step: 77650 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:18:38,075-Speed 5501.83 samples/sec Loss 7.1449 LearningRate 0.1301 Epoch: 7 Global Step: 77660 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:18:45,530-Speed 5495.16 samples/sec Loss 7.0756 LearningRate 0.1300 Epoch: 7 Global Step: 77670 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:18:52,979-Speed 5499.24 samples/sec Loss 7.0916 LearningRate 0.1300 Epoch: 7 Global Step: 77680 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:19:00,515-Speed 5435.86 samples/sec Loss 7.1067 LearningRate 0.1300 Epoch: 7 Global Step: 77690 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:19:08,142-Speed 5371.42 samples/sec Loss 7.1057 LearningRate 0.1300 Epoch: 7 Global Step: 77700 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:19:15,720-Speed 5405.72 samples/sec Loss 7.0776 LearningRate 0.1300 Epoch: 7 Global Step: 77710 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:19:23,145-Speed 5517.68 samples/sec Loss 7.1117 LearningRate 0.1299 Epoch: 7 Global Step: 77720 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:19:30,537-Speed 5541.57 samples/sec Loss 7.1827 LearningRate 0.1299 Epoch: 7 Global Step: 77730 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:19:37,934-Speed 5538.45 samples/sec Loss 7.1163 LearningRate 0.1299 Epoch: 7 Global Step: 77740 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:19:45,363-Speed 5514.52 samples/sec Loss 7.1315 LearningRate 0.1299 Epoch: 7 Global Step: 77750 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:19:53,012-Speed 5355.28 samples/sec Loss 7.1392 LearningRate 0.1299 Epoch: 7 Global Step: 77760 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:20:00,438-Speed 5516.51 samples/sec Loss 7.1066 LearningRate 0.1298 Epoch: 7 Global Step: 77770 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:20:07,872-Speed 5510.54 samples/sec Loss 7.1289 LearningRate 0.1298 Epoch: 7 Global Step: 77780 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:20:15,326-Speed 5496.23 samples/sec Loss 7.0792 LearningRate 0.1298 Epoch: 7 Global Step: 77790 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:20:22,890-Speed 5415.72 samples/sec Loss 7.0746 LearningRate 0.1298 Epoch: 7 Global Step: 77800 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:20:30,514-Speed 5373.19 samples/sec Loss 7.0758 LearningRate 0.1298 Epoch: 7 Global Step: 77810 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:20:38,171-Speed 5350.34 samples/sec Loss 7.0654 LearningRate 0.1297 Epoch: 7 Global Step: 77820 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:20:45,656-Speed 5472.82 samples/sec Loss 7.1261 LearningRate 0.1297 Epoch: 7 Global Step: 77830 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:20:53,217-Speed 5418.52 samples/sec Loss 7.0875 LearningRate 0.1297 Epoch: 7 Global Step: 77840 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:21:00,670-Speed 5496.41 samples/sec Loss 7.0076 LearningRate 0.1297 Epoch: 7 Global Step: 77850 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:21:08,121-Speed 5497.56 samples/sec Loss 7.0022 LearningRate 0.1297 Epoch: 7 Global Step: 77860 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:21:15,674-Speed 5424.05 samples/sec Loss 7.0083 LearningRate 0.1296 Epoch: 7 Global Step: 77870 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:21:23,110-Speed 5508.96 samples/sec Loss 7.0863 LearningRate 0.1296 Epoch: 7 Global Step: 77880 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:21:30,548-Speed 5507.70 samples/sec Loss 7.0879 LearningRate 0.1296 Epoch: 7 Global Step: 77890 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:21:38,031-Speed 5474.32 samples/sec Loss 7.0670 LearningRate 0.1296 Epoch: 7 Global Step: 77900 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:21:45,425-Speed 5540.90 samples/sec Loss 7.0685 LearningRate 0.1296 Epoch: 7 Global Step: 77910 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:21:52,864-Speed 5506.95 samples/sec Loss 7.1304 LearningRate 0.1295 Epoch: 7 Global Step: 77920 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:22:00,408-Speed 5429.71 samples/sec Loss 7.0231 LearningRate 0.1295 Epoch: 7 Global Step: 77930 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:22:07,819-Speed 5528.13 samples/sec Loss 7.0791 LearningRate 0.1295 Epoch: 7 Global Step: 77940 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:22:15,383-Speed 5415.59 samples/sec Loss 7.0748 LearningRate 0.1295 Epoch: 7 Global Step: 77950 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:22:22,806-Speed 5518.71 samples/sec Loss 7.1175 LearningRate 0.1295 Epoch: 7 Global Step: 77960 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:22:30,266-Speed 5490.92 samples/sec Loss 7.0759 LearningRate 0.1294 Epoch: 7 Global Step: 77970 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:22:37,755-Speed 5470.20 samples/sec Loss 7.0404 LearningRate 0.1294 Epoch: 7 Global Step: 77980 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:22:45,318-Speed 5416.92 samples/sec Loss 7.1376 LearningRate 0.1294 Epoch: 7 Global Step: 77990 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:22:52,756-Speed 5507.21 samples/sec Loss 7.0497 LearningRate 0.1294 Epoch: 7 Global Step: 78000 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:23:37,084-[lfw][78000]XNorm: 22.960847 Training: 2022-01-08 12:23:37,085-[lfw][78000]Accuracy-Flip: 0.99783+-0.00299 Training: 2022-01-08 12:23:37,085-[lfw][78000]Accuracy-Highest: 0.99817 Training: 2022-01-08 12:24:29,762-[cfp_fp][78000]XNorm: 20.959540 Training: 2022-01-08 12:24:29,763-[cfp_fp][78000]Accuracy-Flip: 0.98814+-0.00389 Training: 2022-01-08 12:24:29,764-[cfp_fp][78000]Accuracy-Highest: 0.98814 Training: 2022-01-08 12:25:18,486-[agedb_30][78000]XNorm: 23.009706 Training: 2022-01-08 12:25:18,487-[agedb_30][78000]Accuracy-Flip: 0.97533+-0.00781 Training: 2022-01-08 12:25:18,487-[agedb_30][78000]Accuracy-Highest: 0.97667 Training: 2022-01-08 12:25:26,120-Speed 267.08 samples/sec Loss 7.0236 LearningRate 0.1294 Epoch: 7 Global Step: 78010 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:25:33,653-Speed 5439.14 samples/sec Loss 7.0691 LearningRate 0.1293 Epoch: 7 Global Step: 78020 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:25:41,173-Speed 5447.72 samples/sec Loss 7.0310 LearningRate 0.1293 Epoch: 7 Global Step: 78030 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:25:48,798-Speed 5373.21 samples/sec Loss 7.0910 LearningRate 0.1293 Epoch: 7 Global Step: 78040 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:25:56,316-Speed 5449.15 samples/sec Loss 7.0309 LearningRate 0.1293 Epoch: 7 Global Step: 78050 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:26:03,767-Speed 5498.13 samples/sec Loss 7.0905 LearningRate 0.1293 Epoch: 7 Global Step: 78060 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:26:11,307-Speed 5432.80 samples/sec Loss 7.0912 LearningRate 0.1292 Epoch: 7 Global Step: 78070 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:26:18,824-Speed 5450.29 samples/sec Loss 7.0669 LearningRate 0.1292 Epoch: 7 Global Step: 78080 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:26:26,359-Speed 5436.07 samples/sec Loss 7.0665 LearningRate 0.1292 Epoch: 7 Global Step: 78090 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:26:33,856-Speed 5464.61 samples/sec Loss 7.0131 LearningRate 0.1292 Epoch: 7 Global Step: 78100 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:26:41,337-Speed 5476.35 samples/sec Loss 7.0728 LearningRate 0.1292 Epoch: 7 Global Step: 78110 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:26:48,890-Speed 5423.61 samples/sec Loss 7.0352 LearningRate 0.1291 Epoch: 7 Global Step: 78120 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:26:56,411-Speed 5446.20 samples/sec Loss 7.0956 LearningRate 0.1291 Epoch: 7 Global Step: 78130 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:27:03,919-Speed 5456.32 samples/sec Loss 7.0941 LearningRate 0.1291 Epoch: 7 Global Step: 78140 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:27:11,379-Speed 5491.66 samples/sec Loss 7.0755 LearningRate 0.1291 Epoch: 7 Global Step: 78150 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:27:18,832-Speed 5496.61 samples/sec Loss 7.0548 LearningRate 0.1291 Epoch: 7 Global Step: 78160 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:27:26,362-Speed 5439.75 samples/sec Loss 7.0716 LearningRate 0.1290 Epoch: 7 Global Step: 78170 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:27:33,827-Speed 5487.87 samples/sec Loss 7.0659 LearningRate 0.1290 Epoch: 7 Global Step: 78180 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:27:41,340-Speed 5452.73 samples/sec Loss 7.0689 LearningRate 0.1290 Epoch: 7 Global Step: 78190 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:27:48,822-Speed 5475.49 samples/sec Loss 7.0883 LearningRate 0.1290 Epoch: 7 Global Step: 78200 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:27:56,356-Speed 5436.64 samples/sec Loss 7.0799 LearningRate 0.1290 Epoch: 7 Global Step: 78210 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:28:03,863-Speed 5456.96 samples/sec Loss 6.9707 LearningRate 0.1289 Epoch: 7 Global Step: 78220 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:28:11,373-Speed 5455.53 samples/sec Loss 7.0636 LearningRate 0.1289 Epoch: 7 Global Step: 78230 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:28:18,851-Speed 5477.65 samples/sec Loss 7.0557 LearningRate 0.1289 Epoch: 7 Global Step: 78240 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:28:26,301-Speed 5499.00 samples/sec Loss 7.0125 LearningRate 0.1289 Epoch: 7 Global Step: 78250 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:28:33,747-Speed 5501.04 samples/sec Loss 7.0941 LearningRate 0.1289 Epoch: 7 Global Step: 78260 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:28:41,295-Speed 5428.10 samples/sec Loss 7.1138 LearningRate 0.1288 Epoch: 7 Global Step: 78270 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:28:48,776-Speed 5475.71 samples/sec Loss 7.0921 LearningRate 0.1288 Epoch: 7 Global Step: 78280 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:28:56,210-Speed 5510.54 samples/sec Loss 7.0063 LearningRate 0.1288 Epoch: 7 Global Step: 78290 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:29:03,694-Speed 5474.09 samples/sec Loss 7.0587 LearningRate 0.1288 Epoch: 7 Global Step: 78300 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:29:11,253-Speed 5419.38 samples/sec Loss 7.0530 LearningRate 0.1288 Epoch: 7 Global Step: 78310 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:29:18,764-Speed 5453.86 samples/sec Loss 7.0189 LearningRate 0.1287 Epoch: 7 Global Step: 78320 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:29:26,368-Speed 5387.26 samples/sec Loss 7.1225 LearningRate 0.1287 Epoch: 7 Global Step: 78330 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:29:33,892-Speed 5445.01 samples/sec Loss 7.0690 LearningRate 0.1287 Epoch: 7 Global Step: 78340 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:29:41,352-Speed 5491.47 samples/sec Loss 7.0793 LearningRate 0.1287 Epoch: 7 Global Step: 78350 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:29:48,756-Speed 5532.95 samples/sec Loss 6.9969 LearningRate 0.1287 Epoch: 7 Global Step: 78360 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:29:56,213-Speed 5493.90 samples/sec Loss 7.0970 LearningRate 0.1286 Epoch: 7 Global Step: 78370 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:30:03,650-Speed 5508.10 samples/sec Loss 6.9909 LearningRate 0.1286 Epoch: 7 Global Step: 78380 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:30:11,099-Speed 5499.40 samples/sec Loss 7.0532 LearningRate 0.1286 Epoch: 7 Global Step: 78390 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:30:18,576-Speed 5478.89 samples/sec Loss 7.0467 LearningRate 0.1286 Epoch: 7 Global Step: 78400 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:30:26,067-Speed 5469.12 samples/sec Loss 7.0374 LearningRate 0.1286 Epoch: 7 Global Step: 78410 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:30:33,493-Speed 5515.75 samples/sec Loss 7.0499 LearningRate 0.1285 Epoch: 7 Global Step: 78420 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:30:40,983-Speed 5469.82 samples/sec Loss 7.0218 LearningRate 0.1285 Epoch: 7 Global Step: 78430 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:30:48,376-Speed 5540.74 samples/sec Loss 7.0870 LearningRate 0.1285 Epoch: 7 Global Step: 78440 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:30:55,875-Speed 5463.43 samples/sec Loss 7.0372 LearningRate 0.1285 Epoch: 7 Global Step: 78450 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:31:03,509-Speed 5366.06 samples/sec Loss 7.0896 LearningRate 0.1285 Epoch: 7 Global Step: 78460 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:31:11,016-Speed 5456.60 samples/sec Loss 7.0604 LearningRate 0.1284 Epoch: 7 Global Step: 78470 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:31:18,517-Speed 5461.32 samples/sec Loss 7.0862 LearningRate 0.1284 Epoch: 7 Global Step: 78480 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:31:26,004-Speed 5471.91 samples/sec Loss 7.0198 LearningRate 0.1284 Epoch: 7 Global Step: 78490 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:31:33,450-Speed 5501.66 samples/sec Loss 7.0433 LearningRate 0.1284 Epoch: 7 Global Step: 78500 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:31:40,908-Speed 5492.44 samples/sec Loss 6.9995 LearningRate 0.1284 Epoch: 7 Global Step: 78510 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:31:48,394-Speed 5472.37 samples/sec Loss 7.0159 LearningRate 0.1283 Epoch: 7 Global Step: 78520 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:31:57,312-Speed 5510.93 samples/sec Loss 7.0128 LearningRate 0.1283 Epoch: 7 Global Step: 78530 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:32:04,816-Speed 5458.84 samples/sec Loss 7.0648 LearningRate 0.1283 Epoch: 7 Global Step: 78540 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:32:12,325-Speed 5455.56 samples/sec Loss 7.0588 LearningRate 0.1283 Epoch: 7 Global Step: 78550 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:32:19,857-Speed 5438.97 samples/sec Loss 7.0750 LearningRate 0.1283 Epoch: 7 Global Step: 78560 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:32:27,402-Speed 5429.36 samples/sec Loss 7.0802 LearningRate 0.1282 Epoch: 7 Global Step: 78570 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:32:34,896-Speed 5466.80 samples/sec Loss 7.0180 LearningRate 0.1282 Epoch: 7 Global Step: 78580 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:32:42,368-Speed 5482.17 samples/sec Loss 7.0260 LearningRate 0.1282 Epoch: 7 Global Step: 78590 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:32:49,984-Speed 5378.61 samples/sec Loss 7.0192 LearningRate 0.1282 Epoch: 7 Global Step: 78600 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:32:57,422-Speed 5508.39 samples/sec Loss 7.0485 LearningRate 0.1282 Epoch: 7 Global Step: 78610 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:33:04,984-Speed 5417.57 samples/sec Loss 7.0097 LearningRate 0.1281 Epoch: 7 Global Step: 78620 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-01-08 12:33:12,482-Speed 5463.29 samples/sec Loss 7.0468 LearningRate 0.1281 Epoch: 7 Global Step: 78630 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:33:19,899-Speed 5522.59 samples/sec Loss 6.9760 LearningRate 0.1281 Epoch: 7 Global Step: 78640 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:33:27,346-Speed 5501.52 samples/sec Loss 7.0350 LearningRate 0.1281 Epoch: 7 Global Step: 78650 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-01-08 12:33:34,741-Speed 5539.85 samples/sec Loss 7.0925 LearningRate 0.1281 Epoch: 7 Global Step: 78660 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:33:42,213-Speed 5482.00 samples/sec Loss 7.0795 LearningRate 0.1280 Epoch: 7 Global Step: 78670 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:33:49,632-Speed 5521.47 samples/sec Loss 7.0042 LearningRate 0.1280 Epoch: 7 Global Step: 78680 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:33:57,120-Speed 5471.43 samples/sec Loss 7.0459 LearningRate 0.1280 Epoch: 7 Global Step: 78690 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:34:04,617-Speed 5464.07 samples/sec Loss 7.0448 LearningRate 0.1280 Epoch: 7 Global Step: 78700 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:34:12,052-Speed 5509.90 samples/sec Loss 7.0212 LearningRate 0.1280 Epoch: 7 Global Step: 78710 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:34:19,546-Speed 5466.24 samples/sec Loss 7.1044 LearningRate 0.1279 Epoch: 7 Global Step: 78720 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:34:26,963-Speed 5523.37 samples/sec Loss 6.9864 LearningRate 0.1279 Epoch: 7 Global Step: 78730 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-01-08 12:34:34,412-Speed 5499.69 samples/sec Loss 6.9974 LearningRate 0.1279 Epoch: 7 Global Step: 78740 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:34:41,922-Speed 5454.78 samples/sec Loss 7.0113 LearningRate 0.1279 Epoch: 7 Global Step: 78750 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:34:49,336-Speed 5524.89 samples/sec Loss 7.0107 LearningRate 0.1279 Epoch: 7 Global Step: 78760 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:34:56,894-Speed 5420.31 samples/sec Loss 7.0175 LearningRate 0.1278 Epoch: 7 Global Step: 78770 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:35:04,379-Speed 5473.19 samples/sec Loss 6.9954 LearningRate 0.1278 Epoch: 7 Global Step: 78780 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:35:11,861-Speed 5475.55 samples/sec Loss 7.0186 LearningRate 0.1278 Epoch: 7 Global Step: 78790 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:35:19,345-Speed 5473.77 samples/sec Loss 7.0701 LearningRate 0.1278 Epoch: 7 Global Step: 78800 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:35:26,844-Speed 5462.51 samples/sec Loss 7.0115 LearningRate 0.1278 Epoch: 7 Global Step: 78810 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:35:34,293-Speed 5499.73 samples/sec Loss 7.0802 LearningRate 0.1277 Epoch: 7 Global Step: 78820 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:35:41,761-Speed 5485.28 samples/sec Loss 7.0173 LearningRate 0.1277 Epoch: 7 Global Step: 78830 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:35:49,214-Speed 5496.42 samples/sec Loss 7.0624 LearningRate 0.1277 Epoch: 7 Global Step: 78840 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:35:56,747-Speed 5437.89 samples/sec Loss 7.0047 LearningRate 0.1277 Epoch: 7 Global Step: 78850 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:36:04,236-Speed 5470.17 samples/sec Loss 6.9793 LearningRate 0.1277 Epoch: 7 Global Step: 78860 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:36:11,835-Speed 5391.42 samples/sec Loss 7.0056 LearningRate 0.1276 Epoch: 7 Global Step: 78870 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:36:19,331-Speed 5464.54 samples/sec Loss 7.0190 LearningRate 0.1276 Epoch: 7 Global Step: 78880 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:36:26,812-Speed 5476.06 samples/sec Loss 7.0822 LearningRate 0.1276 Epoch: 7 Global Step: 78890 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:36:34,271-Speed 5491.96 samples/sec Loss 6.9836 LearningRate 0.1276 Epoch: 7 Global Step: 78900 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:36:41,816-Speed 5429.67 samples/sec Loss 7.0234 LearningRate 0.1276 Epoch: 7 Global Step: 78910 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:36:49,306-Speed 5469.59 samples/sec Loss 6.9832 LearningRate 0.1275 Epoch: 7 Global Step: 78920 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:36:56,771-Speed 5487.28 samples/sec Loss 7.0532 LearningRate 0.1275 Epoch: 7 Global Step: 78930 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:37:04,343-Speed 5410.53 samples/sec Loss 6.9768 LearningRate 0.1275 Epoch: 7 Global Step: 78940 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:37:11,809-Speed 5487.11 samples/sec Loss 7.0296 LearningRate 0.1275 Epoch: 7 Global Step: 78950 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:37:19,321-Speed 5453.34 samples/sec Loss 7.0149 LearningRate 0.1275 Epoch: 7 Global Step: 78960 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:37:26,916-Speed 5394.06 samples/sec Loss 7.0242 LearningRate 0.1274 Epoch: 7 Global Step: 78970 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:37:34,501-Speed 5401.13 samples/sec Loss 6.9655 LearningRate 0.1274 Epoch: 7 Global Step: 78980 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:37:42,055-Speed 5423.44 samples/sec Loss 6.9935 LearningRate 0.1274 Epoch: 7 Global Step: 78990 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:37:49,588-Speed 5438.21 samples/sec Loss 7.0568 LearningRate 0.1274 Epoch: 7 Global Step: 79000 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:37:57,072-Speed 5473.48 samples/sec Loss 7.0314 LearningRate 0.1274 Epoch: 7 Global Step: 79010 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:38:04,749-Speed 5335.89 samples/sec Loss 7.0281 LearningRate 0.1274 Epoch: 7 Global Step: 79020 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:38:12,369-Speed 5375.68 samples/sec Loss 6.9760 LearningRate 0.1273 Epoch: 7 Global Step: 79030 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:38:19,893-Speed 5445.01 samples/sec Loss 7.0802 LearningRate 0.1273 Epoch: 7 Global Step: 79040 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:38:27,360-Speed 5486.17 samples/sec Loss 7.0048 LearningRate 0.1273 Epoch: 7 Global Step: 79050 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:38:34,865-Speed 5458.55 samples/sec Loss 7.0523 LearningRate 0.1273 Epoch: 7 Global Step: 79060 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:38:42,381-Speed 5449.99 samples/sec Loss 6.9977 LearningRate 0.1273 Epoch: 7 Global Step: 79070 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:38:49,789-Speed 5530.38 samples/sec Loss 7.0056 LearningRate 0.1272 Epoch: 7 Global Step: 79080 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:38:57,303-Speed 5451.73 samples/sec Loss 7.0250 LearningRate 0.1272 Epoch: 7 Global Step: 79090 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:39:04,797-Speed 5466.03 samples/sec Loss 6.9837 LearningRate 0.1272 Epoch: 7 Global Step: 79100 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:39:12,257-Speed 5491.39 samples/sec Loss 6.9602 LearningRate 0.1272 Epoch: 7 Global Step: 79110 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:39:19,720-Speed 5489.17 samples/sec Loss 6.9106 LearningRate 0.1272 Epoch: 7 Global Step: 79120 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:39:27,183-Speed 5488.94 samples/sec Loss 7.0869 LearningRate 0.1271 Epoch: 7 Global Step: 79130 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:39:34,574-Speed 5542.44 samples/sec Loss 7.0009 LearningRate 0.1271 Epoch: 7 Global Step: 79140 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:39:42,038-Speed 5488.97 samples/sec Loss 7.0318 LearningRate 0.1271 Epoch: 7 Global Step: 79150 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:39:49,568-Speed 5439.80 samples/sec Loss 6.9412 LearningRate 0.1271 Epoch: 7 Global Step: 79160 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:39:57,080-Speed 5453.90 samples/sec Loss 6.9966 LearningRate 0.1271 Epoch: 7 Global Step: 79170 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:40:04,619-Speed 5433.25 samples/sec Loss 7.0034 LearningRate 0.1270 Epoch: 7 Global Step: 79180 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:40:12,109-Speed 5469.51 samples/sec Loss 6.9858 LearningRate 0.1270 Epoch: 7 Global Step: 79190 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:40:19,593-Speed 5473.97 samples/sec Loss 7.0279 LearningRate 0.1270 Epoch: 7 Global Step: 79200 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:40:27,078-Speed 5473.19 samples/sec Loss 7.0094 LearningRate 0.1270 Epoch: 7 Global Step: 79210 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:40:34,627-Speed 5426.34 samples/sec Loss 6.9649 LearningRate 0.1270 Epoch: 7 Global Step: 79220 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:40:42,077-Speed 5498.66 samples/sec Loss 6.9911 LearningRate 0.1269 Epoch: 7 Global Step: 79230 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:40:49,538-Speed 5490.78 samples/sec Loss 7.0405 LearningRate 0.1269 Epoch: 7 Global Step: 79240 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:40:57,001-Speed 5489.18 samples/sec Loss 6.9644 LearningRate 0.1269 Epoch: 7 Global Step: 79250 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:41:04,498-Speed 5464.04 samples/sec Loss 7.0269 LearningRate 0.1269 Epoch: 7 Global Step: 79260 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:41:12,004-Speed 5457.76 samples/sec Loss 7.0032 LearningRate 0.1269 Epoch: 7 Global Step: 79270 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:41:19,567-Speed 5416.93 samples/sec Loss 7.0147 LearningRate 0.1268 Epoch: 7 Global Step: 79280 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:41:27,053-Speed 5472.65 samples/sec Loss 6.9858 LearningRate 0.1268 Epoch: 7 Global Step: 79290 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:41:34,453-Speed 5534.99 samples/sec Loss 6.9676 LearningRate 0.1268 Epoch: 7 Global Step: 79300 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:41:42,010-Speed 5421.50 samples/sec Loss 6.9374 LearningRate 0.1268 Epoch: 7 Global Step: 79310 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:41:49,457-Speed 5500.79 samples/sec Loss 6.9671 LearningRate 0.1268 Epoch: 7 Global Step: 79320 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:42:02,437-Speed 3155.77 samples/sec Loss 7.0071 LearningRate 0.1267 Epoch: 7 Global Step: 79330 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:42:09,934-Speed 5464.58 samples/sec Loss 6.9362 LearningRate 0.1267 Epoch: 7 Global Step: 79340 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:42:17,363-Speed 5514.04 samples/sec Loss 6.9445 LearningRate 0.1267 Epoch: 7 Global Step: 79350 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:42:24,899-Speed 5436.15 samples/sec Loss 6.9337 LearningRate 0.1267 Epoch: 7 Global Step: 79360 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:42:32,351-Speed 5496.97 samples/sec Loss 6.9773 LearningRate 0.1267 Epoch: 7 Global Step: 79370 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:42:39,866-Speed 5451.43 samples/sec Loss 6.9895 LearningRate 0.1266 Epoch: 7 Global Step: 79380 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:42:47,398-Speed 5438.67 samples/sec Loss 6.9764 LearningRate 0.1266 Epoch: 7 Global Step: 79390 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:42:54,888-Speed 5469.74 samples/sec Loss 7.0393 LearningRate 0.1266 Epoch: 7 Global Step: 79400 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:43:02,498-Speed 5383.42 samples/sec Loss 6.9789 LearningRate 0.1266 Epoch: 7 Global Step: 79410 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:43:10,100-Speed 5388.07 samples/sec Loss 6.9925 LearningRate 0.1266 Epoch: 7 Global Step: 79420 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:43:17,660-Speed 5418.94 samples/sec Loss 6.9990 LearningRate 0.1265 Epoch: 7 Global Step: 79430 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:43:25,220-Speed 5419.36 samples/sec Loss 6.9894 LearningRate 0.1265 Epoch: 7 Global Step: 79440 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:43:32,671-Speed 5498.20 samples/sec Loss 6.9153 LearningRate 0.1265 Epoch: 7 Global Step: 79450 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:43:40,175-Speed 5458.48 samples/sec Loss 6.9076 LearningRate 0.1265 Epoch: 7 Global Step: 79460 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:43:47,754-Speed 5405.26 samples/sec Loss 6.9809 LearningRate 0.1265 Epoch: 7 Global Step: 79470 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:43:55,293-Speed 5434.28 samples/sec Loss 6.9806 LearningRate 0.1264 Epoch: 7 Global Step: 79480 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:44:02,834-Speed 5432.28 samples/sec Loss 6.9501 LearningRate 0.1264 Epoch: 7 Global Step: 79490 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:44:10,339-Speed 5457.95 samples/sec Loss 6.9850 LearningRate 0.1264 Epoch: 7 Global Step: 79500 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:44:17,832-Speed 5467.38 samples/sec Loss 6.8911 LearningRate 0.1264 Epoch: 7 Global Step: 79510 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:44:25,323-Speed 5468.86 samples/sec Loss 6.9583 LearningRate 0.1264 Epoch: 7 Global Step: 79520 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:44:32,858-Speed 5437.03 samples/sec Loss 6.9631 LearningRate 0.1263 Epoch: 7 Global Step: 79530 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:44:40,645-Speed 5260.28 samples/sec Loss 7.0021 LearningRate 0.1263 Epoch: 7 Global Step: 79540 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:44:48,206-Speed 5418.16 samples/sec Loss 6.9729 LearningRate 0.1263 Epoch: 7 Global Step: 79550 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:44:55,786-Speed 5403.91 samples/sec Loss 6.9840 LearningRate 0.1263 Epoch: 7 Global Step: 79560 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:45:03,333-Speed 5428.18 samples/sec Loss 7.0168 LearningRate 0.1263 Epoch: 7 Global Step: 79570 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:45:10,814-Speed 5476.22 samples/sec Loss 6.9802 LearningRate 0.1262 Epoch: 7 Global Step: 79580 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:45:18,383-Speed 5412.38 samples/sec Loss 7.0552 LearningRate 0.1262 Epoch: 7 Global Step: 79590 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:45:26,016-Speed 5366.64 samples/sec Loss 7.0075 LearningRate 0.1262 Epoch: 7 Global Step: 79600 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:45:34,586-Speed 4780.09 samples/sec Loss 6.9915 LearningRate 0.1262 Epoch: 7 Global Step: 79610 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:45:42,162-Speed 5406.95 samples/sec Loss 7.0037 LearningRate 0.1262 Epoch: 7 Global Step: 79620 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:45:49,727-Speed 5415.06 samples/sec Loss 6.9253 LearningRate 0.1261 Epoch: 7 Global Step: 79630 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:45:57,331-Speed 5387.41 samples/sec Loss 6.9715 LearningRate 0.1261 Epoch: 7 Global Step: 79640 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:46:04,984-Speed 5353.12 samples/sec Loss 6.9678 LearningRate 0.1261 Epoch: 7 Global Step: 79650 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:46:12,523-Speed 5433.43 samples/sec Loss 6.9901 LearningRate 0.1261 Epoch: 7 Global Step: 79660 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:46:20,077-Speed 5423.28 samples/sec Loss 7.0127 LearningRate 0.1261 Epoch: 7 Global Step: 79670 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:46:27,620-Speed 5430.19 samples/sec Loss 7.0407 LearningRate 0.1260 Epoch: 7 Global Step: 79680 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:46:35,097-Speed 5479.65 samples/sec Loss 7.0246 LearningRate 0.1260 Epoch: 7 Global Step: 79690 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:46:42,638-Speed 5432.17 samples/sec Loss 6.9567 LearningRate 0.1260 Epoch: 7 Global Step: 79700 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:46:50,158-Speed 5447.41 samples/sec Loss 6.9977 LearningRate 0.1260 Epoch: 7 Global Step: 79710 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:46:57,696-Speed 5434.23 samples/sec Loss 6.9613 LearningRate 0.1260 Epoch: 7 Global Step: 79720 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:47:05,283-Speed 5399.50 samples/sec Loss 6.9703 LearningRate 0.1259 Epoch: 7 Global Step: 79730 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:47:12,874-Speed 5396.85 samples/sec Loss 6.9855 LearningRate 0.1259 Epoch: 7 Global Step: 79740 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:47:20,446-Speed 5410.12 samples/sec Loss 6.9660 LearningRate 0.1259 Epoch: 7 Global Step: 79750 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:47:28,075-Speed 5369.44 samples/sec Loss 6.9608 LearningRate 0.1259 Epoch: 7 Global Step: 79760 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:47:35,699-Speed 5373.50 samples/sec Loss 6.8927 LearningRate 0.1259 Epoch: 7 Global Step: 79770 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:47:43,364-Speed 5344.01 samples/sec Loss 6.9863 LearningRate 0.1258 Epoch: 7 Global Step: 79780 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:47:51,012-Speed 5356.25 samples/sec Loss 7.0034 LearningRate 0.1258 Epoch: 7 Global Step: 79790 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:47:58,624-Speed 5381.43 samples/sec Loss 6.9783 LearningRate 0.1258 Epoch: 7 Global Step: 79800 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:48:06,212-Speed 5399.21 samples/sec Loss 6.9409 LearningRate 0.1258 Epoch: 7 Global Step: 79810 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:48:13,799-Speed 5399.70 samples/sec Loss 6.9660 LearningRate 0.1258 Epoch: 7 Global Step: 79820 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:48:21,422-Speed 5373.58 samples/sec Loss 6.9454 LearningRate 0.1257 Epoch: 7 Global Step: 79830 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:48:28,955-Speed 5438.41 samples/sec Loss 6.9353 LearningRate 0.1257 Epoch: 7 Global Step: 79840 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:48:36,564-Speed 5383.81 samples/sec Loss 6.9776 LearningRate 0.1257 Epoch: 7 Global Step: 79850 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:48:44,151-Speed 5399.15 samples/sec Loss 6.9815 LearningRate 0.1257 Epoch: 7 Global Step: 79860 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:48:51,624-Speed 5482.33 samples/sec Loss 6.9617 LearningRate 0.1257 Epoch: 7 Global Step: 79870 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:48:59,217-Speed 5395.00 samples/sec Loss 6.9921 LearningRate 0.1256 Epoch: 7 Global Step: 79880 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:49:06,775-Speed 5419.77 samples/sec Loss 6.9786 LearningRate 0.1256 Epoch: 7 Global Step: 79890 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:49:14,287-Speed 5453.14 samples/sec Loss 6.9873 LearningRate 0.1256 Epoch: 7 Global Step: 79900 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:49:21,762-Speed 5481.00 samples/sec Loss 7.0124 LearningRate 0.1256 Epoch: 7 Global Step: 79910 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:49:29,253-Speed 5468.20 samples/sec Loss 6.9862 LearningRate 0.1256 Epoch: 7 Global Step: 79920 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:49:36,769-Speed 5450.59 samples/sec Loss 6.9388 LearningRate 0.1256 Epoch: 7 Global Step: 79930 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:49:44,263-Speed 5466.58 samples/sec Loss 6.9054 LearningRate 0.1255 Epoch: 7 Global Step: 79940 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:49:51,891-Speed 5370.23 samples/sec Loss 6.9521 LearningRate 0.1255 Epoch: 7 Global Step: 79950 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:49:59,458-Speed 5412.96 samples/sec Loss 6.9033 LearningRate 0.1255 Epoch: 7 Global Step: 79960 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:50:06,935-Speed 5479.02 samples/sec Loss 6.9393 LearningRate 0.1255 Epoch: 7 Global Step: 79970 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:50:14,478-Speed 5431.24 samples/sec Loss 6.9276 LearningRate 0.1255 Epoch: 7 Global Step: 79980 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:50:22,062-Speed 5401.72 samples/sec Loss 6.9926 LearningRate 0.1254 Epoch: 7 Global Step: 79990 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:50:29,575-Speed 5452.25 samples/sec Loss 6.9126 LearningRate 0.1254 Epoch: 7 Global Step: 80000 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:51:14,000-[lfw][80000]XNorm: 23.802571 Training: 2022-01-08 12:51:14,004-[lfw][80000]Accuracy-Flip: 0.99750+-0.00300 Training: 2022-01-08 12:51:14,004-[lfw][80000]Accuracy-Highest: 0.99817 Training: 2022-01-08 12:52:05,842-[cfp_fp][80000]XNorm: 21.845129 Training: 2022-01-08 12:52:05,843-[cfp_fp][80000]Accuracy-Flip: 0.98586+-0.00627 Training: 2022-01-08 12:52:05,843-[cfp_fp][80000]Accuracy-Highest: 0.98814 Training: 2022-01-08 12:52:51,768-[agedb_30][80000]XNorm: 23.331471 Training: 2022-01-08 12:52:51,769-[agedb_30][80000]Accuracy-Flip: 0.97483+-0.00758 Training: 2022-01-08 12:52:51,770-[agedb_30][80000]Accuracy-Highest: 0.97667 Training: 2022-01-08 12:52:59,371-Speed 273.44 samples/sec Loss 6.9404 LearningRate 0.1254 Epoch: 7 Global Step: 80010 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:53:06,934-Speed 5417.49 samples/sec Loss 6.9746 LearningRate 0.1254 Epoch: 7 Global Step: 80020 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:53:14,619-Speed 5330.39 samples/sec Loss 6.9392 LearningRate 0.1254 Epoch: 7 Global Step: 80030 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:53:22,150-Speed 5440.35 samples/sec Loss 6.9231 LearningRate 0.1253 Epoch: 7 Global Step: 80040 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:53:29,640-Speed 5470.18 samples/sec Loss 6.9976 LearningRate 0.1253 Epoch: 7 Global Step: 80050 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:53:37,249-Speed 5384.73 samples/sec Loss 6.9550 LearningRate 0.1253 Epoch: 7 Global Step: 80060 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:53:44,777-Speed 5442.34 samples/sec Loss 6.9799 LearningRate 0.1253 Epoch: 7 Global Step: 80070 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:53:52,290-Speed 5452.48 samples/sec Loss 6.8671 LearningRate 0.1253 Epoch: 7 Global Step: 80080 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 12:53:59,769-Speed 5477.28 samples/sec Loss 6.9198 LearningRate 0.1252 Epoch: 7 Global Step: 80090 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:54:07,335-Speed 5414.83 samples/sec Loss 6.9096 LearningRate 0.1252 Epoch: 7 Global Step: 80100 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:54:14,950-Speed 5379.56 samples/sec Loss 6.9341 LearningRate 0.1252 Epoch: 7 Global Step: 80110 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:54:22,569-Speed 5376.36 samples/sec Loss 6.9749 LearningRate 0.1252 Epoch: 7 Global Step: 80120 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:54:30,237-Speed 5342.05 samples/sec Loss 7.0116 LearningRate 0.1252 Epoch: 7 Global Step: 80130 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:54:37,895-Speed 5349.88 samples/sec Loss 6.9875 LearningRate 0.1251 Epoch: 7 Global Step: 80140 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:54:45,537-Speed 5360.70 samples/sec Loss 6.9627 LearningRate 0.1251 Epoch: 7 Global Step: 80150 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:54:53,032-Speed 5465.20 samples/sec Loss 6.8742 LearningRate 0.1251 Epoch: 7 Global Step: 80160 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:55:00,537-Speed 5458.36 samples/sec Loss 6.9375 LearningRate 0.1251 Epoch: 7 Global Step: 80170 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:55:08,036-Speed 5463.25 samples/sec Loss 6.8900 LearningRate 0.1251 Epoch: 7 Global Step: 80180 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:55:15,587-Speed 5424.81 samples/sec Loss 7.0181 LearningRate 0.1250 Epoch: 7 Global Step: 80190 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:55:23,183-Speed 5392.95 samples/sec Loss 6.8951 LearningRate 0.1250 Epoch: 7 Global Step: 80200 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:55:30,710-Speed 5442.75 samples/sec Loss 6.9452 LearningRate 0.1250 Epoch: 7 Global Step: 80210 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:55:38,316-Speed 5386.22 samples/sec Loss 6.9904 LearningRate 0.1250 Epoch: 7 Global Step: 80220 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:55:45,752-Speed 5508.60 samples/sec Loss 6.9317 LearningRate 0.1250 Epoch: 7 Global Step: 80230 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:55:53,311-Speed 5419.14 samples/sec Loss 6.9224 LearningRate 0.1249 Epoch: 7 Global Step: 80240 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:56:00,979-Speed 5342.25 samples/sec Loss 6.9793 LearningRate 0.1249 Epoch: 7 Global Step: 80250 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:56:08,520-Speed 5432.90 samples/sec Loss 6.8986 LearningRate 0.1249 Epoch: 7 Global Step: 80260 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:56:16,034-Speed 5451.90 samples/sec Loss 6.9694 LearningRate 0.1249 Epoch: 7 Global Step: 80270 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:56:23,513-Speed 5477.46 samples/sec Loss 6.8665 LearningRate 0.1249 Epoch: 7 Global Step: 80280 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:56:31,029-Speed 5450.30 samples/sec Loss 6.9143 LearningRate 0.1248 Epoch: 7 Global Step: 80290 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:56:38,513-Speed 5473.44 samples/sec Loss 6.9178 LearningRate 0.1248 Epoch: 7 Global Step: 80300 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:56:45,994-Speed 5476.07 samples/sec Loss 6.9768 LearningRate 0.1248 Epoch: 7 Global Step: 80310 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:56:53,512-Speed 5449.34 samples/sec Loss 6.8986 LearningRate 0.1248 Epoch: 7 Global Step: 80320 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:57:01,112-Speed 5389.64 samples/sec Loss 6.9523 LearningRate 0.1248 Epoch: 7 Global Step: 80330 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:57:08,737-Speed 5373.14 samples/sec Loss 6.9524 LearningRate 0.1247 Epoch: 7 Global Step: 80340 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:57:16,461-Speed 5303.28 samples/sec Loss 6.8382 LearningRate 0.1247 Epoch: 7 Global Step: 80350 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:57:23,970-Speed 5455.85 samples/sec Loss 6.8983 LearningRate 0.1247 Epoch: 7 Global Step: 80360 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 12:57:31,594-Speed 5372.91 samples/sec Loss 6.9167 LearningRate 0.1247 Epoch: 7 Global Step: 80370 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:57:39,138-Speed 5430.61 samples/sec Loss 6.9607 LearningRate 0.1247 Epoch: 7 Global Step: 80380 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:57:46,616-Speed 5477.47 samples/sec Loss 6.9189 LearningRate 0.1246 Epoch: 7 Global Step: 80390 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:57:54,486-Speed 5205.23 samples/sec Loss 6.8650 LearningRate 0.1246 Epoch: 7 Global Step: 80400 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:58:01,941-Speed 5495.35 samples/sec Loss 6.8980 LearningRate 0.1246 Epoch: 7 Global Step: 80410 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:58:09,391-Speed 5498.58 samples/sec Loss 6.9572 LearningRate 0.1246 Epoch: 7 Global Step: 80420 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:58:16,881-Speed 5469.09 samples/sec Loss 6.9199 LearningRate 0.1246 Epoch: 7 Global Step: 80430 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:58:24,281-Speed 5536.26 samples/sec Loss 6.9191 LearningRate 0.1245 Epoch: 7 Global Step: 80440 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:58:31,766-Speed 5472.72 samples/sec Loss 6.9670 LearningRate 0.1245 Epoch: 7 Global Step: 80450 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:58:39,268-Speed 5460.84 samples/sec Loss 6.8626 LearningRate 0.1245 Epoch: 7 Global Step: 80460 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:58:46,705-Speed 5508.44 samples/sec Loss 6.9196 LearningRate 0.1245 Epoch: 7 Global Step: 80470 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:58:54,168-Speed 5488.75 samples/sec Loss 6.9800 LearningRate 0.1245 Epoch: 7 Global Step: 80480 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:59:01,739-Speed 5410.81 samples/sec Loss 6.9043 LearningRate 0.1245 Epoch: 7 Global Step: 80490 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:59:09,278-Speed 5434.57 samples/sec Loss 6.9867 LearningRate 0.1244 Epoch: 7 Global Step: 80500 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:59:16,789-Speed 5453.87 samples/sec Loss 6.9136 LearningRate 0.1244 Epoch: 7 Global Step: 80510 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:59:24,286-Speed 5464.20 samples/sec Loss 6.8690 LearningRate 0.1244 Epoch: 7 Global Step: 80520 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:59:31,793-Speed 5457.18 samples/sec Loss 6.9847 LearningRate 0.1244 Epoch: 7 Global Step: 80530 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:59:39,305-Speed 5452.89 samples/sec Loss 6.9265 LearningRate 0.1244 Epoch: 7 Global Step: 80540 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:59:46,836-Speed 5439.74 samples/sec Loss 6.9121 LearningRate 0.1243 Epoch: 7 Global Step: 80550 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 12:59:54,373-Speed 5435.25 samples/sec Loss 6.8696 LearningRate 0.1243 Epoch: 7 Global Step: 80560 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:00:01,863-Speed 5469.15 samples/sec Loss 6.8688 LearningRate 0.1243 Epoch: 7 Global Step: 80570 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:00:09,331-Speed 5485.26 samples/sec Loss 6.9519 LearningRate 0.1243 Epoch: 7 Global Step: 80580 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:00:16,897-Speed 5414.42 samples/sec Loss 6.9291 LearningRate 0.1243 Epoch: 7 Global Step: 80590 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:00:24,454-Speed 5420.55 samples/sec Loss 6.9026 LearningRate 0.1242 Epoch: 7 Global Step: 80600 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:00:31,936-Speed 5475.07 samples/sec Loss 6.9321 LearningRate 0.1242 Epoch: 7 Global Step: 80610 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:00:39,420-Speed 5474.52 samples/sec Loss 6.9264 LearningRate 0.1242 Epoch: 7 Global Step: 80620 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:00:46,859-Speed 5506.32 samples/sec Loss 6.8945 LearningRate 0.1242 Epoch: 7 Global Step: 80630 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:00:54,326-Speed 5485.95 samples/sec Loss 6.9159 LearningRate 0.1242 Epoch: 7 Global Step: 80640 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:01:01,978-Speed 5354.02 samples/sec Loss 6.9256 LearningRate 0.1241 Epoch: 7 Global Step: 80650 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:01:09,609-Speed 5367.73 samples/sec Loss 6.9055 LearningRate 0.1241 Epoch: 7 Global Step: 80660 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:01:17,073-Speed 5489.11 samples/sec Loss 6.9605 LearningRate 0.1241 Epoch: 7 Global Step: 80670 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:01:24,514-Speed 5505.35 samples/sec Loss 6.8568 LearningRate 0.1241 Epoch: 7 Global Step: 80680 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:01:31,968-Speed 5495.27 samples/sec Loss 6.8741 LearningRate 0.1241 Epoch: 7 Global Step: 80690 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:01:39,468-Speed 5462.14 samples/sec Loss 6.8647 LearningRate 0.1240 Epoch: 7 Global Step: 80700 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:01:47,098-Speed 5369.20 samples/sec Loss 6.8563 LearningRate 0.1240 Epoch: 7 Global Step: 80710 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:01:54,605-Speed 5457.21 samples/sec Loss 6.9038 LearningRate 0.1240 Epoch: 7 Global Step: 80720 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:02:02,072-Speed 5486.39 samples/sec Loss 6.9406 LearningRate 0.1240 Epoch: 7 Global Step: 80730 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:02:09,516-Speed 5503.32 samples/sec Loss 6.8396 LearningRate 0.1240 Epoch: 7 Global Step: 80740 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:02:17,069-Speed 5424.05 samples/sec Loss 6.9057 LearningRate 0.1239 Epoch: 7 Global Step: 80750 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:02:24,543-Speed 5481.02 samples/sec Loss 6.8855 LearningRate 0.1239 Epoch: 7 Global Step: 80760 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:02:32,027-Speed 5473.91 samples/sec Loss 6.8736 LearningRate 0.1239 Epoch: 7 Global Step: 80770 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:02:39,656-Speed 5369.73 samples/sec Loss 6.9216 LearningRate 0.1239 Epoch: 7 Global Step: 80780 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:02:47,190-Speed 5437.77 samples/sec Loss 6.9356 LearningRate 0.1239 Epoch: 7 Global Step: 80790 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:02:54,794-Speed 5386.71 samples/sec Loss 6.8668 LearningRate 0.1238 Epoch: 7 Global Step: 80800 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:03:02,358-Speed 5416.31 samples/sec Loss 6.8387 LearningRate 0.1238 Epoch: 7 Global Step: 80810 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:03:09,822-Speed 5488.15 samples/sec Loss 6.9331 LearningRate 0.1238 Epoch: 7 Global Step: 80820 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:03:17,366-Speed 5430.45 samples/sec Loss 6.9034 LearningRate 0.1238 Epoch: 7 Global Step: 80830 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:03:24,991-Speed 5371.91 samples/sec Loss 6.9533 LearningRate 0.1238 Epoch: 7 Global Step: 80840 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:03:32,518-Speed 5442.60 samples/sec Loss 6.9203 LearningRate 0.1237 Epoch: 7 Global Step: 80850 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:03:40,029-Speed 5453.76 samples/sec Loss 6.9786 LearningRate 0.1237 Epoch: 7 Global Step: 80860 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:03:47,529-Speed 5462.28 samples/sec Loss 6.8616 LearningRate 0.1237 Epoch: 7 Global Step: 80870 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:03:55,002-Speed 5481.63 samples/sec Loss 6.8453 LearningRate 0.1237 Epoch: 7 Global Step: 80880 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:04:02,516-Speed 5451.96 samples/sec Loss 6.8958 LearningRate 0.1237 Epoch: 7 Global Step: 80890 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:04:09,964-Speed 5500.13 samples/sec Loss 6.8733 LearningRate 0.1236 Epoch: 7 Global Step: 80900 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:04:17,424-Speed 5491.74 samples/sec Loss 6.8865 LearningRate 0.1236 Epoch: 7 Global Step: 80910 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:04:24,953-Speed 5440.78 samples/sec Loss 6.9110 LearningRate 0.1236 Epoch: 7 Global Step: 80920 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:04:32,409-Speed 5494.25 samples/sec Loss 6.8558 LearningRate 0.1236 Epoch: 7 Global Step: 80930 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:04:39,908-Speed 5462.72 samples/sec Loss 6.9204 LearningRate 0.1236 Epoch: 7 Global Step: 80940 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:04:47,340-Speed 5512.31 samples/sec Loss 6.9292 LearningRate 0.1235 Epoch: 7 Global Step: 80950 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:04:54,865-Speed 5443.27 samples/sec Loss 6.8674 LearningRate 0.1235 Epoch: 7 Global Step: 80960 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:05:02,419-Speed 5423.24 samples/sec Loss 6.8948 LearningRate 0.1235 Epoch: 7 Global Step: 80970 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:05:09,942-Speed 5445.45 samples/sec Loss 6.8929 LearningRate 0.1235 Epoch: 7 Global Step: 80980 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:05:17,571-Speed 5369.55 samples/sec Loss 6.8916 LearningRate 0.1235 Epoch: 7 Global Step: 80990 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:05:25,087-Speed 5449.84 samples/sec Loss 6.8639 LearningRate 0.1235 Epoch: 7 Global Step: 81000 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:05:32,660-Speed 5409.76 samples/sec Loss 6.8956 LearningRate 0.1234 Epoch: 7 Global Step: 81010 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:05:40,221-Speed 5417.97 samples/sec Loss 6.9008 LearningRate 0.1234 Epoch: 7 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:05:47,738-Speed 5450.17 samples/sec Loss 6.8708 LearningRate 0.1234 Epoch: 7 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:05:55,343-Speed 5386.00 samples/sec Loss 6.9503 LearningRate 0.1234 Epoch: 7 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:06:02,911-Speed 5413.34 samples/sec Loss 6.8997 LearningRate 0.1234 Epoch: 7 Global Step: 81050 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:06:10,450-Speed 5433.81 samples/sec Loss 6.9198 LearningRate 0.1233 Epoch: 7 Global Step: 81060 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:06:18,080-Speed 5368.57 samples/sec Loss 6.8772 LearningRate 0.1233 Epoch: 7 Global Step: 81070 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:06:25,583-Speed 5460.20 samples/sec Loss 6.8654 LearningRate 0.1233 Epoch: 7 Global Step: 81080 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:06:33,083-Speed 5462.01 samples/sec Loss 6.8799 LearningRate 0.1233 Epoch: 7 Global Step: 81090 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:06:40,635-Speed 5424.32 samples/sec Loss 6.8699 LearningRate 0.1233 Epoch: 7 Global Step: 81100 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:06:48,210-Speed 5408.04 samples/sec Loss 6.8701 LearningRate 0.1232 Epoch: 7 Global Step: 81110 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:06:55,738-Speed 5441.43 samples/sec Loss 6.8374 LearningRate 0.1232 Epoch: 7 Global Step: 81120 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:07:03,308-Speed 5411.30 samples/sec Loss 6.9342 LearningRate 0.1232 Epoch: 7 Global Step: 81130 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:07:10,962-Speed 5352.43 samples/sec Loss 6.8479 LearningRate 0.1232 Epoch: 7 Global Step: 81140 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:07:18,634-Speed 5339.83 samples/sec Loss 6.9447 LearningRate 0.1232 Epoch: 7 Global Step: 81150 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:07:26,181-Speed 5427.82 samples/sec Loss 6.9205 LearningRate 0.1231 Epoch: 7 Global Step: 81160 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:07:33,667-Speed 5472.40 samples/sec Loss 6.8717 LearningRate 0.1231 Epoch: 7 Global Step: 81170 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:07:41,228-Speed 5417.92 samples/sec Loss 6.8672 LearningRate 0.1231 Epoch: 7 Global Step: 81180 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:07:48,795-Speed 5413.81 samples/sec Loss 6.8340 LearningRate 0.1231 Epoch: 7 Global Step: 81190 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:07:56,305-Speed 5454.69 samples/sec Loss 6.8657 LearningRate 0.1231 Epoch: 7 Global Step: 81200 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:08:03,802-Speed 5464.52 samples/sec Loss 6.8914 LearningRate 0.1230 Epoch: 7 Global Step: 81210 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:08:11,338-Speed 5435.87 samples/sec Loss 6.8864 LearningRate 0.1230 Epoch: 7 Global Step: 81220 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:08:18,890-Speed 5424.68 samples/sec Loss 6.9096 LearningRate 0.1230 Epoch: 7 Global Step: 81230 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:08:26,514-Speed 5373.21 samples/sec Loss 6.7822 LearningRate 0.1230 Epoch: 7 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:08:34,048-Speed 5437.10 samples/sec Loss 6.9300 LearningRate 0.1230 Epoch: 7 Global Step: 81250 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:08:41,535-Speed 5472.00 samples/sec Loss 6.9402 LearningRate 0.1229 Epoch: 7 Global Step: 81260 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:08:49,081-Speed 5429.09 samples/sec Loss 6.8681 LearningRate 0.1229 Epoch: 7 Global Step: 81270 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:08:56,675-Speed 5394.70 samples/sec Loss 6.8934 LearningRate 0.1229 Epoch: 7 Global Step: 81280 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:09:04,213-Speed 5433.97 samples/sec Loss 6.8728 LearningRate 0.1229 Epoch: 7 Global Step: 81290 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:09:11,721-Speed 5456.94 samples/sec Loss 6.8966 LearningRate 0.1229 Epoch: 7 Global Step: 81300 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:09:19,266-Speed 5428.99 samples/sec Loss 6.8921 LearningRate 0.1228 Epoch: 7 Global Step: 81310 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:09:26,827-Speed 5417.92 samples/sec Loss 6.8970 LearningRate 0.1228 Epoch: 7 Global Step: 81320 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:09:34,358-Speed 5439.53 samples/sec Loss 6.8726 LearningRate 0.1228 Epoch: 7 Global Step: 81330 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:09:41,883-Speed 5444.36 samples/sec Loss 6.8289 LearningRate 0.1228 Epoch: 7 Global Step: 81340 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:09:49,370-Speed 5471.82 samples/sec Loss 6.9455 LearningRate 0.1228 Epoch: 7 Global Step: 81350 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:09:56,993-Speed 5373.62 samples/sec Loss 6.8398 LearningRate 0.1227 Epoch: 7 Global Step: 81360 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:10:04,493-Speed 5462.30 samples/sec Loss 6.8959 LearningRate 0.1227 Epoch: 7 Global Step: 81370 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:10:12,017-Speed 5444.22 samples/sec Loss 6.9282 LearningRate 0.1227 Epoch: 7 Global Step: 81380 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:10:19,540-Speed 5445.50 samples/sec Loss 6.8923 LearningRate 0.1227 Epoch: 7 Global Step: 81390 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:10:27,033-Speed 5467.62 samples/sec Loss 6.8288 LearningRate 0.1227 Epoch: 7 Global Step: 81400 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:10:34,622-Speed 5397.52 samples/sec Loss 6.8689 LearningRate 0.1227 Epoch: 7 Global Step: 81410 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:10:42,225-Speed 5388.57 samples/sec Loss 6.8417 LearningRate 0.1226 Epoch: 7 Global Step: 81420 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:10:49,739-Speed 5451.84 samples/sec Loss 6.8527 LearningRate 0.1226 Epoch: 7 Global Step: 81430 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:10:57,468-Speed 5300.28 samples/sec Loss 6.9190 LearningRate 0.1226 Epoch: 7 Global Step: 81440 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:11:05,066-Speed 5391.66 samples/sec Loss 6.8829 LearningRate 0.1226 Epoch: 7 Global Step: 81450 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:11:12,495-Speed 5514.00 samples/sec Loss 6.8175 LearningRate 0.1226 Epoch: 7 Global Step: 81460 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:11:20,019-Speed 5445.04 samples/sec Loss 6.8683 LearningRate 0.1225 Epoch: 7 Global Step: 81470 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:11:27,535-Speed 5450.02 samples/sec Loss 6.8150 LearningRate 0.1225 Epoch: 7 Global Step: 81480 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:11:35,137-Speed 5389.07 samples/sec Loss 6.8404 LearningRate 0.1225 Epoch: 7 Global Step: 81490 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:11:42,615-Speed 5477.98 samples/sec Loss 6.8614 LearningRate 0.1225 Epoch: 7 Global Step: 81500 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:11:50,094-Speed 5477.09 samples/sec Loss 6.8415 LearningRate 0.1225 Epoch: 7 Global Step: 81510 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:11:57,681-Speed 5399.32 samples/sec Loss 6.9165 LearningRate 0.1224 Epoch: 7 Global Step: 81520 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:12:05,290-Speed 5384.50 samples/sec Loss 6.8932 LearningRate 0.1224 Epoch: 7 Global Step: 81530 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:12:12,855-Speed 5414.76 samples/sec Loss 6.8869 LearningRate 0.1224 Epoch: 7 Global Step: 81540 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:12:20,542-Speed 5329.03 samples/sec Loss 6.8188 LearningRate 0.1224 Epoch: 7 Global Step: 81550 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:12:28,138-Speed 5393.13 samples/sec Loss 6.8837 LearningRate 0.1224 Epoch: 7 Global Step: 81560 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:12:35,678-Speed 5433.60 samples/sec Loss 6.8050 LearningRate 0.1223 Epoch: 7 Global Step: 81570 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:12:43,242-Speed 5415.65 samples/sec Loss 6.8560 LearningRate 0.1223 Epoch: 7 Global Step: 81580 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:12:50,740-Speed 5463.05 samples/sec Loss 6.8335 LearningRate 0.1223 Epoch: 7 Global Step: 81590 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:12:58,264-Speed 5444.69 samples/sec Loss 6.7994 LearningRate 0.1223 Epoch: 7 Global Step: 81600 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:13:05,839-Speed 5407.96 samples/sec Loss 6.8517 LearningRate 0.1223 Epoch: 7 Global Step: 81610 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:13:13,393-Speed 5423.13 samples/sec Loss 6.8381 LearningRate 0.1222 Epoch: 7 Global Step: 81620 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:13:20,888-Speed 5465.39 samples/sec Loss 6.8572 LearningRate 0.1222 Epoch: 7 Global Step: 81630 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:13:28,378-Speed 5469.63 samples/sec Loss 6.8313 LearningRate 0.1222 Epoch: 7 Global Step: 81640 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:13:35,901-Speed 5445.45 samples/sec Loss 6.8448 LearningRate 0.1222 Epoch: 7 Global Step: 81650 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:13:43,500-Speed 5391.08 samples/sec Loss 6.8551 LearningRate 0.1222 Epoch: 7 Global Step: 81660 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:13:50,967-Speed 5485.73 samples/sec Loss 6.8295 LearningRate 0.1221 Epoch: 7 Global Step: 81670 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:13:58,426-Speed 5492.26 samples/sec Loss 6.7993 LearningRate 0.1221 Epoch: 7 Global Step: 81680 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:14:05,990-Speed 5415.65 samples/sec Loss 6.8161 LearningRate 0.1221 Epoch: 7 Global Step: 81690 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:14:13,508-Speed 5448.76 samples/sec Loss 6.7567 LearningRate 0.1221 Epoch: 7 Global Step: 81700 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:14:21,028-Speed 5448.15 samples/sec Loss 6.8323 LearningRate 0.1221 Epoch: 7 Global Step: 81710 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:14:28,530-Speed 5460.73 samples/sec Loss 6.8291 LearningRate 0.1220 Epoch: 7 Global Step: 81720 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:14:35,972-Speed 5504.56 samples/sec Loss 6.8045 LearningRate 0.1220 Epoch: 7 Global Step: 81730 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:14:43,551-Speed 5405.47 samples/sec Loss 6.7925 LearningRate 0.1220 Epoch: 7 Global Step: 81740 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:14:51,050-Speed 5462.79 samples/sec Loss 6.8152 LearningRate 0.1220 Epoch: 7 Global Step: 81750 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:14:58,522-Speed 5482.44 samples/sec Loss 6.8686 LearningRate 0.1220 Epoch: 7 Global Step: 81760 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:15:06,108-Speed 5399.93 samples/sec Loss 6.8603 LearningRate 0.1220 Epoch: 7 Global Step: 81770 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:15:13,652-Speed 5430.34 samples/sec Loss 6.8769 LearningRate 0.1219 Epoch: 7 Global Step: 81780 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:15:21,129-Speed 5479.16 samples/sec Loss 6.7761 LearningRate 0.1219 Epoch: 7 Global Step: 81790 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:15:28,639-Speed 5454.96 samples/sec Loss 6.7952 LearningRate 0.1219 Epoch: 7 Global Step: 81800 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:15:36,247-Speed 5384.16 samples/sec Loss 6.8591 LearningRate 0.1219 Epoch: 7 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:15:43,876-Speed 5370.40 samples/sec Loss 6.8267 LearningRate 0.1219 Epoch: 7 Global Step: 81820 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:15:51,378-Speed 5460.19 samples/sec Loss 6.8258 LearningRate 0.1218 Epoch: 7 Global Step: 81830 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:15:58,935-Speed 5420.95 samples/sec Loss 6.9120 LearningRate 0.1218 Epoch: 7 Global Step: 81840 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:16:06,566-Speed 5368.34 samples/sec Loss 6.8755 LearningRate 0.1218 Epoch: 7 Global Step: 81850 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:16:14,171-Speed 5386.58 samples/sec Loss 6.8532 LearningRate 0.1218 Epoch: 7 Global Step: 81860 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:16:21,759-Speed 5398.92 samples/sec Loss 6.8682 LearningRate 0.1218 Epoch: 7 Global Step: 81870 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:16:29,264-Speed 5458.10 samples/sec Loss 6.8923 LearningRate 0.1217 Epoch: 7 Global Step: 81880 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:16:36,792-Speed 5441.41 samples/sec Loss 6.8503 LearningRate 0.1217 Epoch: 7 Global Step: 81890 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:16:44,328-Speed 5436.23 samples/sec Loss 6.8157 LearningRate 0.1217 Epoch: 7 Global Step: 81900 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:16:51,879-Speed 5425.37 samples/sec Loss 6.8693 LearningRate 0.1217 Epoch: 7 Global Step: 81910 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:16:59,343-Speed 5488.30 samples/sec Loss 6.8582 LearningRate 0.1217 Epoch: 7 Global Step: 81920 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:17:06,914-Speed 5410.51 samples/sec Loss 6.9215 LearningRate 0.1216 Epoch: 7 Global Step: 81930 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:17:14,401-Speed 5472.64 samples/sec Loss 6.8739 LearningRate 0.1216 Epoch: 7 Global Step: 81940 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:17:21,956-Speed 5422.13 samples/sec Loss 6.8060 LearningRate 0.1216 Epoch: 7 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:17:29,669-Speed 5310.87 samples/sec Loss 6.7922 LearningRate 0.1216 Epoch: 7 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:17:37,194-Speed 5444.39 samples/sec Loss 6.7703 LearningRate 0.1216 Epoch: 7 Global Step: 81970 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:17:44,676-Speed 5474.79 samples/sec Loss 6.8191 LearningRate 0.1215 Epoch: 7 Global Step: 81980 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:17:52,220-Speed 5430.34 samples/sec Loss 6.8304 LearningRate 0.1215 Epoch: 7 Global Step: 81990 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:17:59,856-Speed 5364.80 samples/sec Loss 6.8408 LearningRate 0.1215 Epoch: 7 Global Step: 82000 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:18:44,258-[lfw][82000]XNorm: 23.526637 Training: 2022-01-08 13:18:44,258-[lfw][82000]Accuracy-Flip: 0.99783+-0.00279 Training: 2022-01-08 13:18:44,259-[lfw][82000]Accuracy-Highest: 0.99817 Training: 2022-01-08 13:19:36,214-[cfp_fp][82000]XNorm: 21.435723 Training: 2022-01-08 13:19:36,215-[cfp_fp][82000]Accuracy-Flip: 0.98600+-0.00595 Training: 2022-01-08 13:19:36,215-[cfp_fp][82000]Accuracy-Highest: 0.98814 Training: 2022-01-08 13:20:22,297-[agedb_30][82000]XNorm: 23.263862 Training: 2022-01-08 13:20:22,297-[agedb_30][82000]Accuracy-Flip: 0.97567+-0.00742 Training: 2022-01-08 13:20:22,298-[agedb_30][82000]Accuracy-Highest: 0.97667 Training: 2022-01-08 13:20:29,809-Speed 273.15 samples/sec Loss 6.8505 LearningRate 0.1215 Epoch: 7 Global Step: 82010 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:20:37,272-Speed 5490.16 samples/sec Loss 6.8522 LearningRate 0.1215 Epoch: 7 Global Step: 82020 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:20:44,812-Speed 5433.29 samples/sec Loss 6.8137 LearningRate 0.1214 Epoch: 7 Global Step: 82030 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:20:52,342-Speed 5441.00 samples/sec Loss 6.8749 LearningRate 0.1214 Epoch: 7 Global Step: 82040 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:20:59,920-Speed 5405.93 samples/sec Loss 6.8409 LearningRate 0.1214 Epoch: 7 Global Step: 82050 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:21:07,478-Speed 5420.18 samples/sec Loss 6.8442 LearningRate 0.1214 Epoch: 7 Global Step: 82060 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:21:15,005-Speed 5442.80 samples/sec Loss 6.8602 LearningRate 0.1214 Epoch: 7 Global Step: 82070 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:21:22,685-Speed 5333.90 samples/sec Loss 6.8871 LearningRate 0.1214 Epoch: 7 Global Step: 82080 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:21:30,319-Speed 5366.14 samples/sec Loss 6.8454 LearningRate 0.1213 Epoch: 7 Global Step: 82090 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:21:37,838-Speed 5447.53 samples/sec Loss 6.8376 LearningRate 0.1213 Epoch: 7 Global Step: 82100 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:21:45,384-Speed 5429.30 samples/sec Loss 6.8707 LearningRate 0.1213 Epoch: 7 Global Step: 82110 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:21:52,835-Speed 5498.00 samples/sec Loss 6.8191 LearningRate 0.1213 Epoch: 7 Global Step: 82120 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:22:00,334-Speed 5462.98 samples/sec Loss 6.8136 LearningRate 0.1213 Epoch: 7 Global Step: 82130 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:22:08,025-Speed 5326.03 samples/sec Loss 6.8303 LearningRate 0.1212 Epoch: 7 Global Step: 82140 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:22:15,521-Speed 5464.98 samples/sec Loss 6.8243 LearningRate 0.1212 Epoch: 7 Global Step: 82150 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:22:23,025-Speed 5459.83 samples/sec Loss 6.8769 LearningRate 0.1212 Epoch: 7 Global Step: 82160 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:22:30,604-Speed 5404.82 samples/sec Loss 6.8698 LearningRate 0.1212 Epoch: 7 Global Step: 82170 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:22:38,128-Speed 5445.14 samples/sec Loss 6.7984 LearningRate 0.1212 Epoch: 7 Global Step: 82180 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:22:45,604-Speed 5479.50 samples/sec Loss 6.7740 LearningRate 0.1211 Epoch: 7 Global Step: 82190 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:22:53,029-Speed 5516.85 samples/sec Loss 6.7843 LearningRate 0.1211 Epoch: 7 Global Step: 82200 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:23:00,547-Speed 5449.11 samples/sec Loss 6.8022 LearningRate 0.1211 Epoch: 7 Global Step: 82210 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:23:08,131-Speed 5402.11 samples/sec Loss 6.7919 LearningRate 0.1211 Epoch: 7 Global Step: 82220 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:23:15,723-Speed 5395.29 samples/sec Loss 6.8603 LearningRate 0.1211 Epoch: 7 Global Step: 82230 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:23:23,305-Speed 5403.26 samples/sec Loss 6.7882 LearningRate 0.1210 Epoch: 7 Global Step: 82240 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:23:30,839-Speed 5437.93 samples/sec Loss 6.8267 LearningRate 0.1210 Epoch: 7 Global Step: 82250 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:23:38,293-Speed 5495.64 samples/sec Loss 6.8743 LearningRate 0.1210 Epoch: 7 Global Step: 82260 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:23:45,892-Speed 5390.64 samples/sec Loss 6.8164 LearningRate 0.1210 Epoch: 7 Global Step: 82270 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:23:53,365-Speed 5481.62 samples/sec Loss 6.8537 LearningRate 0.1210 Epoch: 7 Global Step: 82280 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:24:00,849-Speed 5473.70 samples/sec Loss 6.7608 LearningRate 0.1209 Epoch: 7 Global Step: 82290 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:24:08,343-Speed 5466.15 samples/sec Loss 6.8415 LearningRate 0.1209 Epoch: 7 Global Step: 82300 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:24:15,786-Speed 5504.00 samples/sec Loss 6.8159 LearningRate 0.1209 Epoch: 7 Global Step: 82310 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:24:23,258-Speed 5482.19 samples/sec Loss 6.7993 LearningRate 0.1209 Epoch: 7 Global Step: 82320 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:24:30,694-Speed 5510.62 samples/sec Loss 6.7676 LearningRate 0.1209 Epoch: 7 Global Step: 82330 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:24:38,236-Speed 5431.69 samples/sec Loss 6.7921 LearningRate 0.1208 Epoch: 7 Global Step: 82340 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:24:45,796-Speed 5418.81 samples/sec Loss 6.8438 LearningRate 0.1208 Epoch: 7 Global Step: 82350 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:24:53,386-Speed 5397.15 samples/sec Loss 6.8082 LearningRate 0.1208 Epoch: 7 Global Step: 82360 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:25:00,855-Speed 5484.94 samples/sec Loss 6.8519 LearningRate 0.1208 Epoch: 7 Global Step: 82370 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:25:08,314-Speed 5492.19 samples/sec Loss 6.7724 LearningRate 0.1208 Epoch: 7 Global Step: 82380 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:25:15,830-Speed 5450.08 samples/sec Loss 6.7867 LearningRate 0.1208 Epoch: 7 Global Step: 82390 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:25:23,293-Speed 5489.68 samples/sec Loss 6.8081 LearningRate 0.1207 Epoch: 7 Global Step: 82400 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:25:30,833-Speed 5432.98 samples/sec Loss 6.7785 LearningRate 0.1207 Epoch: 7 Global Step: 82410 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:25:38,359-Speed 5443.46 samples/sec Loss 6.8204 LearningRate 0.1207 Epoch: 7 Global Step: 82420 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:25:45,889-Speed 5439.81 samples/sec Loss 6.8589 LearningRate 0.1207 Epoch: 7 Global Step: 82430 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:25:53,464-Speed 5408.37 samples/sec Loss 6.8821 LearningRate 0.1207 Epoch: 7 Global Step: 82440 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:26:01,011-Speed 5428.56 samples/sec Loss 6.8690 LearningRate 0.1206 Epoch: 7 Global Step: 82450 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:26:08,530-Speed 5447.88 samples/sec Loss 6.7861 LearningRate 0.1206 Epoch: 7 Global Step: 82460 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:26:16,058-Speed 5441.70 samples/sec Loss 6.7754 LearningRate 0.1206 Epoch: 7 Global Step: 82470 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:26:23,589-Speed 5439.55 samples/sec Loss 6.8182 LearningRate 0.1206 Epoch: 7 Global Step: 82480 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:26:31,072-Speed 5474.44 samples/sec Loss 6.7691 LearningRate 0.1206 Epoch: 7 Global Step: 82490 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:26:38,613-Speed 5432.85 samples/sec Loss 6.8011 LearningRate 0.1205 Epoch: 7 Global Step: 82500 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:26:46,126-Speed 5452.39 samples/sec Loss 6.7955 LearningRate 0.1205 Epoch: 7 Global Step: 82510 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:26:53,627-Speed 5461.08 samples/sec Loss 6.7379 LearningRate 0.1205 Epoch: 7 Global Step: 82520 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:27:01,208-Speed 5403.22 samples/sec Loss 6.7885 LearningRate 0.1205 Epoch: 7 Global Step: 82530 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:27:08,712-Speed 5459.87 samples/sec Loss 6.8038 LearningRate 0.1205 Epoch: 7 Global Step: 82540 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:27:16,272-Speed 5418.38 samples/sec Loss 6.7469 LearningRate 0.1204 Epoch: 7 Global Step: 82550 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:27:23,753-Speed 5475.71 samples/sec Loss 6.8891 LearningRate 0.1204 Epoch: 7 Global Step: 82560 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:27:31,395-Speed 5360.50 samples/sec Loss 6.7822 LearningRate 0.1204 Epoch: 7 Global Step: 82570 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:27:38,929-Speed 5438.10 samples/sec Loss 6.7738 LearningRate 0.1204 Epoch: 7 Global Step: 82580 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:27:46,454-Speed 5443.92 samples/sec Loss 6.8650 LearningRate 0.1204 Epoch: 7 Global Step: 82590 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:27:54,024-Speed 5411.01 samples/sec Loss 6.7845 LearningRate 0.1203 Epoch: 7 Global Step: 82600 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-01-08 13:28:01,532-Speed 5456.21 samples/sec Loss 6.8092 LearningRate 0.1203 Epoch: 7 Global Step: 82610 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:28:09,091-Speed 5419.88 samples/sec Loss 6.7614 LearningRate 0.1203 Epoch: 7 Global Step: 82620 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:28:16,657-Speed 5414.00 samples/sec Loss 6.8262 LearningRate 0.1203 Epoch: 7 Global Step: 82630 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:28:24,161-Speed 5458.85 samples/sec Loss 6.7785 LearningRate 0.1203 Epoch: 7 Global Step: 82640 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:28:31,719-Speed 5420.35 samples/sec Loss 6.7980 LearningRate 0.1202 Epoch: 7 Global Step: 82650 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:28:39,350-Speed 5368.91 samples/sec Loss 6.7522 LearningRate 0.1202 Epoch: 7 Global Step: 82660 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:28:46,871-Speed 5446.75 samples/sec Loss 6.7787 LearningRate 0.1202 Epoch: 7 Global Step: 82670 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:28:54,376-Speed 5457.56 samples/sec Loss 6.8106 LearningRate 0.1202 Epoch: 7 Global Step: 82680 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:29:01,943-Speed 5414.10 samples/sec Loss 6.7994 LearningRate 0.1202 Epoch: 7 Global Step: 82690 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:29:09,447-Speed 5459.48 samples/sec Loss 6.7423 LearningRate 0.1202 Epoch: 7 Global Step: 82700 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:29:16,891-Speed 5502.86 samples/sec Loss 6.7746 LearningRate 0.1201 Epoch: 7 Global Step: 82710 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:29:24,346-Speed 5495.44 samples/sec Loss 6.7613 LearningRate 0.1201 Epoch: 7 Global Step: 82720 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:29:31,855-Speed 5455.60 samples/sec Loss 6.7625 LearningRate 0.1201 Epoch: 7 Global Step: 82730 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:29:39,446-Speed 5396.77 samples/sec Loss 6.7965 LearningRate 0.1201 Epoch: 7 Global Step: 82740 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:29:47,020-Speed 5408.95 samples/sec Loss 6.7833 LearningRate 0.1201 Epoch: 7 Global Step: 82750 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:29:54,565-Speed 5428.71 samples/sec Loss 6.7995 LearningRate 0.1200 Epoch: 7 Global Step: 82760 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:30:02,106-Speed 5432.55 samples/sec Loss 6.7880 LearningRate 0.1200 Epoch: 7 Global Step: 82770 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:30:09,638-Speed 5439.28 samples/sec Loss 6.7569 LearningRate 0.1200 Epoch: 7 Global Step: 82780 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:30:17,136-Speed 5463.82 samples/sec Loss 6.8262 LearningRate 0.1200 Epoch: 7 Global Step: 82790 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:30:24,817-Speed 5332.74 samples/sec Loss 6.8493 LearningRate 0.1200 Epoch: 7 Global Step: 82800 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:30:32,321-Speed 5459.16 samples/sec Loss 6.7949 LearningRate 0.1199 Epoch: 7 Global Step: 82810 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:30:39,769-Speed 5499.97 samples/sec Loss 6.8414 LearningRate 0.1199 Epoch: 7 Global Step: 82820 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:30:47,262-Speed 5467.53 samples/sec Loss 6.8470 LearningRate 0.1199 Epoch: 7 Global Step: 82830 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:30:54,841-Speed 5405.43 samples/sec Loss 6.8180 LearningRate 0.1199 Epoch: 7 Global Step: 82840 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:31:02,318-Speed 5478.78 samples/sec Loss 6.8273 LearningRate 0.1199 Epoch: 7 Global Step: 82850 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:31:09,855-Speed 5434.59 samples/sec Loss 6.8278 LearningRate 0.1198 Epoch: 7 Global Step: 82860 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:31:17,375-Speed 5447.76 samples/sec Loss 6.8658 LearningRate 0.1198 Epoch: 7 Global Step: 82870 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:31:24,876-Speed 5461.78 samples/sec Loss 6.8169 LearningRate 0.1198 Epoch: 7 Global Step: 82880 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:31:32,402-Speed 5442.72 samples/sec Loss 6.7158 LearningRate 0.1198 Epoch: 7 Global Step: 82890 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:31:40,015-Speed 5380.32 samples/sec Loss 6.7418 LearningRate 0.1198 Epoch: 7 Global Step: 82900 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:31:47,658-Speed 5360.40 samples/sec Loss 6.8020 LearningRate 0.1197 Epoch: 7 Global Step: 82910 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:31:55,346-Speed 5328.65 samples/sec Loss 6.7827 LearningRate 0.1197 Epoch: 7 Global Step: 82920 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:32:02,964-Speed 5377.20 samples/sec Loss 6.7356 LearningRate 0.1197 Epoch: 7 Global Step: 82930 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:32:10,572-Speed 5384.25 samples/sec Loss 6.7332 LearningRate 0.1197 Epoch: 7 Global Step: 82940 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:32:18,205-Speed 5367.23 samples/sec Loss 6.7804 LearningRate 0.1197 Epoch: 7 Global Step: 82950 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:32:42,090-Speed 1714.95 samples/sec Loss 6.8057 LearningRate 0.1197 Epoch: 8 Global Step: 82960 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:32:49,570-Speed 5476.42 samples/sec Loss 6.7310 LearningRate 0.1196 Epoch: 8 Global Step: 82970 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:32:57,015-Speed 5502.73 samples/sec Loss 6.7719 LearningRate 0.1196 Epoch: 8 Global Step: 82980 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:33:04,514-Speed 5462.88 samples/sec Loss 6.8267 LearningRate 0.1196 Epoch: 8 Global Step: 82990 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:33:12,028-Speed 5451.84 samples/sec Loss 6.7499 LearningRate 0.1196 Epoch: 8 Global Step: 83000 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:33:19,516-Speed 5470.86 samples/sec Loss 6.7412 LearningRate 0.1196 Epoch: 8 Global Step: 83010 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:33:27,014-Speed 5463.61 samples/sec Loss 6.7547 LearningRate 0.1195 Epoch: 8 Global Step: 83020 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:33:34,519-Speed 5458.64 samples/sec Loss 6.8006 LearningRate 0.1195 Epoch: 8 Global Step: 83030 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:33:42,078-Speed 5419.35 samples/sec Loss 6.7273 LearningRate 0.1195 Epoch: 8 Global Step: 83040 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:33:49,782-Speed 5317.95 samples/sec Loss 6.7542 LearningRate 0.1195 Epoch: 8 Global Step: 83050 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-01-08 13:33:57,249-Speed 5485.83 samples/sec Loss 6.7656 LearningRate 0.1195 Epoch: 8 Global Step: 83060 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:34:04,803-Speed 5423.26 samples/sec Loss 6.8000 LearningRate 0.1194 Epoch: 8 Global Step: 83070 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:34:12,382-Speed 5405.49 samples/sec Loss 6.7368 LearningRate 0.1194 Epoch: 8 Global Step: 83080 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-01-08 13:34:19,848-Speed 5486.98 samples/sec Loss 6.7328 LearningRate 0.1194 Epoch: 8 Global Step: 83090 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:34:27,274-Speed 5516.91 samples/sec Loss 6.7861 LearningRate 0.1194 Epoch: 8 Global Step: 83100 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:34:34,732-Speed 5492.31 samples/sec Loss 6.7516 LearningRate 0.1194 Epoch: 8 Global Step: 83110 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:34:42,193-Speed 5491.01 samples/sec Loss 6.7344 LearningRate 0.1193 Epoch: 8 Global Step: 83120 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:34:49,930-Speed 5294.66 samples/sec Loss 6.7283 LearningRate 0.1193 Epoch: 8 Global Step: 83130 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:34:57,635-Speed 5317.02 samples/sec Loss 6.7435 LearningRate 0.1193 Epoch: 8 Global Step: 83140 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:35:05,246-Speed 5382.17 samples/sec Loss 6.7458 LearningRate 0.1193 Epoch: 8 Global Step: 83150 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:35:13,094-Speed 5220.19 samples/sec Loss 6.7048 LearningRate 0.1193 Epoch: 8 Global Step: 83160 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:35:20,703-Speed 5383.69 samples/sec Loss 6.7427 LearningRate 0.1192 Epoch: 8 Global Step: 83170 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:35:28,302-Speed 5391.04 samples/sec Loss 6.8020 LearningRate 0.1192 Epoch: 8 Global Step: 83180 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:35:35,927-Speed 5372.24 samples/sec Loss 6.7153 LearningRate 0.1192 Epoch: 8 Global Step: 83190 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:35:43,582-Speed 5352.13 samples/sec Loss 6.7000 LearningRate 0.1192 Epoch: 8 Global Step: 83200 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:35:51,194-Speed 5381.01 samples/sec Loss 6.7783 LearningRate 0.1192 Epoch: 8 Global Step: 83210 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:35:58,867-Speed 5339.18 samples/sec Loss 6.6986 LearningRate 0.1192 Epoch: 8 Global Step: 83220 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:36:06,500-Speed 5367.04 samples/sec Loss 6.8127 LearningRate 0.1191 Epoch: 8 Global Step: 83230 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:36:14,114-Speed 5380.75 samples/sec Loss 6.7492 LearningRate 0.1191 Epoch: 8 Global Step: 83240 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:36:21,755-Speed 5360.99 samples/sec Loss 6.6518 LearningRate 0.1191 Epoch: 8 Global Step: 83250 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:36:29,222-Speed 5485.92 samples/sec Loss 6.6992 LearningRate 0.1191 Epoch: 8 Global Step: 83260 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:36:36,668-Speed 5501.80 samples/sec Loss 6.7568 LearningRate 0.1191 Epoch: 8 Global Step: 83270 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:36:44,101-Speed 5511.50 samples/sec Loss 6.7556 LearningRate 0.1190 Epoch: 8 Global Step: 83280 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:36:51,516-Speed 5524.46 samples/sec Loss 6.7757 LearningRate 0.1190 Epoch: 8 Global Step: 83290 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:36:58,943-Speed 5515.17 samples/sec Loss 6.7626 LearningRate 0.1190 Epoch: 8 Global Step: 83300 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 13:37:06,387-Speed 5503.55 samples/sec Loss 6.7023 LearningRate 0.1190 Epoch: 8 Global Step: 83310 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 13:37:13,845-Speed 5492.58 samples/sec Loss 6.7527 LearningRate 0.1190 Epoch: 8 Global Step: 83320 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 13:37:21,313-Speed 5485.56 samples/sec Loss 6.7138 LearningRate 0.1189 Epoch: 8 Global Step: 83330 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 13:37:28,805-Speed 5467.81 samples/sec Loss 6.7285 LearningRate 0.1189 Epoch: 8 Global Step: 83340 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 13:37:36,324-Speed 5447.97 samples/sec Loss 6.7674 LearningRate 0.1189 Epoch: 8 Global Step: 83350 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 13:37:43,773-Speed 5499.85 samples/sec Loss 6.7373 LearningRate 0.1189 Epoch: 8 Global Step: 83360 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 13:37:51,275-Speed 5460.63 samples/sec Loss 6.6828 LearningRate 0.1189 Epoch: 8 Global Step: 83370 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 13:37:58,733-Speed 5492.61 samples/sec Loss 6.7673 LearningRate 0.1188 Epoch: 8 Global Step: 83380 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 13:38:06,304-Speed 5410.50 samples/sec Loss 6.7964 LearningRate 0.1188 Epoch: 8 Global Step: 83390 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 13:38:14,003-Speed 5321.30 samples/sec Loss 6.7275 LearningRate 0.1188 Epoch: 8 Global Step: 83400 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:38:21,575-Speed 5410.08 samples/sec Loss 6.7134 LearningRate 0.1188 Epoch: 8 Global Step: 83410 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:38:29,056-Speed 5476.09 samples/sec Loss 6.7488 LearningRate 0.1188 Epoch: 8 Global Step: 83420 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:38:36,508-Speed 5497.22 samples/sec Loss 6.7602 LearningRate 0.1187 Epoch: 8 Global Step: 83430 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:38:43,970-Speed 5489.75 samples/sec Loss 6.7365 LearningRate 0.1187 Epoch: 8 Global Step: 83440 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:38:51,397-Speed 5515.92 samples/sec Loss 6.7427 LearningRate 0.1187 Epoch: 8 Global Step: 83450 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:38:58,815-Speed 5522.40 samples/sec Loss 6.7299 LearningRate 0.1187 Epoch: 8 Global Step: 83460 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:39:06,368-Speed 5423.33 samples/sec Loss 6.7190 LearningRate 0.1187 Epoch: 8 Global Step: 83470 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:39:13,943-Speed 5408.58 samples/sec Loss 6.7309 LearningRate 0.1187 Epoch: 8 Global Step: 83480 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:39:21,391-Speed 5499.88 samples/sec Loss 6.7657 LearningRate 0.1186 Epoch: 8 Global Step: 83490 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:39:28,955-Speed 5416.38 samples/sec Loss 6.7365 LearningRate 0.1186 Epoch: 8 Global Step: 83500 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:39:36,591-Speed 5364.36 samples/sec Loss 6.7440 LearningRate 0.1186 Epoch: 8 Global Step: 83510 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:39:44,147-Speed 5421.84 samples/sec Loss 6.7696 LearningRate 0.1186 Epoch: 8 Global Step: 83520 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:39:51,628-Speed 5475.98 samples/sec Loss 6.7055 LearningRate 0.1186 Epoch: 8 Global Step: 83530 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:39:59,077-Speed 5499.94 samples/sec Loss 6.7403 LearningRate 0.1185 Epoch: 8 Global Step: 83540 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:40:06,635-Speed 5419.52 samples/sec Loss 6.7208 LearningRate 0.1185 Epoch: 8 Global Step: 83550 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:40:14,188-Speed 5424.58 samples/sec Loss 6.7118 LearningRate 0.1185 Epoch: 8 Global Step: 83560 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:40:21,702-Speed 5451.46 samples/sec Loss 6.7500 LearningRate 0.1185 Epoch: 8 Global Step: 83570 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:40:29,251-Speed 5426.84 samples/sec Loss 6.7194 LearningRate 0.1185 Epoch: 8 Global Step: 83580 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:40:36,790-Speed 5433.43 samples/sec Loss 6.7658 LearningRate 0.1184 Epoch: 8 Global Step: 83590 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:40:44,385-Speed 5394.36 samples/sec Loss 6.7546 LearningRate 0.1184 Epoch: 8 Global Step: 83600 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:40:51,895-Speed 5454.32 samples/sec Loss 6.7519 LearningRate 0.1184 Epoch: 8 Global Step: 83610 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:40:59,440-Speed 5429.57 samples/sec Loss 6.7388 LearningRate 0.1184 Epoch: 8 Global Step: 83620 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:41:06,992-Speed 5424.35 samples/sec Loss 6.7276 LearningRate 0.1184 Epoch: 8 Global Step: 83630 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:41:14,603-Speed 5383.12 samples/sec Loss 6.7384 LearningRate 0.1183 Epoch: 8 Global Step: 83640 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:41:22,138-Speed 5435.99 samples/sec Loss 6.7601 LearningRate 0.1183 Epoch: 8 Global Step: 83650 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:41:29,668-Speed 5440.36 samples/sec Loss 6.7822 LearningRate 0.1183 Epoch: 8 Global Step: 83660 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:41:37,212-Speed 5430.59 samples/sec Loss 6.7569 LearningRate 0.1183 Epoch: 8 Global Step: 83670 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:41:44,640-Speed 5514.95 samples/sec Loss 6.7695 LearningRate 0.1183 Epoch: 8 Global Step: 83680 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:41:52,110-Speed 5484.13 samples/sec Loss 6.7680 LearningRate 0.1183 Epoch: 8 Global Step: 83690 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:41:59,655-Speed 5429.42 samples/sec Loss 6.7301 LearningRate 0.1182 Epoch: 8 Global Step: 83700 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:42:07,129-Speed 5481.60 samples/sec Loss 6.6840 LearningRate 0.1182 Epoch: 8 Global Step: 83710 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:42:14,684-Speed 5422.24 samples/sec Loss 6.7314 LearningRate 0.1182 Epoch: 8 Global Step: 83720 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:42:22,200-Speed 5450.19 samples/sec Loss 6.6555 LearningRate 0.1182 Epoch: 8 Global Step: 83730 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:42:29,804-Speed 5387.88 samples/sec Loss 6.7391 LearningRate 0.1182 Epoch: 8 Global Step: 83740 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:42:37,372-Speed 5412.60 samples/sec Loss 6.7288 LearningRate 0.1181 Epoch: 8 Global Step: 83750 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:42:44,965-Speed 5395.33 samples/sec Loss 6.7925 LearningRate 0.1181 Epoch: 8 Global Step: 83760 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:42:52,532-Speed 5413.80 samples/sec Loss 6.7585 LearningRate 0.1181 Epoch: 8 Global Step: 83770 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:43:00,119-Speed 5399.71 samples/sec Loss 6.6597 LearningRate 0.1181 Epoch: 8 Global Step: 83780 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:43:07,776-Speed 5349.96 samples/sec Loss 6.6532 LearningRate 0.1181 Epoch: 8 Global Step: 83790 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:43:15,276-Speed 5461.98 samples/sec Loss 6.7538 LearningRate 0.1180 Epoch: 8 Global Step: 83800 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:43:22,790-Speed 5451.94 samples/sec Loss 6.7616 LearningRate 0.1180 Epoch: 8 Global Step: 83810 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:43:30,314-Speed 5444.37 samples/sec Loss 6.7258 LearningRate 0.1180 Epoch: 8 Global Step: 83820 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:43:37,765-Speed 5498.03 samples/sec Loss 6.7050 LearningRate 0.1180 Epoch: 8 Global Step: 83830 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:43:45,230-Speed 5487.72 samples/sec Loss 6.7559 LearningRate 0.1180 Epoch: 8 Global Step: 83840 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:43:52,809-Speed 5405.29 samples/sec Loss 6.7673 LearningRate 0.1179 Epoch: 8 Global Step: 83850 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:44:00,432-Speed 5374.05 samples/sec Loss 6.7110 LearningRate 0.1179 Epoch: 8 Global Step: 83860 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:44:07,928-Speed 5464.88 samples/sec Loss 6.7734 LearningRate 0.1179 Epoch: 8 Global Step: 83870 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:44:15,514-Speed 5400.23 samples/sec Loss 6.7774 LearningRate 0.1179 Epoch: 8 Global Step: 83880 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:44:22,982-Speed 5485.45 samples/sec Loss 6.7304 LearningRate 0.1179 Epoch: 8 Global Step: 83890 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:44:30,478-Speed 5465.48 samples/sec Loss 6.7360 LearningRate 0.1179 Epoch: 8 Global Step: 83900 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:44:38,004-Speed 5443.01 samples/sec Loss 6.7352 LearningRate 0.1178 Epoch: 8 Global Step: 83910 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:44:45,444-Speed 5506.49 samples/sec Loss 6.7983 LearningRate 0.1178 Epoch: 8 Global Step: 83920 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:44:52,916-Speed 5482.04 samples/sec Loss 6.7211 LearningRate 0.1178 Epoch: 8 Global Step: 83930 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:45:00,437-Speed 5447.23 samples/sec Loss 6.6908 LearningRate 0.1178 Epoch: 8 Global Step: 83940 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:45:08,082-Speed 5358.30 samples/sec Loss 6.6897 LearningRate 0.1178 Epoch: 8 Global Step: 83950 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:45:15,719-Speed 5364.01 samples/sec Loss 6.7198 LearningRate 0.1177 Epoch: 8 Global Step: 83960 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:45:23,174-Speed 5495.03 samples/sec Loss 6.6763 LearningRate 0.1177 Epoch: 8 Global Step: 83970 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:45:30,742-Speed 5413.09 samples/sec Loss 6.6443 LearningRate 0.1177 Epoch: 8 Global Step: 83980 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:45:38,263-Speed 5446.65 samples/sec Loss 6.6775 LearningRate 0.1177 Epoch: 8 Global Step: 83990 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:45:45,755-Speed 5467.87 samples/sec Loss 6.6942 LearningRate 0.1177 Epoch: 8 Global Step: 84000 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:46:29,826-[lfw][84000]XNorm: 22.226733 Training: 2022-01-08 13:46:29,827-[lfw][84000]Accuracy-Flip: 0.99783+-0.00279 Training: 2022-01-08 13:46:29,827-[lfw][84000]Accuracy-Highest: 0.99817 Training: 2022-01-08 13:47:21,553-[cfp_fp][84000]XNorm: 20.178478 Training: 2022-01-08 13:47:21,554-[cfp_fp][84000]Accuracy-Flip: 0.98671+-0.00507 Training: 2022-01-08 13:47:21,555-[cfp_fp][84000]Accuracy-Highest: 0.98814 Training: 2022-01-08 13:48:07,354-[agedb_30][84000]XNorm: 21.844130 Training: 2022-01-08 13:48:07,355-[agedb_30][84000]Accuracy-Flip: 0.97350+-0.01004 Training: 2022-01-08 13:48:07,355-[agedb_30][84000]Accuracy-Highest: 0.97667 Training: 2022-01-08 13:48:14,836-Speed 274.75 samples/sec Loss 6.7012 LearningRate 0.1176 Epoch: 8 Global Step: 84010 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:48:22,470-Speed 5366.47 samples/sec Loss 6.7188 LearningRate 0.1176 Epoch: 8 Global Step: 84020 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:48:30,017-Speed 5428.36 samples/sec Loss 6.7422 LearningRate 0.1176 Epoch: 8 Global Step: 84030 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:48:37,653-Speed 5365.60 samples/sec Loss 6.6631 LearningRate 0.1176 Epoch: 8 Global Step: 84040 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:48:45,332-Speed 5335.16 samples/sec Loss 6.7228 LearningRate 0.1176 Epoch: 8 Global Step: 84050 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:48:52,835-Speed 5459.74 samples/sec Loss 6.6781 LearningRate 0.1175 Epoch: 8 Global Step: 84060 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:49:00,465-Speed 5369.05 samples/sec Loss 6.6801 LearningRate 0.1175 Epoch: 8 Global Step: 84070 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:49:08,061-Speed 5393.18 samples/sec Loss 6.7173 LearningRate 0.1175 Epoch: 8 Global Step: 84080 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:49:15,531-Speed 5484.06 samples/sec Loss 6.6935 LearningRate 0.1175 Epoch: 8 Global Step: 84090 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:49:23,107-Speed 5407.66 samples/sec Loss 6.7323 LearningRate 0.1175 Epoch: 8 Global Step: 84100 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:49:30,813-Speed 5315.75 samples/sec Loss 6.7788 LearningRate 0.1175 Epoch: 8 Global Step: 84110 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:49:38,286-Speed 5481.49 samples/sec Loss 6.6747 LearningRate 0.1174 Epoch: 8 Global Step: 84120 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:49:45,770-Speed 5473.98 samples/sec Loss 6.6829 LearningRate 0.1174 Epoch: 8 Global Step: 84130 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:49:53,199-Speed 5513.87 samples/sec Loss 6.7607 LearningRate 0.1174 Epoch: 8 Global Step: 84140 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:50:00,722-Speed 5445.92 samples/sec Loss 6.6801 LearningRate 0.1174 Epoch: 8 Global Step: 84150 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:50:08,252-Speed 5440.40 samples/sec Loss 6.6725 LearningRate 0.1174 Epoch: 8 Global Step: 84160 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:50:15,855-Speed 5388.36 samples/sec Loss 6.7276 LearningRate 0.1173 Epoch: 8 Global Step: 84170 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:50:23,360-Speed 5457.82 samples/sec Loss 6.7520 LearningRate 0.1173 Epoch: 8 Global Step: 84180 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:50:30,838-Speed 5478.20 samples/sec Loss 6.7588 LearningRate 0.1173 Epoch: 8 Global Step: 84190 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:50:38,311-Speed 5482.16 samples/sec Loss 6.6819 LearningRate 0.1173 Epoch: 8 Global Step: 84200 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:50:45,789-Speed 5478.01 samples/sec Loss 6.7098 LearningRate 0.1173 Epoch: 8 Global Step: 84210 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:50:53,444-Speed 5351.60 samples/sec Loss 6.7440 LearningRate 0.1172 Epoch: 8 Global Step: 84220 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:51:01,129-Speed 5330.50 samples/sec Loss 6.6540 LearningRate 0.1172 Epoch: 8 Global Step: 84230 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:51:08,674-Speed 5429.61 samples/sec Loss 6.6974 LearningRate 0.1172 Epoch: 8 Global Step: 84240 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:51:16,106-Speed 5512.23 samples/sec Loss 6.6935 LearningRate 0.1172 Epoch: 8 Global Step: 84250 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:51:23,606-Speed 5462.19 samples/sec Loss 6.6991 LearningRate 0.1172 Epoch: 8 Global Step: 84260 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:51:31,098-Speed 5467.34 samples/sec Loss 6.6913 LearningRate 0.1171 Epoch: 8 Global Step: 84270 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:51:38,522-Speed 5518.41 samples/sec Loss 6.6136 LearningRate 0.1171 Epoch: 8 Global Step: 84280 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:51:45,964-Speed 5504.38 samples/sec Loss 6.6909 LearningRate 0.1171 Epoch: 8 Global Step: 84290 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:51:53,440-Speed 5479.81 samples/sec Loss 6.6680 LearningRate 0.1171 Epoch: 8 Global Step: 84300 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:52:00,894-Speed 5495.25 samples/sec Loss 6.7151 LearningRate 0.1171 Epoch: 8 Global Step: 84310 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:52:08,439-Speed 5429.24 samples/sec Loss 6.6489 LearningRate 0.1171 Epoch: 8 Global Step: 84320 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:52:15,909-Speed 5484.30 samples/sec Loss 6.7609 LearningRate 0.1170 Epoch: 8 Global Step: 84330 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:52:23,339-Speed 5513.37 samples/sec Loss 6.6413 LearningRate 0.1170 Epoch: 8 Global Step: 84340 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:52:30,793-Speed 5496.19 samples/sec Loss 6.7017 LearningRate 0.1170 Epoch: 8 Global Step: 84350 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:52:38,258-Speed 5487.75 samples/sec Loss 6.6317 LearningRate 0.1170 Epoch: 8 Global Step: 84360 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:52:45,750-Speed 5467.42 samples/sec Loss 6.6771 LearningRate 0.1170 Epoch: 8 Global Step: 84370 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:52:53,258-Speed 5456.95 samples/sec Loss 6.7131 LearningRate 0.1169 Epoch: 8 Global Step: 84380 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:53:00,723-Speed 5487.58 samples/sec Loss 6.7536 LearningRate 0.1169 Epoch: 8 Global Step: 84390 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:53:08,296-Speed 5409.32 samples/sec Loss 6.6427 LearningRate 0.1169 Epoch: 8 Global Step: 84400 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:53:15,907-Speed 5381.88 samples/sec Loss 6.6591 LearningRate 0.1169 Epoch: 8 Global Step: 84410 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:53:23,433-Speed 5443.53 samples/sec Loss 6.6570 LearningRate 0.1169 Epoch: 8 Global Step: 84420 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:53:30,897-Speed 5488.21 samples/sec Loss 6.7111 LearningRate 0.1168 Epoch: 8 Global Step: 84430 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:53:38,355-Speed 5492.87 samples/sec Loss 6.6456 LearningRate 0.1168 Epoch: 8 Global Step: 84440 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:53:45,815-Speed 5491.76 samples/sec Loss 6.7416 LearningRate 0.1168 Epoch: 8 Global Step: 84450 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:53:53,271-Speed 5494.38 samples/sec Loss 6.7094 LearningRate 0.1168 Epoch: 8 Global Step: 84460 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:54:00,823-Speed 5424.09 samples/sec Loss 6.6838 LearningRate 0.1168 Epoch: 8 Global Step: 84470 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:54:08,536-Speed 5311.21 samples/sec Loss 6.6623 LearningRate 0.1167 Epoch: 8 Global Step: 84480 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:54:16,059-Speed 5446.01 samples/sec Loss 6.6271 LearningRate 0.1167 Epoch: 8 Global Step: 84490 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:54:23,601-Speed 5431.73 samples/sec Loss 6.6358 LearningRate 0.1167 Epoch: 8 Global Step: 84500 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:54:31,101-Speed 5461.80 samples/sec Loss 6.6724 LearningRate 0.1167 Epoch: 8 Global Step: 84510 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:54:38,579-Speed 5478.11 samples/sec Loss 6.7000 LearningRate 0.1167 Epoch: 8 Global Step: 84520 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:54:46,104-Speed 5443.71 samples/sec Loss 6.7025 LearningRate 0.1167 Epoch: 8 Global Step: 84530 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:54:53,583-Speed 5477.68 samples/sec Loss 6.7022 LearningRate 0.1166 Epoch: 8 Global Step: 84540 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:55:01,114-Speed 5440.03 samples/sec Loss 6.6689 LearningRate 0.1166 Epoch: 8 Global Step: 84550 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:55:08,589-Speed 5480.33 samples/sec Loss 6.7097 LearningRate 0.1166 Epoch: 8 Global Step: 84560 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:55:16,104-Speed 5450.61 samples/sec Loss 6.7188 LearningRate 0.1166 Epoch: 8 Global Step: 84570 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:55:23,577-Speed 5482.34 samples/sec Loss 6.6684 LearningRate 0.1166 Epoch: 8 Global Step: 84580 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:55:31,156-Speed 5405.10 samples/sec Loss 6.6914 LearningRate 0.1165 Epoch: 8 Global Step: 84590 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:55:38,629-Speed 5481.51 samples/sec Loss 6.6629 LearningRate 0.1165 Epoch: 8 Global Step: 84600 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:55:46,102-Speed 5482.29 samples/sec Loss 6.5725 LearningRate 0.1165 Epoch: 8 Global Step: 84610 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:55:53,720-Speed 5377.50 samples/sec Loss 6.6533 LearningRate 0.1165 Epoch: 8 Global Step: 84620 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:56:01,232-Speed 5453.68 samples/sec Loss 6.6627 LearningRate 0.1165 Epoch: 8 Global Step: 84630 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 13:56:08,693-Speed 5490.46 samples/sec Loss 6.6788 LearningRate 0.1164 Epoch: 8 Global Step: 84640 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:56:16,214-Speed 5447.09 samples/sec Loss 6.7088 LearningRate 0.1164 Epoch: 8 Global Step: 84650 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:56:23,771-Speed 5420.36 samples/sec Loss 6.6853 LearningRate 0.1164 Epoch: 8 Global Step: 84660 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:56:31,234-Speed 5489.00 samples/sec Loss 6.6726 LearningRate 0.1164 Epoch: 8 Global Step: 84670 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:56:38,643-Speed 5529.18 samples/sec Loss 6.6997 LearningRate 0.1164 Epoch: 8 Global Step: 84680 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:56:46,088-Speed 5502.89 samples/sec Loss 6.7408 LearningRate 0.1163 Epoch: 8 Global Step: 84690 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:56:53,494-Speed 5531.40 samples/sec Loss 6.6564 LearningRate 0.1163 Epoch: 8 Global Step: 84700 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:57:00,972-Speed 5478.18 samples/sec Loss 6.6698 LearningRate 0.1163 Epoch: 8 Global Step: 84710 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:57:08,470-Speed 5463.10 samples/sec Loss 6.6970 LearningRate 0.1163 Epoch: 8 Global Step: 84720 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:57:15,973-Speed 5459.98 samples/sec Loss 6.6650 LearningRate 0.1163 Epoch: 8 Global Step: 84730 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:57:23,397-Speed 5518.51 samples/sec Loss 6.6756 LearningRate 0.1163 Epoch: 8 Global Step: 84740 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:57:30,928-Speed 5439.08 samples/sec Loss 6.6847 LearningRate 0.1162 Epoch: 8 Global Step: 84750 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:57:38,415-Speed 5472.01 samples/sec Loss 6.6914 LearningRate 0.1162 Epoch: 8 Global Step: 84760 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:57:45,929-Speed 5452.13 samples/sec Loss 6.6630 LearningRate 0.1162 Epoch: 8 Global Step: 84770 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:57:53,512-Speed 5401.77 samples/sec Loss 6.6886 LearningRate 0.1162 Epoch: 8 Global Step: 84780 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:58:01,175-Speed 5345.79 samples/sec Loss 6.6029 LearningRate 0.1162 Epoch: 8 Global Step: 84790 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:58:08,862-Speed 5329.71 samples/sec Loss 6.6242 LearningRate 0.1161 Epoch: 8 Global Step: 84800 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:58:16,552-Speed 5327.26 samples/sec Loss 6.6601 LearningRate 0.1161 Epoch: 8 Global Step: 84810 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:58:24,084-Speed 5438.94 samples/sec Loss 6.7393 LearningRate 0.1161 Epoch: 8 Global Step: 84820 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:58:31,585-Speed 5460.72 samples/sec Loss 6.6310 LearningRate 0.1161 Epoch: 8 Global Step: 84830 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:58:39,044-Speed 5491.99 samples/sec Loss 6.6463 LearningRate 0.1161 Epoch: 8 Global Step: 84840 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:58:46,539-Speed 5465.66 samples/sec Loss 6.6610 LearningRate 0.1160 Epoch: 8 Global Step: 84850 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:58:54,045-Speed 5457.74 samples/sec Loss 6.6573 LearningRate 0.1160 Epoch: 8 Global Step: 84860 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:59:01,661-Speed 5378.89 samples/sec Loss 6.6794 LearningRate 0.1160 Epoch: 8 Global Step: 84870 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:59:09,154-Speed 5467.21 samples/sec Loss 6.6377 LearningRate 0.1160 Epoch: 8 Global Step: 84880 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:59:16,661-Speed 5456.73 samples/sec Loss 6.6345 LearningRate 0.1160 Epoch: 8 Global Step: 84890 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:59:24,183-Speed 5445.97 samples/sec Loss 6.6645 LearningRate 0.1159 Epoch: 8 Global Step: 84900 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:59:31,871-Speed 5328.98 samples/sec Loss 6.6578 LearningRate 0.1159 Epoch: 8 Global Step: 84910 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:59:39,343-Speed 5482.57 samples/sec Loss 6.6042 LearningRate 0.1159 Epoch: 8 Global Step: 84920 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:59:46,835-Speed 5467.67 samples/sec Loss 6.6086 LearningRate 0.1159 Epoch: 8 Global Step: 84930 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 13:59:54,290-Speed 5494.49 samples/sec Loss 6.6345 LearningRate 0.1159 Epoch: 8 Global Step: 84940 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:00:01,807-Speed 5449.58 samples/sec Loss 6.6443 LearningRate 0.1159 Epoch: 8 Global Step: 84950 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:00:09,297-Speed 5469.69 samples/sec Loss 6.6419 LearningRate 0.1158 Epoch: 8 Global Step: 84960 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:00:16,709-Speed 5526.67 samples/sec Loss 6.6365 LearningRate 0.1158 Epoch: 8 Global Step: 84970 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:00:24,159-Speed 5498.98 samples/sec Loss 6.6748 LearningRate 0.1158 Epoch: 8 Global Step: 84980 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:00:31,686-Speed 5442.66 samples/sec Loss 6.5995 LearningRate 0.1158 Epoch: 8 Global Step: 84990 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:00:39,169-Speed 5474.28 samples/sec Loss 6.6704 LearningRate 0.1158 Epoch: 8 Global Step: 85000 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:00:46,830-Speed 5346.93 samples/sec Loss 6.6853 LearningRate 0.1157 Epoch: 8 Global Step: 85010 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:00:54,355-Speed 5443.70 samples/sec Loss 6.6442 LearningRate 0.1157 Epoch: 8 Global Step: 85020 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:01:01,875-Speed 5447.75 samples/sec Loss 6.6507 LearningRate 0.1157 Epoch: 8 Global Step: 85030 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:01:09,405-Speed 5440.22 samples/sec Loss 6.5955 LearningRate 0.1157 Epoch: 8 Global Step: 85040 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:01:16,930-Speed 5444.12 samples/sec Loss 6.6117 LearningRate 0.1157 Epoch: 8 Global Step: 85050 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:01:24,447-Speed 5449.37 samples/sec Loss 6.6186 LearningRate 0.1156 Epoch: 8 Global Step: 85060 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:01:32,030-Speed 5443.67 samples/sec Loss 6.6410 LearningRate 0.1156 Epoch: 8 Global Step: 85070 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:01:39,681-Speed 5370.53 samples/sec Loss 6.6339 LearningRate 0.1156 Epoch: 8 Global Step: 85080 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:01:47,137-Speed 5494.91 samples/sec Loss 6.6984 LearningRate 0.1156 Epoch: 8 Global Step: 85090 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:01:54,662-Speed 5443.44 samples/sec Loss 6.5941 LearningRate 0.1156 Epoch: 8 Global Step: 85100 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:02:02,157-Speed 5465.28 samples/sec Loss 6.6401 LearningRate 0.1156 Epoch: 8 Global Step: 85110 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:02:09,709-Speed 5424.98 samples/sec Loss 6.6007 LearningRate 0.1155 Epoch: 8 Global Step: 85120 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:02:17,363-Speed 5352.07 samples/sec Loss 6.6086 LearningRate 0.1155 Epoch: 8 Global Step: 85130 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:02:24,910-Speed 5428.33 samples/sec Loss 6.6530 LearningRate 0.1155 Epoch: 8 Global Step: 85140 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:02:32,360-Speed 5498.20 samples/sec Loss 6.6298 LearningRate 0.1155 Epoch: 8 Global Step: 85150 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:02:39,837-Speed 5478.76 samples/sec Loss 6.6294 LearningRate 0.1155 Epoch: 8 Global Step: 85160 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:02:47,373-Speed 5436.30 samples/sec Loss 6.5985 LearningRate 0.1154 Epoch: 8 Global Step: 85170 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:02:54,938-Speed 5415.52 samples/sec Loss 6.6267 LearningRate 0.1154 Epoch: 8 Global Step: 85180 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:03:02,497-Speed 5419.39 samples/sec Loss 6.6779 LearningRate 0.1154 Epoch: 8 Global Step: 85190 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:03:10,057-Speed 5418.60 samples/sec Loss 6.5809 LearningRate 0.1154 Epoch: 8 Global Step: 85200 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:03:17,686-Speed 5369.93 samples/sec Loss 6.6340 LearningRate 0.1154 Epoch: 8 Global Step: 85210 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:03:25,241-Speed 5422.03 samples/sec Loss 6.6439 LearningRate 0.1153 Epoch: 8 Global Step: 85220 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:03:32,743-Speed 5460.76 samples/sec Loss 6.6016 LearningRate 0.1153 Epoch: 8 Global Step: 85230 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:03:40,253-Speed 5454.66 samples/sec Loss 6.6276 LearningRate 0.1153 Epoch: 8 Global Step: 85240 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:03:47,699-Speed 5501.96 samples/sec Loss 6.6952 LearningRate 0.1153 Epoch: 8 Global Step: 85250 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:03:55,236-Speed 5435.13 samples/sec Loss 6.6535 LearningRate 0.1153 Epoch: 8 Global Step: 85260 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:04:02,685-Speed 5498.82 samples/sec Loss 6.6060 LearningRate 0.1153 Epoch: 8 Global Step: 85270 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:04:10,255-Speed 5412.12 samples/sec Loss 6.6795 LearningRate 0.1152 Epoch: 8 Global Step: 85280 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:04:17,839-Speed 5401.68 samples/sec Loss 6.6528 LearningRate 0.1152 Epoch: 8 Global Step: 85290 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:04:25,425-Speed 5399.88 samples/sec Loss 6.6296 LearningRate 0.1152 Epoch: 8 Global Step: 85300 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:04:32,923-Speed 5464.34 samples/sec Loss 6.6822 LearningRate 0.1152 Epoch: 8 Global Step: 85310 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:04:40,514-Speed 5396.32 samples/sec Loss 6.7030 LearningRate 0.1152 Epoch: 8 Global Step: 85320 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:04:48,054-Speed 5432.67 samples/sec Loss 6.6743 LearningRate 0.1151 Epoch: 8 Global Step: 85330 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:04:55,580-Speed 5443.56 samples/sec Loss 6.6477 LearningRate 0.1151 Epoch: 8 Global Step: 85340 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:05:03,093-Speed 5452.95 samples/sec Loss 6.6743 LearningRate 0.1151 Epoch: 8 Global Step: 85350 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:05:10,529-Speed 5508.41 samples/sec Loss 6.6544 LearningRate 0.1151 Epoch: 8 Global Step: 85360 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:05:17,977-Speed 5500.69 samples/sec Loss 6.6225 LearningRate 0.1151 Epoch: 8 Global Step: 85370 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:05:25,482-Speed 5458.07 samples/sec Loss 6.6257 LearningRate 0.1150 Epoch: 8 Global Step: 85380 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:05:32,962-Speed 5476.94 samples/sec Loss 6.6041 LearningRate 0.1150 Epoch: 8 Global Step: 85390 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:05:40,474-Speed 5453.18 samples/sec Loss 6.6351 LearningRate 0.1150 Epoch: 8 Global Step: 85400 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:05:48,063-Speed 5398.60 samples/sec Loss 6.6061 LearningRate 0.1150 Epoch: 8 Global Step: 85410 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:05:55,655-Speed 5395.55 samples/sec Loss 6.5946 LearningRate 0.1150 Epoch: 8 Global Step: 85420 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:06:03,122-Speed 5486.26 samples/sec Loss 6.5891 LearningRate 0.1149 Epoch: 8 Global Step: 85430 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:06:10,639-Speed 5449.89 samples/sec Loss 6.6104 LearningRate 0.1149 Epoch: 8 Global Step: 85440 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:06:18,200-Speed 5417.61 samples/sec Loss 6.5990 LearningRate 0.1149 Epoch: 8 Global Step: 85450 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:06:25,654-Speed 5496.27 samples/sec Loss 6.6013 LearningRate 0.1149 Epoch: 8 Global Step: 85460 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:06:33,129-Speed 5480.35 samples/sec Loss 6.6100 LearningRate 0.1149 Epoch: 8 Global Step: 85470 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:06:40,690-Speed 5417.63 samples/sec Loss 6.6041 LearningRate 0.1149 Epoch: 8 Global Step: 85480 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:06:48,162-Speed 5482.99 samples/sec Loss 6.5650 LearningRate 0.1148 Epoch: 8 Global Step: 85490 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:06:55,635-Speed 5481.49 samples/sec Loss 6.5526 LearningRate 0.1148 Epoch: 8 Global Step: 85500 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:07:03,260-Speed 5372.80 samples/sec Loss 6.6491 LearningRate 0.1148 Epoch: 8 Global Step: 85510 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:07:10,837-Speed 5406.43 samples/sec Loss 6.5901 LearningRate 0.1148 Epoch: 8 Global Step: 85520 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:07:18,453-Speed 5379.14 samples/sec Loss 6.5953 LearningRate 0.1148 Epoch: 8 Global Step: 85530 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:07:26,057-Speed 5387.30 samples/sec Loss 6.6504 LearningRate 0.1147 Epoch: 8 Global Step: 85540 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:07:33,517-Speed 5491.35 samples/sec Loss 6.6399 LearningRate 0.1147 Epoch: 8 Global Step: 85550 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:07:41,046-Speed 5440.83 samples/sec Loss 6.6101 LearningRate 0.1147 Epoch: 8 Global Step: 85560 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:07:48,794-Speed 5287.37 samples/sec Loss 6.6206 LearningRate 0.1147 Epoch: 8 Global Step: 85570 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:07:56,305-Speed 5453.68 samples/sec Loss 6.6191 LearningRate 0.1147 Epoch: 8 Global Step: 85580 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:08:03,819-Speed 5452.27 samples/sec Loss 6.5498 LearningRate 0.1146 Epoch: 8 Global Step: 85590 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:08:11,269-Speed 5498.23 samples/sec Loss 6.6459 LearningRate 0.1146 Epoch: 8 Global Step: 85600 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:08:18,813-Speed 5430.30 samples/sec Loss 6.6465 LearningRate 0.1146 Epoch: 8 Global Step: 85610 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:08:26,326-Speed 5452.65 samples/sec Loss 6.6723 LearningRate 0.1146 Epoch: 8 Global Step: 85620 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:08:33,931-Speed 5387.09 samples/sec Loss 6.5721 LearningRate 0.1146 Epoch: 8 Global Step: 85630 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:08:41,654-Speed 5303.99 samples/sec Loss 6.5831 LearningRate 0.1146 Epoch: 8 Global Step: 85640 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:08:49,200-Speed 5428.67 samples/sec Loss 6.6914 LearningRate 0.1145 Epoch: 8 Global Step: 85650 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:08:56,712-Speed 5453.55 samples/sec Loss 6.6657 LearningRate 0.1145 Epoch: 8 Global Step: 85660 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:09:04,293-Speed 5403.69 samples/sec Loss 6.6270 LearningRate 0.1145 Epoch: 8 Global Step: 85670 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:09:11,959-Speed 5343.50 samples/sec Loss 6.6019 LearningRate 0.1145 Epoch: 8 Global Step: 85680 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:09:19,426-Speed 5486.41 samples/sec Loss 6.6154 LearningRate 0.1145 Epoch: 8 Global Step: 85690 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:09:26,952-Speed 5442.94 samples/sec Loss 6.5736 LearningRate 0.1144 Epoch: 8 Global Step: 85700 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:09:34,357-Speed 5532.66 samples/sec Loss 6.5747 LearningRate 0.1144 Epoch: 8 Global Step: 85710 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:09:41,873-Speed 5449.84 samples/sec Loss 6.5323 LearningRate 0.1144 Epoch: 8 Global Step: 85720 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:09:49,356-Speed 5474.82 samples/sec Loss 6.5779 LearningRate 0.1144 Epoch: 8 Global Step: 85730 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:09:56,929-Speed 5408.99 samples/sec Loss 6.6295 LearningRate 0.1144 Epoch: 8 Global Step: 85740 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:10:04,445-Speed 5450.61 samples/sec Loss 6.6227 LearningRate 0.1143 Epoch: 8 Global Step: 85750 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:10:11,980-Speed 5437.04 samples/sec Loss 6.6486 LearningRate 0.1143 Epoch: 8 Global Step: 85760 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:10:19,599-Speed 5376.69 samples/sec Loss 6.6223 LearningRate 0.1143 Epoch: 8 Global Step: 85770 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:10:27,096-Speed 5463.86 samples/sec Loss 6.6199 LearningRate 0.1143 Epoch: 8 Global Step: 85780 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:10:34,731-Speed 5365.68 samples/sec Loss 6.6067 LearningRate 0.1143 Epoch: 8 Global Step: 85790 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:10:42,214-Speed 5474.74 samples/sec Loss 6.6109 LearningRate 0.1143 Epoch: 8 Global Step: 85800 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:10:49,747-Speed 5438.15 samples/sec Loss 6.5998 LearningRate 0.1142 Epoch: 8 Global Step: 85810 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:10:57,275-Speed 5442.04 samples/sec Loss 6.6522 LearningRate 0.1142 Epoch: 8 Global Step: 85820 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:11:04,865-Speed 5396.87 samples/sec Loss 6.6028 LearningRate 0.1142 Epoch: 8 Global Step: 85830 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:11:12,416-Speed 5425.28 samples/sec Loss 6.5670 LearningRate 0.1142 Epoch: 8 Global Step: 85840 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:11:19,922-Speed 5457.64 samples/sec Loss 6.6922 LearningRate 0.1142 Epoch: 8 Global Step: 85850 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:11:27,416-Speed 5466.59 samples/sec Loss 6.6007 LearningRate 0.1141 Epoch: 8 Global Step: 85860 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:11:34,887-Speed 5483.07 samples/sec Loss 6.5650 LearningRate 0.1141 Epoch: 8 Global Step: 85870 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:11:42,413-Speed 5443.73 samples/sec Loss 6.6010 LearningRate 0.1141 Epoch: 8 Global Step: 85880 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:11:49,913-Speed 5461.53 samples/sec Loss 6.5969 LearningRate 0.1141 Epoch: 8 Global Step: 85890 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:11:57,529-Speed 5379.08 samples/sec Loss 6.5985 LearningRate 0.1141 Epoch: 8 Global Step: 85900 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:12:05,073-Speed 5430.28 samples/sec Loss 6.6449 LearningRate 0.1140 Epoch: 8 Global Step: 85910 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:12:12,638-Speed 5414.93 samples/sec Loss 6.6295 LearningRate 0.1140 Epoch: 8 Global Step: 85920 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:12:20,171-Speed 5438.18 samples/sec Loss 6.6713 LearningRate 0.1140 Epoch: 8 Global Step: 85930 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:12:27,758-Speed 5399.72 samples/sec Loss 6.6258 LearningRate 0.1140 Epoch: 8 Global Step: 85940 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:12:35,518-Speed 5279.09 samples/sec Loss 6.5884 LearningRate 0.1140 Epoch: 8 Global Step: 85950 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:12:43,032-Speed 5451.64 samples/sec Loss 6.6518 LearningRate 0.1140 Epoch: 8 Global Step: 85960 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:12:50,613-Speed 5403.68 samples/sec Loss 6.6567 LearningRate 0.1139 Epoch: 8 Global Step: 85970 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:12:58,185-Speed 5410.60 samples/sec Loss 6.6062 LearningRate 0.1139 Epoch: 8 Global Step: 85980 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:13:05,727-Speed 5431.32 samples/sec Loss 6.6085 LearningRate 0.1139 Epoch: 8 Global Step: 85990 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:13:13,222-Speed 5466.01 samples/sec Loss 6.5295 LearningRate 0.1139 Epoch: 8 Global Step: 86000 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:13:57,266-[lfw][86000]XNorm: 22.979678 Training: 2022-01-08 14:13:57,267-[lfw][86000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-01-08 14:13:57,268-[lfw][86000]Accuracy-Highest: 0.99817 Training: 2022-01-08 14:14:48,332-[cfp_fp][86000]XNorm: 20.765421 Training: 2022-01-08 14:14:48,333-[cfp_fp][86000]Accuracy-Flip: 0.98671+-0.00811 Training: 2022-01-08 14:14:48,333-[cfp_fp][86000]Accuracy-Highest: 0.98814 Training: 2022-01-08 14:15:33,898-[agedb_30][86000]XNorm: 22.846304 Training: 2022-01-08 14:15:33,899-[agedb_30][86000]Accuracy-Flip: 0.97600+-0.00700 Training: 2022-01-08 14:15:33,899-[agedb_30][86000]Accuracy-Highest: 0.97667 Training: 2022-01-08 14:15:41,169-Speed 276.86 samples/sec Loss 6.5563 LearningRate 0.1139 Epoch: 8 Global Step: 86010 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:15:48,760-Speed 5397.11 samples/sec Loss 6.6396 LearningRate 0.1138 Epoch: 8 Global Step: 86020 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:15:56,323-Speed 5416.77 samples/sec Loss 6.6102 LearningRate 0.1138 Epoch: 8 Global Step: 86030 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:16:03,990-Speed 5343.52 samples/sec Loss 6.6289 LearningRate 0.1138 Epoch: 8 Global Step: 86040 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:16:11,504-Speed 5452.26 samples/sec Loss 6.5637 LearningRate 0.1138 Epoch: 8 Global Step: 86050 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:16:19,090-Speed 5400.18 samples/sec Loss 6.5569 LearningRate 0.1138 Epoch: 8 Global Step: 86060 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:16:26,604-Speed 5452.33 samples/sec Loss 6.6015 LearningRate 0.1137 Epoch: 8 Global Step: 86070 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:16:34,083-Speed 5477.11 samples/sec Loss 6.5902 LearningRate 0.1137 Epoch: 8 Global Step: 86080 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:16:41,611-Speed 5440.80 samples/sec Loss 6.5898 LearningRate 0.1137 Epoch: 8 Global Step: 86090 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:16:49,109-Speed 5463.97 samples/sec Loss 6.5686 LearningRate 0.1137 Epoch: 8 Global Step: 86100 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:16:56,659-Speed 5426.01 samples/sec Loss 6.6322 LearningRate 0.1137 Epoch: 8 Global Step: 86110 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:17:04,211-Speed 5424.04 samples/sec Loss 6.5806 LearningRate 0.1137 Epoch: 8 Global Step: 86120 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:17:11,721-Speed 5455.11 samples/sec Loss 6.5742 LearningRate 0.1136 Epoch: 8 Global Step: 86130 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:17:19,209-Speed 5470.80 samples/sec Loss 6.5818 LearningRate 0.1136 Epoch: 8 Global Step: 86140 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:17:26,683-Speed 5481.31 samples/sec Loss 6.6351 LearningRate 0.1136 Epoch: 8 Global Step: 86150 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:17:34,225-Speed 5430.95 samples/sec Loss 6.6104 LearningRate 0.1136 Epoch: 8 Global Step: 86160 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:17:41,750-Speed 5443.66 samples/sec Loss 6.6033 LearningRate 0.1136 Epoch: 8 Global Step: 86170 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:17:49,346-Speed 5393.56 samples/sec Loss 6.5982 LearningRate 0.1135 Epoch: 8 Global Step: 86180 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:17:56,912-Speed 5414.45 samples/sec Loss 6.6266 LearningRate 0.1135 Epoch: 8 Global Step: 86190 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:18:04,486-Speed 5408.20 samples/sec Loss 6.5845 LearningRate 0.1135 Epoch: 8 Global Step: 86200 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:18:12,146-Speed 5347.55 samples/sec Loss 6.5876 LearningRate 0.1135 Epoch: 8 Global Step: 86210 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:18:19,896-Speed 5286.41 samples/sec Loss 6.5313 LearningRate 0.1135 Epoch: 8 Global Step: 86220 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:18:27,502-Speed 5386.22 samples/sec Loss 6.5755 LearningRate 0.1134 Epoch: 8 Global Step: 86230 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:18:35,169-Speed 5342.66 samples/sec Loss 6.6144 LearningRate 0.1134 Epoch: 8 Global Step: 86240 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:18:42,809-Speed 5361.70 samples/sec Loss 6.5775 LearningRate 0.1134 Epoch: 8 Global Step: 86250 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:18:50,274-Speed 5488.10 samples/sec Loss 6.5691 LearningRate 0.1134 Epoch: 8 Global Step: 86260 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:18:57,761-Speed 5471.48 samples/sec Loss 6.5938 LearningRate 0.1134 Epoch: 8 Global Step: 86270 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:19:05,347-Speed 5400.12 samples/sec Loss 6.5649 LearningRate 0.1134 Epoch: 8 Global Step: 86280 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:19:12,887-Speed 5432.53 samples/sec Loss 6.5641 LearningRate 0.1133 Epoch: 8 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:19:20,437-Speed 5426.64 samples/sec Loss 6.5344 LearningRate 0.1133 Epoch: 8 Global Step: 86300 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:19:27,905-Speed 5485.33 samples/sec Loss 6.5587 LearningRate 0.1133 Epoch: 8 Global Step: 86310 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:19:35,348-Speed 5503.89 samples/sec Loss 6.6037 LearningRate 0.1133 Epoch: 8 Global Step: 86320 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:19:42,851-Speed 5459.75 samples/sec Loss 6.5260 LearningRate 0.1133 Epoch: 8 Global Step: 86330 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:19:50,406-Speed 5422.39 samples/sec Loss 6.5502 LearningRate 0.1132 Epoch: 8 Global Step: 86340 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:19:58,026-Speed 5375.97 samples/sec Loss 6.5923 LearningRate 0.1132 Epoch: 8 Global Step: 86350 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:20:05,717-Speed 5327.08 samples/sec Loss 6.5624 LearningRate 0.1132 Epoch: 8 Global Step: 86360 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:20:13,324-Speed 5384.47 samples/sec Loss 6.5669 LearningRate 0.1132 Epoch: 8 Global Step: 86370 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:20:20,823-Speed 5463.20 samples/sec Loss 6.5674 LearningRate 0.1132 Epoch: 8 Global Step: 86380 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:20:28,331-Speed 5456.76 samples/sec Loss 6.5360 LearningRate 0.1131 Epoch: 8 Global Step: 86390 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:20:35,896-Speed 5415.18 samples/sec Loss 6.5466 LearningRate 0.1131 Epoch: 8 Global Step: 86400 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:20:43,465-Speed 5411.98 samples/sec Loss 6.5258 LearningRate 0.1131 Epoch: 8 Global Step: 86410 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:20:51,132-Speed 5343.12 samples/sec Loss 6.5610 LearningRate 0.1131 Epoch: 8 Global Step: 86420 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:20:58,777-Speed 5357.94 samples/sec Loss 6.5281 LearningRate 0.1131 Epoch: 8 Global Step: 86430 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:21:06,429-Speed 5353.91 samples/sec Loss 6.5732 LearningRate 0.1131 Epoch: 8 Global Step: 86440 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:21:14,037-Speed 5385.00 samples/sec Loss 6.5941 LearningRate 0.1130 Epoch: 8 Global Step: 86450 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:21:21,717-Speed 5333.49 samples/sec Loss 6.6196 LearningRate 0.1130 Epoch: 8 Global Step: 86460 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:21:29,213-Speed 5465.20 samples/sec Loss 6.5997 LearningRate 0.1130 Epoch: 8 Global Step: 86470 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:21:36,725-Speed 5453.35 samples/sec Loss 6.5409 LearningRate 0.1130 Epoch: 8 Global Step: 86480 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:21:44,239-Speed 5451.44 samples/sec Loss 6.6006 LearningRate 0.1130 Epoch: 8 Global Step: 86490 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:21:51,696-Speed 5494.04 samples/sec Loss 6.5397 LearningRate 0.1129 Epoch: 8 Global Step: 86500 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:21:59,142-Speed 5501.30 samples/sec Loss 6.5218 LearningRate 0.1129 Epoch: 8 Global Step: 86510 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:22:06,691-Speed 5426.77 samples/sec Loss 6.5934 LearningRate 0.1129 Epoch: 8 Global Step: 86520 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:22:14,337-Speed 5357.69 samples/sec Loss 6.5721 LearningRate 0.1129 Epoch: 8 Global Step: 86530 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:22:22,108-Speed 5271.34 samples/sec Loss 6.5413 LearningRate 0.1129 Epoch: 8 Global Step: 86540 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:22:29,599-Speed 5468.31 samples/sec Loss 6.5416 LearningRate 0.1128 Epoch: 8 Global Step: 86550 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:22:37,194-Speed 5394.07 samples/sec Loss 6.5346 LearningRate 0.1128 Epoch: 8 Global Step: 86560 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:22:44,634-Speed 5506.11 samples/sec Loss 6.5697 LearningRate 0.1128 Epoch: 8 Global Step: 86570 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:22:52,105-Speed 5482.85 samples/sec Loss 6.5613 LearningRate 0.1128 Epoch: 8 Global Step: 86580 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:22:59,536-Speed 5512.96 samples/sec Loss 6.5719 LearningRate 0.1128 Epoch: 8 Global Step: 86590 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:23:07,100-Speed 5415.25 samples/sec Loss 6.5800 LearningRate 0.1128 Epoch: 8 Global Step: 86600 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:23:14,603-Speed 5460.57 samples/sec Loss 6.6131 LearningRate 0.1127 Epoch: 8 Global Step: 86610 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:23:22,184-Speed 5403.52 samples/sec Loss 6.5258 LearningRate 0.1127 Epoch: 8 Global Step: 86620 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:23:29,852-Speed 5341.80 samples/sec Loss 6.5403 LearningRate 0.1127 Epoch: 8 Global Step: 86630 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:23:37,415-Speed 5416.68 samples/sec Loss 6.5195 LearningRate 0.1127 Epoch: 8 Global Step: 86640 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:23:44,828-Speed 5526.60 samples/sec Loss 6.5532 LearningRate 0.1127 Epoch: 8 Global Step: 86650 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:23:52,253-Speed 5516.99 samples/sec Loss 6.5063 LearningRate 0.1126 Epoch: 8 Global Step: 86660 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:23:59,800-Speed 5427.91 samples/sec Loss 6.4804 LearningRate 0.1126 Epoch: 8 Global Step: 86670 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:24:07,280-Speed 5476.46 samples/sec Loss 6.5180 LearningRate 0.1126 Epoch: 8 Global Step: 86680 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:24:14,754-Speed 5481.61 samples/sec Loss 6.5770 LearningRate 0.1126 Epoch: 8 Global Step: 86690 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:24:22,301-Speed 5428.31 samples/sec Loss 6.5446 LearningRate 0.1126 Epoch: 8 Global Step: 86700 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:24:29,799-Speed 5463.02 samples/sec Loss 6.5686 LearningRate 0.1125 Epoch: 8 Global Step: 86710 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:24:37,400-Speed 5389.41 samples/sec Loss 6.5436 LearningRate 0.1125 Epoch: 8 Global Step: 86720 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:24:44,931-Speed 5439.19 samples/sec Loss 6.5260 LearningRate 0.1125 Epoch: 8 Global Step: 86730 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:24:52,401-Speed 5484.85 samples/sec Loss 6.5867 LearningRate 0.1125 Epoch: 8 Global Step: 86740 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:24:59,884-Speed 5474.24 samples/sec Loss 6.5263 LearningRate 0.1125 Epoch: 8 Global Step: 86750 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:25:07,420-Speed 5436.13 samples/sec Loss 6.6333 LearningRate 0.1125 Epoch: 8 Global Step: 86760 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:25:14,962-Speed 5431.54 samples/sec Loss 6.5863 LearningRate 0.1124 Epoch: 8 Global Step: 86770 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:25:22,526-Speed 5415.73 samples/sec Loss 6.5230 LearningRate 0.1124 Epoch: 8 Global Step: 86780 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:25:30,059-Speed 5437.89 samples/sec Loss 6.5364 LearningRate 0.1124 Epoch: 8 Global Step: 86790 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:25:37,660-Speed 5389.39 samples/sec Loss 6.5455 LearningRate 0.1124 Epoch: 8 Global Step: 86800 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:25:45,227-Speed 5414.27 samples/sec Loss 6.5188 LearningRate 0.1124 Epoch: 8 Global Step: 86810 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:25:52,869-Speed 5360.01 samples/sec Loss 6.5139 LearningRate 0.1123 Epoch: 8 Global Step: 86820 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:26:00,420-Speed 5425.60 samples/sec Loss 6.5544 LearningRate 0.1123 Epoch: 8 Global Step: 86830 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-01-08 14:26:07,982-Speed 5417.16 samples/sec Loss 6.5744 LearningRate 0.1123 Epoch: 8 Global Step: 86840 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:26:15,553-Speed 5411.09 samples/sec Loss 6.5436 LearningRate 0.1123 Epoch: 8 Global Step: 86850 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:26:23,103-Speed 5425.60 samples/sec Loss 6.5187 LearningRate 0.1123 Epoch: 8 Global Step: 86860 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:26:30,803-Speed 5320.29 samples/sec Loss 6.5088 LearningRate 0.1123 Epoch: 8 Global Step: 86870 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:26:38,372-Speed 5412.74 samples/sec Loss 6.5493 LearningRate 0.1122 Epoch: 8 Global Step: 86880 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:26:45,869-Speed 5464.45 samples/sec Loss 6.5725 LearningRate 0.1122 Epoch: 8 Global Step: 86890 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:26:53,347-Speed 5478.02 samples/sec Loss 6.5601 LearningRate 0.1122 Epoch: 8 Global Step: 86900 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:27:00,977-Speed 5368.46 samples/sec Loss 6.4556 LearningRate 0.1122 Epoch: 8 Global Step: 86910 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:27:08,511-Speed 5437.79 samples/sec Loss 6.4469 LearningRate 0.1122 Epoch: 8 Global Step: 86920 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:27:15,981-Speed 5483.66 samples/sec Loss 6.5879 LearningRate 0.1121 Epoch: 8 Global Step: 86930 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:27:23,468-Speed 5471.80 samples/sec Loss 6.5859 LearningRate 0.1121 Epoch: 8 Global Step: 86940 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:27:31,028-Speed 5418.99 samples/sec Loss 6.5576 LearningRate 0.1121 Epoch: 8 Global Step: 86950 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:27:38,541-Speed 5452.84 samples/sec Loss 6.5761 LearningRate 0.1121 Epoch: 8 Global Step: 86960 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:27:46,050-Speed 5455.52 samples/sec Loss 6.5944 LearningRate 0.1121 Epoch: 8 Global Step: 86970 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:27:53,670-Speed 5375.44 samples/sec Loss 6.5158 LearningRate 0.1120 Epoch: 8 Global Step: 86980 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:28:01,228-Speed 5420.18 samples/sec Loss 6.4729 LearningRate 0.1120 Epoch: 8 Global Step: 86990 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:28:08,700-Speed 5483.08 samples/sec Loss 6.4808 LearningRate 0.1120 Epoch: 8 Global Step: 87000 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:28:16,094-Speed 5540.33 samples/sec Loss 6.5204 LearningRate 0.1120 Epoch: 8 Global Step: 87010 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:28:23,533-Speed 5506.61 samples/sec Loss 6.5357 LearningRate 0.1120 Epoch: 8 Global Step: 87020 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:28:31,056-Speed 5445.34 samples/sec Loss 6.5431 LearningRate 0.1120 Epoch: 8 Global Step: 87030 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:28:38,612-Speed 5422.04 samples/sec Loss 6.6078 LearningRate 0.1119 Epoch: 8 Global Step: 87040 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:28:46,089-Speed 5479.19 samples/sec Loss 6.4954 LearningRate 0.1119 Epoch: 8 Global Step: 87050 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:28:53,614-Speed 5443.43 samples/sec Loss 6.5551 LearningRate 0.1119 Epoch: 8 Global Step: 87060 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:29:01,238-Speed 5373.54 samples/sec Loss 6.5791 LearningRate 0.1119 Epoch: 8 Global Step: 87070 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:29:08,804-Speed 5414.11 samples/sec Loss 6.6358 LearningRate 0.1119 Epoch: 8 Global Step: 87080 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:29:16,292-Speed 5471.07 samples/sec Loss 6.5703 LearningRate 0.1118 Epoch: 8 Global Step: 87090 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:29:23,834-Speed 5431.84 samples/sec Loss 6.5423 LearningRate 0.1118 Epoch: 8 Global Step: 87100 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:29:31,344-Speed 5454.80 samples/sec Loss 6.6079 LearningRate 0.1118 Epoch: 8 Global Step: 87110 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:29:38,794-Speed 5498.79 samples/sec Loss 6.4643 LearningRate 0.1118 Epoch: 8 Global Step: 87120 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:29:46,179-Speed 5547.04 samples/sec Loss 6.5327 LearningRate 0.1118 Epoch: 8 Global Step: 87130 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:29:53,611-Speed 5511.84 samples/sec Loss 6.5204 LearningRate 0.1117 Epoch: 8 Global Step: 87140 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:30:01,097-Speed 5472.29 samples/sec Loss 6.4250 LearningRate 0.1117 Epoch: 8 Global Step: 87150 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:30:08,591-Speed 5466.48 samples/sec Loss 6.4492 LearningRate 0.1117 Epoch: 8 Global Step: 87160 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:30:16,091-Speed 5461.71 samples/sec Loss 6.4994 LearningRate 0.1117 Epoch: 8 Global Step: 87170 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:30:23,613-Speed 5446.37 samples/sec Loss 6.4958 LearningRate 0.1117 Epoch: 8 Global Step: 87180 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:30:31,223-Speed 5383.45 samples/sec Loss 6.5020 LearningRate 0.1117 Epoch: 8 Global Step: 87190 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:30:38,823-Speed 5389.86 samples/sec Loss 6.5312 LearningRate 0.1116 Epoch: 8 Global Step: 87200 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:30:46,494-Speed 5340.44 samples/sec Loss 6.5205 LearningRate 0.1116 Epoch: 8 Global Step: 87210 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:30:54,009-Speed 5451.47 samples/sec Loss 6.5057 LearningRate 0.1116 Epoch: 8 Global Step: 87220 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:31:01,720-Speed 5312.68 samples/sec Loss 6.5945 LearningRate 0.1116 Epoch: 8 Global Step: 87230 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:31:09,246-Speed 5442.82 samples/sec Loss 6.4951 LearningRate 0.1116 Epoch: 8 Global Step: 87240 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:31:16,859-Speed 5381.34 samples/sec Loss 6.5164 LearningRate 0.1115 Epoch: 8 Global Step: 87250 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:31:24,323-Speed 5488.37 samples/sec Loss 6.5187 LearningRate 0.1115 Epoch: 8 Global Step: 87260 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:31:31,778-Speed 5495.10 samples/sec Loss 6.4611 LearningRate 0.1115 Epoch: 8 Global Step: 87270 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:31:39,384-Speed 5385.43 samples/sec Loss 6.6098 LearningRate 0.1115 Epoch: 8 Global Step: 87280 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:31:46,903-Speed 5448.06 samples/sec Loss 6.5308 LearningRate 0.1115 Epoch: 8 Global Step: 87290 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:31:54,389-Speed 5472.89 samples/sec Loss 6.5394 LearningRate 0.1115 Epoch: 8 Global Step: 87300 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:32:01,931-Speed 5431.46 samples/sec Loss 6.4782 LearningRate 0.1114 Epoch: 8 Global Step: 87310 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:32:09,690-Speed 5279.75 samples/sec Loss 6.5320 LearningRate 0.1114 Epoch: 8 Global Step: 87320 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:32:17,196-Speed 5457.28 samples/sec Loss 6.5071 LearningRate 0.1114 Epoch: 8 Global Step: 87330 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:32:24,699-Speed 5460.43 samples/sec Loss 6.5851 LearningRate 0.1114 Epoch: 8 Global Step: 87340 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:32:32,216-Speed 5449.60 samples/sec Loss 6.5002 LearningRate 0.1114 Epoch: 8 Global Step: 87350 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:32:39,656-Speed 5506.03 samples/sec Loss 6.5565 LearningRate 0.1113 Epoch: 8 Global Step: 87360 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:32:47,128-Speed 5482.42 samples/sec Loss 6.4385 LearningRate 0.1113 Epoch: 8 Global Step: 87370 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:32:54,667-Speed 5434.26 samples/sec Loss 6.5946 LearningRate 0.1113 Epoch: 8 Global Step: 87380 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-01-08 14:33:02,167-Speed 5461.97 samples/sec Loss 6.5329 LearningRate 0.1113 Epoch: 8 Global Step: 87390 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:33:09,661-Speed 5466.56 samples/sec Loss 6.5448 LearningRate 0.1113 Epoch: 8 Global Step: 87400 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-01-08 14:33:17,159-Speed 5463.71 samples/sec Loss 6.4769 LearningRate 0.1112 Epoch: 8 Global Step: 87410 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:33:24,560-Speed 5534.54 samples/sec Loss 6.5296 LearningRate 0.1112 Epoch: 8 Global Step: 87420 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:33:32,118-Speed 5420.46 samples/sec Loss 6.5081 LearningRate 0.1112 Epoch: 8 Global Step: 87430 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:33:39,602-Speed 5474.41 samples/sec Loss 6.5596 LearningRate 0.1112 Epoch: 8 Global Step: 87440 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:33:47,071-Speed 5484.44 samples/sec Loss 6.5460 LearningRate 0.1112 Epoch: 8 Global Step: 87450 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:33:54,528-Speed 5493.27 samples/sec Loss 6.5015 LearningRate 0.1112 Epoch: 8 Global Step: 87460 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:34:01,976-Speed 5499.93 samples/sec Loss 6.4388 LearningRate 0.1111 Epoch: 8 Global Step: 87470 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:34:09,447-Speed 5483.79 samples/sec Loss 6.4546 LearningRate 0.1111 Epoch: 8 Global Step: 87480 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:34:16,973-Speed 5442.90 samples/sec Loss 6.4014 LearningRate 0.1111 Epoch: 8 Global Step: 87490 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:34:24,406-Speed 5511.64 samples/sec Loss 6.5205 LearningRate 0.1111 Epoch: 8 Global Step: 87500 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:34:31,910-Speed 5458.76 samples/sec Loss 6.4658 LearningRate 0.1111 Epoch: 8 Global Step: 87510 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:34:39,381-Speed 5484.16 samples/sec Loss 6.4696 LearningRate 0.1110 Epoch: 8 Global Step: 87520 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:34:46,874-Speed 5466.66 samples/sec Loss 6.5542 LearningRate 0.1110 Epoch: 8 Global Step: 87530 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:34:54,292-Speed 5522.79 samples/sec Loss 6.5198 LearningRate 0.1110 Epoch: 8 Global Step: 87540 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:35:01,724-Speed 5511.84 samples/sec Loss 6.5048 LearningRate 0.1110 Epoch: 8 Global Step: 87550 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:35:09,234-Speed 5455.39 samples/sec Loss 6.5135 LearningRate 0.1110 Epoch: 8 Global Step: 87560 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:35:16,810-Speed 5407.30 samples/sec Loss 6.4934 LearningRate 0.1109 Epoch: 8 Global Step: 87570 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:35:24,254-Speed 5503.14 samples/sec Loss 6.4770 LearningRate 0.1109 Epoch: 8 Global Step: 87580 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:35:31,762-Speed 5455.91 samples/sec Loss 6.4936 LearningRate 0.1109 Epoch: 8 Global Step: 87590 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:35:39,251-Speed 5470.23 samples/sec Loss 6.5100 LearningRate 0.1109 Epoch: 8 Global Step: 87600 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:35:46,822-Speed 5411.03 samples/sec Loss 6.5049 LearningRate 0.1109 Epoch: 8 Global Step: 87610 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:35:54,312-Speed 5469.46 samples/sec Loss 6.5568 LearningRate 0.1109 Epoch: 8 Global Step: 87620 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:36:01,764-Speed 5497.06 samples/sec Loss 6.4914 LearningRate 0.1108 Epoch: 8 Global Step: 87630 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:36:09,399-Speed 5365.92 samples/sec Loss 6.5042 LearningRate 0.1108 Epoch: 8 Global Step: 87640 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:36:17,160-Speed 5278.08 samples/sec Loss 6.5146 LearningRate 0.1108 Epoch: 8 Global Step: 87650 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:36:24,774-Speed 5380.79 samples/sec Loss 6.4296 LearningRate 0.1108 Epoch: 8 Global Step: 87660 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:36:32,311-Speed 5434.65 samples/sec Loss 6.4777 LearningRate 0.1108 Epoch: 8 Global Step: 87670 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:36:39,916-Speed 5387.53 samples/sec Loss 6.5373 LearningRate 0.1107 Epoch: 8 Global Step: 87680 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:36:47,399-Speed 5474.22 samples/sec Loss 6.4894 LearningRate 0.1107 Epoch: 8 Global Step: 87690 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:36:54,950-Speed 5424.86 samples/sec Loss 6.4816 LearningRate 0.1107 Epoch: 8 Global Step: 87700 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:37:02,499-Speed 5427.11 samples/sec Loss 6.5544 LearningRate 0.1107 Epoch: 8 Global Step: 87710 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:37:10,140-Speed 5361.04 samples/sec Loss 6.5143 LearningRate 0.1107 Epoch: 8 Global Step: 87720 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:37:17,858-Speed 5308.00 samples/sec Loss 6.5364 LearningRate 0.1107 Epoch: 8 Global Step: 87730 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:37:25,416-Speed 5419.87 samples/sec Loss 6.5567 LearningRate 0.1106 Epoch: 8 Global Step: 87740 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:37:33,002-Speed 5400.28 samples/sec Loss 6.4690 LearningRate 0.1106 Epoch: 8 Global Step: 87750 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:37:40,524-Speed 5446.62 samples/sec Loss 6.4855 LearningRate 0.1106 Epoch: 8 Global Step: 87760 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:37:48,012-Speed 5470.70 samples/sec Loss 6.4897 LearningRate 0.1106 Epoch: 8 Global Step: 87770 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:37:55,539-Speed 5442.65 samples/sec Loss 6.4411 LearningRate 0.1106 Epoch: 8 Global Step: 87780 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:38:03,190-Speed 5354.05 samples/sec Loss 6.4324 LearningRate 0.1105 Epoch: 8 Global Step: 87790 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:38:10,841-Speed 5354.13 samples/sec Loss 6.5131 LearningRate 0.1105 Epoch: 8 Global Step: 87800 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:38:18,524-Speed 5332.31 samples/sec Loss 6.5238 LearningRate 0.1105 Epoch: 8 Global Step: 87810 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:38:25,999-Speed 5479.91 samples/sec Loss 6.5248 LearningRate 0.1105 Epoch: 8 Global Step: 87820 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:38:33,435-Speed 5509.22 samples/sec Loss 6.4973 LearningRate 0.1105 Epoch: 8 Global Step: 87830 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:38:40,897-Speed 5490.32 samples/sec Loss 6.5034 LearningRate 0.1105 Epoch: 8 Global Step: 87840 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:38:48,574-Speed 5335.66 samples/sec Loss 6.4885 LearningRate 0.1104 Epoch: 8 Global Step: 87850 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:38:56,268-Speed 5324.65 samples/sec Loss 6.5831 LearningRate 0.1104 Epoch: 8 Global Step: 87860 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:39:03,948-Speed 5334.26 samples/sec Loss 6.5015 LearningRate 0.1104 Epoch: 8 Global Step: 87870 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:39:11,682-Speed 5296.30 samples/sec Loss 6.4384 LearningRate 0.1104 Epoch: 8 Global Step: 87880 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:39:19,361-Speed 5334.91 samples/sec Loss 6.5189 LearningRate 0.1104 Epoch: 8 Global Step: 87890 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:39:27,078-Speed 5308.63 samples/sec Loss 6.5050 LearningRate 0.1103 Epoch: 8 Global Step: 87900 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:39:34,616-Speed 5434.20 samples/sec Loss 6.4674 LearningRate 0.1103 Epoch: 8 Global Step: 87910 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:39:42,180-Speed 5415.98 samples/sec Loss 6.5106 LearningRate 0.1103 Epoch: 8 Global Step: 87920 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:39:49,761-Speed 5403.94 samples/sec Loss 6.4786 LearningRate 0.1103 Epoch: 8 Global Step: 87930 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:39:57,319-Speed 5419.74 samples/sec Loss 6.5092 LearningRate 0.1103 Epoch: 8 Global Step: 87940 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:40:04,786-Speed 5486.17 samples/sec Loss 6.5093 LearningRate 0.1102 Epoch: 8 Global Step: 87950 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:40:12,229-Speed 5504.15 samples/sec Loss 6.4786 LearningRate 0.1102 Epoch: 8 Global Step: 87960 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:40:19,859-Speed 5369.13 samples/sec Loss 6.5096 LearningRate 0.1102 Epoch: 8 Global Step: 87970 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:40:27,308-Speed 5499.31 samples/sec Loss 6.4696 LearningRate 0.1102 Epoch: 8 Global Step: 87980 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:40:34,819-Speed 5453.94 samples/sec Loss 6.5140 LearningRate 0.1102 Epoch: 8 Global Step: 87990 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:40:42,313-Speed 5465.99 samples/sec Loss 6.4865 LearningRate 0.1102 Epoch: 8 Global Step: 88000 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:41:26,191-[lfw][88000]XNorm: 22.549660 Training: 2022-01-08 14:41:26,192-[lfw][88000]Accuracy-Flip: 0.99800+-0.00287 Training: 2022-01-08 14:41:26,192-[lfw][88000]Accuracy-Highest: 0.99817 Training: 2022-01-08 14:42:17,888-[cfp_fp][88000]XNorm: 20.955835 Training: 2022-01-08 14:42:17,890-[cfp_fp][88000]Accuracy-Flip: 0.98814+-0.00649 Training: 2022-01-08 14:42:17,890-[cfp_fp][88000]Accuracy-Highest: 0.98814 Training: 2022-01-08 14:43:04,039-[agedb_30][88000]XNorm: 22.499002 Training: 2022-01-08 14:43:04,040-[agedb_30][88000]Accuracy-Flip: 0.97617+-0.00757 Training: 2022-01-08 14:43:04,040-[agedb_30][88000]Accuracy-Highest: 0.97667 Training: 2022-01-08 14:43:11,656-Speed 274.27 samples/sec Loss 6.4254 LearningRate 0.1101 Epoch: 8 Global Step: 88010 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:43:19,349-Speed 5325.90 samples/sec Loss 6.4497 LearningRate 0.1101 Epoch: 8 Global Step: 88020 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:43:26,926-Speed 5407.72 samples/sec Loss 6.4636 LearningRate 0.1101 Epoch: 8 Global Step: 88030 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:43:34,376-Speed 5502.39 samples/sec Loss 6.4747 LearningRate 0.1101 Epoch: 8 Global Step: 88040 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:43:41,836-Speed 5491.62 samples/sec Loss 6.4934 LearningRate 0.1101 Epoch: 8 Global Step: 88050 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:43:49,345-Speed 5456.24 samples/sec Loss 6.5079 LearningRate 0.1100 Epoch: 8 Global Step: 88060 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:43:56,830-Speed 5473.27 samples/sec Loss 6.5134 LearningRate 0.1100 Epoch: 8 Global Step: 88070 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:44:04,324-Speed 5466.64 samples/sec Loss 6.5256 LearningRate 0.1100 Epoch: 8 Global Step: 88080 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:44:11,841-Speed 5450.17 samples/sec Loss 6.4902 LearningRate 0.1100 Epoch: 8 Global Step: 88090 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:44:19,560-Speed 5306.51 samples/sec Loss 6.5148 LearningRate 0.1100 Epoch: 8 Global Step: 88100 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:44:27,097-Speed 5435.66 samples/sec Loss 6.4969 LearningRate 0.1100 Epoch: 8 Global Step: 88110 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:44:34,642-Speed 5429.70 samples/sec Loss 6.4717 LearningRate 0.1099 Epoch: 8 Global Step: 88120 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:44:42,254-Speed 5381.73 samples/sec Loss 6.4687 LearningRate 0.1099 Epoch: 8 Global Step: 88130 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:44:49,768-Speed 5451.88 samples/sec Loss 6.5128 LearningRate 0.1099 Epoch: 8 Global Step: 88140 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:44:57,601-Speed 5229.88 samples/sec Loss 6.4657 LearningRate 0.1099 Epoch: 8 Global Step: 88150 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:45:05,273-Speed 5339.62 samples/sec Loss 6.5135 LearningRate 0.1099 Epoch: 8 Global Step: 88160 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:45:12,781-Speed 5457.05 samples/sec Loss 6.4891 LearningRate 0.1098 Epoch: 8 Global Step: 88170 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:45:20,315-Speed 5437.39 samples/sec Loss 6.4667 LearningRate 0.1098 Epoch: 8 Global Step: 88180 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:45:27,786-Speed 5482.95 samples/sec Loss 6.4431 LearningRate 0.1098 Epoch: 8 Global Step: 88190 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:45:35,278-Speed 5468.14 samples/sec Loss 6.4601 LearningRate 0.1098 Epoch: 8 Global Step: 88200 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:45:42,783-Speed 5458.26 samples/sec Loss 6.4890 LearningRate 0.1098 Epoch: 8 Global Step: 88210 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:45:50,257-Speed 5481.55 samples/sec Loss 6.5283 LearningRate 0.1097 Epoch: 8 Global Step: 88220 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:45:57,848-Speed 5396.07 samples/sec Loss 6.4621 LearningRate 0.1097 Epoch: 8 Global Step: 88230 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:46:05,504-Speed 5351.53 samples/sec Loss 6.5228 LearningRate 0.1097 Epoch: 8 Global Step: 88240 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:46:13,256-Speed 5283.84 samples/sec Loss 6.4658 LearningRate 0.1097 Epoch: 8 Global Step: 88250 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:46:20,841-Speed 5401.05 samples/sec Loss 6.4720 LearningRate 0.1097 Epoch: 8 Global Step: 88260 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:46:28,398-Speed 5420.82 samples/sec Loss 6.4756 LearningRate 0.1097 Epoch: 8 Global Step: 88270 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:46:35,863-Speed 5487.75 samples/sec Loss 6.5055 LearningRate 0.1096 Epoch: 8 Global Step: 88280 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:46:43,378-Speed 5451.10 samples/sec Loss 6.4524 LearningRate 0.1096 Epoch: 8 Global Step: 88290 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:46:50,989-Speed 5382.20 samples/sec Loss 6.4151 LearningRate 0.1096 Epoch: 8 Global Step: 88300 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:46:58,598-Speed 5383.90 samples/sec Loss 6.4476 LearningRate 0.1096 Epoch: 8 Global Step: 88310 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:47:06,117-Speed 5448.43 samples/sec Loss 6.4878 LearningRate 0.1096 Epoch: 8 Global Step: 88320 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:47:13,628-Speed 5453.81 samples/sec Loss 6.4546 LearningRate 0.1095 Epoch: 8 Global Step: 88330 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:47:21,059-Speed 5512.54 samples/sec Loss 6.4686 LearningRate 0.1095 Epoch: 8 Global Step: 88340 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:47:28,600-Speed 5432.82 samples/sec Loss 6.4621 LearningRate 0.1095 Epoch: 8 Global Step: 88350 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:47:36,174-Speed 5408.08 samples/sec Loss 6.4167 LearningRate 0.1095 Epoch: 8 Global Step: 88360 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:47:43,692-Speed 5449.10 samples/sec Loss 6.4290 LearningRate 0.1095 Epoch: 8 Global Step: 88370 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:47:51,240-Speed 5427.41 samples/sec Loss 6.4395 LearningRate 0.1095 Epoch: 8 Global Step: 88380 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:47:58,702-Speed 5490.22 samples/sec Loss 6.4060 LearningRate 0.1094 Epoch: 8 Global Step: 88390 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:48:06,142-Speed 5505.60 samples/sec Loss 6.4112 LearningRate 0.1094 Epoch: 8 Global Step: 88400 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:48:13,668-Speed 5443.19 samples/sec Loss 6.4438 LearningRate 0.1094 Epoch: 8 Global Step: 88410 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:48:21,191-Speed 5445.80 samples/sec Loss 6.4655 LearningRate 0.1094 Epoch: 8 Global Step: 88420 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:48:28,660-Speed 5484.48 samples/sec Loss 6.4693 LearningRate 0.1094 Epoch: 8 Global Step: 88430 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:48:36,088-Speed 5515.04 samples/sec Loss 6.5211 LearningRate 0.1093 Epoch: 8 Global Step: 88440 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:48:43,629-Speed 5432.12 samples/sec Loss 6.4738 LearningRate 0.1093 Epoch: 8 Global Step: 88450 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:48:51,234-Speed 5386.30 samples/sec Loss 6.4889 LearningRate 0.1093 Epoch: 8 Global Step: 88460 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:48:58,684-Speed 5499.38 samples/sec Loss 6.4430 LearningRate 0.1093 Epoch: 8 Global Step: 88470 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:49:06,160-Speed 5479.28 samples/sec Loss 6.4322 LearningRate 0.1093 Epoch: 8 Global Step: 88480 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:49:13,597-Speed 5508.62 samples/sec Loss 6.4331 LearningRate 0.1093 Epoch: 8 Global Step: 88490 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:49:21,103-Speed 5457.40 samples/sec Loss 6.4752 LearningRate 0.1092 Epoch: 8 Global Step: 88500 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:49:28,680-Speed 5406.12 samples/sec Loss 6.4343 LearningRate 0.1092 Epoch: 8 Global Step: 88510 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:49:36,129-Speed 5500.05 samples/sec Loss 6.4081 LearningRate 0.1092 Epoch: 8 Global Step: 88520 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:49:43,573-Speed 5503.23 samples/sec Loss 6.4773 LearningRate 0.1092 Epoch: 8 Global Step: 88530 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:49:51,054-Speed 5475.73 samples/sec Loss 6.4383 LearningRate 0.1092 Epoch: 8 Global Step: 88540 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:49:58,506-Speed 5497.19 samples/sec Loss 6.4617 LearningRate 0.1091 Epoch: 8 Global Step: 88550 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:50:06,016-Speed 5454.89 samples/sec Loss 6.4731 LearningRate 0.1091 Epoch: 8 Global Step: 88560 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:50:13,563-Speed 5428.10 samples/sec Loss 6.4417 LearningRate 0.1091 Epoch: 8 Global Step: 88570 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:50:21,143-Speed 5404.41 samples/sec Loss 6.4855 LearningRate 0.1091 Epoch: 8 Global Step: 88580 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:50:28,651-Speed 5455.88 samples/sec Loss 6.4632 LearningRate 0.1091 Epoch: 8 Global Step: 88590 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:50:36,143-Speed 5468.38 samples/sec Loss 6.4491 LearningRate 0.1091 Epoch: 8 Global Step: 88600 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:50:43,762-Speed 5376.78 samples/sec Loss 6.4847 LearningRate 0.1090 Epoch: 8 Global Step: 88610 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:50:51,264-Speed 5460.90 samples/sec Loss 6.4710 LearningRate 0.1090 Epoch: 8 Global Step: 88620 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:50:58,891-Speed 5370.84 samples/sec Loss 6.4398 LearningRate 0.1090 Epoch: 8 Global Step: 88630 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:51:06,428-Speed 5435.07 samples/sec Loss 6.4351 LearningRate 0.1090 Epoch: 8 Global Step: 88640 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:51:13,931-Speed 5460.18 samples/sec Loss 6.4224 LearningRate 0.1090 Epoch: 8 Global Step: 88650 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:51:21,433-Speed 5460.38 samples/sec Loss 6.4299 LearningRate 0.1089 Epoch: 8 Global Step: 88660 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:51:28,973-Speed 5432.75 samples/sec Loss 6.4157 LearningRate 0.1089 Epoch: 8 Global Step: 88670 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:51:36,444-Speed 5482.96 samples/sec Loss 6.4658 LearningRate 0.1089 Epoch: 8 Global Step: 88680 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:51:43,879-Speed 5510.05 samples/sec Loss 6.4258 LearningRate 0.1089 Epoch: 8 Global Step: 88690 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:51:51,396-Speed 5449.42 samples/sec Loss 6.4355 LearningRate 0.1089 Epoch: 8 Global Step: 88700 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:51:58,865-Speed 5485.07 samples/sec Loss 6.4722 LearningRate 0.1088 Epoch: 8 Global Step: 88710 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:52:06,359-Speed 5466.39 samples/sec Loss 6.4979 LearningRate 0.1088 Epoch: 8 Global Step: 88720 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:52:13,876-Speed 5449.92 samples/sec Loss 6.4357 LearningRate 0.1088 Epoch: 8 Global Step: 88730 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:52:21,406-Speed 5440.31 samples/sec Loss 6.4828 LearningRate 0.1088 Epoch: 8 Global Step: 88740 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:52:28,937-Speed 5439.14 samples/sec Loss 6.4607 LearningRate 0.1088 Epoch: 8 Global Step: 88750 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:52:36,532-Speed 5393.87 samples/sec Loss 6.4329 LearningRate 0.1088 Epoch: 8 Global Step: 88760 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:52:44,009-Speed 5478.68 samples/sec Loss 6.4224 LearningRate 0.1087 Epoch: 8 Global Step: 88770 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:52:51,551-Speed 5431.94 samples/sec Loss 6.4631 LearningRate 0.1087 Epoch: 8 Global Step: 88780 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:52:59,140-Speed 5397.86 samples/sec Loss 6.4299 LearningRate 0.1087 Epoch: 8 Global Step: 88790 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:53:06,712-Speed 5409.89 samples/sec Loss 6.4882 LearningRate 0.1087 Epoch: 8 Global Step: 88800 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:53:14,247-Speed 5437.27 samples/sec Loss 6.4441 LearningRate 0.1087 Epoch: 8 Global Step: 88810 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:53:21,765-Speed 5448.92 samples/sec Loss 6.3701 LearningRate 0.1086 Epoch: 8 Global Step: 88820 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:53:29,295-Speed 5440.17 samples/sec Loss 6.3729 LearningRate 0.1086 Epoch: 8 Global Step: 88830 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:53:36,810-Speed 5450.94 samples/sec Loss 6.4285 LearningRate 0.1086 Epoch: 8 Global Step: 88840 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:53:44,349-Speed 5433.75 samples/sec Loss 6.3561 LearningRate 0.1086 Epoch: 8 Global Step: 88850 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:53:51,879-Speed 5440.32 samples/sec Loss 6.4592 LearningRate 0.1086 Epoch: 8 Global Step: 88860 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:53:59,341-Speed 5490.07 samples/sec Loss 6.4488 LearningRate 0.1086 Epoch: 8 Global Step: 88870 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:54:06,851-Speed 5454.55 samples/sec Loss 6.4374 LearningRate 0.1085 Epoch: 8 Global Step: 88880 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:54:14,396-Speed 5429.37 samples/sec Loss 6.4067 LearningRate 0.1085 Epoch: 8 Global Step: 88890 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:54:21,889-Speed 5467.12 samples/sec Loss 6.4604 LearningRate 0.1085 Epoch: 8 Global Step: 88900 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:54:29,337-Speed 5500.55 samples/sec Loss 6.4271 LearningRate 0.1085 Epoch: 8 Global Step: 88910 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:54:36,783-Speed 5501.18 samples/sec Loss 6.4697 LearningRate 0.1085 Epoch: 8 Global Step: 88920 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:54:44,239-Speed 5494.56 samples/sec Loss 6.4096 LearningRate 0.1084 Epoch: 8 Global Step: 88930 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:54:51,747-Speed 5456.57 samples/sec Loss 6.4158 LearningRate 0.1084 Epoch: 8 Global Step: 88940 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:54:59,296-Speed 5426.32 samples/sec Loss 6.4616 LearningRate 0.1084 Epoch: 8 Global Step: 88950 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:55:06,990-Speed 5324.43 samples/sec Loss 6.4114 LearningRate 0.1084 Epoch: 8 Global Step: 88960 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:55:14,501-Speed 5453.92 samples/sec Loss 6.4091 LearningRate 0.1084 Epoch: 8 Global Step: 88970 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:55:22,005-Speed 5459.20 samples/sec Loss 6.5031 LearningRate 0.1084 Epoch: 8 Global Step: 88980 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:55:29,532-Speed 5442.78 samples/sec Loss 6.4737 LearningRate 0.1083 Epoch: 8 Global Step: 88990 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:55:37,050-Speed 5448.52 samples/sec Loss 6.4257 LearningRate 0.1083 Epoch: 8 Global Step: 89000 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:55:44,633-Speed 5401.84 samples/sec Loss 6.4046 LearningRate 0.1083 Epoch: 8 Global Step: 89010 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:55:52,135-Speed 5461.18 samples/sec Loss 6.4413 LearningRate 0.1083 Epoch: 8 Global Step: 89020 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:55:59,717-Speed 5403.35 samples/sec Loss 6.4026 LearningRate 0.1083 Epoch: 8 Global Step: 89030 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:56:07,271-Speed 5422.77 samples/sec Loss 6.4552 LearningRate 0.1082 Epoch: 8 Global Step: 89040 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:56:14,842-Speed 5410.13 samples/sec Loss 6.4768 LearningRate 0.1082 Epoch: 8 Global Step: 89050 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:56:22,363-Speed 5447.52 samples/sec Loss 6.4669 LearningRate 0.1082 Epoch: 8 Global Step: 89060 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:56:29,946-Speed 5402.35 samples/sec Loss 6.4206 LearningRate 0.1082 Epoch: 8 Global Step: 89070 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:56:37,373-Speed 5515.08 samples/sec Loss 6.4364 LearningRate 0.1082 Epoch: 8 Global Step: 89080 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:56:44,813-Speed 5506.02 samples/sec Loss 6.4136 LearningRate 0.1082 Epoch: 8 Global Step: 89090 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:56:52,277-Speed 5488.94 samples/sec Loss 6.4471 LearningRate 0.1081 Epoch: 8 Global Step: 89100 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:56:59,800-Speed 5445.29 samples/sec Loss 6.4242 LearningRate 0.1081 Epoch: 8 Global Step: 89110 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:57:07,316-Speed 5450.52 samples/sec Loss 6.3847 LearningRate 0.1081 Epoch: 8 Global Step: 89120 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:57:14,808-Speed 5467.44 samples/sec Loss 6.4057 LearningRate 0.1081 Epoch: 8 Global Step: 89130 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:57:22,312-Speed 5459.38 samples/sec Loss 6.4144 LearningRate 0.1081 Epoch: 8 Global Step: 89140 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:57:29,822-Speed 5454.95 samples/sec Loss 6.4757 LearningRate 0.1080 Epoch: 8 Global Step: 89150 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:57:37,325-Speed 5460.11 samples/sec Loss 6.3540 LearningRate 0.1080 Epoch: 8 Global Step: 89160 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:57:44,832-Speed 5456.82 samples/sec Loss 6.4082 LearningRate 0.1080 Epoch: 8 Global Step: 89170 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:57:52,345-Speed 5452.42 samples/sec Loss 6.4029 LearningRate 0.1080 Epoch: 8 Global Step: 89180 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:57:59,847-Speed 5460.42 samples/sec Loss 6.4151 LearningRate 0.1080 Epoch: 8 Global Step: 89190 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:58:07,276-Speed 5514.42 samples/sec Loss 6.3998 LearningRate 0.1080 Epoch: 8 Global Step: 89200 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:58:14,892-Speed 5378.47 samples/sec Loss 6.4071 LearningRate 0.1079 Epoch: 8 Global Step: 89210 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:58:22,487-Speed 5394.38 samples/sec Loss 6.3815 LearningRate 0.1079 Epoch: 8 Global Step: 89220 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:58:29,979-Speed 5467.99 samples/sec Loss 6.3675 LearningRate 0.1079 Epoch: 8 Global Step: 89230 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:58:37,447-Speed 5485.41 samples/sec Loss 6.4078 LearningRate 0.1079 Epoch: 8 Global Step: 89240 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:58:44,867-Speed 5520.19 samples/sec Loss 6.4233 LearningRate 0.1079 Epoch: 8 Global Step: 89250 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:58:52,408-Speed 5432.79 samples/sec Loss 6.3854 LearningRate 0.1078 Epoch: 8 Global Step: 89260 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:58:59,936-Speed 5442.21 samples/sec Loss 6.3858 LearningRate 0.1078 Epoch: 8 Global Step: 89270 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:59:07,500-Speed 5415.79 samples/sec Loss 6.3811 LearningRate 0.1078 Epoch: 8 Global Step: 89280 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:59:14,975-Speed 5479.94 samples/sec Loss 6.4029 LearningRate 0.1078 Epoch: 8 Global Step: 89290 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 14:59:22,500-Speed 5444.61 samples/sec Loss 6.3941 LearningRate 0.1078 Epoch: 8 Global Step: 89300 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:59:30,093-Speed 5394.90 samples/sec Loss 6.4273 LearningRate 0.1078 Epoch: 8 Global Step: 89310 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 14:59:37,595-Speed 5460.43 samples/sec Loss 6.4112 LearningRate 0.1077 Epoch: 8 Global Step: 89320 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:59:45,063-Speed 5485.33 samples/sec Loss 6.4124 LearningRate 0.1077 Epoch: 8 Global Step: 89330 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 14:59:52,607-Speed 5430.79 samples/sec Loss 6.4071 LearningRate 0.1077 Epoch: 8 Global Step: 89340 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:00:00,126-Speed 5448.05 samples/sec Loss 6.3861 LearningRate 0.1077 Epoch: 8 Global Step: 89350 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:00:07,643-Speed 5449.53 samples/sec Loss 6.3482 LearningRate 0.1077 Epoch: 8 Global Step: 89360 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:00:15,140-Speed 5463.98 samples/sec Loss 6.3601 LearningRate 0.1076 Epoch: 8 Global Step: 89370 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:00:22,612-Speed 5482.87 samples/sec Loss 6.4595 LearningRate 0.1076 Epoch: 8 Global Step: 89380 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:00:30,084-Speed 5483.24 samples/sec Loss 6.4344 LearningRate 0.1076 Epoch: 8 Global Step: 89390 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:00:37,629-Speed 5429.26 samples/sec Loss 6.3933 LearningRate 0.1076 Epoch: 8 Global Step: 89400 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:00:45,197-Speed 5412.54 samples/sec Loss 6.3880 LearningRate 0.1076 Epoch: 8 Global Step: 89410 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:00:52,720-Speed 5445.65 samples/sec Loss 6.3341 LearningRate 0.1075 Epoch: 8 Global Step: 89420 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:01:00,214-Speed 5466.91 samples/sec Loss 6.3826 LearningRate 0.1075 Epoch: 8 Global Step: 89430 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:01:07,670-Speed 5494.28 samples/sec Loss 6.3709 LearningRate 0.1075 Epoch: 8 Global Step: 89440 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:01:15,172-Speed 5460.64 samples/sec Loss 6.4135 LearningRate 0.1075 Epoch: 8 Global Step: 89450 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:01:22,651-Speed 5477.51 samples/sec Loss 6.3834 LearningRate 0.1075 Epoch: 8 Global Step: 89460 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:01:30,161-Speed 5455.05 samples/sec Loss 6.4166 LearningRate 0.1075 Epoch: 8 Global Step: 89470 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:01:37,614-Speed 5495.96 samples/sec Loss 6.3495 LearningRate 0.1074 Epoch: 8 Global Step: 89480 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:01:45,106-Speed 5467.74 samples/sec Loss 6.3576 LearningRate 0.1074 Epoch: 8 Global Step: 89490 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:01:52,726-Speed 5376.81 samples/sec Loss 6.4389 LearningRate 0.1074 Epoch: 8 Global Step: 89500 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:02:00,264-Speed 5434.45 samples/sec Loss 6.4464 LearningRate 0.1074 Epoch: 8 Global Step: 89510 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:02:07,835-Speed 5411.02 samples/sec Loss 6.3840 LearningRate 0.1074 Epoch: 8 Global Step: 89520 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:02:15,327-Speed 5467.02 samples/sec Loss 6.3994 LearningRate 0.1073 Epoch: 8 Global Step: 89530 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:02:22,854-Speed 5442.79 samples/sec Loss 6.3826 LearningRate 0.1073 Epoch: 8 Global Step: 89540 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:02:30,300-Speed 5502.05 samples/sec Loss 6.3630 LearningRate 0.1073 Epoch: 8 Global Step: 89550 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:02:37,812-Speed 5453.14 samples/sec Loss 6.3198 LearningRate 0.1073 Epoch: 8 Global Step: 89560 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:02:45,375-Speed 5417.01 samples/sec Loss 6.3964 LearningRate 0.1073 Epoch: 8 Global Step: 89570 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:02:52,977-Speed 5388.85 samples/sec Loss 6.4135 LearningRate 0.1073 Epoch: 8 Global Step: 89580 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:03:00,515-Speed 5434.75 samples/sec Loss 6.4021 LearningRate 0.1072 Epoch: 8 Global Step: 89590 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:03:08,024-Speed 5455.15 samples/sec Loss 6.3364 LearningRate 0.1072 Epoch: 8 Global Step: 89600 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:03:15,500-Speed 5479.36 samples/sec Loss 6.3678 LearningRate 0.1072 Epoch: 8 Global Step: 89610 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:03:23,091-Speed 5396.55 samples/sec Loss 6.3409 LearningRate 0.1072 Epoch: 8 Global Step: 89620 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:03:30,580-Speed 5470.55 samples/sec Loss 6.4530 LearningRate 0.1072 Epoch: 8 Global Step: 89630 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:03:38,057-Speed 5478.92 samples/sec Loss 6.4023 LearningRate 0.1071 Epoch: 8 Global Step: 89640 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:03:45,614-Speed 5420.65 samples/sec Loss 6.4812 LearningRate 0.1071 Epoch: 8 Global Step: 89650 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:03:53,234-Speed 5376.20 samples/sec Loss 6.4000 LearningRate 0.1071 Epoch: 8 Global Step: 89660 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:04:00,888-Speed 5352.60 samples/sec Loss 6.3886 LearningRate 0.1071 Epoch: 8 Global Step: 89670 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:04:08,424-Speed 5436.28 samples/sec Loss 6.3715 LearningRate 0.1071 Epoch: 8 Global Step: 89680 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:04:15,965-Speed 5432.06 samples/sec Loss 6.3963 LearningRate 0.1071 Epoch: 8 Global Step: 89690 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:04:23,496-Speed 5439.32 samples/sec Loss 6.3437 LearningRate 0.1070 Epoch: 8 Global Step: 89700 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:04:31,021-Speed 5443.69 samples/sec Loss 6.3988 LearningRate 0.1070 Epoch: 8 Global Step: 89710 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:04:38,522-Speed 5461.58 samples/sec Loss 6.4325 LearningRate 0.1070 Epoch: 8 Global Step: 89720 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:04:46,039-Speed 5449.93 samples/sec Loss 6.3708 LearningRate 0.1070 Epoch: 8 Global Step: 89730 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:04:53,684-Speed 5358.10 samples/sec Loss 6.3532 LearningRate 0.1070 Epoch: 8 Global Step: 89740 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:05:01,139-Speed 5495.06 samples/sec Loss 6.3392 LearningRate 0.1069 Epoch: 8 Global Step: 89750 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:05:08,673-Speed 5437.60 samples/sec Loss 6.4639 LearningRate 0.1069 Epoch: 8 Global Step: 89760 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:05:16,181-Speed 5456.68 samples/sec Loss 6.3796 LearningRate 0.1069 Epoch: 8 Global Step: 89770 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:05:23,674-Speed 5466.79 samples/sec Loss 6.4247 LearningRate 0.1069 Epoch: 8 Global Step: 89780 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:05:31,132-Speed 5492.52 samples/sec Loss 6.3639 LearningRate 0.1069 Epoch: 8 Global Step: 89790 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:05:38,689-Speed 5421.29 samples/sec Loss 6.3495 LearningRate 0.1069 Epoch: 8 Global Step: 89800 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:05:46,153-Speed 5488.55 samples/sec Loss 6.3642 LearningRate 0.1068 Epoch: 8 Global Step: 89810 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:05:53,709-Speed 5420.77 samples/sec Loss 6.4081 LearningRate 0.1068 Epoch: 8 Global Step: 89820 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:06:01,291-Speed 5403.43 samples/sec Loss 6.3738 LearningRate 0.1068 Epoch: 8 Global Step: 89830 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:06:08,803-Speed 5453.45 samples/sec Loss 6.3720 LearningRate 0.1068 Epoch: 8 Global Step: 89840 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:06:16,324-Speed 5446.56 samples/sec Loss 6.3849 LearningRate 0.1068 Epoch: 8 Global Step: 89850 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:06:23,791-Speed 5486.01 samples/sec Loss 6.4089 LearningRate 0.1067 Epoch: 8 Global Step: 89860 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:06:31,385-Speed 5394.91 samples/sec Loss 6.3789 LearningRate 0.1067 Epoch: 8 Global Step: 89870 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:06:38,824-Speed 5506.88 samples/sec Loss 6.3288 LearningRate 0.1067 Epoch: 8 Global Step: 89880 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:06:46,358-Speed 5437.36 samples/sec Loss 6.3717 LearningRate 0.1067 Epoch: 8 Global Step: 89890 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:06:53,996-Speed 5363.33 samples/sec Loss 6.4259 LearningRate 0.1067 Epoch: 8 Global Step: 89900 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:07:01,560-Speed 5416.02 samples/sec Loss 6.3720 LearningRate 0.1067 Epoch: 8 Global Step: 89910 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:07:09,009-Speed 5499.07 samples/sec Loss 6.4074 LearningRate 0.1066 Epoch: 8 Global Step: 89920 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:07:16,654-Speed 5358.79 samples/sec Loss 6.3809 LearningRate 0.1066 Epoch: 8 Global Step: 89930 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:07:24,144-Speed 5468.95 samples/sec Loss 6.4517 LearningRate 0.1066 Epoch: 8 Global Step: 89940 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-01-08 15:07:31,683-Speed 5434.18 samples/sec Loss 6.4408 LearningRate 0.1066 Epoch: 8 Global Step: 89950 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:07:39,178-Speed 5465.55 samples/sec Loss 6.3435 LearningRate 0.1066 Epoch: 8 Global Step: 89960 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:07:46,727-Speed 5426.72 samples/sec Loss 6.3637 LearningRate 0.1065 Epoch: 8 Global Step: 89970 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:07:54,183-Speed 5494.63 samples/sec Loss 6.4197 LearningRate 0.1065 Epoch: 8 Global Step: 89980 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:08:01,726-Speed 5431.15 samples/sec Loss 6.3457 LearningRate 0.1065 Epoch: 8 Global Step: 89990 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:08:09,332-Speed 5386.04 samples/sec Loss 6.3363 LearningRate 0.1065 Epoch: 8 Global Step: 90000 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:08:53,779-[lfw][90000]XNorm: 23.255685 Training: 2022-01-08 15:08:53,780-[lfw][90000]Accuracy-Flip: 0.99767+-0.00271 Training: 2022-01-08 15:08:53,781-[lfw][90000]Accuracy-Highest: 0.99817 Training: 2022-01-08 15:09:45,467-[cfp_fp][90000]XNorm: 20.862160 Training: 2022-01-08 15:09:45,468-[cfp_fp][90000]Accuracy-Flip: 0.98814+-0.00561 Training: 2022-01-08 15:09:45,468-[cfp_fp][90000]Accuracy-Highest: 0.98814 Training: 2022-01-08 15:10:31,192-[agedb_30][90000]XNorm: 22.981298 Training: 2022-01-08 15:10:31,193-[agedb_30][90000]Accuracy-Flip: 0.97833+-0.00898 Training: 2022-01-08 15:10:31,194-[agedb_30][90000]Accuracy-Highest: 0.97833 Training: 2022-01-08 15:10:39,000-Speed 273.67 samples/sec Loss 6.3452 LearningRate 0.1065 Epoch: 8 Global Step: 90010 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:10:46,574-Speed 5409.16 samples/sec Loss 6.3746 LearningRate 0.1065 Epoch: 8 Global Step: 90020 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:10:54,142-Speed 5413.09 samples/sec Loss 6.2902 LearningRate 0.1064 Epoch: 8 Global Step: 90030 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:11:01,702-Speed 5419.12 samples/sec Loss 6.3753 LearningRate 0.1064 Epoch: 8 Global Step: 90040 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:11:09,168-Speed 5487.01 samples/sec Loss 6.3624 LearningRate 0.1064 Epoch: 8 Global Step: 90050 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:11:16,682-Speed 5451.77 samples/sec Loss 6.3515 LearningRate 0.1064 Epoch: 8 Global Step: 90060 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:11:24,400-Speed 5307.36 samples/sec Loss 6.3791 LearningRate 0.1064 Epoch: 8 Global Step: 90070 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:11:31,923-Speed 5445.39 samples/sec Loss 6.2899 LearningRate 0.1063 Epoch: 8 Global Step: 90080 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:11:39,599-Speed 5337.11 samples/sec Loss 6.3140 LearningRate 0.1063 Epoch: 8 Global Step: 90090 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:11:47,083-Speed 5473.61 samples/sec Loss 6.2826 LearningRate 0.1063 Epoch: 8 Global Step: 90100 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:11:54,547-Speed 5487.81 samples/sec Loss 6.3622 LearningRate 0.1063 Epoch: 8 Global Step: 90110 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:12:02,087-Speed 5433.61 samples/sec Loss 6.3358 LearningRate 0.1063 Epoch: 8 Global Step: 90120 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:12:09,617-Speed 5440.76 samples/sec Loss 6.4184 LearningRate 0.1063 Epoch: 8 Global Step: 90130 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:12:17,045-Speed 5514.27 samples/sec Loss 6.4360 LearningRate 0.1062 Epoch: 8 Global Step: 90140 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:12:24,460-Speed 5524.53 samples/sec Loss 6.4104 LearningRate 0.1062 Epoch: 8 Global Step: 90150 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:12:31,944-Speed 5474.40 samples/sec Loss 6.3747 LearningRate 0.1062 Epoch: 8 Global Step: 90160 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:12:39,348-Speed 5533.21 samples/sec Loss 6.3470 LearningRate 0.1062 Epoch: 8 Global Step: 90170 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:12:46,785-Speed 5507.89 samples/sec Loss 6.3291 LearningRate 0.1062 Epoch: 8 Global Step: 90180 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:12:54,226-Speed 5504.92 samples/sec Loss 6.3336 LearningRate 0.1062 Epoch: 8 Global Step: 90190 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:13:01,698-Speed 5482.92 samples/sec Loss 6.3329 LearningRate 0.1061 Epoch: 8 Global Step: 90200 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:13:09,331-Speed 5367.51 samples/sec Loss 6.3444 LearningRate 0.1061 Epoch: 8 Global Step: 90210 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:13:16,763-Speed 5511.65 samples/sec Loss 6.3900 LearningRate 0.1061 Epoch: 8 Global Step: 90220 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:13:24,261-Speed 5463.57 samples/sec Loss 6.3918 LearningRate 0.1061 Epoch: 8 Global Step: 90230 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:13:31,732-Speed 5483.04 samples/sec Loss 6.2464 LearningRate 0.1061 Epoch: 8 Global Step: 90240 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:13:39,251-Speed 5448.31 samples/sec Loss 6.3708 LearningRate 0.1060 Epoch: 8 Global Step: 90250 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:13:46,749-Speed 5463.32 samples/sec Loss 6.4156 LearningRate 0.1060 Epoch: 8 Global Step: 90260 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:13:54,169-Speed 5520.80 samples/sec Loss 6.3272 LearningRate 0.1060 Epoch: 8 Global Step: 90270 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:14:01,727-Speed 5420.68 samples/sec Loss 6.3815 LearningRate 0.1060 Epoch: 8 Global Step: 90280 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:14:09,215-Speed 5471.03 samples/sec Loss 6.3406 LearningRate 0.1060 Epoch: 8 Global Step: 90290 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:14:16,771-Speed 5420.97 samples/sec Loss 6.3088 LearningRate 0.1060 Epoch: 8 Global Step: 90300 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:14:24,269-Speed 5463.97 samples/sec Loss 6.3658 LearningRate 0.1059 Epoch: 8 Global Step: 90310 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:14:31,780-Speed 5453.67 samples/sec Loss 6.3960 LearningRate 0.1059 Epoch: 8 Global Step: 90320 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:14:39,380-Speed 5390.93 samples/sec Loss 6.2892 LearningRate 0.1059 Epoch: 8 Global Step: 90330 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:14:46,890-Speed 5454.47 samples/sec Loss 6.3221 LearningRate 0.1059 Epoch: 8 Global Step: 90340 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:14:54,555-Speed 5344.78 samples/sec Loss 6.3578 LearningRate 0.1059 Epoch: 8 Global Step: 90350 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:15:02,074-Speed 5447.68 samples/sec Loss 6.3630 LearningRate 0.1058 Epoch: 8 Global Step: 90360 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:15:09,513-Speed 5507.20 samples/sec Loss 6.3999 LearningRate 0.1058 Epoch: 8 Global Step: 90370 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:15:17,032-Speed 5448.50 samples/sec Loss 6.3220 LearningRate 0.1058 Epoch: 8 Global Step: 90380 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:15:24,465-Speed 5511.50 samples/sec Loss 6.3393 LearningRate 0.1058 Epoch: 8 Global Step: 90390 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:15:31,964-Speed 5462.72 samples/sec Loss 6.3505 LearningRate 0.1058 Epoch: 8 Global Step: 90400 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:15:39,423-Speed 5492.19 samples/sec Loss 6.3802 LearningRate 0.1058 Epoch: 8 Global Step: 90410 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:15:46,917-Speed 5466.81 samples/sec Loss 6.3551 LearningRate 0.1057 Epoch: 8 Global Step: 90420 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:15:54,443-Speed 5442.70 samples/sec Loss 6.3927 LearningRate 0.1057 Epoch: 8 Global Step: 90430 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:16:01,962-Speed 5448.71 samples/sec Loss 6.3473 LearningRate 0.1057 Epoch: 8 Global Step: 90440 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:16:09,397-Speed 5510.19 samples/sec Loss 6.3131 LearningRate 0.1057 Epoch: 8 Global Step: 90450 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:16:16,927-Speed 5440.52 samples/sec Loss 6.3459 LearningRate 0.1057 Epoch: 8 Global Step: 90460 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:16:24,373-Speed 5501.05 samples/sec Loss 6.3706 LearningRate 0.1056 Epoch: 8 Global Step: 90470 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:16:31,796-Speed 5518.54 samples/sec Loss 6.3643 LearningRate 0.1056 Epoch: 8 Global Step: 90480 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:16:39,368-Speed 5410.37 samples/sec Loss 6.3781 LearningRate 0.1056 Epoch: 8 Global Step: 90490 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:16:46,880-Speed 5453.67 samples/sec Loss 6.3256 LearningRate 0.1056 Epoch: 8 Global Step: 90500 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:16:54,432-Speed 5424.32 samples/sec Loss 6.3696 LearningRate 0.1056 Epoch: 8 Global Step: 90510 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:17:01,949-Speed 5449.93 samples/sec Loss 6.3220 LearningRate 0.1056 Epoch: 8 Global Step: 90520 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:17:09,551-Speed 5388.81 samples/sec Loss 6.3445 LearningRate 0.1055 Epoch: 8 Global Step: 90530 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:17:17,062-Speed 5453.98 samples/sec Loss 6.3724 LearningRate 0.1055 Epoch: 8 Global Step: 90540 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:17:24,870-Speed 5246.43 samples/sec Loss 6.3242 LearningRate 0.1055 Epoch: 8 Global Step: 90550 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:17:32,514-Speed 5359.14 samples/sec Loss 6.3584 LearningRate 0.1055 Epoch: 8 Global Step: 90560 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:17:40,032-Speed 5449.77 samples/sec Loss 6.2874 LearningRate 0.1055 Epoch: 8 Global Step: 90570 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:17:47,692-Speed 5347.92 samples/sec Loss 6.2505 LearningRate 0.1054 Epoch: 8 Global Step: 90580 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:17:55,531-Speed 5225.81 samples/sec Loss 6.3352 LearningRate 0.1054 Epoch: 8 Global Step: 90590 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:18:03,099-Speed 5413.15 samples/sec Loss 6.3023 LearningRate 0.1054 Epoch: 8 Global Step: 90600 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:18:10,605-Speed 5457.85 samples/sec Loss 6.3072 LearningRate 0.1054 Epoch: 8 Global Step: 90610 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:18:18,100-Speed 5465.82 samples/sec Loss 6.4032 LearningRate 0.1054 Epoch: 8 Global Step: 90620 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:18:25,620-Speed 5447.21 samples/sec Loss 6.3771 LearningRate 0.1054 Epoch: 8 Global Step: 90630 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:18:33,225-Speed 5387.13 samples/sec Loss 6.3449 LearningRate 0.1053 Epoch: 8 Global Step: 90640 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:18:40,839-Speed 5379.77 samples/sec Loss 6.4204 LearningRate 0.1053 Epoch: 8 Global Step: 90650 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:18:48,306-Speed 5486.65 samples/sec Loss 6.2851 LearningRate 0.1053 Epoch: 8 Global Step: 90660 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:18:55,762-Speed 5494.37 samples/sec Loss 6.3137 LearningRate 0.1053 Epoch: 8 Global Step: 90670 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:19:03,203-Speed 5505.27 samples/sec Loss 6.2575 LearningRate 0.1053 Epoch: 8 Global Step: 90680 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:19:10,692-Speed 5470.64 samples/sec Loss 6.2902 LearningRate 0.1052 Epoch: 8 Global Step: 90690 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:19:18,152-Speed 5491.28 samples/sec Loss 6.2806 LearningRate 0.1052 Epoch: 8 Global Step: 90700 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:19:25,631-Speed 5477.12 samples/sec Loss 6.3334 LearningRate 0.1052 Epoch: 8 Global Step: 90710 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:19:33,120-Speed 5469.97 samples/sec Loss 6.3050 LearningRate 0.1052 Epoch: 8 Global Step: 90720 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:19:40,729-Speed 5383.90 samples/sec Loss 6.2830 LearningRate 0.1052 Epoch: 8 Global Step: 90730 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:19:48,250-Speed 5447.17 samples/sec Loss 6.3148 LearningRate 0.1052 Epoch: 8 Global Step: 90740 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:19:55,743-Speed 5467.26 samples/sec Loss 6.3356 LearningRate 0.1051 Epoch: 8 Global Step: 90750 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:20:03,188-Speed 5501.96 samples/sec Loss 6.3581 LearningRate 0.1051 Epoch: 8 Global Step: 90760 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:20:10,656-Speed 5485.83 samples/sec Loss 6.2625 LearningRate 0.1051 Epoch: 8 Global Step: 90770 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:20:18,133-Speed 5479.17 samples/sec Loss 6.3098 LearningRate 0.1051 Epoch: 8 Global Step: 90780 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:20:25,637-Speed 5459.23 samples/sec Loss 6.3058 LearningRate 0.1051 Epoch: 8 Global Step: 90790 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:20:33,073-Speed 5509.19 samples/sec Loss 6.2819 LearningRate 0.1050 Epoch: 8 Global Step: 90800 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:20:40,484-Speed 5526.98 samples/sec Loss 6.3386 LearningRate 0.1050 Epoch: 8 Global Step: 90810 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:20:47,868-Speed 5548.63 samples/sec Loss 6.2967 LearningRate 0.1050 Epoch: 8 Global Step: 90820 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:20:55,327-Speed 5491.66 samples/sec Loss 6.3526 LearningRate 0.1050 Epoch: 8 Global Step: 90830 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:21:02,914-Speed 5399.61 samples/sec Loss 6.3293 LearningRate 0.1050 Epoch: 8 Global Step: 90840 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:21:10,448-Speed 5437.12 samples/sec Loss 6.3278 LearningRate 0.1050 Epoch: 8 Global Step: 90850 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:21:17,812-Speed 5563.36 samples/sec Loss 6.3178 LearningRate 0.1049 Epoch: 8 Global Step: 90860 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:21:25,490-Speed 5335.81 samples/sec Loss 6.3089 LearningRate 0.1049 Epoch: 8 Global Step: 90870 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:21:32,948-Speed 5492.40 samples/sec Loss 6.3197 LearningRate 0.1049 Epoch: 8 Global Step: 90880 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:21:40,610-Speed 5346.46 samples/sec Loss 6.2640 LearningRate 0.1049 Epoch: 8 Global Step: 90890 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:21:48,116-Speed 5458.24 samples/sec Loss 6.3681 LearningRate 0.1049 Epoch: 8 Global Step: 90900 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:21:55,546-Speed 5513.99 samples/sec Loss 6.3520 LearningRate 0.1049 Epoch: 8 Global Step: 90910 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:22:03,103-Speed 5420.17 samples/sec Loss 6.3399 LearningRate 0.1048 Epoch: 8 Global Step: 90920 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:22:10,689-Speed 5400.65 samples/sec Loss 6.2985 LearningRate 0.1048 Epoch: 8 Global Step: 90930 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:22:18,100-Speed 5527.20 samples/sec Loss 6.3509 LearningRate 0.1048 Epoch: 8 Global Step: 90940 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:22:25,555-Speed 5495.77 samples/sec Loss 6.3236 LearningRate 0.1048 Epoch: 8 Global Step: 90950 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:22:33,081-Speed 5443.08 samples/sec Loss 6.3061 LearningRate 0.1048 Epoch: 8 Global Step: 90960 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:22:40,541-Speed 5490.92 samples/sec Loss 6.2942 LearningRate 0.1047 Epoch: 8 Global Step: 90970 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:22:47,975-Speed 5510.53 samples/sec Loss 6.2979 LearningRate 0.1047 Epoch: 8 Global Step: 90980 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:22:55,479-Speed 5459.82 samples/sec Loss 6.3374 LearningRate 0.1047 Epoch: 8 Global Step: 90990 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:23:02,897-Speed 5522.11 samples/sec Loss 6.3189 LearningRate 0.1047 Epoch: 8 Global Step: 91000 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:23:10,330-Speed 5511.18 samples/sec Loss 6.3186 LearningRate 0.1047 Epoch: 8 Global Step: 91010 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:23:17,778-Speed 5500.44 samples/sec Loss 6.3274 LearningRate 0.1047 Epoch: 8 Global Step: 91020 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:23:25,215-Speed 5508.53 samples/sec Loss 6.3365 LearningRate 0.1046 Epoch: 8 Global Step: 91030 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:23:32,660-Speed 5502.39 samples/sec Loss 6.2798 LearningRate 0.1046 Epoch: 8 Global Step: 91040 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:23:40,071-Speed 5527.18 samples/sec Loss 6.3000 LearningRate 0.1046 Epoch: 8 Global Step: 91050 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:23:47,517-Speed 5502.35 samples/sec Loss 6.3145 LearningRate 0.1046 Epoch: 8 Global Step: 91060 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:23:54,929-Speed 5526.83 samples/sec Loss 6.2854 LearningRate 0.1046 Epoch: 8 Global Step: 91070 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:24:02,384-Speed 5495.14 samples/sec Loss 6.3048 LearningRate 0.1045 Epoch: 8 Global Step: 91080 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:24:09,807-Speed 5518.18 samples/sec Loss 6.2779 LearningRate 0.1045 Epoch: 8 Global Step: 91090 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:24:17,203-Speed 5539.12 samples/sec Loss 6.3090 LearningRate 0.1045 Epoch: 8 Global Step: 91100 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:24:24,574-Speed 5557.86 samples/sec Loss 6.2912 LearningRate 0.1045 Epoch: 8 Global Step: 91110 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:24:31,965-Speed 5542.71 samples/sec Loss 6.2864 LearningRate 0.1045 Epoch: 8 Global Step: 91120 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:24:39,383-Speed 5522.36 samples/sec Loss 6.3078 LearningRate 0.1045 Epoch: 8 Global Step: 91130 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:24:46,807-Speed 5517.98 samples/sec Loss 6.2804 LearningRate 0.1044 Epoch: 8 Global Step: 91140 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:24:54,257-Speed 5498.80 samples/sec Loss 6.3397 LearningRate 0.1044 Epoch: 8 Global Step: 91150 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:25:01,714-Speed 5494.00 samples/sec Loss 6.3092 LearningRate 0.1044 Epoch: 8 Global Step: 91160 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:25:09,118-Speed 5532.15 samples/sec Loss 6.2745 LearningRate 0.1044 Epoch: 8 Global Step: 91170 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:25:16,570-Speed 5497.54 samples/sec Loss 6.2381 LearningRate 0.1044 Epoch: 8 Global Step: 91180 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:25:23,943-Speed 5555.87 samples/sec Loss 6.3105 LearningRate 0.1043 Epoch: 8 Global Step: 91190 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:25:31,344-Speed 5535.75 samples/sec Loss 6.2773 LearningRate 0.1043 Epoch: 8 Global Step: 91200 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:25:38,751-Speed 5530.56 samples/sec Loss 6.3379 LearningRate 0.1043 Epoch: 8 Global Step: 91210 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:25:46,136-Speed 5547.44 samples/sec Loss 6.2506 LearningRate 0.1043 Epoch: 8 Global Step: 91220 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:25:53,751-Speed 5378.90 samples/sec Loss 6.3453 LearningRate 0.1043 Epoch: 8 Global Step: 91230 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:26:01,223-Speed 5483.58 samples/sec Loss 6.3038 LearningRate 0.1043 Epoch: 8 Global Step: 91240 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:26:08,625-Speed 5533.80 samples/sec Loss 6.2639 LearningRate 0.1042 Epoch: 8 Global Step: 91250 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:26:16,000-Speed 5554.81 samples/sec Loss 6.2487 LearningRate 0.1042 Epoch: 8 Global Step: 91260 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:26:23,409-Speed 5529.20 samples/sec Loss 6.2601 LearningRate 0.1042 Epoch: 8 Global Step: 91270 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:26:30,966-Speed 5421.04 samples/sec Loss 6.3619 LearningRate 0.1042 Epoch: 8 Global Step: 91280 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:26:38,377-Speed 5527.63 samples/sec Loss 6.4110 LearningRate 0.1042 Epoch: 8 Global Step: 91290 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:26:45,920-Speed 5430.93 samples/sec Loss 6.3204 LearningRate 0.1041 Epoch: 8 Global Step: 91300 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:26:53,478-Speed 5420.69 samples/sec Loss 6.2689 LearningRate 0.1041 Epoch: 8 Global Step: 91310 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:27:00,933-Speed 5495.21 samples/sec Loss 6.3546 LearningRate 0.1041 Epoch: 8 Global Step: 91320 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:27:08,372-Speed 5506.87 samples/sec Loss 6.2657 LearningRate 0.1041 Epoch: 8 Global Step: 91330 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:27:15,864-Speed 5467.61 samples/sec Loss 6.2774 LearningRate 0.1041 Epoch: 8 Global Step: 91340 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:27:23,297-Speed 5510.72 samples/sec Loss 6.2442 LearningRate 0.1041 Epoch: 8 Global Step: 91350 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:27:30,809-Speed 5454.53 samples/sec Loss 6.2770 LearningRate 0.1040 Epoch: 8 Global Step: 91360 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:27:38,251-Speed 5505.23 samples/sec Loss 6.2990 LearningRate 0.1040 Epoch: 8 Global Step: 91370 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:27:45,670-Speed 5521.61 samples/sec Loss 6.3120 LearningRate 0.1040 Epoch: 8 Global Step: 91380 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:27:53,104-Speed 5510.31 samples/sec Loss 6.2560 LearningRate 0.1040 Epoch: 8 Global Step: 91390 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:28:00,527-Speed 5518.67 samples/sec Loss 6.2076 LearningRate 0.1040 Epoch: 8 Global Step: 91400 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:28:07,952-Speed 5517.76 samples/sec Loss 6.2169 LearningRate 0.1040 Epoch: 8 Global Step: 91410 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:28:15,365-Speed 5525.60 samples/sec Loss 6.2184 LearningRate 0.1039 Epoch: 8 Global Step: 91420 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:28:22,765-Speed 5535.77 samples/sec Loss 6.2977 LearningRate 0.1039 Epoch: 8 Global Step: 91430 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:28:30,172-Speed 5530.99 samples/sec Loss 6.3111 LearningRate 0.1039 Epoch: 8 Global Step: 91440 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:28:37,659-Speed 5472.10 samples/sec Loss 6.2831 LearningRate 0.1039 Epoch: 8 Global Step: 91450 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:28:45,072-Speed 5526.01 samples/sec Loss 6.2973 LearningRate 0.1039 Epoch: 8 Global Step: 91460 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:28:52,499-Speed 5515.84 samples/sec Loss 6.3061 LearningRate 0.1038 Epoch: 8 Global Step: 91470 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:28:59,938-Speed 5506.88 samples/sec Loss 6.2954 LearningRate 0.1038 Epoch: 8 Global Step: 91480 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:29:07,419-Speed 5476.27 samples/sec Loss 6.2270 LearningRate 0.1038 Epoch: 8 Global Step: 91490 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:29:14,884-Speed 5487.52 samples/sec Loss 6.3000 LearningRate 0.1038 Epoch: 8 Global Step: 91500 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:29:22,292-Speed 5529.97 samples/sec Loss 6.2513 LearningRate 0.1038 Epoch: 8 Global Step: 91510 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:29:29,715-Speed 5518.56 samples/sec Loss 6.2636 LearningRate 0.1038 Epoch: 8 Global Step: 91520 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:29:37,181-Speed 5487.21 samples/sec Loss 6.2848 LearningRate 0.1037 Epoch: 8 Global Step: 91530 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:29:44,613-Speed 5511.81 samples/sec Loss 6.2905 LearningRate 0.1037 Epoch: 8 Global Step: 91540 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:29:52,036-Speed 5518.65 samples/sec Loss 6.3115 LearningRate 0.1037 Epoch: 8 Global Step: 91550 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:29:59,516-Speed 5476.92 samples/sec Loss 6.3707 LearningRate 0.1037 Epoch: 8 Global Step: 91560 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:30:06,944-Speed 5515.02 samples/sec Loss 6.2532 LearningRate 0.1037 Epoch: 8 Global Step: 91570 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:30:14,417-Speed 5481.61 samples/sec Loss 6.2800 LearningRate 0.1036 Epoch: 8 Global Step: 91580 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:30:21,969-Speed 5424.52 samples/sec Loss 6.2478 LearningRate 0.1036 Epoch: 8 Global Step: 91590 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:30:29,384-Speed 5524.90 samples/sec Loss 6.2577 LearningRate 0.1036 Epoch: 8 Global Step: 91600 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:30:36,803-Speed 5522.24 samples/sec Loss 6.2241 LearningRate 0.1036 Epoch: 8 Global Step: 91610 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-01-08 15:30:44,194-Speed 5542.58 samples/sec Loss 6.2363 LearningRate 0.1036 Epoch: 8 Global Step: 91620 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:30:51,583-Speed 5544.27 samples/sec Loss 6.2836 LearningRate 0.1036 Epoch: 8 Global Step: 91630 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:30:59,111-Speed 5441.54 samples/sec Loss 6.3297 LearningRate 0.1035 Epoch: 8 Global Step: 91640 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:31:06,513-Speed 5534.10 samples/sec Loss 6.2977 LearningRate 0.1035 Epoch: 8 Global Step: 91650 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:31:13,918-Speed 5532.44 samples/sec Loss 6.2006 LearningRate 0.1035 Epoch: 8 Global Step: 91660 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:31:21,339-Speed 5520.40 samples/sec Loss 6.2383 LearningRate 0.1035 Epoch: 8 Global Step: 91670 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:31:28,761-Speed 5519.16 samples/sec Loss 6.2726 LearningRate 0.1035 Epoch: 8 Global Step: 91680 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:31:36,176-Speed 5524.80 samples/sec Loss 6.3024 LearningRate 0.1035 Epoch: 8 Global Step: 91690 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:31:43,602-Speed 5516.54 samples/sec Loss 6.2806 LearningRate 0.1034 Epoch: 8 Global Step: 91700 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:31:50,986-Speed 5548.38 samples/sec Loss 6.2568 LearningRate 0.1034 Epoch: 8 Global Step: 91710 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-01-08 15:31:58,432-Speed 5501.60 samples/sec Loss 6.2321 LearningRate 0.1034 Epoch: 8 Global Step: 91720 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:32:06,009-Speed 5406.23 samples/sec Loss 6.2570 LearningRate 0.1034 Epoch: 8 Global Step: 91730 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:32:13,408-Speed 5536.75 samples/sec Loss 6.3544 LearningRate 0.1034 Epoch: 8 Global Step: 91740 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:32:20,826-Speed 5522.60 samples/sec Loss 6.2821 LearningRate 0.1033 Epoch: 8 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:32:28,257-Speed 5513.06 samples/sec Loss 6.3301 LearningRate 0.1033 Epoch: 8 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:32:35,666-Speed 5528.66 samples/sec Loss 6.2232 LearningRate 0.1033 Epoch: 8 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:32:43,139-Speed 5482.31 samples/sec Loss 6.3090 LearningRate 0.1033 Epoch: 8 Global Step: 91780 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:32:50,567-Speed 5514.66 samples/sec Loss 6.2921 LearningRate 0.1033 Epoch: 8 Global Step: 91790 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:32:58,022-Speed 5495.39 samples/sec Loss 6.2909 LearningRate 0.1033 Epoch: 8 Global Step: 91800 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:33:05,541-Speed 5448.02 samples/sec Loss 6.1989 LearningRate 0.1032 Epoch: 8 Global Step: 91810 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:33:12,933-Speed 5542.51 samples/sec Loss 6.2928 LearningRate 0.1032 Epoch: 8 Global Step: 91820 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:33:20,387-Speed 5495.57 samples/sec Loss 6.2368 LearningRate 0.1032 Epoch: 8 Global Step: 91830 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:33:27,811-Speed 5517.82 samples/sec Loss 6.3148 LearningRate 0.1032 Epoch: 8 Global Step: 91840 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:33:35,261-Speed 5498.87 samples/sec Loss 6.2618 LearningRate 0.1032 Epoch: 8 Global Step: 91850 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:33:42,720-Speed 5492.02 samples/sec Loss 6.2479 LearningRate 0.1031 Epoch: 8 Global Step: 91860 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:33:50,139-Speed 5521.92 samples/sec Loss 6.3346 LearningRate 0.1031 Epoch: 8 Global Step: 91870 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:33:57,584-Speed 5501.68 samples/sec Loss 6.2762 LearningRate 0.1031 Epoch: 8 Global Step: 91880 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:34:05,037-Speed 5497.06 samples/sec Loss 6.2637 LearningRate 0.1031 Epoch: 8 Global Step: 91890 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:34:12,527-Speed 5469.14 samples/sec Loss 6.2077 LearningRate 0.1031 Epoch: 8 Global Step: 91900 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:34:19,941-Speed 5525.63 samples/sec Loss 6.2496 LearningRate 0.1031 Epoch: 8 Global Step: 91910 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:34:27,364-Speed 5518.62 samples/sec Loss 6.1925 LearningRate 0.1030 Epoch: 8 Global Step: 91920 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:34:34,788-Speed 5518.01 samples/sec Loss 6.2479 LearningRate 0.1030 Epoch: 8 Global Step: 91930 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:34:42,462-Speed 5338.25 samples/sec Loss 6.2613 LearningRate 0.1030 Epoch: 8 Global Step: 91940 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:34:49,873-Speed 5527.82 samples/sec Loss 6.2601 LearningRate 0.1030 Epoch: 8 Global Step: 91950 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:34:57,294-Speed 5520.08 samples/sec Loss 6.2287 LearningRate 0.1030 Epoch: 8 Global Step: 91960 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:35:04,728-Speed 5510.46 samples/sec Loss 6.2270 LearningRate 0.1030 Epoch: 8 Global Step: 91970 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:35:12,170-Speed 5504.91 samples/sec Loss 6.2496 LearningRate 0.1029 Epoch: 8 Global Step: 91980 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:35:19,576-Speed 5531.59 samples/sec Loss 6.2186 LearningRate 0.1029 Epoch: 8 Global Step: 91990 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:35:27,171-Speed 5393.36 samples/sec Loss 6.2122 LearningRate 0.1029 Epoch: 8 Global Step: 92000 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:36:11,446-[lfw][92000]XNorm: 23.362174 Training: 2022-01-08 15:36:11,447-[lfw][92000]Accuracy-Flip: 0.99733+-0.00260 Training: 2022-01-08 15:36:11,447-[lfw][92000]Accuracy-Highest: 0.99817 Training: 2022-01-08 15:37:03,307-[cfp_fp][92000]XNorm: 21.384648 Training: 2022-01-08 15:37:03,307-[cfp_fp][92000]Accuracy-Flip: 0.98800+-0.00466 Training: 2022-01-08 15:37:03,308-[cfp_fp][92000]Accuracy-Highest: 0.98814 Training: 2022-01-08 15:37:49,100-[agedb_30][92000]XNorm: 23.008124 Training: 2022-01-08 15:37:49,101-[agedb_30][92000]Accuracy-Flip: 0.97750+-0.00676 Training: 2022-01-08 15:37:49,102-[agedb_30][92000]Accuracy-Highest: 0.97833 Training: 2022-01-08 15:37:56,589-Speed 274.14 samples/sec Loss 6.2756 LearningRate 0.1029 Epoch: 8 Global Step: 92010 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:38:03,998-Speed 5529.26 samples/sec Loss 6.2044 LearningRate 0.1029 Epoch: 8 Global Step: 92020 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:38:11,386-Speed 5545.01 samples/sec Loss 6.2701 LearningRate 0.1028 Epoch: 8 Global Step: 92030 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:38:18,757-Speed 5558.25 samples/sec Loss 6.2542 LearningRate 0.1028 Epoch: 8 Global Step: 92040 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:38:26,150-Speed 5540.83 samples/sec Loss 6.2458 LearningRate 0.1028 Epoch: 8 Global Step: 92050 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:38:33,555-Speed 5531.98 samples/sec Loss 6.2544 LearningRate 0.1028 Epoch: 8 Global Step: 92060 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:38:40,991-Speed 5509.31 samples/sec Loss 6.2489 LearningRate 0.1028 Epoch: 8 Global Step: 92070 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:38:48,497-Speed 5457.66 samples/sec Loss 6.2244 LearningRate 0.1028 Epoch: 8 Global Step: 92080 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:38:55,900-Speed 5533.68 samples/sec Loss 6.1725 LearningRate 0.1027 Epoch: 8 Global Step: 92090 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:39:03,416-Speed 5450.99 samples/sec Loss 6.2601 LearningRate 0.1027 Epoch: 8 Global Step: 92100 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:39:10,812-Speed 5539.31 samples/sec Loss 6.2976 LearningRate 0.1027 Epoch: 8 Global Step: 92110 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:39:18,193-Speed 5549.20 samples/sec Loss 6.2722 LearningRate 0.1027 Epoch: 8 Global Step: 92120 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:39:25,616-Speed 5519.41 samples/sec Loss 6.2951 LearningRate 0.1027 Epoch: 8 Global Step: 92130 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:39:33,028-Speed 5526.62 samples/sec Loss 6.2833 LearningRate 0.1026 Epoch: 8 Global Step: 92140 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:39:40,457-Speed 5514.31 samples/sec Loss 6.2625 LearningRate 0.1026 Epoch: 8 Global Step: 92150 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:39:48,003-Speed 5428.55 samples/sec Loss 6.2666 LearningRate 0.1026 Epoch: 8 Global Step: 92160 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:39:55,463-Speed 5491.81 samples/sec Loss 6.2344 LearningRate 0.1026 Epoch: 8 Global Step: 92170 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:40:03,082-Speed 5376.79 samples/sec Loss 6.2526 LearningRate 0.1026 Epoch: 8 Global Step: 92180 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:40:10,525-Speed 5504.42 samples/sec Loss 6.2552 LearningRate 0.1026 Epoch: 8 Global Step: 92190 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:40:17,967-Speed 5504.22 samples/sec Loss 6.2046 LearningRate 0.1025 Epoch: 8 Global Step: 92200 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:40:25,385-Speed 5522.03 samples/sec Loss 6.2740 LearningRate 0.1025 Epoch: 8 Global Step: 92210 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:40:32,857-Speed 5483.01 samples/sec Loss 6.2386 LearningRate 0.1025 Epoch: 8 Global Step: 92220 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:40:40,388-Speed 5439.68 samples/sec Loss 6.2194 LearningRate 0.1025 Epoch: 8 Global Step: 92230 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:40:47,880-Speed 5468.04 samples/sec Loss 6.2733 LearningRate 0.1025 Epoch: 8 Global Step: 92240 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:40:55,461-Speed 5403.86 samples/sec Loss 6.2406 LearningRate 0.1025 Epoch: 8 Global Step: 92250 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:41:03,066-Speed 5387.33 samples/sec Loss 6.1955 LearningRate 0.1024 Epoch: 8 Global Step: 92260 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:41:10,559-Speed 5467.58 samples/sec Loss 6.2620 LearningRate 0.1024 Epoch: 8 Global Step: 92270 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:41:18,088-Speed 5440.26 samples/sec Loss 6.2542 LearningRate 0.1024 Epoch: 8 Global Step: 92280 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:41:25,608-Speed 5447.70 samples/sec Loss 6.2994 LearningRate 0.1024 Epoch: 8 Global Step: 92290 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:41:33,062-Speed 5495.88 samples/sec Loss 6.2479 LearningRate 0.1024 Epoch: 8 Global Step: 92300 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:41:40,736-Speed 5338.35 samples/sec Loss 6.2063 LearningRate 0.1023 Epoch: 8 Global Step: 92310 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:41:48,183-Speed 5500.87 samples/sec Loss 6.2487 LearningRate 0.1023 Epoch: 8 Global Step: 92320 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:41:55,600-Speed 5522.84 samples/sec Loss 6.2418 LearningRate 0.1023 Epoch: 8 Global Step: 92330 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:42:03,079-Speed 5477.48 samples/sec Loss 6.3139 LearningRate 0.1023 Epoch: 8 Global Step: 92340 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:42:10,518-Speed 5507.10 samples/sec Loss 6.2614 LearningRate 0.1023 Epoch: 8 Global Step: 92350 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:42:17,959-Speed 5505.15 samples/sec Loss 6.1794 LearningRate 0.1023 Epoch: 8 Global Step: 92360 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:42:25,331-Speed 5556.84 samples/sec Loss 6.2759 LearningRate 0.1022 Epoch: 8 Global Step: 92370 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:42:32,830-Speed 5462.68 samples/sec Loss 6.2662 LearningRate 0.1022 Epoch: 8 Global Step: 92380 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:42:40,322-Speed 5468.45 samples/sec Loss 6.1969 LearningRate 0.1022 Epoch: 8 Global Step: 92390 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:42:47,750-Speed 5515.19 samples/sec Loss 6.2643 LearningRate 0.1022 Epoch: 8 Global Step: 92400 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:42:55,157-Speed 5530.49 samples/sec Loss 6.2480 LearningRate 0.1022 Epoch: 8 Global Step: 92410 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:43:02,556-Speed 5536.70 samples/sec Loss 6.1921 LearningRate 0.1021 Epoch: 8 Global Step: 92420 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:43:10,019-Speed 5489.10 samples/sec Loss 6.2319 LearningRate 0.1021 Epoch: 8 Global Step: 92430 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:43:17,420-Speed 5535.67 samples/sec Loss 6.2139 LearningRate 0.1021 Epoch: 8 Global Step: 92440 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:43:24,838-Speed 5522.24 samples/sec Loss 6.2780 LearningRate 0.1021 Epoch: 8 Global Step: 92450 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:43:32,260-Speed 5518.98 samples/sec Loss 6.2116 LearningRate 0.1021 Epoch: 8 Global Step: 92460 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:43:39,694-Speed 5511.21 samples/sec Loss 6.1688 LearningRate 0.1021 Epoch: 8 Global Step: 92470 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:43:47,167-Speed 5481.76 samples/sec Loss 6.2342 LearningRate 0.1020 Epoch: 8 Global Step: 92480 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:43:54,748-Speed 5403.75 samples/sec Loss 6.2426 LearningRate 0.1020 Epoch: 8 Global Step: 92490 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:44:02,215-Speed 5485.86 samples/sec Loss 6.1802 LearningRate 0.1020 Epoch: 8 Global Step: 92500 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:44:09,701-Speed 5472.43 samples/sec Loss 6.2384 LearningRate 0.1020 Epoch: 8 Global Step: 92510 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:44:17,083-Speed 5549.33 samples/sec Loss 6.2530 LearningRate 0.1020 Epoch: 8 Global Step: 92520 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:44:24,511-Speed 5515.26 samples/sec Loss 6.1435 LearningRate 0.1020 Epoch: 8 Global Step: 92530 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:44:31,941-Speed 5513.46 samples/sec Loss 6.2235 LearningRate 0.1019 Epoch: 8 Global Step: 92540 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:44:39,439-Speed 5463.50 samples/sec Loss 6.3148 LearningRate 0.1019 Epoch: 8 Global Step: 92550 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:44:46,942-Speed 5460.09 samples/sec Loss 6.2224 LearningRate 0.1019 Epoch: 8 Global Step: 92560 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:44:54,412-Speed 5484.14 samples/sec Loss 6.2189 LearningRate 0.1019 Epoch: 8 Global Step: 92570 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:45:01,882-Speed 5483.60 samples/sec Loss 6.2702 LearningRate 0.1019 Epoch: 8 Global Step: 92580 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:45:09,331-Speed 5499.05 samples/sec Loss 6.1466 LearningRate 0.1018 Epoch: 8 Global Step: 92590 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:45:16,786-Speed 5495.83 samples/sec Loss 6.2030 LearningRate 0.1018 Epoch: 8 Global Step: 92600 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:45:24,201-Speed 5524.88 samples/sec Loss 6.1671 LearningRate 0.1018 Epoch: 8 Global Step: 92610 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:45:31,684-Speed 5474.18 samples/sec Loss 6.1632 LearningRate 0.1018 Epoch: 8 Global Step: 92620 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:45:39,136-Speed 5496.84 samples/sec Loss 6.1869 LearningRate 0.1018 Epoch: 8 Global Step: 92630 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:45:46,581-Speed 5502.81 samples/sec Loss 6.2046 LearningRate 0.1018 Epoch: 8 Global Step: 92640 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:45:54,145-Speed 5415.76 samples/sec Loss 6.1935 LearningRate 0.1017 Epoch: 8 Global Step: 92650 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:46:01,560-Speed 5525.02 samples/sec Loss 6.2388 LearningRate 0.1017 Epoch: 8 Global Step: 92660 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:46:09,066-Speed 5457.56 samples/sec Loss 6.1962 LearningRate 0.1017 Epoch: 8 Global Step: 92670 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:46:16,645-Speed 5405.12 samples/sec Loss 6.1839 LearningRate 0.1017 Epoch: 8 Global Step: 92680 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:46:24,158-Speed 5452.68 samples/sec Loss 6.2402 LearningRate 0.1017 Epoch: 8 Global Step: 92690 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:46:31,677-Speed 5447.88 samples/sec Loss 6.2209 LearningRate 0.1017 Epoch: 8 Global Step: 92700 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:46:39,085-Speed 5530.46 samples/sec Loss 6.2271 LearningRate 0.1016 Epoch: 8 Global Step: 92710 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:46:46,473-Speed 5544.69 samples/sec Loss 6.2241 LearningRate 0.1016 Epoch: 8 Global Step: 92720 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:46:53,883-Speed 5528.80 samples/sec Loss 6.2454 LearningRate 0.1016 Epoch: 8 Global Step: 92730 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:47:01,268-Speed 5546.66 samples/sec Loss 6.2448 LearningRate 0.1016 Epoch: 8 Global Step: 92740 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:47:08,730-Speed 5489.96 samples/sec Loss 6.2100 LearningRate 0.1016 Epoch: 8 Global Step: 92750 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:47:16,143-Speed 5526.30 samples/sec Loss 6.2395 LearningRate 0.1015 Epoch: 8 Global Step: 92760 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:47:23,563-Speed 5520.77 samples/sec Loss 6.2399 LearningRate 0.1015 Epoch: 8 Global Step: 92770 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:47:31,013-Speed 5499.10 samples/sec Loss 6.2365 LearningRate 0.1015 Epoch: 8 Global Step: 92780 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:47:38,409-Speed 5539.06 samples/sec Loss 6.2242 LearningRate 0.1015 Epoch: 8 Global Step: 92790 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:47:45,839-Speed 5512.93 samples/sec Loss 6.1713 LearningRate 0.1015 Epoch: 8 Global Step: 92800 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:47:53,367-Speed 5442.59 samples/sec Loss 6.2161 LearningRate 0.1015 Epoch: 8 Global Step: 92810 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:48:00,833-Speed 5486.79 samples/sec Loss 6.2398 LearningRate 0.1014 Epoch: 8 Global Step: 92820 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:48:08,343-Speed 5454.44 samples/sec Loss 6.2677 LearningRate 0.1014 Epoch: 8 Global Step: 92830 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:48:15,769-Speed 5516.23 samples/sec Loss 6.2198 LearningRate 0.1014 Epoch: 8 Global Step: 92840 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:48:23,216-Speed 5501.65 samples/sec Loss 6.1988 LearningRate 0.1014 Epoch: 8 Global Step: 92850 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:48:30,676-Speed 5491.03 samples/sec Loss 6.1951 LearningRate 0.1014 Epoch: 8 Global Step: 92860 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:48:38,268-Speed 5395.46 samples/sec Loss 6.2436 LearningRate 0.1014 Epoch: 8 Global Step: 92870 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:48:45,732-Speed 5489.06 samples/sec Loss 6.2154 LearningRate 0.1013 Epoch: 8 Global Step: 92880 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:48:53,153-Speed 5520.27 samples/sec Loss 6.2579 LearningRate 0.1013 Epoch: 8 Global Step: 92890 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:49:00,589-Speed 5509.14 samples/sec Loss 6.2109 LearningRate 0.1013 Epoch: 8 Global Step: 92900 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:49:07,993-Speed 5532.57 samples/sec Loss 6.2246 LearningRate 0.1013 Epoch: 8 Global Step: 92910 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:49:15,422-Speed 5514.50 samples/sec Loss 6.2047 LearningRate 0.1013 Epoch: 8 Global Step: 92920 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:49:22,851-Speed 5513.98 samples/sec Loss 6.2211 LearningRate 0.1012 Epoch: 8 Global Step: 92930 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:49:30,320-Speed 5485.36 samples/sec Loss 6.2332 LearningRate 0.1012 Epoch: 8 Global Step: 92940 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:49:37,765-Speed 5501.98 samples/sec Loss 6.1909 LearningRate 0.1012 Epoch: 8 Global Step: 92950 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:49:45,198-Speed 5511.31 samples/sec Loss 6.1936 LearningRate 0.1012 Epoch: 8 Global Step: 92960 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:49:52,635-Speed 5507.97 samples/sec Loss 6.2138 LearningRate 0.1012 Epoch: 8 Global Step: 92970 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:50:00,098-Speed 5489.91 samples/sec Loss 6.2080 LearningRate 0.1012 Epoch: 8 Global Step: 92980 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:50:07,582-Speed 5473.24 samples/sec Loss 6.2877 LearningRate 0.1011 Epoch: 8 Global Step: 92990 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:50:15,027-Speed 5502.32 samples/sec Loss 6.1995 LearningRate 0.1011 Epoch: 8 Global Step: 93000 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:50:22,520-Speed 5467.50 samples/sec Loss 6.1979 LearningRate 0.1011 Epoch: 8 Global Step: 93010 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:50:29,950-Speed 5514.08 samples/sec Loss 6.2157 LearningRate 0.1011 Epoch: 8 Global Step: 93020 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:50:37,421-Speed 5482.77 samples/sec Loss 6.2433 LearningRate 0.1011 Epoch: 8 Global Step: 93030 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:50:44,788-Speed 5560.51 samples/sec Loss 6.2616 LearningRate 0.1011 Epoch: 8 Global Step: 93040 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:50:52,188-Speed 5536.48 samples/sec Loss 6.1631 LearningRate 0.1010 Epoch: 8 Global Step: 93050 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:50:59,561-Speed 5556.29 samples/sec Loss 6.1922 LearningRate 0.1010 Epoch: 8 Global Step: 93060 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:51:06,965-Speed 5533.01 samples/sec Loss 6.1849 LearningRate 0.1010 Epoch: 8 Global Step: 93070 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:51:14,366-Speed 5534.60 samples/sec Loss 6.1784 LearningRate 0.1010 Epoch: 8 Global Step: 93080 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:51:21,828-Speed 5489.57 samples/sec Loss 6.2383 LearningRate 0.1010 Epoch: 8 Global Step: 93090 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:51:29,311-Speed 5475.00 samples/sec Loss 6.2376 LearningRate 0.1009 Epoch: 8 Global Step: 93100 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:51:36,701-Speed 5543.89 samples/sec Loss 6.1708 LearningRate 0.1009 Epoch: 8 Global Step: 93110 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:51:44,154-Speed 5496.13 samples/sec Loss 6.1590 LearningRate 0.1009 Epoch: 8 Global Step: 93120 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:51:51,581-Speed 5515.24 samples/sec Loss 6.1857 LearningRate 0.1009 Epoch: 8 Global Step: 93130 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:51:59,051-Speed 5484.32 samples/sec Loss 6.1947 LearningRate 0.1009 Epoch: 8 Global Step: 93140 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:52:06,493-Speed 5504.90 samples/sec Loss 6.1821 LearningRate 0.1009 Epoch: 8 Global Step: 93150 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:52:13,921-Speed 5515.08 samples/sec Loss 6.2036 LearningRate 0.1008 Epoch: 8 Global Step: 93160 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:52:21,375-Speed 5495.79 samples/sec Loss 6.2568 LearningRate 0.1008 Epoch: 8 Global Step: 93170 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:52:28,860-Speed 5472.88 samples/sec Loss 6.1668 LearningRate 0.1008 Epoch: 8 Global Step: 93180 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:52:36,292-Speed 5512.03 samples/sec Loss 6.2083 LearningRate 0.1008 Epoch: 8 Global Step: 93190 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:52:43,755-Speed 5488.96 samples/sec Loss 6.1841 LearningRate 0.1008 Epoch: 8 Global Step: 93200 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:52:51,180-Speed 5517.33 samples/sec Loss 6.1954 LearningRate 0.1007 Epoch: 8 Global Step: 93210 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:52:58,646-Speed 5487.00 samples/sec Loss 6.2418 LearningRate 0.1007 Epoch: 8 Global Step: 93220 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:53:06,057-Speed 5528.43 samples/sec Loss 6.1692 LearningRate 0.1007 Epoch: 8 Global Step: 93230 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:53:13,501-Speed 5502.66 samples/sec Loss 6.1336 LearningRate 0.1007 Epoch: 8 Global Step: 93240 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:53:20,935-Speed 5510.51 samples/sec Loss 6.2212 LearningRate 0.1007 Epoch: 8 Global Step: 93250 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:53:28,421-Speed 5472.59 samples/sec Loss 6.2281 LearningRate 0.1007 Epoch: 8 Global Step: 93260 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:53:35,916-Speed 5465.87 samples/sec Loss 6.1515 LearningRate 0.1006 Epoch: 8 Global Step: 93270 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:53:43,403-Speed 5471.58 samples/sec Loss 6.1919 LearningRate 0.1006 Epoch: 8 Global Step: 93280 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:53:50,852-Speed 5499.08 samples/sec Loss 6.1837 LearningRate 0.1006 Epoch: 8 Global Step: 93290 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:53:58,318-Speed 5487.07 samples/sec Loss 6.1565 LearningRate 0.1006 Epoch: 8 Global Step: 93300 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:54:05,871-Speed 5424.13 samples/sec Loss 6.1884 LearningRate 0.1006 Epoch: 8 Global Step: 93310 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:54:13,283-Speed 5526.85 samples/sec Loss 6.1881 LearningRate 0.1006 Epoch: 8 Global Step: 93320 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:54:37,674-Speed 1679.33 samples/sec Loss 6.1676 LearningRate 0.1005 Epoch: 9 Global Step: 93330 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:54:45,137-Speed 5489.27 samples/sec Loss 6.1681 LearningRate 0.1005 Epoch: 9 Global Step: 93340 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:54:52,590-Speed 5496.90 samples/sec Loss 6.1440 LearningRate 0.1005 Epoch: 9 Global Step: 93350 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:55:00,024-Speed 5510.23 samples/sec Loss 6.1849 LearningRate 0.1005 Epoch: 9 Global Step: 93360 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:55:07,489-Speed 5488.22 samples/sec Loss 6.1957 LearningRate 0.1005 Epoch: 9 Global Step: 93370 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:55:14,916-Speed 5515.84 samples/sec Loss 6.1764 LearningRate 0.1005 Epoch: 9 Global Step: 93380 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:55:22,344-Speed 5515.22 samples/sec Loss 6.1794 LearningRate 0.1004 Epoch: 9 Global Step: 93390 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:55:29,775-Speed 5512.37 samples/sec Loss 6.1768 LearningRate 0.1004 Epoch: 9 Global Step: 93400 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:55:37,182-Speed 5530.73 samples/sec Loss 6.1743 LearningRate 0.1004 Epoch: 9 Global Step: 93410 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:55:44,622-Speed 5506.70 samples/sec Loss 6.1646 LearningRate 0.1004 Epoch: 9 Global Step: 93420 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:55:52,189-Speed 5413.60 samples/sec Loss 6.1472 LearningRate 0.1004 Epoch: 9 Global Step: 93430 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:55:59,690-Speed 5460.81 samples/sec Loss 6.1888 LearningRate 0.1003 Epoch: 9 Global Step: 93440 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:56:07,147-Speed 5493.58 samples/sec Loss 6.1823 LearningRate 0.1003 Epoch: 9 Global Step: 93450 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:56:14,584-Speed 5508.61 samples/sec Loss 6.1793 LearningRate 0.1003 Epoch: 9 Global Step: 93460 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:56:22,143-Speed 5419.69 samples/sec Loss 6.2060 LearningRate 0.1003 Epoch: 9 Global Step: 93470 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:56:29,595-Speed 5497.06 samples/sec Loss 6.2294 LearningRate 0.1003 Epoch: 9 Global Step: 93480 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:56:37,034-Speed 5506.94 samples/sec Loss 6.1404 LearningRate 0.1003 Epoch: 9 Global Step: 93490 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:56:44,616-Speed 5402.89 samples/sec Loss 6.1324 LearningRate 0.1002 Epoch: 9 Global Step: 93500 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:56:52,245-Speed 5370.39 samples/sec Loss 6.1264 LearningRate 0.1002 Epoch: 9 Global Step: 93510 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:56:59,865-Speed 5375.69 samples/sec Loss 6.1029 LearningRate 0.1002 Epoch: 9 Global Step: 93520 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:57:07,424-Speed 5419.14 samples/sec Loss 6.1518 LearningRate 0.1002 Epoch: 9 Global Step: 93530 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:57:15,010-Speed 5400.28 samples/sec Loss 6.0935 LearningRate 0.1002 Epoch: 9 Global Step: 93540 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:57:22,583-Speed 5409.39 samples/sec Loss 6.1408 LearningRate 0.1002 Epoch: 9 Global Step: 93550 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:57:30,179-Speed 5393.07 samples/sec Loss 6.1636 LearningRate 0.1001 Epoch: 9 Global Step: 93560 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:57:37,652-Speed 5481.96 samples/sec Loss 6.1781 LearningRate 0.1001 Epoch: 9 Global Step: 93570 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:57:45,198-Speed 5428.08 samples/sec Loss 6.1298 LearningRate 0.1001 Epoch: 9 Global Step: 93580 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:57:52,855-Speed 5350.96 samples/sec Loss 6.1463 LearningRate 0.1001 Epoch: 9 Global Step: 93590 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:58:00,434-Speed 5404.88 samples/sec Loss 6.2238 LearningRate 0.1001 Epoch: 9 Global Step: 93600 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:58:08,109-Speed 5337.60 samples/sec Loss 6.0986 LearningRate 0.1000 Epoch: 9 Global Step: 93610 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:58:15,871-Speed 5277.37 samples/sec Loss 6.2121 LearningRate 0.1000 Epoch: 9 Global Step: 93620 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 15:58:23,532-Speed 5347.34 samples/sec Loss 6.2374 LearningRate 0.1000 Epoch: 9 Global Step: 93630 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:58:31,214-Speed 5332.69 samples/sec Loss 6.1838 LearningRate 0.1000 Epoch: 9 Global Step: 93640 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:58:38,848-Speed 5365.83 samples/sec Loss 6.1381 LearningRate 0.1000 Epoch: 9 Global Step: 93650 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:58:46,508-Speed 5348.09 samples/sec Loss 6.1735 LearningRate 0.1000 Epoch: 9 Global Step: 93660 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:58:54,140-Speed 5367.89 samples/sec Loss 6.1107 LearningRate 0.0999 Epoch: 9 Global Step: 93670 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:59:01,850-Speed 5312.90 samples/sec Loss 6.1504 LearningRate 0.0999 Epoch: 9 Global Step: 93680 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:59:09,372-Speed 5446.04 samples/sec Loss 6.1070 LearningRate 0.0999 Epoch: 9 Global Step: 93690 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:59:16,914-Speed 5431.51 samples/sec Loss 6.1154 LearningRate 0.0999 Epoch: 9 Global Step: 93700 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:59:24,384-Speed 5483.93 samples/sec Loss 6.1136 LearningRate 0.0999 Epoch: 9 Global Step: 93710 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:59:31,922-Speed 5435.97 samples/sec Loss 6.1861 LearningRate 0.0999 Epoch: 9 Global Step: 93720 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:59:39,372-Speed 5498.18 samples/sec Loss 6.1381 LearningRate 0.0998 Epoch: 9 Global Step: 93730 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 15:59:46,873-Speed 5460.99 samples/sec Loss 6.1340 LearningRate 0.0998 Epoch: 9 Global Step: 93740 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 15:59:54,444-Speed 5411.18 samples/sec Loss 6.1459 LearningRate 0.0998 Epoch: 9 Global Step: 93750 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:00:02,080-Speed 5364.59 samples/sec Loss 6.1918 LearningRate 0.0998 Epoch: 9 Global Step: 93760 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:00:09,564-Speed 5474.01 samples/sec Loss 6.0939 LearningRate 0.0998 Epoch: 9 Global Step: 93770 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:00:17,176-Speed 5381.21 samples/sec Loss 6.1286 LearningRate 0.0997 Epoch: 9 Global Step: 93780 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:00:24,703-Speed 5442.37 samples/sec Loss 6.0957 LearningRate 0.0997 Epoch: 9 Global Step: 93790 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:00:32,231-Speed 5442.42 samples/sec Loss 6.1700 LearningRate 0.0997 Epoch: 9 Global Step: 93800 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:00:39,737-Speed 5457.63 samples/sec Loss 6.1732 LearningRate 0.0997 Epoch: 9 Global Step: 93810 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:00:47,223-Speed 5472.00 samples/sec Loss 6.1559 LearningRate 0.0997 Epoch: 9 Global Step: 93820 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:00:54,677-Speed 5496.01 samples/sec Loss 6.1580 LearningRate 0.0997 Epoch: 9 Global Step: 93830 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:01:02,230-Speed 5423.56 samples/sec Loss 6.1436 LearningRate 0.0996 Epoch: 9 Global Step: 93840 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:01:09,745-Speed 5451.28 samples/sec Loss 6.1530 LearningRate 0.0996 Epoch: 9 Global Step: 93850 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:01:17,209-Speed 5488.08 samples/sec Loss 6.1471 LearningRate 0.0996 Epoch: 9 Global Step: 93860 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:01:24,777-Speed 5413.21 samples/sec Loss 6.1958 LearningRate 0.0996 Epoch: 9 Global Step: 93870 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:01:32,239-Speed 5490.73 samples/sec Loss 6.1550 LearningRate 0.0996 Epoch: 9 Global Step: 93880 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:01:39,705-Speed 5486.72 samples/sec Loss 6.2068 LearningRate 0.0996 Epoch: 9 Global Step: 93890 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:01:47,286-Speed 5403.48 samples/sec Loss 6.1176 LearningRate 0.0995 Epoch: 9 Global Step: 93900 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:01:54,851-Speed 5415.06 samples/sec Loss 6.1448 LearningRate 0.0995 Epoch: 9 Global Step: 93910 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:02:02,411-Speed 5418.99 samples/sec Loss 6.1311 LearningRate 0.0995 Epoch: 9 Global Step: 93920 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:02:09,929-Speed 5449.12 samples/sec Loss 6.1709 LearningRate 0.0995 Epoch: 9 Global Step: 93930 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:02:17,433-Speed 5458.63 samples/sec Loss 6.1654 LearningRate 0.0995 Epoch: 9 Global Step: 93940 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:02:24,970-Speed 5435.11 samples/sec Loss 6.2109 LearningRate 0.0994 Epoch: 9 Global Step: 93950 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:02:32,726-Speed 5281.73 samples/sec Loss 6.1414 LearningRate 0.0994 Epoch: 9 Global Step: 93960 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:02:40,384-Speed 5349.94 samples/sec Loss 6.1296 LearningRate 0.0994 Epoch: 9 Global Step: 93970 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:02:47,866-Speed 5474.83 samples/sec Loss 6.1763 LearningRate 0.0994 Epoch: 9 Global Step: 93980 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:02:55,448-Speed 5402.81 samples/sec Loss 6.1362 LearningRate 0.0994 Epoch: 9 Global Step: 93990 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:03:03,122-Speed 5338.08 samples/sec Loss 6.0984 LearningRate 0.0994 Epoch: 9 Global Step: 94000 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:03:47,050-[lfw][94000]XNorm: 23.219754 Training: 2022-01-08 16:03:47,051-[lfw][94000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-01-08 16:03:47,051-[lfw][94000]Accuracy-Highest: 0.99817 Training: 2022-01-08 16:04:38,930-[cfp_fp][94000]XNorm: 21.137395 Training: 2022-01-08 16:04:38,931-[cfp_fp][94000]Accuracy-Flip: 0.98800+-0.00600 Training: 2022-01-08 16:04:38,932-[cfp_fp][94000]Accuracy-Highest: 0.98814 Training: 2022-01-08 16:05:24,882-[agedb_30][94000]XNorm: 22.578027 Training: 2022-01-08 16:05:24,884-[agedb_30][94000]Accuracy-Flip: 0.97533+-0.00730 Training: 2022-01-08 16:05:24,884-[agedb_30][94000]Accuracy-Highest: 0.97833 Training: 2022-01-08 16:05:32,469-Speed 274.27 samples/sec Loss 6.1263 LearningRate 0.0993 Epoch: 9 Global Step: 94010 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:05:39,943-Speed 5481.69 samples/sec Loss 6.1672 LearningRate 0.0993 Epoch: 9 Global Step: 94020 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:05:47,413-Speed 5484.52 samples/sec Loss 6.1518 LearningRate 0.0993 Epoch: 9 Global Step: 94030 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:05:54,877-Speed 5489.57 samples/sec Loss 6.2016 LearningRate 0.0993 Epoch: 9 Global Step: 94040 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:06:02,342-Speed 5488.16 samples/sec Loss 6.1367 LearningRate 0.0993 Epoch: 9 Global Step: 94050 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:06:09,927-Speed 5402.06 samples/sec Loss 6.1719 LearningRate 0.0993 Epoch: 9 Global Step: 94060 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:06:17,429-Speed 5461.22 samples/sec Loss 6.1579 LearningRate 0.0992 Epoch: 9 Global Step: 94070 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:06:24,928-Speed 5462.71 samples/sec Loss 6.1808 LearningRate 0.0992 Epoch: 9 Global Step: 94080 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:06:32,381-Speed 5496.98 samples/sec Loss 6.1626 LearningRate 0.0992 Epoch: 9 Global Step: 94090 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:06:39,865-Speed 5474.23 samples/sec Loss 6.1935 LearningRate 0.0992 Epoch: 9 Global Step: 94100 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:06:47,293-Speed 5514.66 samples/sec Loss 6.1007 LearningRate 0.0992 Epoch: 9 Global Step: 94110 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:06:55,027-Speed 5297.49 samples/sec Loss 6.1218 LearningRate 0.0992 Epoch: 9 Global Step: 94120 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:07:02,634-Speed 5386.07 samples/sec Loss 6.1792 LearningRate 0.0991 Epoch: 9 Global Step: 94130 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:07:10,064-Speed 5512.60 samples/sec Loss 6.1486 LearningRate 0.0991 Epoch: 9 Global Step: 94140 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:07:17,562-Speed 5464.33 samples/sec Loss 6.1231 LearningRate 0.0991 Epoch: 9 Global Step: 94150 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:07:25,093-Speed 5439.42 samples/sec Loss 6.1228 LearningRate 0.0991 Epoch: 9 Global Step: 94160 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:07:32,496-Speed 5534.21 samples/sec Loss 6.1331 LearningRate 0.0991 Epoch: 9 Global Step: 94170 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:07:39,907-Speed 5527.19 samples/sec Loss 6.1488 LearningRate 0.0990 Epoch: 9 Global Step: 94180 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:07:47,469-Speed 5417.56 samples/sec Loss 6.2034 LearningRate 0.0990 Epoch: 9 Global Step: 94190 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:07:54,984-Speed 5451.51 samples/sec Loss 6.2091 LearningRate 0.0990 Epoch: 9 Global Step: 94200 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:08:02,530-Speed 5428.56 samples/sec Loss 6.1512 LearningRate 0.0990 Epoch: 9 Global Step: 94210 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:08:10,078-Speed 5427.40 samples/sec Loss 6.1610 LearningRate 0.0990 Epoch: 9 Global Step: 94220 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:08:17,528-Speed 5498.48 samples/sec Loss 6.1595 LearningRate 0.0990 Epoch: 9 Global Step: 94230 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:08:25,002-Speed 5481.15 samples/sec Loss 6.1411 LearningRate 0.0989 Epoch: 9 Global Step: 94240 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:08:32,468-Speed 5486.74 samples/sec Loss 6.1512 LearningRate 0.0989 Epoch: 9 Global Step: 94250 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:08:39,868-Speed 5536.56 samples/sec Loss 6.1744 LearningRate 0.0989 Epoch: 9 Global Step: 94260 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:08:47,422-Speed 5422.85 samples/sec Loss 6.1110 LearningRate 0.0989 Epoch: 9 Global Step: 94270 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:08:54,842-Speed 5520.58 samples/sec Loss 6.1243 LearningRate 0.0989 Epoch: 9 Global Step: 94280 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:09:02,299-Speed 5494.06 samples/sec Loss 6.1089 LearningRate 0.0989 Epoch: 9 Global Step: 94290 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:09:09,758-Speed 5491.74 samples/sec Loss 6.1205 LearningRate 0.0988 Epoch: 9 Global Step: 94300 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:09:17,211-Speed 5497.41 samples/sec Loss 6.1658 LearningRate 0.0988 Epoch: 9 Global Step: 94310 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:09:24,901-Speed 5326.56 samples/sec Loss 6.1135 LearningRate 0.0988 Epoch: 9 Global Step: 94320 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:09:32,432-Speed 5439.81 samples/sec Loss 6.1176 LearningRate 0.0988 Epoch: 9 Global Step: 94330 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:09:39,834-Speed 5534.52 samples/sec Loss 6.1056 LearningRate 0.0988 Epoch: 9 Global Step: 94340 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:09:47,246-Speed 5526.78 samples/sec Loss 6.1062 LearningRate 0.0987 Epoch: 9 Global Step: 94350 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:09:54,959-Speed 5311.58 samples/sec Loss 6.1418 LearningRate 0.0987 Epoch: 9 Global Step: 94360 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:10:02,548-Speed 5397.72 samples/sec Loss 6.1028 LearningRate 0.0987 Epoch: 9 Global Step: 94370 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:10:10,090-Speed 5431.99 samples/sec Loss 6.1149 LearningRate 0.0987 Epoch: 9 Global Step: 94380 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:10:17,604-Speed 5451.78 samples/sec Loss 6.0728 LearningRate 0.0987 Epoch: 9 Global Step: 94390 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:10:25,216-Speed 5381.95 samples/sec Loss 6.0875 LearningRate 0.0987 Epoch: 9 Global Step: 94400 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:10:32,639-Speed 5518.49 samples/sec Loss 6.0878 LearningRate 0.0986 Epoch: 9 Global Step: 94410 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:10:40,131-Speed 5467.90 samples/sec Loss 6.1932 LearningRate 0.0986 Epoch: 9 Global Step: 94420 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:10:47,588-Speed 5493.57 samples/sec Loss 6.0749 LearningRate 0.0986 Epoch: 9 Global Step: 94430 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:10:55,088-Speed 5462.16 samples/sec Loss 6.1191 LearningRate 0.0986 Epoch: 9 Global Step: 94440 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:11:02,574-Speed 5473.05 samples/sec Loss 6.1010 LearningRate 0.0986 Epoch: 9 Global Step: 94450 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:11:10,131-Speed 5420.69 samples/sec Loss 6.0994 LearningRate 0.0986 Epoch: 9 Global Step: 94460 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:11:17,683-Speed 5424.65 samples/sec Loss 6.1199 LearningRate 0.0985 Epoch: 9 Global Step: 94470 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:11:25,158-Speed 5480.45 samples/sec Loss 6.0858 LearningRate 0.0985 Epoch: 9 Global Step: 94480 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:11:32,596-Speed 5507.74 samples/sec Loss 6.0832 LearningRate 0.0985 Epoch: 9 Global Step: 94490 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:11:40,078-Speed 5474.98 samples/sec Loss 6.1610 LearningRate 0.0985 Epoch: 9 Global Step: 94500 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:11:47,607-Speed 5441.30 samples/sec Loss 6.1821 LearningRate 0.0985 Epoch: 9 Global Step: 94510 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:11:55,085-Speed 5477.78 samples/sec Loss 6.1298 LearningRate 0.0985 Epoch: 9 Global Step: 94520 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:12:02,536-Speed 5498.26 samples/sec Loss 6.0916 LearningRate 0.0984 Epoch: 9 Global Step: 94530 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:12:09,987-Speed 5497.64 samples/sec Loss 6.1150 LearningRate 0.0984 Epoch: 9 Global Step: 94540 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:12:17,486-Speed 5462.89 samples/sec Loss 6.1563 LearningRate 0.0984 Epoch: 9 Global Step: 94550 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:12:24,983-Speed 5464.50 samples/sec Loss 6.1623 LearningRate 0.0984 Epoch: 9 Global Step: 94560 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:12:32,453-Speed 5483.87 samples/sec Loss 6.1614 LearningRate 0.0984 Epoch: 9 Global Step: 94570 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:12:39,907-Speed 5495.88 samples/sec Loss 6.1470 LearningRate 0.0983 Epoch: 9 Global Step: 94580 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:12:47,389-Speed 5475.48 samples/sec Loss 6.1175 LearningRate 0.0983 Epoch: 9 Global Step: 94590 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:12:54,791-Speed 5534.17 samples/sec Loss 6.1358 LearningRate 0.0983 Epoch: 9 Global Step: 94600 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:13:02,333-Speed 5431.88 samples/sec Loss 6.1171 LearningRate 0.0983 Epoch: 9 Global Step: 94610 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:13:09,877-Speed 5430.41 samples/sec Loss 6.1154 LearningRate 0.0983 Epoch: 9 Global Step: 94620 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:13:17,404-Speed 5442.34 samples/sec Loss 6.0347 LearningRate 0.0983 Epoch: 9 Global Step: 94630 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:13:24,812-Speed 5530.02 samples/sec Loss 6.0873 LearningRate 0.0982 Epoch: 9 Global Step: 94640 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:13:32,298-Speed 5472.33 samples/sec Loss 6.0570 LearningRate 0.0982 Epoch: 9 Global Step: 94650 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:13:39,789-Speed 5468.68 samples/sec Loss 6.1250 LearningRate 0.0982 Epoch: 9 Global Step: 94660 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:13:47,200-Speed 5527.47 samples/sec Loss 6.0990 LearningRate 0.0982 Epoch: 9 Global Step: 94670 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:13:54,654-Speed 5495.98 samples/sec Loss 6.1206 LearningRate 0.0982 Epoch: 9 Global Step: 94680 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:14:02,114-Speed 5491.39 samples/sec Loss 6.1095 LearningRate 0.0982 Epoch: 9 Global Step: 94690 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:14:09,604-Speed 5469.60 samples/sec Loss 6.0939 LearningRate 0.0981 Epoch: 9 Global Step: 94700 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:14:17,050-Speed 5501.83 samples/sec Loss 6.0287 LearningRate 0.0981 Epoch: 9 Global Step: 94710 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:14:24,516-Speed 5486.79 samples/sec Loss 6.1136 LearningRate 0.0981 Epoch: 9 Global Step: 94720 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:14:31,947-Speed 5512.58 samples/sec Loss 6.0627 LearningRate 0.0981 Epoch: 9 Global Step: 94730 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:14:39,591-Speed 5359.54 samples/sec Loss 6.0941 LearningRate 0.0981 Epoch: 9 Global Step: 94740 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:14:47,076-Speed 5472.69 samples/sec Loss 6.0464 LearningRate 0.0981 Epoch: 9 Global Step: 94750 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:14:54,474-Speed 5537.59 samples/sec Loss 6.0621 LearningRate 0.0980 Epoch: 9 Global Step: 94760 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:15:01,927-Speed 5496.70 samples/sec Loss 6.1435 LearningRate 0.0980 Epoch: 9 Global Step: 94770 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:15:09,434-Speed 5456.40 samples/sec Loss 6.1008 LearningRate 0.0980 Epoch: 9 Global Step: 94780 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:15:16,878-Speed 5503.13 samples/sec Loss 6.1326 LearningRate 0.0980 Epoch: 9 Global Step: 94790 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:15:24,428-Speed 5426.31 samples/sec Loss 6.1192 LearningRate 0.0980 Epoch: 9 Global Step: 94800 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:15:31,866-Speed 5507.74 samples/sec Loss 6.0359 LearningRate 0.0979 Epoch: 9 Global Step: 94810 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:15:39,391-Speed 5443.58 samples/sec Loss 6.1056 LearningRate 0.0979 Epoch: 9 Global Step: 94820 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:15:46,972-Speed 5404.08 samples/sec Loss 6.1144 LearningRate 0.0979 Epoch: 9 Global Step: 94830 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:15:54,524-Speed 5424.31 samples/sec Loss 6.1161 LearningRate 0.0979 Epoch: 9 Global Step: 94840 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:16:02,056-Speed 5439.20 samples/sec Loss 6.1358 LearningRate 0.0979 Epoch: 9 Global Step: 94850 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:16:09,592-Speed 5435.53 samples/sec Loss 6.0472 LearningRate 0.0979 Epoch: 9 Global Step: 94860 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:16:17,161-Speed 5412.56 samples/sec Loss 6.1548 LearningRate 0.0978 Epoch: 9 Global Step: 94870 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:16:24,667-Speed 5457.35 samples/sec Loss 6.0017 LearningRate 0.0978 Epoch: 9 Global Step: 94880 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:16:32,109-Speed 5505.56 samples/sec Loss 6.0621 LearningRate 0.0978 Epoch: 9 Global Step: 94890 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:16:39,608-Speed 5462.33 samples/sec Loss 6.1026 LearningRate 0.0978 Epoch: 9 Global Step: 94900 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:16:47,054-Speed 5502.00 samples/sec Loss 6.0965 LearningRate 0.0978 Epoch: 9 Global Step: 94910 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:16:54,609-Speed 5422.35 samples/sec Loss 6.0593 LearningRate 0.0978 Epoch: 9 Global Step: 94920 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:17:02,139-Speed 5440.40 samples/sec Loss 6.0751 LearningRate 0.0977 Epoch: 9 Global Step: 94930 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:17:09,603-Speed 5488.63 samples/sec Loss 6.1680 LearningRate 0.0977 Epoch: 9 Global Step: 94940 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:17:17,113-Speed 5454.33 samples/sec Loss 6.0785 LearningRate 0.0977 Epoch: 9 Global Step: 94950 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:17:24,641-Speed 5441.63 samples/sec Loss 6.1529 LearningRate 0.0977 Epoch: 9 Global Step: 94960 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:17:32,129-Speed 5471.80 samples/sec Loss 6.0516 LearningRate 0.0977 Epoch: 9 Global Step: 94970 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:17:39,590-Speed 5490.27 samples/sec Loss 6.0779 LearningRate 0.0977 Epoch: 9 Global Step: 94980 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:17:47,048-Speed 5492.68 samples/sec Loss 6.0425 LearningRate 0.0976 Epoch: 9 Global Step: 94990 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:17:54,736-Speed 5328.19 samples/sec Loss 6.1059 LearningRate 0.0976 Epoch: 9 Global Step: 95000 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:18:02,196-Speed 5492.02 samples/sec Loss 6.1145 LearningRate 0.0976 Epoch: 9 Global Step: 95010 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:18:09,656-Speed 5491.50 samples/sec Loss 6.1028 LearningRate 0.0976 Epoch: 9 Global Step: 95020 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:18:17,079-Speed 5518.59 samples/sec Loss 6.0893 LearningRate 0.0976 Epoch: 9 Global Step: 95030 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:18:24,551-Speed 5481.75 samples/sec Loss 6.1082 LearningRate 0.0975 Epoch: 9 Global Step: 95040 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:18:31,996-Speed 5503.36 samples/sec Loss 6.0498 LearningRate 0.0975 Epoch: 9 Global Step: 95050 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:18:39,458-Speed 5489.65 samples/sec Loss 6.0481 LearningRate 0.0975 Epoch: 9 Global Step: 95060 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:18:46,910-Speed 5496.75 samples/sec Loss 6.0374 LearningRate 0.0975 Epoch: 9 Global Step: 95070 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:18:54,389-Speed 5477.94 samples/sec Loss 6.0551 LearningRate 0.0975 Epoch: 9 Global Step: 95080 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:19:01,880-Speed 5468.52 samples/sec Loss 6.0773 LearningRate 0.0975 Epoch: 9 Global Step: 95090 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:19:09,279-Speed 5536.87 samples/sec Loss 6.1293 LearningRate 0.0974 Epoch: 9 Global Step: 95100 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:19:16,730-Speed 5497.62 samples/sec Loss 6.1351 LearningRate 0.0974 Epoch: 9 Global Step: 95110 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:19:24,236-Speed 5457.52 samples/sec Loss 6.1055 LearningRate 0.0974 Epoch: 9 Global Step: 95120 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:19:31,795-Speed 5420.04 samples/sec Loss 6.1326 LearningRate 0.0974 Epoch: 9 Global Step: 95130 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:19:39,219-Speed 5517.71 samples/sec Loss 6.0949 LearningRate 0.0974 Epoch: 9 Global Step: 95140 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:19:46,764-Speed 5429.41 samples/sec Loss 6.0595 LearningRate 0.0974 Epoch: 9 Global Step: 95150 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:19:54,235-Speed 5483.25 samples/sec Loss 6.0805 LearningRate 0.0973 Epoch: 9 Global Step: 95160 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:20:01,697-Speed 5490.06 samples/sec Loss 6.0814 LearningRate 0.0973 Epoch: 9 Global Step: 95170 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:20:09,153-Speed 5495.08 samples/sec Loss 6.0407 LearningRate 0.0973 Epoch: 9 Global Step: 95180 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:20:16,668-Speed 5450.34 samples/sec Loss 6.0267 LearningRate 0.0973 Epoch: 9 Global Step: 95190 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:20:24,152-Speed 5473.84 samples/sec Loss 6.0697 LearningRate 0.0973 Epoch: 9 Global Step: 95200 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:20:31,665-Speed 5452.80 samples/sec Loss 6.0952 LearningRate 0.0973 Epoch: 9 Global Step: 95210 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:20:39,134-Speed 5485.28 samples/sec Loss 6.0928 LearningRate 0.0972 Epoch: 9 Global Step: 95220 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:20:46,674-Speed 5433.12 samples/sec Loss 6.1298 LearningRate 0.0972 Epoch: 9 Global Step: 95230 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:20:54,107-Speed 5510.70 samples/sec Loss 6.0882 LearningRate 0.0972 Epoch: 9 Global Step: 95240 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:21:01,575-Speed 5485.72 samples/sec Loss 6.0228 LearningRate 0.0972 Epoch: 9 Global Step: 95250 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:21:09,059-Speed 5474.40 samples/sec Loss 6.0659 LearningRate 0.0972 Epoch: 9 Global Step: 95260 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:21:16,735-Speed 5336.66 samples/sec Loss 6.0872 LearningRate 0.0971 Epoch: 9 Global Step: 95270 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:21:24,225-Speed 5469.30 samples/sec Loss 6.0665 LearningRate 0.0971 Epoch: 9 Global Step: 95280 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:21:31,735-Speed 5454.76 samples/sec Loss 6.0770 LearningRate 0.0971 Epoch: 9 Global Step: 95290 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:21:39,231-Speed 5465.15 samples/sec Loss 6.1228 LearningRate 0.0971 Epoch: 9 Global Step: 95300 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:21:46,734-Speed 5459.75 samples/sec Loss 6.0512 LearningRate 0.0971 Epoch: 9 Global Step: 95310 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:21:54,241-Speed 5456.99 samples/sec Loss 6.1001 LearningRate 0.0971 Epoch: 9 Global Step: 95320 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:22:01,680-Speed 5507.14 samples/sec Loss 6.0338 LearningRate 0.0970 Epoch: 9 Global Step: 95330 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:22:09,087-Speed 5530.43 samples/sec Loss 6.0627 LearningRate 0.0970 Epoch: 9 Global Step: 95340 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:22:16,547-Speed 5491.47 samples/sec Loss 6.0959 LearningRate 0.0970 Epoch: 9 Global Step: 95350 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:22:24,052-Speed 5458.37 samples/sec Loss 6.0946 LearningRate 0.0970 Epoch: 9 Global Step: 95360 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:22:31,525-Speed 5481.20 samples/sec Loss 6.0489 LearningRate 0.0970 Epoch: 9 Global Step: 95370 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:22:39,016-Speed 5469.11 samples/sec Loss 6.0902 LearningRate 0.0970 Epoch: 9 Global Step: 95380 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:22:46,689-Speed 5339.03 samples/sec Loss 6.1263 LearningRate 0.0969 Epoch: 9 Global Step: 95390 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:22:54,250-Speed 5417.34 samples/sec Loss 6.0177 LearningRate 0.0969 Epoch: 9 Global Step: 95400 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:23:01,684-Speed 5510.95 samples/sec Loss 6.0722 LearningRate 0.0969 Epoch: 9 Global Step: 95410 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:23:09,194-Speed 5455.11 samples/sec Loss 6.0676 LearningRate 0.0969 Epoch: 9 Global Step: 95420 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:23:16,669-Speed 5480.52 samples/sec Loss 6.0398 LearningRate 0.0969 Epoch: 9 Global Step: 95430 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:23:24,138-Speed 5484.43 samples/sec Loss 6.0675 LearningRate 0.0969 Epoch: 9 Global Step: 95440 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:23:31,619-Speed 5476.10 samples/sec Loss 6.0675 LearningRate 0.0968 Epoch: 9 Global Step: 95450 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:23:39,108-Speed 5470.08 samples/sec Loss 6.0292 LearningRate 0.0968 Epoch: 9 Global Step: 95460 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:23:46,586-Speed 5478.75 samples/sec Loss 6.1101 LearningRate 0.0968 Epoch: 9 Global Step: 95470 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:23:54,024-Speed 5506.80 samples/sec Loss 6.0903 LearningRate 0.0968 Epoch: 9 Global Step: 95480 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:24:01,539-Speed 5451.50 samples/sec Loss 6.1088 LearningRate 0.0968 Epoch: 9 Global Step: 95490 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:24:09,100-Speed 5417.51 samples/sec Loss 6.0161 LearningRate 0.0967 Epoch: 9 Global Step: 95500 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:24:16,602-Speed 5460.78 samples/sec Loss 6.0390 LearningRate 0.0967 Epoch: 9 Global Step: 95510 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:24:24,090-Speed 5470.98 samples/sec Loss 6.0092 LearningRate 0.0967 Epoch: 9 Global Step: 95520 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:24:31,548-Speed 5492.93 samples/sec Loss 6.0207 LearningRate 0.0967 Epoch: 9 Global Step: 95530 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:24:39,233-Speed 5330.47 samples/sec Loss 5.9704 LearningRate 0.0967 Epoch: 9 Global Step: 95540 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:24:46,707-Speed 5481.26 samples/sec Loss 6.0408 LearningRate 0.0967 Epoch: 9 Global Step: 95550 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:24:54,174-Speed 5513.12 samples/sec Loss 6.0675 LearningRate 0.0966 Epoch: 9 Global Step: 95560 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:25:01,657-Speed 5474.18 samples/sec Loss 6.0567 LearningRate 0.0966 Epoch: 9 Global Step: 95570 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:25:09,133-Speed 5479.01 samples/sec Loss 6.0780 LearningRate 0.0966 Epoch: 9 Global Step: 95580 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:25:16,610-Speed 5479.43 samples/sec Loss 6.0826 LearningRate 0.0966 Epoch: 9 Global Step: 95590 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:25:24,062-Speed 5497.61 samples/sec Loss 6.1142 LearningRate 0.0966 Epoch: 9 Global Step: 95600 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:25:31,540-Speed 5477.32 samples/sec Loss 6.0282 LearningRate 0.0966 Epoch: 9 Global Step: 95610 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:25:39,023-Speed 5474.68 samples/sec Loss 6.0741 LearningRate 0.0965 Epoch: 9 Global Step: 95620 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:25:46,569-Speed 5429.49 samples/sec Loss 6.0449 LearningRate 0.0965 Epoch: 9 Global Step: 95630 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:25:53,993-Speed 5517.93 samples/sec Loss 6.1039 LearningRate 0.0965 Epoch: 9 Global Step: 95640 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:26:01,503-Speed 5454.51 samples/sec Loss 6.0781 LearningRate 0.0965 Epoch: 9 Global Step: 95650 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:26:08,978-Speed 5480.00 samples/sec Loss 6.0008 LearningRate 0.0965 Epoch: 9 Global Step: 95660 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:26:16,437-Speed 5492.57 samples/sec Loss 6.0649 LearningRate 0.0965 Epoch: 9 Global Step: 95670 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:26:23,887-Speed 5499.41 samples/sec Loss 6.0004 LearningRate 0.0964 Epoch: 9 Global Step: 95680 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:26:31,375-Speed 5470.09 samples/sec Loss 5.9887 LearningRate 0.0964 Epoch: 9 Global Step: 95690 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:26:38,830-Speed 5495.18 samples/sec Loss 6.0618 LearningRate 0.0964 Epoch: 9 Global Step: 95700 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:26:46,367-Speed 5435.49 samples/sec Loss 6.0540 LearningRate 0.0964 Epoch: 9 Global Step: 95710 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:26:53,843-Speed 5479.67 samples/sec Loss 6.0548 LearningRate 0.0964 Epoch: 9 Global Step: 95720 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:27:01,442-Speed 5390.58 samples/sec Loss 6.0368 LearningRate 0.0964 Epoch: 9 Global Step: 95730 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:27:09,016-Speed 5408.61 samples/sec Loss 6.0355 LearningRate 0.0963 Epoch: 9 Global Step: 95740 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:27:16,499-Speed 5475.01 samples/sec Loss 6.0743 LearningRate 0.0963 Epoch: 9 Global Step: 95750 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:27:24,192-Speed 5325.45 samples/sec Loss 6.0208 LearningRate 0.0963 Epoch: 9 Global Step: 95760 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:27:31,673-Speed 5475.38 samples/sec Loss 6.0435 LearningRate 0.0963 Epoch: 9 Global Step: 95770 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:27:39,143-Speed 5483.96 samples/sec Loss 6.0216 LearningRate 0.0963 Epoch: 9 Global Step: 95780 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:27:46,599-Speed 5494.57 samples/sec Loss 6.0968 LearningRate 0.0962 Epoch: 9 Global Step: 95790 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:27:54,191-Speed 5396.25 samples/sec Loss 6.0482 LearningRate 0.0962 Epoch: 9 Global Step: 95800 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:28:01,660-Speed 5484.44 samples/sec Loss 5.9630 LearningRate 0.0962 Epoch: 9 Global Step: 95810 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:28:09,056-Speed 5538.36 samples/sec Loss 6.0505 LearningRate 0.0962 Epoch: 9 Global Step: 95820 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:28:16,569-Speed 5452.58 samples/sec Loss 6.0397 LearningRate 0.0962 Epoch: 9 Global Step: 95830 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:28:24,069-Speed 5462.38 samples/sec Loss 6.0266 LearningRate 0.0962 Epoch: 9 Global Step: 95840 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:28:31,703-Speed 5366.15 samples/sec Loss 6.0406 LearningRate 0.0961 Epoch: 9 Global Step: 95850 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:28:39,234-Speed 5439.91 samples/sec Loss 6.0301 LearningRate 0.0961 Epoch: 9 Global Step: 95860 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:28:46,783-Speed 5426.21 samples/sec Loss 6.0932 LearningRate 0.0961 Epoch: 9 Global Step: 95870 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:28:54,353-Speed 5412.16 samples/sec Loss 6.0382 LearningRate 0.0961 Epoch: 9 Global Step: 95880 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:29:01,910-Speed 5420.36 samples/sec Loss 6.0792 LearningRate 0.0961 Epoch: 9 Global Step: 95890 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:29:09,540-Speed 5369.03 samples/sec Loss 6.0455 LearningRate 0.0961 Epoch: 9 Global Step: 95900 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:29:16,975-Speed 5510.35 samples/sec Loss 6.0534 LearningRate 0.0960 Epoch: 9 Global Step: 95910 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-01-08 16:29:24,488-Speed 5452.74 samples/sec Loss 6.0856 LearningRate 0.0960 Epoch: 9 Global Step: 95920 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:29:32,026-Speed 5434.00 samples/sec Loss 6.1233 LearningRate 0.0960 Epoch: 9 Global Step: 95930 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:29:39,647-Speed 5375.39 samples/sec Loss 6.0985 LearningRate 0.0960 Epoch: 9 Global Step: 95940 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:29:47,057-Speed 5528.40 samples/sec Loss 6.0519 LearningRate 0.0960 Epoch: 9 Global Step: 95950 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:29:54,526-Speed 5485.04 samples/sec Loss 6.0279 LearningRate 0.0960 Epoch: 9 Global Step: 95960 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:30:02,034-Speed 5456.02 samples/sec Loss 5.9632 LearningRate 0.0959 Epoch: 9 Global Step: 95970 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:30:09,570-Speed 5436.03 samples/sec Loss 5.9539 LearningRate 0.0959 Epoch: 9 Global Step: 95980 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:30:17,138-Speed 5412.76 samples/sec Loss 6.0283 LearningRate 0.0959 Epoch: 9 Global Step: 95990 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:30:24,690-Speed 5424.82 samples/sec Loss 6.0502 LearningRate 0.0959 Epoch: 9 Global Step: 96000 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:31:08,759-[lfw][96000]XNorm: 22.075303 Training: 2022-01-08 16:31:08,759-[lfw][96000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-01-08 16:31:08,760-[lfw][96000]Accuracy-Highest: 0.99817 Training: 2022-01-08 16:32:00,775-[cfp_fp][96000]XNorm: 20.072877 Training: 2022-01-08 16:32:00,776-[cfp_fp][96000]Accuracy-Flip: 0.98914+-0.00535 Training: 2022-01-08 16:32:00,777-[cfp_fp][96000]Accuracy-Highest: 0.98914 Training: 2022-01-08 16:32:46,615-[agedb_30][96000]XNorm: 21.833672 Training: 2022-01-08 16:32:46,617-[agedb_30][96000]Accuracy-Flip: 0.97550+-0.00746 Training: 2022-01-08 16:32:46,617-[agedb_30][96000]Accuracy-Highest: 0.97833 Training: 2022-01-08 16:32:54,256-Speed 273.86 samples/sec Loss 6.0672 LearningRate 0.0959 Epoch: 9 Global Step: 96010 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:33:01,857-Speed 5390.22 samples/sec Loss 6.0896 LearningRate 0.0959 Epoch: 9 Global Step: 96020 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:33:09,354-Speed 5464.25 samples/sec Loss 6.0522 LearningRate 0.0958 Epoch: 9 Global Step: 96030 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:33:16,842-Speed 5471.96 samples/sec Loss 6.0537 LearningRate 0.0958 Epoch: 9 Global Step: 96040 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:33:24,289-Speed 5501.65 samples/sec Loss 6.0642 LearningRate 0.0958 Epoch: 9 Global Step: 96050 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:33:31,938-Speed 5356.37 samples/sec Loss 6.0613 LearningRate 0.0958 Epoch: 9 Global Step: 96060 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:33:39,429-Speed 5469.29 samples/sec Loss 6.0222 LearningRate 0.0958 Epoch: 9 Global Step: 96070 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:33:46,868-Speed 5506.98 samples/sec Loss 6.0432 LearningRate 0.0957 Epoch: 9 Global Step: 96080 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:33:54,355-Speed 5472.77 samples/sec Loss 5.9833 LearningRate 0.0957 Epoch: 9 Global Step: 96090 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:34:01,754-Speed 5536.47 samples/sec Loss 6.0829 LearningRate 0.0957 Epoch: 9 Global Step: 96100 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:34:09,240-Speed 5472.71 samples/sec Loss 6.0604 LearningRate 0.0957 Epoch: 9 Global Step: 96110 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:34:16,772-Speed 5438.39 samples/sec Loss 6.0034 LearningRate 0.0957 Epoch: 9 Global Step: 96120 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:34:24,281-Speed 5456.54 samples/sec Loss 5.9802 LearningRate 0.0957 Epoch: 9 Global Step: 96130 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:34:31,808-Speed 5442.09 samples/sec Loss 6.0560 LearningRate 0.0956 Epoch: 9 Global Step: 96140 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:34:39,302-Speed 5466.24 samples/sec Loss 5.9774 LearningRate 0.0956 Epoch: 9 Global Step: 96150 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:34:46,778-Speed 5479.36 samples/sec Loss 6.0546 LearningRate 0.0956 Epoch: 9 Global Step: 96160 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:34:54,170-Speed 5542.11 samples/sec Loss 6.0298 LearningRate 0.0956 Epoch: 9 Global Step: 96170 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:35:01,641-Speed 5482.99 samples/sec Loss 6.0126 LearningRate 0.0956 Epoch: 9 Global Step: 96180 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:35:09,076-Speed 5509.50 samples/sec Loss 5.9951 LearningRate 0.0956 Epoch: 9 Global Step: 96190 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:35:16,516-Speed 5506.48 samples/sec Loss 6.0178 LearningRate 0.0955 Epoch: 9 Global Step: 96200 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-01-08 16:35:24,006-Speed 5469.73 samples/sec Loss 6.0064 LearningRate 0.0955 Epoch: 9 Global Step: 96210 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-01-08 16:35:31,521-Speed 5450.57 samples/sec Loss 6.0217 LearningRate 0.0955 Epoch: 9 Global Step: 96220 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:35:39,028-Speed 5457.38 samples/sec Loss 5.9929 LearningRate 0.0955 Epoch: 9 Global Step: 96230 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:35:46,565-Speed 5435.04 samples/sec Loss 5.9779 LearningRate 0.0955 Epoch: 9 Global Step: 96240 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:35:53,960-Speed 5539.71 samples/sec Loss 6.0170 LearningRate 0.0955 Epoch: 9 Global Step: 96250 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:36:01,421-Speed 5490.66 samples/sec Loss 6.0528 LearningRate 0.0954 Epoch: 9 Global Step: 96260 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:36:08,836-Speed 5524.48 samples/sec Loss 5.9855 LearningRate 0.0954 Epoch: 9 Global Step: 96270 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:36:16,382-Speed 5428.54 samples/sec Loss 6.0257 LearningRate 0.0954 Epoch: 9 Global Step: 96280 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:36:23,937-Speed 5422.35 samples/sec Loss 6.0138 LearningRate 0.0954 Epoch: 9 Global Step: 96290 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:36:31,396-Speed 5492.44 samples/sec Loss 6.0039 LearningRate 0.0954 Epoch: 9 Global Step: 96300 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:36:38,978-Speed 5402.46 samples/sec Loss 5.9967 LearningRate 0.0954 Epoch: 9 Global Step: 96310 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:36:46,406-Speed 5514.59 samples/sec Loss 6.0279 LearningRate 0.0953 Epoch: 9 Global Step: 96320 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:36:53,949-Speed 5431.83 samples/sec Loss 6.0617 LearningRate 0.0953 Epoch: 9 Global Step: 96330 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:37:01,420-Speed 5483.31 samples/sec Loss 6.0538 LearningRate 0.0953 Epoch: 9 Global Step: 96340 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:37:08,916-Speed 5464.54 samples/sec Loss 6.0345 LearningRate 0.0953 Epoch: 9 Global Step: 96350 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:37:16,413-Speed 5463.91 samples/sec Loss 6.0225 LearningRate 0.0953 Epoch: 9 Global Step: 96360 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:37:23,846-Speed 5511.96 samples/sec Loss 6.0374 LearningRate 0.0952 Epoch: 9 Global Step: 96370 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:37:31,368-Speed 5446.06 samples/sec Loss 5.9915 LearningRate 0.0952 Epoch: 9 Global Step: 96380 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:37:38,989-Speed 5375.13 samples/sec Loss 6.0282 LearningRate 0.0952 Epoch: 9 Global Step: 96390 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:37:46,432-Speed 5503.85 samples/sec Loss 6.0050 LearningRate 0.0952 Epoch: 9 Global Step: 96400 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:37:53,960-Speed 5441.61 samples/sec Loss 6.0460 LearningRate 0.0952 Epoch: 9 Global Step: 96410 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:38:01,468-Speed 5456.30 samples/sec Loss 6.0081 LearningRate 0.0952 Epoch: 9 Global Step: 96420 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:38:08,924-Speed 5494.58 samples/sec Loss 6.0079 LearningRate 0.0951 Epoch: 9 Global Step: 96430 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:38:16,379-Speed 5495.24 samples/sec Loss 6.0207 LearningRate 0.0951 Epoch: 9 Global Step: 96440 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:38:23,807-Speed 5515.02 samples/sec Loss 6.0143 LearningRate 0.0951 Epoch: 9 Global Step: 96450 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:38:31,261-Speed 5495.38 samples/sec Loss 6.0199 LearningRate 0.0951 Epoch: 9 Global Step: 96460 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:38:38,756-Speed 5465.80 samples/sec Loss 6.0168 LearningRate 0.0951 Epoch: 9 Global Step: 96470 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:38:46,250-Speed 5466.27 samples/sec Loss 6.0368 LearningRate 0.0951 Epoch: 9 Global Step: 96480 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:38:53,715-Speed 5488.25 samples/sec Loss 6.0043 LearningRate 0.0950 Epoch: 9 Global Step: 96490 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:39:01,192-Speed 5478.65 samples/sec Loss 6.0119 LearningRate 0.0950 Epoch: 9 Global Step: 96500 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:39:08,697-Speed 5458.55 samples/sec Loss 6.0148 LearningRate 0.0950 Epoch: 9 Global Step: 96510 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:39:16,124-Speed 5515.59 samples/sec Loss 5.9972 LearningRate 0.0950 Epoch: 9 Global Step: 96520 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:39:23,609-Speed 5473.08 samples/sec Loss 5.9040 LearningRate 0.0950 Epoch: 9 Global Step: 96530 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:39:31,050-Speed 5505.74 samples/sec Loss 5.9121 LearningRate 0.0950 Epoch: 9 Global Step: 96540 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:39:38,502-Speed 5496.93 samples/sec Loss 5.9386 LearningRate 0.0949 Epoch: 9 Global Step: 96550 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:39:45,988-Speed 5472.51 samples/sec Loss 6.0167 LearningRate 0.0949 Epoch: 9 Global Step: 96560 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:39:53,472-Speed 5473.55 samples/sec Loss 6.0376 LearningRate 0.0949 Epoch: 9 Global Step: 96570 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:40:00,873-Speed 5535.38 samples/sec Loss 6.0368 LearningRate 0.0949 Epoch: 9 Global Step: 96580 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:40:08,338-Speed 5487.95 samples/sec Loss 6.0078 LearningRate 0.0949 Epoch: 9 Global Step: 96590 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:40:15,848-Speed 5454.29 samples/sec Loss 6.0195 LearningRate 0.0949 Epoch: 9 Global Step: 96600 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:40:23,276-Speed 5515.08 samples/sec Loss 6.0050 LearningRate 0.0948 Epoch: 9 Global Step: 96610 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:40:30,813-Speed 5435.49 samples/sec Loss 5.9825 LearningRate 0.0948 Epoch: 9 Global Step: 96620 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:40:38,207-Speed 5540.06 samples/sec Loss 6.0333 LearningRate 0.0948 Epoch: 9 Global Step: 96630 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:40:45,700-Speed 5467.51 samples/sec Loss 6.0130 LearningRate 0.0948 Epoch: 9 Global Step: 96640 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:40:53,148-Speed 5500.52 samples/sec Loss 6.0041 LearningRate 0.0948 Epoch: 9 Global Step: 96650 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:41:00,663-Speed 5450.74 samples/sec Loss 5.9723 LearningRate 0.0948 Epoch: 9 Global Step: 96660 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:41:08,065-Speed 5534.46 samples/sec Loss 5.9739 LearningRate 0.0947 Epoch: 9 Global Step: 96670 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:41:15,488-Speed 5519.11 samples/sec Loss 5.9344 LearningRate 0.0947 Epoch: 9 Global Step: 96680 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:41:23,155-Speed 5343.04 samples/sec Loss 5.9945 LearningRate 0.0947 Epoch: 9 Global Step: 96690 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:41:30,721-Speed 5414.33 samples/sec Loss 5.9502 LearningRate 0.0947 Epoch: 9 Global Step: 96700 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:41:38,223-Speed 5460.31 samples/sec Loss 5.9925 LearningRate 0.0947 Epoch: 9 Global Step: 96710 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:41:45,682-Speed 5492.02 samples/sec Loss 5.9538 LearningRate 0.0947 Epoch: 9 Global Step: 96720 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:41:53,101-Speed 5521.70 samples/sec Loss 5.9576 LearningRate 0.0946 Epoch: 9 Global Step: 96730 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:42:00,510-Speed 5529.95 samples/sec Loss 6.0170 LearningRate 0.0946 Epoch: 9 Global Step: 96740 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:42:08,005-Speed 5465.50 samples/sec Loss 6.0091 LearningRate 0.0946 Epoch: 9 Global Step: 96750 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:42:15,531-Speed 5443.25 samples/sec Loss 5.9562 LearningRate 0.0946 Epoch: 9 Global Step: 96760 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:42:22,960-Speed 5514.38 samples/sec Loss 5.9474 LearningRate 0.0946 Epoch: 9 Global Step: 96770 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:42:30,406-Speed 5501.94 samples/sec Loss 6.0652 LearningRate 0.0945 Epoch: 9 Global Step: 96780 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:42:37,794-Speed 5544.76 samples/sec Loss 5.9995 LearningRate 0.0945 Epoch: 9 Global Step: 96790 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:42:45,210-Speed 5523.81 samples/sec Loss 6.0104 LearningRate 0.0945 Epoch: 9 Global Step: 96800 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:42:52,690-Speed 5476.45 samples/sec Loss 5.9527 LearningRate 0.0945 Epoch: 9 Global Step: 96810 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:43:00,136-Speed 5502.57 samples/sec Loss 5.9726 LearningRate 0.0945 Epoch: 9 Global Step: 96820 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:43:07,592-Speed 5493.97 samples/sec Loss 5.9234 LearningRate 0.0945 Epoch: 9 Global Step: 96830 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:43:15,058-Speed 5486.61 samples/sec Loss 5.9566 LearningRate 0.0944 Epoch: 9 Global Step: 96840 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:43:22,441-Speed 5548.99 samples/sec Loss 5.9529 LearningRate 0.0944 Epoch: 9 Global Step: 96850 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:43:29,842-Speed 5535.36 samples/sec Loss 5.9627 LearningRate 0.0944 Epoch: 9 Global Step: 96860 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:43:37,382-Speed 5433.01 samples/sec Loss 6.0062 LearningRate 0.0944 Epoch: 9 Global Step: 96870 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:43:44,933-Speed 5425.13 samples/sec Loss 5.9716 LearningRate 0.0944 Epoch: 9 Global Step: 96880 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:43:52,357-Speed 5518.00 samples/sec Loss 6.0005 LearningRate 0.0944 Epoch: 9 Global Step: 96890 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:43:59,867-Speed 5454.39 samples/sec Loss 5.9791 LearningRate 0.0943 Epoch: 9 Global Step: 96900 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:44:07,341-Speed 5481.49 samples/sec Loss 6.0062 LearningRate 0.0943 Epoch: 9 Global Step: 96910 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:44:14,832-Speed 5468.90 samples/sec Loss 5.9865 LearningRate 0.0943 Epoch: 9 Global Step: 96920 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:44:22,288-Speed 5494.00 samples/sec Loss 5.9836 LearningRate 0.0943 Epoch: 9 Global Step: 96930 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:44:29,764-Speed 5479.18 samples/sec Loss 5.9458 LearningRate 0.0943 Epoch: 9 Global Step: 96940 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:44:37,196-Speed 5512.56 samples/sec Loss 5.9860 LearningRate 0.0943 Epoch: 9 Global Step: 96950 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:44:44,658-Speed 5490.39 samples/sec Loss 5.9953 LearningRate 0.0942 Epoch: 9 Global Step: 96960 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:44:52,092-Speed 5509.59 samples/sec Loss 5.9328 LearningRate 0.0942 Epoch: 9 Global Step: 96970 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:44:59,508-Speed 5524.57 samples/sec Loss 5.9508 LearningRate 0.0942 Epoch: 9 Global Step: 96980 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:45:06,929-Speed 5521.38 samples/sec Loss 5.9648 LearningRate 0.0942 Epoch: 9 Global Step: 96990 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:45:22,035-Speed 2711.76 samples/sec Loss 5.9574 LearningRate 0.0942 Epoch: 9 Global Step: 97000 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:45:29,630-Speed 5393.82 samples/sec Loss 5.9221 LearningRate 0.0942 Epoch: 9 Global Step: 97010 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:45:37,099-Speed 5484.73 samples/sec Loss 5.9927 LearningRate 0.0941 Epoch: 9 Global Step: 97020 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:45:44,597-Speed 5463.58 samples/sec Loss 5.9656 LearningRate 0.0941 Epoch: 9 Global Step: 97030 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:45:52,057-Speed 5491.28 samples/sec Loss 5.9655 LearningRate 0.0941 Epoch: 9 Global Step: 97040 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 16:45:59,528-Speed 5483.23 samples/sec Loss 5.9968 LearningRate 0.0941 Epoch: 9 Global Step: 97050 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:46:07,013-Speed 5473.34 samples/sec Loss 6.0331 LearningRate 0.0941 Epoch: 9 Global Step: 97060 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:46:14,521-Speed 5456.39 samples/sec Loss 6.0200 LearningRate 0.0941 Epoch: 9 Global Step: 97070 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:46:22,013-Speed 5467.71 samples/sec Loss 5.9362 LearningRate 0.0940 Epoch: 9 Global Step: 97080 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:46:29,546-Speed 5437.85 samples/sec Loss 6.0079 LearningRate 0.0940 Epoch: 9 Global Step: 97090 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:46:37,054-Speed 5456.93 samples/sec Loss 5.9472 LearningRate 0.0940 Epoch: 9 Global Step: 97100 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:46:44,491-Speed 5508.00 samples/sec Loss 6.0017 LearningRate 0.0940 Epoch: 9 Global Step: 97110 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:46:51,981-Speed 5469.46 samples/sec Loss 6.0004 LearningRate 0.0940 Epoch: 9 Global Step: 97120 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:46:59,444-Speed 5488.52 samples/sec Loss 5.9807 LearningRate 0.0940 Epoch: 9 Global Step: 97130 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:47:06,881-Speed 5508.92 samples/sec Loss 6.0065 LearningRate 0.0939 Epoch: 9 Global Step: 97140 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:47:14,388-Speed 5457.51 samples/sec Loss 5.9418 LearningRate 0.0939 Epoch: 9 Global Step: 97150 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:47:21,910-Speed 5445.26 samples/sec Loss 5.9018 LearningRate 0.0939 Epoch: 9 Global Step: 97160 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:47:29,375-Speed 5487.82 samples/sec Loss 5.9127 LearningRate 0.0939 Epoch: 9 Global Step: 97170 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:47:36,940-Speed 5415.26 samples/sec Loss 5.9636 LearningRate 0.0939 Epoch: 9 Global Step: 97180 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:47:44,426-Speed 5472.56 samples/sec Loss 5.9683 LearningRate 0.0938 Epoch: 9 Global Step: 97190 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:47:51,928-Speed 5460.47 samples/sec Loss 5.9806 LearningRate 0.0938 Epoch: 9 Global Step: 97200 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:47:59,490-Speed 5417.37 samples/sec Loss 5.9752 LearningRate 0.0938 Epoch: 9 Global Step: 97210 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:48:06,973-Speed 5474.38 samples/sec Loss 5.9437 LearningRate 0.0938 Epoch: 9 Global Step: 97220 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:48:14,569-Speed 5393.50 samples/sec Loss 5.9016 LearningRate 0.0938 Epoch: 9 Global Step: 97230 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:48:22,021-Speed 5497.05 samples/sec Loss 5.9487 LearningRate 0.0938 Epoch: 9 Global Step: 97240 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:48:29,522-Speed 5461.48 samples/sec Loss 5.9398 LearningRate 0.0937 Epoch: 9 Global Step: 97250 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:48:37,064-Speed 5431.57 samples/sec Loss 5.9604 LearningRate 0.0937 Epoch: 9 Global Step: 97260 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:48:44,527-Speed 5489.55 samples/sec Loss 5.9599 LearningRate 0.0937 Epoch: 9 Global Step: 97270 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:48:51,984-Speed 5493.59 samples/sec Loss 5.9812 LearningRate 0.0937 Epoch: 9 Global Step: 97280 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:48:59,462-Speed 5478.08 samples/sec Loss 5.9590 LearningRate 0.0937 Epoch: 9 Global Step: 97290 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:49:06,891-Speed 5514.57 samples/sec Loss 5.9467 LearningRate 0.0937 Epoch: 9 Global Step: 97300 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:49:14,352-Speed 5490.70 samples/sec Loss 6.0050 LearningRate 0.0936 Epoch: 9 Global Step: 97310 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:49:21,834-Speed 5474.96 samples/sec Loss 5.9773 LearningRate 0.0936 Epoch: 9 Global Step: 97320 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:49:29,261-Speed 5515.75 samples/sec Loss 5.9482 LearningRate 0.0936 Epoch: 9 Global Step: 97330 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:49:36,753-Speed 5467.90 samples/sec Loss 5.9751 LearningRate 0.0936 Epoch: 9 Global Step: 97340 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:49:44,191-Speed 5507.99 samples/sec Loss 5.9247 LearningRate 0.0936 Epoch: 9 Global Step: 97350 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:49:51,676-Speed 5472.58 samples/sec Loss 5.9881 LearningRate 0.0936 Epoch: 9 Global Step: 97360 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:49:59,174-Speed 5463.77 samples/sec Loss 6.0201 LearningRate 0.0935 Epoch: 9 Global Step: 97370 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:50:06,630-Speed 5494.03 samples/sec Loss 6.0094 LearningRate 0.0935 Epoch: 9 Global Step: 97380 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:50:14,122-Speed 5468.04 samples/sec Loss 5.9599 LearningRate 0.0935 Epoch: 9 Global Step: 97390 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:50:21,569-Speed 5501.23 samples/sec Loss 5.9433 LearningRate 0.0935 Epoch: 9 Global Step: 97400 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:50:29,042-Speed 5481.21 samples/sec Loss 5.9674 LearningRate 0.0935 Epoch: 9 Global Step: 97410 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:50:36,512-Speed 5484.07 samples/sec Loss 5.9628 LearningRate 0.0935 Epoch: 9 Global Step: 97420 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:50:43,958-Speed 5501.94 samples/sec Loss 5.9078 LearningRate 0.0934 Epoch: 9 Global Step: 97430 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:50:51,429-Speed 5483.92 samples/sec Loss 5.9156 LearningRate 0.0934 Epoch: 9 Global Step: 97440 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:50:58,945-Speed 5449.65 samples/sec Loss 5.9529 LearningRate 0.0934 Epoch: 9 Global Step: 97450 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:51:06,392-Speed 5501.32 samples/sec Loss 5.9559 LearningRate 0.0934 Epoch: 9 Global Step: 97460 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:51:13,805-Speed 5526.71 samples/sec Loss 5.9374 LearningRate 0.0934 Epoch: 9 Global Step: 97470 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:51:21,337-Speed 5438.33 samples/sec Loss 5.9477 LearningRate 0.0934 Epoch: 9 Global Step: 97480 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:51:28,883-Speed 5428.37 samples/sec Loss 6.0216 LearningRate 0.0933 Epoch: 9 Global Step: 97490 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:51:36,284-Speed 5535.25 samples/sec Loss 5.9152 LearningRate 0.0933 Epoch: 9 Global Step: 97500 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:51:43,874-Speed 5398.07 samples/sec Loss 5.9357 LearningRate 0.0933 Epoch: 9 Global Step: 97510 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:51:51,462-Speed 5398.39 samples/sec Loss 5.9377 LearningRate 0.0933 Epoch: 9 Global Step: 97520 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:51:58,923-Speed 5490.28 samples/sec Loss 5.9294 LearningRate 0.0933 Epoch: 9 Global Step: 97530 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:52:06,367-Speed 5503.61 samples/sec Loss 5.9095 LearningRate 0.0933 Epoch: 9 Global Step: 97540 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:52:13,826-Speed 5491.88 samples/sec Loss 5.9156 LearningRate 0.0932 Epoch: 9 Global Step: 97550 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:52:21,301-Speed 5480.41 samples/sec Loss 5.8903 LearningRate 0.0932 Epoch: 9 Global Step: 97560 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:52:28,757-Speed 5494.63 samples/sec Loss 5.8986 LearningRate 0.0932 Epoch: 9 Global Step: 97570 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:52:36,173-Speed 5523.19 samples/sec Loss 5.9842 LearningRate 0.0932 Epoch: 9 Global Step: 97580 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:52:43,604-Speed 5513.14 samples/sec Loss 5.9698 LearningRate 0.0932 Epoch: 9 Global Step: 97590 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:52:51,050-Speed 5501.92 samples/sec Loss 5.8670 LearningRate 0.0932 Epoch: 9 Global Step: 97600 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:52:58,455-Speed 5532.28 samples/sec Loss 5.9572 LearningRate 0.0931 Epoch: 9 Global Step: 97610 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:53:05,874-Speed 5521.54 samples/sec Loss 5.9260 LearningRate 0.0931 Epoch: 9 Global Step: 97620 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:53:13,325-Speed 5497.93 samples/sec Loss 5.9569 LearningRate 0.0931 Epoch: 9 Global Step: 97630 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:53:20,781-Speed 5494.66 samples/sec Loss 5.9329 LearningRate 0.0931 Epoch: 9 Global Step: 97640 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:53:28,573-Speed 5257.00 samples/sec Loss 5.9847 LearningRate 0.0931 Epoch: 9 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:53:35,982-Speed 5529.26 samples/sec Loss 5.9345 LearningRate 0.0930 Epoch: 9 Global Step: 97660 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:53:43,500-Speed 5449.01 samples/sec Loss 5.8826 LearningRate 0.0930 Epoch: 9 Global Step: 97670 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:53:50,947-Speed 5500.56 samples/sec Loss 5.9411 LearningRate 0.0930 Epoch: 9 Global Step: 97680 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:53:58,403-Speed 5495.00 samples/sec Loss 5.9706 LearningRate 0.0930 Epoch: 9 Global Step: 97690 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:54:05,897-Speed 5466.17 samples/sec Loss 6.0161 LearningRate 0.0930 Epoch: 9 Global Step: 97700 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:54:13,385-Speed 5470.41 samples/sec Loss 5.9326 LearningRate 0.0930 Epoch: 9 Global Step: 97710 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:54:20,912-Speed 5442.95 samples/sec Loss 5.9343 LearningRate 0.0929 Epoch: 9 Global Step: 97720 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:54:28,581-Speed 5341.89 samples/sec Loss 5.9158 LearningRate 0.0929 Epoch: 9 Global Step: 97730 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:54:36,010-Speed 5514.41 samples/sec Loss 5.9878 LearningRate 0.0929 Epoch: 9 Global Step: 97740 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:54:43,596-Speed 5399.65 samples/sec Loss 5.9162 LearningRate 0.0929 Epoch: 9 Global Step: 97750 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:54:51,066-Speed 5483.95 samples/sec Loss 5.9876 LearningRate 0.0929 Epoch: 9 Global Step: 97760 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:54:58,479-Speed 5526.78 samples/sec Loss 5.9371 LearningRate 0.0929 Epoch: 9 Global Step: 97770 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:55:05,932-Speed 5496.04 samples/sec Loss 5.9200 LearningRate 0.0928 Epoch: 9 Global Step: 97780 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:55:13,382-Speed 5498.90 samples/sec Loss 5.9297 LearningRate 0.0928 Epoch: 9 Global Step: 97790 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:55:20,946-Speed 5415.81 samples/sec Loss 5.9281 LearningRate 0.0928 Epoch: 9 Global Step: 97800 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:55:28,475-Speed 5441.69 samples/sec Loss 5.9630 LearningRate 0.0928 Epoch: 9 Global Step: 97810 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:55:36,083-Speed 5384.14 samples/sec Loss 5.9627 LearningRate 0.0928 Epoch: 9 Global Step: 97820 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:55:43,697-Speed 5380.41 samples/sec Loss 5.9323 LearningRate 0.0928 Epoch: 9 Global Step: 97830 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:55:51,471-Speed 5269.88 samples/sec Loss 5.9534 LearningRate 0.0927 Epoch: 9 Global Step: 97840 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:55:59,083-Speed 5381.67 samples/sec Loss 5.9328 LearningRate 0.0927 Epoch: 9 Global Step: 97850 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:56:06,727-Speed 5359.18 samples/sec Loss 5.9155 LearningRate 0.0927 Epoch: 9 Global Step: 97860 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:56:14,424-Speed 5321.82 samples/sec Loss 5.8856 LearningRate 0.0927 Epoch: 9 Global Step: 97870 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:56:22,044-Speed 5376.65 samples/sec Loss 5.9303 LearningRate 0.0927 Epoch: 9 Global Step: 97880 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:56:29,637-Speed 5394.75 samples/sec Loss 5.9170 LearningRate 0.0927 Epoch: 9 Global Step: 97890 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:56:37,168-Speed 5439.56 samples/sec Loss 5.8718 LearningRate 0.0926 Epoch: 9 Global Step: 97900 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:56:44,715-Speed 5428.16 samples/sec Loss 5.9488 LearningRate 0.0926 Epoch: 9 Global Step: 97910 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:56:52,172-Speed 5493.50 samples/sec Loss 5.9306 LearningRate 0.0926 Epoch: 9 Global Step: 97920 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:56:59,672-Speed 5461.87 samples/sec Loss 5.9543 LearningRate 0.0926 Epoch: 9 Global Step: 97930 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:57:07,222-Speed 5426.24 samples/sec Loss 5.8943 LearningRate 0.0926 Epoch: 9 Global Step: 97940 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:57:14,751-Speed 5441.33 samples/sec Loss 5.8884 LearningRate 0.0926 Epoch: 9 Global Step: 97950 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 16:57:22,279-Speed 5441.34 samples/sec Loss 5.9096 LearningRate 0.0925 Epoch: 9 Global Step: 97960 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:57:29,784-Speed 5458.07 samples/sec Loss 5.9348 LearningRate 0.0925 Epoch: 9 Global Step: 97970 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:57:37,237-Speed 5496.78 samples/sec Loss 5.9451 LearningRate 0.0925 Epoch: 9 Global Step: 97980 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:57:44,760-Speed 5445.23 samples/sec Loss 5.9514 LearningRate 0.0925 Epoch: 9 Global Step: 97990 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:57:52,324-Speed 5416.16 samples/sec Loss 5.9314 LearningRate 0.0925 Epoch: 9 Global Step: 98000 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 16:58:36,657-[lfw][98000]XNorm: 21.642411 Training: 2022-01-08 16:58:36,658-[lfw][98000]Accuracy-Flip: 0.99783+-0.00289 Training: 2022-01-08 16:58:36,659-[lfw][98000]Accuracy-Highest: 0.99817 Training: 2022-01-08 16:59:27,973-[cfp_fp][98000]XNorm: 19.796313 Training: 2022-01-08 16:59:27,974-[cfp_fp][98000]Accuracy-Flip: 0.98700+-0.00614 Training: 2022-01-08 16:59:27,974-[cfp_fp][98000]Accuracy-Highest: 0.98914 Training: 2022-01-08 17:00:13,294-[agedb_30][98000]XNorm: 21.276060 Training: 2022-01-08 17:00:13,296-[agedb_30][98000]Accuracy-Flip: 0.97500+-0.00792 Training: 2022-01-08 17:00:13,296-[agedb_30][98000]Accuracy-Highest: 0.97833 Training: 2022-01-08 17:00:20,966-Speed 275.57 samples/sec Loss 5.9450 LearningRate 0.0925 Epoch: 9 Global Step: 98010 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:00:28,609-Speed 5360.99 samples/sec Loss 5.9587 LearningRate 0.0924 Epoch: 9 Global Step: 98020 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:00:36,172-Speed 5417.05 samples/sec Loss 5.9542 LearningRate 0.0924 Epoch: 9 Global Step: 98030 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:00:43,609-Speed 5509.04 samples/sec Loss 5.9416 LearningRate 0.0924 Epoch: 9 Global Step: 98040 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:00:51,159-Speed 5426.94 samples/sec Loss 5.9300 LearningRate 0.0924 Epoch: 9 Global Step: 98050 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:00:58,608-Speed 5499.85 samples/sec Loss 5.9360 LearningRate 0.0924 Epoch: 9 Global Step: 98060 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:01:06,148-Speed 5433.31 samples/sec Loss 5.9215 LearningRate 0.0924 Epoch: 9 Global Step: 98070 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:01:13,754-Speed 5385.49 samples/sec Loss 5.9186 LearningRate 0.0923 Epoch: 9 Global Step: 98080 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:01:21,457-Speed 5317.97 samples/sec Loss 5.9625 LearningRate 0.0923 Epoch: 9 Global Step: 98090 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:01:28,990-Speed 5438.60 samples/sec Loss 5.8819 LearningRate 0.0923 Epoch: 9 Global Step: 98100 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:01:36,416-Speed 5516.43 samples/sec Loss 5.8942 LearningRate 0.0923 Epoch: 9 Global Step: 98110 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:01:43,907-Speed 5468.48 samples/sec Loss 5.9086 LearningRate 0.0923 Epoch: 9 Global Step: 98120 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:01:51,397-Speed 5468.97 samples/sec Loss 5.8427 LearningRate 0.0923 Epoch: 9 Global Step: 98130 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:01:58,863-Speed 5486.75 samples/sec Loss 5.9523 LearningRate 0.0922 Epoch: 9 Global Step: 98140 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:02:06,311-Speed 5500.82 samples/sec Loss 5.8918 LearningRate 0.0922 Epoch: 9 Global Step: 98150 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:02:13,770-Speed 5491.71 samples/sec Loss 5.9214 LearningRate 0.0922 Epoch: 9 Global Step: 98160 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:02:21,186-Speed 5524.28 samples/sec Loss 5.9423 LearningRate 0.0922 Epoch: 9 Global Step: 98170 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:02:28,721-Speed 5436.10 samples/sec Loss 5.9355 LearningRate 0.0922 Epoch: 9 Global Step: 98180 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:02:36,167-Speed 5502.09 samples/sec Loss 5.8735 LearningRate 0.0922 Epoch: 9 Global Step: 98190 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:02:43,610-Speed 5504.06 samples/sec Loss 5.8832 LearningRate 0.0921 Epoch: 9 Global Step: 98200 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:02:51,130-Speed 5447.34 samples/sec Loss 5.9157 LearningRate 0.0921 Epoch: 9 Global Step: 98210 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:02:58,532-Speed 5534.53 samples/sec Loss 6.0068 LearningRate 0.0921 Epoch: 9 Global Step: 98220 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:03:06,004-Speed 5482.35 samples/sec Loss 5.8947 LearningRate 0.0921 Epoch: 9 Global Step: 98230 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:03:13,501-Speed 5464.25 samples/sec Loss 5.8975 LearningRate 0.0921 Epoch: 9 Global Step: 98240 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:03:20,970-Speed 5484.98 samples/sec Loss 5.9148 LearningRate 0.0921 Epoch: 9 Global Step: 98250 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:03:28,463-Speed 5466.97 samples/sec Loss 5.9487 LearningRate 0.0920 Epoch: 9 Global Step: 98260 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:03:35,931-Speed 5485.88 samples/sec Loss 5.9053 LearningRate 0.0920 Epoch: 9 Global Step: 98270 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:03:43,475-Speed 5430.13 samples/sec Loss 5.9337 LearningRate 0.0920 Epoch: 9 Global Step: 98280 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:03:50,943-Speed 5485.18 samples/sec Loss 5.9093 LearningRate 0.0920 Epoch: 9 Global Step: 98290 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:03:58,371-Speed 5515.21 samples/sec Loss 5.9025 LearningRate 0.0920 Epoch: 9 Global Step: 98300 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:04:05,831-Speed 5491.35 samples/sec Loss 5.8366 LearningRate 0.0919 Epoch: 9 Global Step: 98310 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:04:13,248-Speed 5523.42 samples/sec Loss 5.9091 LearningRate 0.0919 Epoch: 9 Global Step: 98320 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:04:20,772-Speed 5444.53 samples/sec Loss 5.8786 LearningRate 0.0919 Epoch: 9 Global Step: 98330 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:04:28,324-Speed 5424.41 samples/sec Loss 5.9147 LearningRate 0.0919 Epoch: 9 Global Step: 98340 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:04:35,885-Speed 5417.81 samples/sec Loss 5.8753 LearningRate 0.0919 Epoch: 9 Global Step: 98350 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:04:43,444-Speed 5419.69 samples/sec Loss 5.8582 LearningRate 0.0919 Epoch: 9 Global Step: 98360 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:04:51,036-Speed 5395.52 samples/sec Loss 5.9478 LearningRate 0.0918 Epoch: 9 Global Step: 98370 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:04:58,564-Speed 5441.75 samples/sec Loss 5.8726 LearningRate 0.0918 Epoch: 9 Global Step: 98380 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:05:05,959-Speed 5539.54 samples/sec Loss 5.8431 LearningRate 0.0918 Epoch: 9 Global Step: 98390 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:05:13,403-Speed 5503.39 samples/sec Loss 5.9086 LearningRate 0.0918 Epoch: 9 Global Step: 98400 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:05:20,811-Speed 5529.91 samples/sec Loss 5.8483 LearningRate 0.0918 Epoch: 9 Global Step: 98410 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:05:28,277-Speed 5487.25 samples/sec Loss 5.8686 LearningRate 0.0918 Epoch: 9 Global Step: 98420 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:05:35,741-Speed 5488.05 samples/sec Loss 5.9245 LearningRate 0.0917 Epoch: 9 Global Step: 98430 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:05:43,233-Speed 5468.16 samples/sec Loss 5.9240 LearningRate 0.0917 Epoch: 9 Global Step: 98440 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:05:50,651-Speed 5522.09 samples/sec Loss 5.9030 LearningRate 0.0917 Epoch: 9 Global Step: 98450 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:05:58,140-Speed 5470.62 samples/sec Loss 5.8758 LearningRate 0.0917 Epoch: 9 Global Step: 98460 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:06:05,564-Speed 5517.47 samples/sec Loss 5.9487 LearningRate 0.0917 Epoch: 9 Global Step: 98470 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:06:12,994-Speed 5514.05 samples/sec Loss 5.8596 LearningRate 0.0917 Epoch: 9 Global Step: 98480 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:06:20,446-Speed 5496.83 samples/sec Loss 5.8287 LearningRate 0.0916 Epoch: 9 Global Step: 98490 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:06:27,922-Speed 5480.19 samples/sec Loss 5.9332 LearningRate 0.0916 Epoch: 9 Global Step: 98500 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:06:35,333-Speed 5526.95 samples/sec Loss 5.9061 LearningRate 0.0916 Epoch: 9 Global Step: 98510 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:06:42,987-Speed 5352.60 samples/sec Loss 5.9028 LearningRate 0.0916 Epoch: 9 Global Step: 98520 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:06:50,514-Speed 5442.47 samples/sec Loss 5.9311 LearningRate 0.0916 Epoch: 9 Global Step: 98530 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:06:58,080-Speed 5414.22 samples/sec Loss 5.8870 LearningRate 0.0916 Epoch: 9 Global Step: 98540 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:07:05,605-Speed 5444.47 samples/sec Loss 5.8540 LearningRate 0.0915 Epoch: 9 Global Step: 98550 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:07:13,116-Speed 5454.20 samples/sec Loss 5.8915 LearningRate 0.0915 Epoch: 9 Global Step: 98560 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:07:20,574-Speed 5492.54 samples/sec Loss 5.9646 LearningRate 0.0915 Epoch: 9 Global Step: 98570 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:07:28,008-Speed 5510.55 samples/sec Loss 5.8784 LearningRate 0.0915 Epoch: 9 Global Step: 98580 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:07:35,528-Speed 5447.04 samples/sec Loss 5.8561 LearningRate 0.0915 Epoch: 9 Global Step: 98590 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:07:43,000-Speed 5482.82 samples/sec Loss 5.8536 LearningRate 0.0915 Epoch: 9 Global Step: 98600 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:07:50,473-Speed 5483.21 samples/sec Loss 5.8409 LearningRate 0.0914 Epoch: 9 Global Step: 98610 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:07:57,942-Speed 5484.07 samples/sec Loss 5.8537 LearningRate 0.0914 Epoch: 9 Global Step: 98620 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:08:05,619-Speed 5336.82 samples/sec Loss 5.8964 LearningRate 0.0914 Epoch: 9 Global Step: 98630 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:08:13,195-Speed 5407.08 samples/sec Loss 5.9179 LearningRate 0.0914 Epoch: 9 Global Step: 98640 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:08:20,621-Speed 5518.51 samples/sec Loss 5.9086 LearningRate 0.0914 Epoch: 9 Global Step: 98650 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:08:28,135-Speed 5451.33 samples/sec Loss 5.9379 LearningRate 0.0914 Epoch: 9 Global Step: 98660 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:08:35,664-Speed 5441.33 samples/sec Loss 5.8909 LearningRate 0.0913 Epoch: 9 Global Step: 98670 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:08:43,200-Speed 5435.86 samples/sec Loss 5.8611 LearningRate 0.0913 Epoch: 9 Global Step: 98680 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:08:50,645-Speed 5503.07 samples/sec Loss 5.8419 LearningRate 0.0913 Epoch: 9 Global Step: 98690 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:08:58,149-Speed 5458.52 samples/sec Loss 5.8846 LearningRate 0.0913 Epoch: 9 Global Step: 98700 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:09:05,628-Speed 5477.52 samples/sec Loss 5.8835 LearningRate 0.0913 Epoch: 9 Global Step: 98710 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:09:13,071-Speed 5504.40 samples/sec Loss 5.8832 LearningRate 0.0913 Epoch: 9 Global Step: 98720 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:09:20,623-Speed 5424.73 samples/sec Loss 5.9123 LearningRate 0.0912 Epoch: 9 Global Step: 98730 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:09:28,108-Speed 5472.33 samples/sec Loss 5.8598 LearningRate 0.0912 Epoch: 9 Global Step: 98740 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:09:35,617-Speed 5456.03 samples/sec Loss 5.9073 LearningRate 0.0912 Epoch: 9 Global Step: 98750 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:09:43,140-Speed 5444.98 samples/sec Loss 5.9002 LearningRate 0.0912 Epoch: 9 Global Step: 98760 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:09:50,686-Speed 5429.19 samples/sec Loss 5.8691 LearningRate 0.0912 Epoch: 9 Global Step: 98770 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:09:58,233-Speed 5427.90 samples/sec Loss 5.8480 LearningRate 0.0912 Epoch: 9 Global Step: 98780 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:10:05,639-Speed 5531.66 samples/sec Loss 5.9227 LearningRate 0.0911 Epoch: 9 Global Step: 98790 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:10:13,065-Speed 5515.84 samples/sec Loss 5.8615 LearningRate 0.0911 Epoch: 9 Global Step: 98800 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:10:20,542-Speed 5479.26 samples/sec Loss 5.9560 LearningRate 0.0911 Epoch: 9 Global Step: 98810 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:10:28,004-Speed 5490.12 samples/sec Loss 5.8614 LearningRate 0.0911 Epoch: 9 Global Step: 98820 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:10:35,469-Speed 5487.74 samples/sec Loss 5.9212 LearningRate 0.0911 Epoch: 9 Global Step: 98830 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:10:42,923-Speed 5494.82 samples/sec Loss 5.8276 LearningRate 0.0911 Epoch: 9 Global Step: 98840 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:10:50,457-Speed 5437.91 samples/sec Loss 5.8978 LearningRate 0.0910 Epoch: 9 Global Step: 98850 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:10:57,956-Speed 5463.11 samples/sec Loss 5.8388 LearningRate 0.0910 Epoch: 9 Global Step: 98860 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:11:05,486-Speed 5439.79 samples/sec Loss 5.9307 LearningRate 0.0910 Epoch: 9 Global Step: 98870 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:11:13,011-Speed 5444.22 samples/sec Loss 5.8540 LearningRate 0.0910 Epoch: 9 Global Step: 98880 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:11:20,543-Speed 5439.11 samples/sec Loss 5.9104 LearningRate 0.0910 Epoch: 9 Global Step: 98890 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:11:28,145-Speed 5389.04 samples/sec Loss 5.8753 LearningRate 0.0910 Epoch: 9 Global Step: 98900 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:11:35,740-Speed 5393.31 samples/sec Loss 5.8880 LearningRate 0.0909 Epoch: 9 Global Step: 98910 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:11:43,258-Speed 5449.40 samples/sec Loss 5.8476 LearningRate 0.0909 Epoch: 9 Global Step: 98920 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:11:50,863-Speed 5386.70 samples/sec Loss 5.8282 LearningRate 0.0909 Epoch: 9 Global Step: 98930 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:11:58,311-Speed 5500.44 samples/sec Loss 5.8813 LearningRate 0.0909 Epoch: 9 Global Step: 98940 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:12:05,913-Speed 5388.84 samples/sec Loss 5.7937 LearningRate 0.0909 Epoch: 9 Global Step: 98950 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:12:13,447-Speed 5437.41 samples/sec Loss 5.9319 LearningRate 0.0909 Epoch: 9 Global Step: 98960 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:12:21,215-Speed 5273.82 samples/sec Loss 5.8384 LearningRate 0.0908 Epoch: 9 Global Step: 98970 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:12:28,742-Speed 5442.28 samples/sec Loss 5.8475 LearningRate 0.0908 Epoch: 9 Global Step: 98980 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:12:36,279-Speed 5435.11 samples/sec Loss 5.8426 LearningRate 0.0908 Epoch: 9 Global Step: 98990 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:12:43,852-Speed 5409.06 samples/sec Loss 5.9387 LearningRate 0.0908 Epoch: 9 Global Step: 99000 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:12:51,301-Speed 5500.07 samples/sec Loss 5.9047 LearningRate 0.0908 Epoch: 9 Global Step: 99010 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:12:58,775-Speed 5480.82 samples/sec Loss 5.8145 LearningRate 0.0908 Epoch: 9 Global Step: 99020 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:13:06,247-Speed 5482.69 samples/sec Loss 5.8358 LearningRate 0.0907 Epoch: 9 Global Step: 99030 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:13:13,740-Speed 5466.65 samples/sec Loss 5.8635 LearningRate 0.0907 Epoch: 9 Global Step: 99040 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:13:21,155-Speed 5524.62 samples/sec Loss 5.8620 LearningRate 0.0907 Epoch: 9 Global Step: 99050 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:13:28,726-Speed 5411.45 samples/sec Loss 5.9109 LearningRate 0.0907 Epoch: 9 Global Step: 99060 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:13:36,224-Speed 5463.20 samples/sec Loss 5.8942 LearningRate 0.0907 Epoch: 9 Global Step: 99070 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:13:43,776-Speed 5424.12 samples/sec Loss 5.8525 LearningRate 0.0907 Epoch: 9 Global Step: 99080 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:13:51,246-Speed 5483.95 samples/sec Loss 5.8661 LearningRate 0.0906 Epoch: 9 Global Step: 99090 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:13:58,736-Speed 5469.73 samples/sec Loss 5.8369 LearningRate 0.0906 Epoch: 9 Global Step: 99100 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:14:06,223-Speed 5471.87 samples/sec Loss 5.8496 LearningRate 0.0906 Epoch: 9 Global Step: 99110 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:14:13,701-Speed 5477.68 samples/sec Loss 5.8480 LearningRate 0.0906 Epoch: 9 Global Step: 99120 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:14:21,379-Speed 5334.89 samples/sec Loss 5.8254 LearningRate 0.0906 Epoch: 9 Global Step: 99130 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:14:29,040-Speed 5347.46 samples/sec Loss 5.8469 LearningRate 0.0906 Epoch: 9 Global Step: 99140 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:14:36,481-Speed 5505.80 samples/sec Loss 5.8725 LearningRate 0.0905 Epoch: 9 Global Step: 99150 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:14:44,001-Speed 5447.25 samples/sec Loss 5.8910 LearningRate 0.0905 Epoch: 9 Global Step: 99160 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:14:51,573-Speed 5409.93 samples/sec Loss 5.9084 LearningRate 0.0905 Epoch: 9 Global Step: 99170 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:14:59,017-Speed 5503.95 samples/sec Loss 5.8766 LearningRate 0.0905 Epoch: 9 Global Step: 99180 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:15:06,500-Speed 5474.38 samples/sec Loss 5.8903 LearningRate 0.0905 Epoch: 9 Global Step: 99190 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:15:14,021-Speed 5446.69 samples/sec Loss 5.8558 LearningRate 0.0905 Epoch: 9 Global Step: 99200 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:15:21,534-Speed 5452.67 samples/sec Loss 5.8709 LearningRate 0.0904 Epoch: 9 Global Step: 99210 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:15:29,031-Speed 5464.09 samples/sec Loss 5.8201 LearningRate 0.0904 Epoch: 9 Global Step: 99220 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:15:36,522-Speed 5468.39 samples/sec Loss 5.8237 LearningRate 0.0904 Epoch: 9 Global Step: 99230 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:15:44,043-Speed 5446.84 samples/sec Loss 5.8645 LearningRate 0.0904 Epoch: 9 Global Step: 99240 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:15:51,593-Speed 5425.64 samples/sec Loss 5.8910 LearningRate 0.0904 Epoch: 9 Global Step: 99250 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:15:59,192-Speed 5391.52 samples/sec Loss 5.8770 LearningRate 0.0904 Epoch: 9 Global Step: 99260 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:16:06,778-Speed 5399.92 samples/sec Loss 5.8422 LearningRate 0.0903 Epoch: 9 Global Step: 99270 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:16:14,177-Speed 5536.78 samples/sec Loss 5.8748 LearningRate 0.0903 Epoch: 9 Global Step: 99280 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:16:21,737-Speed 5418.60 samples/sec Loss 5.8486 LearningRate 0.0903 Epoch: 9 Global Step: 99290 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:16:29,259-Speed 5446.36 samples/sec Loss 5.8377 LearningRate 0.0903 Epoch: 9 Global Step: 99300 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:16:36,831-Speed 5410.31 samples/sec Loss 5.8743 LearningRate 0.0903 Epoch: 9 Global Step: 99310 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:16:44,359-Speed 5441.46 samples/sec Loss 5.8269 LearningRate 0.0903 Epoch: 9 Global Step: 99320 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:16:51,870-Speed 5453.35 samples/sec Loss 5.8458 LearningRate 0.0902 Epoch: 9 Global Step: 99330 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:16:59,303-Speed 5511.84 samples/sec Loss 5.8650 LearningRate 0.0902 Epoch: 9 Global Step: 99340 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:17:06,784-Speed 5476.08 samples/sec Loss 5.7909 LearningRate 0.0902 Epoch: 9 Global Step: 99350 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:17:14,333-Speed 5426.85 samples/sec Loss 5.9110 LearningRate 0.0902 Epoch: 9 Global Step: 99360 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:17:21,827-Speed 5465.72 samples/sec Loss 5.9122 LearningRate 0.0902 Epoch: 9 Global Step: 99370 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:17:29,332-Speed 5458.56 samples/sec Loss 5.8192 LearningRate 0.0902 Epoch: 9 Global Step: 99380 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:17:36,802-Speed 5483.66 samples/sec Loss 5.8567 LearningRate 0.0901 Epoch: 9 Global Step: 99390 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:17:44,323-Speed 5447.41 samples/sec Loss 5.8652 LearningRate 0.0901 Epoch: 9 Global Step: 99400 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:17:51,890-Speed 5413.15 samples/sec Loss 5.8877 LearningRate 0.0901 Epoch: 9 Global Step: 99410 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:17:59,422-Speed 5438.39 samples/sec Loss 5.9192 LearningRate 0.0901 Epoch: 9 Global Step: 99420 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:18:06,876-Speed 5495.66 samples/sec Loss 5.8283 LearningRate 0.0901 Epoch: 9 Global Step: 99430 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:18:14,443-Speed 5414.22 samples/sec Loss 5.8564 LearningRate 0.0901 Epoch: 9 Global Step: 99440 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:18:21,975-Speed 5438.74 samples/sec Loss 5.8348 LearningRate 0.0900 Epoch: 9 Global Step: 99450 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:18:29,465-Speed 5468.90 samples/sec Loss 5.8025 LearningRate 0.0900 Epoch: 9 Global Step: 99460 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:18:36,940-Speed 5480.73 samples/sec Loss 5.9184 LearningRate 0.0900 Epoch: 9 Global Step: 99470 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:18:44,463-Speed 5445.31 samples/sec Loss 5.8598 LearningRate 0.0900 Epoch: 9 Global Step: 99480 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:18:51,922-Speed 5492.08 samples/sec Loss 5.8317 LearningRate 0.0900 Epoch: 9 Global Step: 99490 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:18:59,346-Speed 5517.94 samples/sec Loss 5.8730 LearningRate 0.0900 Epoch: 9 Global Step: 99500 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:19:06,840-Speed 5466.26 samples/sec Loss 5.8872 LearningRate 0.0899 Epoch: 9 Global Step: 99510 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:19:14,354-Speed 5451.91 samples/sec Loss 5.7859 LearningRate 0.0899 Epoch: 9 Global Step: 99520 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:19:21,830-Speed 5479.76 samples/sec Loss 5.8381 LearningRate 0.0899 Epoch: 9 Global Step: 99530 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:19:29,369-Speed 5433.45 samples/sec Loss 5.8585 LearningRate 0.0899 Epoch: 9 Global Step: 99540 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:19:36,853-Speed 5474.06 samples/sec Loss 5.8373 LearningRate 0.0899 Epoch: 9 Global Step: 99550 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:19:44,443-Speed 5397.52 samples/sec Loss 5.8549 LearningRate 0.0899 Epoch: 9 Global Step: 99560 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:19:51,952-Speed 5455.35 samples/sec Loss 5.8423 LearningRate 0.0898 Epoch: 9 Global Step: 99570 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:19:59,484-Speed 5438.82 samples/sec Loss 5.8953 LearningRate 0.0898 Epoch: 9 Global Step: 99580 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:20:06,903-Speed 5521.55 samples/sec Loss 5.8338 LearningRate 0.0898 Epoch: 9 Global Step: 99590 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:20:14,355-Speed 5497.69 samples/sec Loss 5.7789 LearningRate 0.0898 Epoch: 9 Global Step: 99600 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:20:21,835-Speed 5476.13 samples/sec Loss 5.8433 LearningRate 0.0898 Epoch: 9 Global Step: 99610 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:20:29,306-Speed 5483.69 samples/sec Loss 5.8639 LearningRate 0.0898 Epoch: 9 Global Step: 99620 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:20:36,817-Speed 5454.45 samples/sec Loss 5.8071 LearningRate 0.0897 Epoch: 9 Global Step: 99630 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:20:44,414-Speed 5392.29 samples/sec Loss 5.8113 LearningRate 0.0897 Epoch: 9 Global Step: 99640 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:20:51,993-Speed 5404.67 samples/sec Loss 5.8550 LearningRate 0.0897 Epoch: 9 Global Step: 99650 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:20:59,542-Speed 5426.51 samples/sec Loss 5.8145 LearningRate 0.0897 Epoch: 9 Global Step: 99660 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:21:07,067-Speed 5444.36 samples/sec Loss 5.8285 LearningRate 0.0897 Epoch: 9 Global Step: 99670 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:21:14,612-Speed 5429.11 samples/sec Loss 5.7736 LearningRate 0.0897 Epoch: 9 Global Step: 99680 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:21:22,255-Speed 5359.97 samples/sec Loss 5.8604 LearningRate 0.0896 Epoch: 9 Global Step: 99690 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:21:29,803-Speed 5426.88 samples/sec Loss 5.8281 LearningRate 0.0896 Epoch: 9 Global Step: 99700 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:21:37,345-Speed 5432.04 samples/sec Loss 5.8646 LearningRate 0.0896 Epoch: 9 Global Step: 99710 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:21:44,824-Speed 5477.29 samples/sec Loss 5.8077 LearningRate 0.0896 Epoch: 9 Global Step: 99720 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:21:52,377-Speed 5423.45 samples/sec Loss 5.8111 LearningRate 0.0896 Epoch: 9 Global Step: 99730 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:21:59,893-Speed 5450.18 samples/sec Loss 5.8231 LearningRate 0.0896 Epoch: 9 Global Step: 99740 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:22:07,383-Speed 5469.66 samples/sec Loss 5.8113 LearningRate 0.0895 Epoch: 9 Global Step: 99750 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:22:14,980-Speed 5392.28 samples/sec Loss 5.8394 LearningRate 0.0895 Epoch: 9 Global Step: 99760 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:22:22,561-Speed 5403.52 samples/sec Loss 5.8597 LearningRate 0.0895 Epoch: 9 Global Step: 99770 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:22:30,045-Speed 5473.66 samples/sec Loss 5.7229 LearningRate 0.0895 Epoch: 9 Global Step: 99780 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:22:37,564-Speed 5449.09 samples/sec Loss 5.7852 LearningRate 0.0895 Epoch: 9 Global Step: 99790 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:22:45,031-Speed 5485.76 samples/sec Loss 5.7637 LearningRate 0.0895 Epoch: 9 Global Step: 99800 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:22:52,505-Speed 5480.78 samples/sec Loss 5.8304 LearningRate 0.0894 Epoch: 9 Global Step: 99810 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:23:00,014-Speed 5455.76 samples/sec Loss 5.8608 LearningRate 0.0894 Epoch: 9 Global Step: 99820 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:23:07,525-Speed 5454.47 samples/sec Loss 5.8328 LearningRate 0.0894 Epoch: 9 Global Step: 99830 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:23:15,139-Speed 5379.93 samples/sec Loss 5.8173 LearningRate 0.0894 Epoch: 9 Global Step: 99840 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:23:22,714-Speed 5408.64 samples/sec Loss 5.8491 LearningRate 0.0894 Epoch: 9 Global Step: 99850 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:23:30,212-Speed 5463.39 samples/sec Loss 5.8541 LearningRate 0.0894 Epoch: 9 Global Step: 99860 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:23:37,791-Speed 5404.68 samples/sec Loss 5.8365 LearningRate 0.0893 Epoch: 9 Global Step: 99870 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:23:45,422-Speed 5368.66 samples/sec Loss 5.7973 LearningRate 0.0893 Epoch: 9 Global Step: 99880 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:23:52,918-Speed 5465.11 samples/sec Loss 5.8301 LearningRate 0.0893 Epoch: 9 Global Step: 99890 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:24:00,454-Speed 5436.01 samples/sec Loss 5.7926 LearningRate 0.0893 Epoch: 9 Global Step: 99900 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:24:07,980-Speed 5442.76 samples/sec Loss 5.8243 LearningRate 0.0893 Epoch: 9 Global Step: 99910 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:24:15,482-Speed 5461.05 samples/sec Loss 5.7784 LearningRate 0.0893 Epoch: 9 Global Step: 99920 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:24:22,979-Speed 5464.15 samples/sec Loss 5.7456 LearningRate 0.0892 Epoch: 9 Global Step: 99930 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:24:30,450-Speed 5482.89 samples/sec Loss 5.7991 LearningRate 0.0892 Epoch: 9 Global Step: 99940 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:24:37,960-Speed 5454.70 samples/sec Loss 5.8123 LearningRate 0.0892 Epoch: 9 Global Step: 99950 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:24:45,399-Speed 5507.38 samples/sec Loss 5.7896 LearningRate 0.0892 Epoch: 9 Global Step: 99960 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:24:52,961-Speed 5417.11 samples/sec Loss 5.8116 LearningRate 0.0892 Epoch: 9 Global Step: 99970 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:25:00,641-Speed 5333.73 samples/sec Loss 5.8019 LearningRate 0.0892 Epoch: 9 Global Step: 99980 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:25:08,199-Speed 5420.50 samples/sec Loss 5.8438 LearningRate 0.0891 Epoch: 9 Global Step: 99990 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:25:15,680-Speed 5475.54 samples/sec Loss 5.7990 LearningRate 0.0891 Epoch: 9 Global Step: 100000 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:25:59,681-[lfw][100000]XNorm: 22.346219 Training: 2022-01-08 17:25:59,682-[lfw][100000]Accuracy-Flip: 0.99800+-0.00277 Training: 2022-01-08 17:25:59,683-[lfw][100000]Accuracy-Highest: 0.99817 Training: 2022-01-08 17:26:51,031-[cfp_fp][100000]XNorm: 20.419401 Training: 2022-01-08 17:26:51,032-[cfp_fp][100000]Accuracy-Flip: 0.98586+-0.00505 Training: 2022-01-08 17:26:51,032-[cfp_fp][100000]Accuracy-Highest: 0.98914 Training: 2022-01-08 17:27:36,859-[agedb_30][100000]XNorm: 22.253314 Training: 2022-01-08 17:27:36,861-[agedb_30][100000]Accuracy-Flip: 0.97633+-0.00795 Training: 2022-01-08 17:27:36,861-[agedb_30][100000]Accuracy-Highest: 0.97833 Training: 2022-01-08 17:27:44,452-Speed 275.32 samples/sec Loss 5.7759 LearningRate 0.0891 Epoch: 9 Global Step: 100010 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:27:52,120-Speed 5343.17 samples/sec Loss 5.7999 LearningRate 0.0891 Epoch: 9 Global Step: 100020 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:27:59,760-Speed 5362.26 samples/sec Loss 5.8647 LearningRate 0.0891 Epoch: 9 Global Step: 100030 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:28:07,436-Speed 5336.83 samples/sec Loss 5.8029 LearningRate 0.0891 Epoch: 9 Global Step: 100040 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:28:14,994-Speed 5419.97 samples/sec Loss 5.7877 LearningRate 0.0890 Epoch: 9 Global Step: 100050 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:28:22,574-Speed 5404.99 samples/sec Loss 5.7940 LearningRate 0.0890 Epoch: 9 Global Step: 100060 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:28:30,044-Speed 5483.95 samples/sec Loss 5.8160 LearningRate 0.0890 Epoch: 9 Global Step: 100070 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:28:37,587-Speed 5430.13 samples/sec Loss 5.7769 LearningRate 0.0890 Epoch: 9 Global Step: 100080 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:28:45,214-Speed 5371.34 samples/sec Loss 5.8178 LearningRate 0.0890 Epoch: 9 Global Step: 100090 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:28:52,755-Speed 5432.72 samples/sec Loss 5.8490 LearningRate 0.0890 Epoch: 9 Global Step: 100100 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:29:00,226-Speed 5483.47 samples/sec Loss 5.7437 LearningRate 0.0889 Epoch: 9 Global Step: 100110 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:29:07,767-Speed 5431.71 samples/sec Loss 5.8311 LearningRate 0.0889 Epoch: 9 Global Step: 100120 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:29:15,309-Speed 5431.85 samples/sec Loss 5.8514 LearningRate 0.0889 Epoch: 9 Global Step: 100130 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:29:22,827-Speed 5449.31 samples/sec Loss 5.8480 LearningRate 0.0889 Epoch: 9 Global Step: 100140 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:29:30,223-Speed 5538.46 samples/sec Loss 5.7887 LearningRate 0.0889 Epoch: 9 Global Step: 100150 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:29:37,676-Speed 5496.85 samples/sec Loss 5.7897 LearningRate 0.0889 Epoch: 9 Global Step: 100160 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:29:45,144-Speed 5485.10 samples/sec Loss 5.8212 LearningRate 0.0888 Epoch: 9 Global Step: 100170 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:29:52,717-Speed 5409.69 samples/sec Loss 5.7848 LearningRate 0.0888 Epoch: 9 Global Step: 100180 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:30:00,233-Speed 5450.07 samples/sec Loss 5.8009 LearningRate 0.0888 Epoch: 9 Global Step: 100190 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:30:07,761-Speed 5441.79 samples/sec Loss 5.8244 LearningRate 0.0888 Epoch: 9 Global Step: 100200 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:30:15,284-Speed 5445.44 samples/sec Loss 5.7940 LearningRate 0.0888 Epoch: 9 Global Step: 100210 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:30:22,818-Speed 5437.72 samples/sec Loss 5.7994 LearningRate 0.0888 Epoch: 9 Global Step: 100220 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:30:30,379-Speed 5417.75 samples/sec Loss 5.7730 LearningRate 0.0887 Epoch: 9 Global Step: 100230 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:30:37,931-Speed 5423.91 samples/sec Loss 5.7596 LearningRate 0.0887 Epoch: 9 Global Step: 100240 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:30:45,402-Speed 5483.53 samples/sec Loss 5.8241 LearningRate 0.0887 Epoch: 9 Global Step: 100250 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:30:52,938-Speed 5436.14 samples/sec Loss 5.8622 LearningRate 0.0887 Epoch: 9 Global Step: 100260 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:31:00,520-Speed 5403.31 samples/sec Loss 5.7915 LearningRate 0.0887 Epoch: 9 Global Step: 100270 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:31:08,249-Speed 5299.62 samples/sec Loss 5.8023 LearningRate 0.0887 Epoch: 9 Global Step: 100280 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:31:15,675-Speed 5516.61 samples/sec Loss 5.7837 LearningRate 0.0886 Epoch: 9 Global Step: 100290 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:31:23,152-Speed 5478.98 samples/sec Loss 5.7377 LearningRate 0.0886 Epoch: 9 Global Step: 100300 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:31:30,720-Speed 5413.41 samples/sec Loss 5.7351 LearningRate 0.0886 Epoch: 9 Global Step: 100310 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:31:38,267-Speed 5427.56 samples/sec Loss 5.7783 LearningRate 0.0886 Epoch: 9 Global Step: 100320 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:31:45,741-Speed 5481.65 samples/sec Loss 5.8210 LearningRate 0.0886 Epoch: 9 Global Step: 100330 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:31:53,309-Speed 5412.56 samples/sec Loss 5.8419 LearningRate 0.0886 Epoch: 9 Global Step: 100340 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:32:00,859-Speed 5426.23 samples/sec Loss 5.8307 LearningRate 0.0885 Epoch: 9 Global Step: 100350 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:32:08,491-Speed 5367.55 samples/sec Loss 5.7687 LearningRate 0.0885 Epoch: 9 Global Step: 100360 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:32:16,033-Speed 5432.24 samples/sec Loss 5.7681 LearningRate 0.0885 Epoch: 9 Global Step: 100370 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:32:23,502-Speed 5484.25 samples/sec Loss 5.7492 LearningRate 0.0885 Epoch: 9 Global Step: 100380 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:32:30,966-Speed 5488.90 samples/sec Loss 5.7284 LearningRate 0.0885 Epoch: 9 Global Step: 100390 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:32:38,527-Speed 5417.65 samples/sec Loss 5.8007 LearningRate 0.0885 Epoch: 9 Global Step: 100400 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:32:46,024-Speed 5464.42 samples/sec Loss 5.8219 LearningRate 0.0884 Epoch: 9 Global Step: 100410 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:32:53,513-Speed 5470.17 samples/sec Loss 5.7997 LearningRate 0.0884 Epoch: 9 Global Step: 100420 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:33:01,030-Speed 5450.14 samples/sec Loss 5.7246 LearningRate 0.0884 Epoch: 9 Global Step: 100430 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-01-08 17:33:08,656-Speed 5371.58 samples/sec Loss 5.8210 LearningRate 0.0884 Epoch: 9 Global Step: 100440 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-01-08 17:33:16,100-Speed 5503.55 samples/sec Loss 6.1275 LearningRate 0.0884 Epoch: 9 Global Step: 100450 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:33:23,629-Speed 5441.01 samples/sec Loss 5.9902 LearningRate 0.0884 Epoch: 9 Global Step: 100460 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:33:31,105-Speed 5479.73 samples/sec Loss 5.9300 LearningRate 0.0883 Epoch: 9 Global Step: 100470 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:33:38,668-Speed 5416.17 samples/sec Loss 5.8268 LearningRate 0.0883 Epoch: 9 Global Step: 100480 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:33:46,300-Speed 5367.70 samples/sec Loss 5.8757 LearningRate 0.0883 Epoch: 9 Global Step: 100490 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:33:54,129-Speed 5232.87 samples/sec Loss 5.8189 LearningRate 0.0883 Epoch: 9 Global Step: 100500 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:34:01,632-Speed 5460.04 samples/sec Loss 5.8259 LearningRate 0.0883 Epoch: 9 Global Step: 100510 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:34:09,119-Speed 5471.28 samples/sec Loss 5.8353 LearningRate 0.0883 Epoch: 9 Global Step: 100520 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:34:16,573-Speed 5495.83 samples/sec Loss 5.8057 LearningRate 0.0882 Epoch: 9 Global Step: 100530 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-08 17:34:24,068-Speed 5466.10 samples/sec Loss 5.8870 LearningRate 0.0882 Epoch: 9 Global Step: 100540 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:34:31,518-Speed 5498.54 samples/sec Loss 5.7770 LearningRate 0.0882 Epoch: 9 Global Step: 100550 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:34:38,977-Speed 5492.01 samples/sec Loss 5.8021 LearningRate 0.0882 Epoch: 9 Global Step: 100560 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:34:46,470-Speed 5467.40 samples/sec Loss 5.7617 LearningRate 0.0882 Epoch: 9 Global Step: 100570 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:34:53,972-Speed 5460.53 samples/sec Loss 5.8070 LearningRate 0.0882 Epoch: 9 Global Step: 100580 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:35:01,472-Speed 5461.95 samples/sec Loss 5.7907 LearningRate 0.0881 Epoch: 9 Global Step: 100590 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:35:08,976-Speed 5485.67 samples/sec Loss 5.7969 LearningRate 0.0881 Epoch: 9 Global Step: 100600 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:35:16,521-Speed 5429.38 samples/sec Loss 5.7955 LearningRate 0.0881 Epoch: 9 Global Step: 100610 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:35:24,082-Speed 5418.40 samples/sec Loss 5.8217 LearningRate 0.0881 Epoch: 9 Global Step: 100620 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:35:31,644-Speed 5417.39 samples/sec Loss 5.7421 LearningRate 0.0881 Epoch: 9 Global Step: 100630 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:35:39,147-Speed 5459.91 samples/sec Loss 5.8311 LearningRate 0.0881 Epoch: 9 Global Step: 100640 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:35:46,603-Speed 5493.59 samples/sec Loss 5.8138 LearningRate 0.0880 Epoch: 9 Global Step: 100650 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:35:54,107-Speed 5460.00 samples/sec Loss 5.7297 LearningRate 0.0880 Epoch: 9 Global Step: 100660 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:36:01,693-Speed 5399.71 samples/sec Loss 5.7530 LearningRate 0.0880 Epoch: 9 Global Step: 100670 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:36:09,350-Speed 5350.66 samples/sec Loss 5.8207 LearningRate 0.0880 Epoch: 9 Global Step: 100680 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:36:16,873-Speed 5444.92 samples/sec Loss 5.8098 LearningRate 0.0880 Epoch: 9 Global Step: 100690 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:36:24,519-Speed 5357.59 samples/sec Loss 5.7773 LearningRate 0.0880 Epoch: 9 Global Step: 100700 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:36:31,968-Speed 5499.85 samples/sec Loss 5.7904 LearningRate 0.0879 Epoch: 9 Global Step: 100710 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:36:39,471-Speed 5460.27 samples/sec Loss 5.8059 LearningRate 0.0879 Epoch: 9 Global Step: 100720 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:36:46,988-Speed 5449.06 samples/sec Loss 5.8058 LearningRate 0.0879 Epoch: 9 Global Step: 100730 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:36:54,563-Speed 5408.18 samples/sec Loss 5.7529 LearningRate 0.0879 Epoch: 9 Global Step: 100740 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:37:02,026-Speed 5489.49 samples/sec Loss 5.7403 LearningRate 0.0879 Epoch: 9 Global Step: 100750 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 17:37:09,496-Speed 5484.23 samples/sec Loss 5.7222 LearningRate 0.0879 Epoch: 9 Global Step: 100760 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:37:17,038-Speed 5430.96 samples/sec Loss 5.7780 LearningRate 0.0878 Epoch: 9 Global Step: 100770 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:37:24,658-Speed 5376.15 samples/sec Loss 5.8112 LearningRate 0.0878 Epoch: 9 Global Step: 100780 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:37:32,187-Speed 5441.43 samples/sec Loss 5.7591 LearningRate 0.0878 Epoch: 9 Global Step: 100790 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:37:39,786-Speed 5390.36 samples/sec Loss 5.8237 LearningRate 0.0878 Epoch: 9 Global Step: 100800 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:37:47,264-Speed 5478.62 samples/sec Loss 5.6653 LearningRate 0.0878 Epoch: 9 Global Step: 100810 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:37:54,760-Speed 5464.99 samples/sec Loss 5.7366 LearningRate 0.0878 Epoch: 9 Global Step: 100820 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:38:02,315-Speed 5421.71 samples/sec Loss 5.7638 LearningRate 0.0878 Epoch: 9 Global Step: 100830 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:38:09,759-Speed 5503.36 samples/sec Loss 5.8252 LearningRate 0.0877 Epoch: 9 Global Step: 100840 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:38:17,212-Speed 5496.55 samples/sec Loss 5.7518 LearningRate 0.0877 Epoch: 9 Global Step: 100850 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:38:24,722-Speed 5454.78 samples/sec Loss 5.7309 LearningRate 0.0877 Epoch: 9 Global Step: 100860 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:38:32,157-Speed 5509.12 samples/sec Loss 5.7636 LearningRate 0.0877 Epoch: 9 Global Step: 100870 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:38:39,647-Speed 5470.08 samples/sec Loss 5.7297 LearningRate 0.0877 Epoch: 9 Global Step: 100880 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:38:47,200-Speed 5423.52 samples/sec Loss 5.7602 LearningRate 0.0877 Epoch: 9 Global Step: 100890 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:38:54,729-Speed 5440.57 samples/sec Loss 5.6918 LearningRate 0.0876 Epoch: 9 Global Step: 100900 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:39:02,231-Speed 5461.10 samples/sec Loss 5.7572 LearningRate 0.0876 Epoch: 9 Global Step: 100910 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:39:09,774-Speed 5430.69 samples/sec Loss 5.7683 LearningRate 0.0876 Epoch: 9 Global Step: 100920 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:39:17,322-Speed 5427.87 samples/sec Loss 5.6999 LearningRate 0.0876 Epoch: 9 Global Step: 100930 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:39:24,891-Speed 5411.97 samples/sec Loss 5.7674 LearningRate 0.0876 Epoch: 9 Global Step: 100940 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:39:32,444-Speed 5423.33 samples/sec Loss 5.7512 LearningRate 0.0876 Epoch: 9 Global Step: 100950 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:39:40,152-Speed 5315.05 samples/sec Loss 5.7340 LearningRate 0.0875 Epoch: 9 Global Step: 100960 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:39:47,632-Speed 5476.38 samples/sec Loss 5.7369 LearningRate 0.0875 Epoch: 9 Global Step: 100970 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:39:55,080-Speed 5500.51 samples/sec Loss 5.7360 LearningRate 0.0875 Epoch: 9 Global Step: 100980 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:40:02,548-Speed 5485.01 samples/sec Loss 5.7116 LearningRate 0.0875 Epoch: 9 Global Step: 100990 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:40:10,059-Speed 5454.22 samples/sec Loss 5.7786 LearningRate 0.0875 Epoch: 9 Global Step: 101000 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:40:17,560-Speed 5461.54 samples/sec Loss 5.7825 LearningRate 0.0875 Epoch: 9 Global Step: 101010 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:40:24,986-Speed 5516.39 samples/sec Loss 5.7358 LearningRate 0.0874 Epoch: 9 Global Step: 101020 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:40:32,604-Speed 5377.10 samples/sec Loss 5.7500 LearningRate 0.0874 Epoch: 9 Global Step: 101030 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:40:40,162-Speed 5420.57 samples/sec Loss 5.7810 LearningRate 0.0874 Epoch: 9 Global Step: 101040 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:40:47,727-Speed 5415.14 samples/sec Loss 5.7946 LearningRate 0.0874 Epoch: 9 Global Step: 101050 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:40:55,251-Speed 5444.28 samples/sec Loss 5.7353 LearningRate 0.0874 Epoch: 9 Global Step: 101060 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:41:02,670-Speed 5521.65 samples/sec Loss 5.7430 LearningRate 0.0874 Epoch: 9 Global Step: 101070 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:41:10,167-Speed 5464.03 samples/sec Loss 5.7477 LearningRate 0.0873 Epoch: 9 Global Step: 101080 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:41:17,666-Speed 5462.55 samples/sec Loss 5.7138 LearningRate 0.0873 Epoch: 9 Global Step: 101090 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:41:25,209-Speed 5431.42 samples/sec Loss 5.7114 LearningRate 0.0873 Epoch: 9 Global Step: 101100 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:41:32,716-Speed 5456.60 samples/sec Loss 5.6809 LearningRate 0.0873 Epoch: 9 Global Step: 101110 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:41:40,216-Speed 5462.00 samples/sec Loss 5.7440 LearningRate 0.0873 Epoch: 9 Global Step: 101120 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:41:47,788-Speed 5409.94 samples/sec Loss 5.6606 LearningRate 0.0873 Epoch: 9 Global Step: 101130 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:41:55,303-Speed 5451.23 samples/sec Loss 5.7451 LearningRate 0.0872 Epoch: 9 Global Step: 101140 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:42:02,788-Speed 5473.26 samples/sec Loss 5.7265 LearningRate 0.0872 Epoch: 9 Global Step: 101150 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:42:10,287-Speed 5462.81 samples/sec Loss 5.7281 LearningRate 0.0872 Epoch: 9 Global Step: 101160 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:42:17,797-Speed 5454.96 samples/sec Loss 5.7071 LearningRate 0.0872 Epoch: 9 Global Step: 101170 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:42:25,310-Speed 5452.55 samples/sec Loss 5.7605 LearningRate 0.0872 Epoch: 9 Global Step: 101180 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:42:32,764-Speed 5495.16 samples/sec Loss 5.8253 LearningRate 0.0872 Epoch: 9 Global Step: 101190 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:42:40,282-Speed 5449.77 samples/sec Loss 5.7700 LearningRate 0.0871 Epoch: 9 Global Step: 101200 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:42:47,773-Speed 5468.39 samples/sec Loss 5.6165 LearningRate 0.0871 Epoch: 9 Global Step: 101210 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:42:55,194-Speed 5519.68 samples/sec Loss 5.7988 LearningRate 0.0871 Epoch: 9 Global Step: 101220 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:43:02,613-Speed 5522.02 samples/sec Loss 5.6884 LearningRate 0.0871 Epoch: 9 Global Step: 101230 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:43:10,114-Speed 5461.47 samples/sec Loss 5.7416 LearningRate 0.0871 Epoch: 9 Global Step: 101240 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:43:17,714-Speed 5389.76 samples/sec Loss 5.7376 LearningRate 0.0871 Epoch: 9 Global Step: 101250 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:43:25,210-Speed 5464.96 samples/sec Loss 5.7482 LearningRate 0.0870 Epoch: 9 Global Step: 101260 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:43:32,807-Speed 5392.83 samples/sec Loss 5.7026 LearningRate 0.0870 Epoch: 9 Global Step: 101270 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 17:43:40,289-Speed 5474.99 samples/sec Loss 5.8012 LearningRate 0.0870 Epoch: 9 Global Step: 101280 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:43:47,715-Speed 5516.34 samples/sec Loss 5.7853 LearningRate 0.0870 Epoch: 9 Global Step: 101290 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:43:55,197-Speed 5475.19 samples/sec Loss 5.7274 LearningRate 0.0870 Epoch: 9 Global Step: 101300 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:44:02,776-Speed 5405.41 samples/sec Loss 5.7286 LearningRate 0.0870 Epoch: 9 Global Step: 101310 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:44:10,432-Speed 5350.70 samples/sec Loss 5.6982 LearningRate 0.0869 Epoch: 9 Global Step: 101320 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:44:17,981-Speed 5426.72 samples/sec Loss 5.7659 LearningRate 0.0869 Epoch: 9 Global Step: 101330 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:44:25,619-Speed 5363.11 samples/sec Loss 5.7990 LearningRate 0.0869 Epoch: 9 Global Step: 101340 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:44:33,217-Speed 5391.94 samples/sec Loss 5.8025 LearningRate 0.0869 Epoch: 9 Global Step: 101350 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:44:41,097-Speed 5198.68 samples/sec Loss 5.7914 LearningRate 0.0869 Epoch: 9 Global Step: 101360 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:44:48,594-Speed 5463.87 samples/sec Loss 5.7716 LearningRate 0.0869 Epoch: 9 Global Step: 101370 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:44:56,098-Speed 5459.51 samples/sec Loss 5.7501 LearningRate 0.0868 Epoch: 9 Global Step: 101380 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:45:03,677-Speed 5405.28 samples/sec Loss 5.7961 LearningRate 0.0868 Epoch: 9 Global Step: 101390 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:45:11,121-Speed 5503.11 samples/sec Loss 5.7766 LearningRate 0.0868 Epoch: 9 Global Step: 101400 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:45:18,687-Speed 5414.53 samples/sec Loss 5.7821 LearningRate 0.0868 Epoch: 9 Global Step: 101410 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:45:26,257-Speed 5411.22 samples/sec Loss 5.7537 LearningRate 0.0868 Epoch: 9 Global Step: 101420 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:45:33,777-Speed 5447.67 samples/sec Loss 5.7129 LearningRate 0.0868 Epoch: 9 Global Step: 101430 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:45:41,440-Speed 5345.90 samples/sec Loss 5.6986 LearningRate 0.0867 Epoch: 9 Global Step: 101440 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:45:49,038-Speed 5391.95 samples/sec Loss 5.6346 LearningRate 0.0867 Epoch: 9 Global Step: 101450 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:45:56,576-Speed 5434.65 samples/sec Loss 5.7292 LearningRate 0.0867 Epoch: 9 Global Step: 101460 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:46:04,099-Speed 5444.95 samples/sec Loss 5.6616 LearningRate 0.0867 Epoch: 9 Global Step: 101470 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:46:11,654-Speed 5422.70 samples/sec Loss 5.8135 LearningRate 0.0867 Epoch: 9 Global Step: 101480 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:46:19,171-Speed 5449.36 samples/sec Loss 5.7115 LearningRate 0.0867 Epoch: 9 Global Step: 101490 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:46:26,715-Speed 5430.32 samples/sec Loss 5.7351 LearningRate 0.0866 Epoch: 9 Global Step: 101500 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:46:34,189-Speed 5481.31 samples/sec Loss 5.7021 LearningRate 0.0866 Epoch: 9 Global Step: 101510 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:46:41,649-Speed 5491.74 samples/sec Loss 5.6848 LearningRate 0.0866 Epoch: 9 Global Step: 101520 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:46:49,202-Speed 5423.71 samples/sec Loss 5.7480 LearningRate 0.0866 Epoch: 9 Global Step: 101530 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:46:56,671-Speed 5484.33 samples/sec Loss 5.7501 LearningRate 0.0866 Epoch: 9 Global Step: 101540 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:47:04,118-Speed 5500.81 samples/sec Loss 5.7275 LearningRate 0.0866 Epoch: 9 Global Step: 101550 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:47:11,609-Speed 5468.93 samples/sec Loss 5.7392 LearningRate 0.0866 Epoch: 9 Global Step: 101560 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:47:19,189-Speed 5404.42 samples/sec Loss 5.6881 LearningRate 0.0865 Epoch: 9 Global Step: 101570 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:47:26,780-Speed 5396.57 samples/sec Loss 5.6860 LearningRate 0.0865 Epoch: 9 Global Step: 101580 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:47:34,396-Speed 5378.36 samples/sec Loss 5.7010 LearningRate 0.0865 Epoch: 9 Global Step: 101590 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:47:41,869-Speed 5481.80 samples/sec Loss 5.6904 LearningRate 0.0865 Epoch: 9 Global Step: 101600 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:47:49,457-Speed 5398.88 samples/sec Loss 5.7035 LearningRate 0.0865 Epoch: 9 Global Step: 101610 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:47:56,912-Speed 5494.87 samples/sec Loss 5.6881 LearningRate 0.0865 Epoch: 9 Global Step: 101620 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:48:04,436-Speed 5444.56 samples/sec Loss 5.7761 LearningRate 0.0864 Epoch: 9 Global Step: 101630 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:48:11,876-Speed 5506.13 samples/sec Loss 5.6763 LearningRate 0.0864 Epoch: 9 Global Step: 101640 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:48:19,355-Speed 5477.88 samples/sec Loss 5.7009 LearningRate 0.0864 Epoch: 9 Global Step: 101650 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:48:26,859-Speed 5458.75 samples/sec Loss 5.6750 LearningRate 0.0864 Epoch: 9 Global Step: 101660 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:48:34,345-Speed 5471.91 samples/sec Loss 5.7142 LearningRate 0.0864 Epoch: 9 Global Step: 101670 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:48:41,903-Speed 5420.31 samples/sec Loss 5.7290 LearningRate 0.0864 Epoch: 9 Global Step: 101680 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:48:49,427-Speed 5444.65 samples/sec Loss 5.7698 LearningRate 0.0863 Epoch: 9 Global Step: 101690 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:48:56,912-Speed 5473.27 samples/sec Loss 5.7148 LearningRate 0.0863 Epoch: 9 Global Step: 101700 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:49:04,475-Speed 5416.24 samples/sec Loss 5.7508 LearningRate 0.0863 Epoch: 9 Global Step: 101710 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:49:11,984-Speed 5455.57 samples/sec Loss 5.6730 LearningRate 0.0863 Epoch: 9 Global Step: 101720 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:49:19,536-Speed 5424.17 samples/sec Loss 5.7083 LearningRate 0.0863 Epoch: 9 Global Step: 101730 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:49:26,953-Speed 5522.90 samples/sec Loss 5.7781 LearningRate 0.0863 Epoch: 9 Global Step: 101740 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:49:34,472-Speed 5448.14 samples/sec Loss 5.6791 LearningRate 0.0862 Epoch: 9 Global Step: 101750 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:49:41,939-Speed 5486.33 samples/sec Loss 5.7285 LearningRate 0.0862 Epoch: 9 Global Step: 101760 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:49:49,459-Speed 5447.93 samples/sec Loss 5.6689 LearningRate 0.0862 Epoch: 9 Global Step: 101770 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:49:56,926-Speed 5486.33 samples/sec Loss 5.7153 LearningRate 0.0862 Epoch: 9 Global Step: 101780 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:50:04,434-Speed 5456.08 samples/sec Loss 5.7251 LearningRate 0.0862 Epoch: 9 Global Step: 101790 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:50:12,016-Speed 5402.89 samples/sec Loss 5.6699 LearningRate 0.0862 Epoch: 9 Global Step: 101800 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:50:19,525-Speed 5455.94 samples/sec Loss 5.7279 LearningRate 0.0861 Epoch: 9 Global Step: 101810 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:50:27,097-Speed 5409.75 samples/sec Loss 5.6996 LearningRate 0.0861 Epoch: 9 Global Step: 101820 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:50:34,662-Speed 5414.63 samples/sec Loss 5.7623 LearningRate 0.0861 Epoch: 9 Global Step: 101830 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:50:42,240-Speed 5406.48 samples/sec Loss 5.7483 LearningRate 0.0861 Epoch: 9 Global Step: 101840 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:50:49,776-Speed 5435.92 samples/sec Loss 5.6840 LearningRate 0.0861 Epoch: 9 Global Step: 101850 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:50:57,319-Speed 5430.87 samples/sec Loss 5.6871 LearningRate 0.0861 Epoch: 9 Global Step: 101860 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:51:04,918-Speed 5390.69 samples/sec Loss 5.7051 LearningRate 0.0860 Epoch: 9 Global Step: 101870 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 17:51:12,398-Speed 5476.18 samples/sec Loss 5.6924 LearningRate 0.0860 Epoch: 9 Global Step: 101880 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:51:19,925-Speed 5443.31 samples/sec Loss 5.7198 LearningRate 0.0860 Epoch: 9 Global Step: 101890 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:51:27,435-Speed 5454.15 samples/sec Loss 5.6691 LearningRate 0.0860 Epoch: 9 Global Step: 101900 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:51:34,932-Speed 5464.33 samples/sec Loss 5.6991 LearningRate 0.0860 Epoch: 9 Global Step: 101910 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:51:42,447-Speed 5450.60 samples/sec Loss 5.6906 LearningRate 0.0860 Epoch: 9 Global Step: 101920 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:51:49,937-Speed 5469.79 samples/sec Loss 5.6929 LearningRate 0.0859 Epoch: 9 Global Step: 101930 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:51:57,461-Speed 5445.24 samples/sec Loss 5.6833 LearningRate 0.0859 Epoch: 9 Global Step: 101940 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:52:04,949-Speed 5469.97 samples/sec Loss 5.7172 LearningRate 0.0859 Epoch: 9 Global Step: 101950 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:52:12,482-Speed 5437.92 samples/sec Loss 5.6871 LearningRate 0.0859 Epoch: 9 Global Step: 101960 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:52:19,944-Speed 5490.27 samples/sec Loss 5.7098 LearningRate 0.0859 Epoch: 9 Global Step: 101970 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:52:27,426-Speed 5475.50 samples/sec Loss 5.7178 LearningRate 0.0859 Epoch: 9 Global Step: 101980 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:52:35,111-Speed 5330.41 samples/sec Loss 5.7218 LearningRate 0.0858 Epoch: 9 Global Step: 101990 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:52:42,680-Speed 5412.20 samples/sec Loss 5.7095 LearningRate 0.0858 Epoch: 9 Global Step: 102000 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:53:26,276-[lfw][102000]XNorm: 24.103609 Training: 2022-01-08 17:53:26,277-[lfw][102000]Accuracy-Flip: 0.99783+-0.00279 Training: 2022-01-08 17:53:26,277-[lfw][102000]Accuracy-Highest: 0.99817 Training: 2022-01-08 17:54:18,050-[cfp_fp][102000]XNorm: 22.032256 Training: 2022-01-08 17:54:18,051-[cfp_fp][102000]Accuracy-Flip: 0.98843+-0.00555 Training: 2022-01-08 17:54:18,052-[cfp_fp][102000]Accuracy-Highest: 0.98914 Training: 2022-01-08 17:55:03,544-[agedb_30][102000]XNorm: 24.000977 Training: 2022-01-08 17:55:03,545-[agedb_30][102000]Accuracy-Flip: 0.97917+-0.00696 Training: 2022-01-08 17:55:03,545-[agedb_30][102000]Accuracy-Highest: 0.97917 Training: 2022-01-08 17:55:11,111-Speed 275.96 samples/sec Loss 5.7522 LearningRate 0.0858 Epoch: 9 Global Step: 102010 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:55:18,531-Speed 5521.65 samples/sec Loss 5.6786 LearningRate 0.0858 Epoch: 9 Global Step: 102020 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:55:26,028-Speed 5465.01 samples/sec Loss 5.6444 LearningRate 0.0858 Epoch: 9 Global Step: 102030 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:55:33,714-Speed 5330.40 samples/sec Loss 5.6498 LearningRate 0.0858 Epoch: 9 Global Step: 102040 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:55:41,346-Speed 5368.27 samples/sec Loss 5.6467 LearningRate 0.0858 Epoch: 9 Global Step: 102050 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:55:49,015-Speed 5342.53 samples/sec Loss 5.6510 LearningRate 0.0857 Epoch: 9 Global Step: 102060 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:55:56,486-Speed 5483.61 samples/sec Loss 5.6760 LearningRate 0.0857 Epoch: 9 Global Step: 102070 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:56:03,956-Speed 5484.20 samples/sec Loss 5.6673 LearningRate 0.0857 Epoch: 9 Global Step: 102080 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:56:11,503-Speed 5429.17 samples/sec Loss 5.6490 LearningRate 0.0857 Epoch: 9 Global Step: 102090 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:56:19,056-Speed 5423.75 samples/sec Loss 5.6935 LearningRate 0.0857 Epoch: 9 Global Step: 102100 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:56:26,714-Speed 5350.18 samples/sec Loss 5.7478 LearningRate 0.0857 Epoch: 9 Global Step: 102110 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:56:34,284-Speed 5411.94 samples/sec Loss 5.6988 LearningRate 0.0856 Epoch: 9 Global Step: 102120 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:56:41,860-Speed 5407.84 samples/sec Loss 5.7294 LearningRate 0.0856 Epoch: 9 Global Step: 102130 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:56:49,475-Speed 5379.75 samples/sec Loss 5.7020 LearningRate 0.0856 Epoch: 9 Global Step: 102140 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:56:56,941-Speed 5487.15 samples/sec Loss 5.7080 LearningRate 0.0856 Epoch: 9 Global Step: 102150 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:57:04,430-Speed 5470.23 samples/sec Loss 5.6420 LearningRate 0.0856 Epoch: 9 Global Step: 102160 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:57:11,868-Speed 5507.55 samples/sec Loss 5.7484 LearningRate 0.0856 Epoch: 9 Global Step: 102170 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:57:19,356-Speed 5470.57 samples/sec Loss 5.7124 LearningRate 0.0855 Epoch: 9 Global Step: 102180 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:57:26,873-Speed 5449.35 samples/sec Loss 5.7272 LearningRate 0.0855 Epoch: 9 Global Step: 102190 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:57:34,329-Speed 5494.16 samples/sec Loss 5.7271 LearningRate 0.0855 Epoch: 9 Global Step: 102200 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:57:41,784-Speed 5495.13 samples/sec Loss 5.6731 LearningRate 0.0855 Epoch: 9 Global Step: 102210 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:57:49,320-Speed 5436.64 samples/sec Loss 5.6646 LearningRate 0.0855 Epoch: 9 Global Step: 102220 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:57:56,900-Speed 5403.83 samples/sec Loss 5.7085 LearningRate 0.0855 Epoch: 9 Global Step: 102230 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:58:04,397-Speed 5464.55 samples/sec Loss 5.6535 LearningRate 0.0854 Epoch: 9 Global Step: 102240 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:58:11,876-Speed 5477.20 samples/sec Loss 5.6595 LearningRate 0.0854 Epoch: 9 Global Step: 102250 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 17:58:19,383-Speed 5457.11 samples/sec Loss 5.6643 LearningRate 0.0854 Epoch: 9 Global Step: 102260 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:58:26,834-Speed 5498.05 samples/sec Loss 5.6957 LearningRate 0.0854 Epoch: 9 Global Step: 102270 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:58:34,263-Speed 5513.85 samples/sec Loss 5.7024 LearningRate 0.0854 Epoch: 9 Global Step: 102280 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:58:41,755-Speed 5468.28 samples/sec Loss 5.6987 LearningRate 0.0854 Epoch: 9 Global Step: 102290 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:58:49,187-Speed 5511.69 samples/sec Loss 5.6870 LearningRate 0.0853 Epoch: 9 Global Step: 102300 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:58:56,638-Speed 5498.15 samples/sec Loss 5.6716 LearningRate 0.0853 Epoch: 9 Global Step: 102310 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:59:04,135-Speed 5464.07 samples/sec Loss 5.7614 LearningRate 0.0853 Epoch: 9 Global Step: 102320 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:59:11,572-Speed 5508.65 samples/sec Loss 5.6735 LearningRate 0.0853 Epoch: 9 Global Step: 102330 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:59:19,098-Speed 5442.75 samples/sec Loss 5.6487 LearningRate 0.0853 Epoch: 9 Global Step: 102340 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:59:26,651-Speed 5423.93 samples/sec Loss 5.6544 LearningRate 0.0853 Epoch: 9 Global Step: 102350 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:59:34,078-Speed 5516.18 samples/sec Loss 5.7238 LearningRate 0.0852 Epoch: 9 Global Step: 102360 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:59:41,616-Speed 5434.40 samples/sec Loss 5.7454 LearningRate 0.0852 Epoch: 9 Global Step: 102370 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:59:49,042-Speed 5516.98 samples/sec Loss 5.6866 LearningRate 0.0852 Epoch: 9 Global Step: 102380 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 17:59:56,533-Speed 5468.17 samples/sec Loss 5.6761 LearningRate 0.0852 Epoch: 9 Global Step: 102390 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:00:04,037-Speed 5459.03 samples/sec Loss 5.6651 LearningRate 0.0852 Epoch: 9 Global Step: 102400 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:00:11,575-Speed 5434.92 samples/sec Loss 5.6831 LearningRate 0.0852 Epoch: 9 Global Step: 102410 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:00:19,098-Speed 5445.87 samples/sec Loss 5.6723 LearningRate 0.0852 Epoch: 9 Global Step: 102420 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:00:26,782-Speed 5331.20 samples/sec Loss 5.6937 LearningRate 0.0851 Epoch: 9 Global Step: 102430 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:00:34,281-Speed 5462.64 samples/sec Loss 5.6690 LearningRate 0.0851 Epoch: 9 Global Step: 102440 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:00:41,820-Speed 5433.64 samples/sec Loss 5.6272 LearningRate 0.0851 Epoch: 9 Global Step: 102450 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:00:49,257-Speed 5508.63 samples/sec Loss 5.6425 LearningRate 0.0851 Epoch: 9 Global Step: 102460 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:00:56,704-Speed 5501.09 samples/sec Loss 5.6892 LearningRate 0.0851 Epoch: 9 Global Step: 102470 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:01:04,185-Speed 5475.79 samples/sec Loss 5.6646 LearningRate 0.0851 Epoch: 9 Global Step: 102480 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:01:11,673-Speed 5470.66 samples/sec Loss 5.6864 LearningRate 0.0850 Epoch: 9 Global Step: 102490 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:01:19,186-Speed 5452.74 samples/sec Loss 5.7045 LearningRate 0.0850 Epoch: 9 Global Step: 102500 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:01:26,754-Speed 5412.67 samples/sec Loss 5.6240 LearningRate 0.0850 Epoch: 9 Global Step: 102510 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:01:34,267-Speed 5452.93 samples/sec Loss 5.6552 LearningRate 0.0850 Epoch: 9 Global Step: 102520 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:01:41,867-Speed 5390.36 samples/sec Loss 5.6484 LearningRate 0.0850 Epoch: 9 Global Step: 102530 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:01:49,383-Speed 5450.38 samples/sec Loss 5.6957 LearningRate 0.0850 Epoch: 9 Global Step: 102540 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:01:56,873-Speed 5469.54 samples/sec Loss 5.6569 LearningRate 0.0849 Epoch: 9 Global Step: 102550 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:02:04,459-Speed 5399.84 samples/sec Loss 5.6547 LearningRate 0.0849 Epoch: 9 Global Step: 102560 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:02:11,942-Speed 5474.58 samples/sec Loss 5.7073 LearningRate 0.0849 Epoch: 9 Global Step: 102570 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:02:19,372-Speed 5513.35 samples/sec Loss 5.6542 LearningRate 0.0849 Epoch: 9 Global Step: 102580 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:02:26,886-Speed 5452.16 samples/sec Loss 5.6771 LearningRate 0.0849 Epoch: 9 Global Step: 102590 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:02:34,438-Speed 5424.58 samples/sec Loss 5.6152 LearningRate 0.0849 Epoch: 9 Global Step: 102600 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:02:44,618-Speed 4024.03 samples/sec Loss 5.6807 LearningRate 0.0848 Epoch: 9 Global Step: 102610 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:02:52,108-Speed 5469.12 samples/sec Loss 5.6445 LearningRate 0.0848 Epoch: 9 Global Step: 102620 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:02:59,619-Speed 5454.38 samples/sec Loss 5.6729 LearningRate 0.0848 Epoch: 9 Global Step: 102630 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:03:07,114-Speed 5464.95 samples/sec Loss 5.6096 LearningRate 0.0848 Epoch: 9 Global Step: 102640 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:03:14,667-Speed 5424.18 samples/sec Loss 5.6540 LearningRate 0.0848 Epoch: 9 Global Step: 102650 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:03:22,252-Speed 5400.97 samples/sec Loss 5.6687 LearningRate 0.0848 Epoch: 9 Global Step: 102660 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:03:29,863-Speed 5382.14 samples/sec Loss 5.6943 LearningRate 0.0847 Epoch: 9 Global Step: 102670 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:03:37,331-Speed 5485.43 samples/sec Loss 5.6968 LearningRate 0.0847 Epoch: 9 Global Step: 102680 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:03:44,804-Speed 5481.24 samples/sec Loss 5.6851 LearningRate 0.0847 Epoch: 9 Global Step: 102690 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:03:52,343-Speed 5434.23 samples/sec Loss 5.6804 LearningRate 0.0847 Epoch: 9 Global Step: 102700 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:03:59,874-Speed 5439.84 samples/sec Loss 5.6837 LearningRate 0.0847 Epoch: 9 Global Step: 102710 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:04:07,439-Speed 5414.52 samples/sec Loss 5.6949 LearningRate 0.0847 Epoch: 9 Global Step: 102720 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:04:14,899-Speed 5491.10 samples/sec Loss 5.6358 LearningRate 0.0846 Epoch: 9 Global Step: 102730 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:04:22,394-Speed 5466.11 samples/sec Loss 5.7054 LearningRate 0.0846 Epoch: 9 Global Step: 102740 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:04:29,864-Speed 5483.94 samples/sec Loss 5.6857 LearningRate 0.0846 Epoch: 9 Global Step: 102750 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:04:37,344-Speed 5476.35 samples/sec Loss 5.6354 LearningRate 0.0846 Epoch: 9 Global Step: 102760 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:04:44,812-Speed 5485.47 samples/sec Loss 5.6796 LearningRate 0.0846 Epoch: 9 Global Step: 102770 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:04:52,289-Speed 5479.05 samples/sec Loss 5.6391 LearningRate 0.0846 Epoch: 9 Global Step: 102780 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:04:59,743-Speed 5496.20 samples/sec Loss 5.6108 LearningRate 0.0846 Epoch: 9 Global Step: 102790 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:05:07,270-Speed 5442.34 samples/sec Loss 5.6510 LearningRate 0.0845 Epoch: 9 Global Step: 102800 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:05:14,731-Speed 5490.55 samples/sec Loss 5.6588 LearningRate 0.0845 Epoch: 9 Global Step: 102810 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:05:22,215-Speed 5474.20 samples/sec Loss 5.6370 LearningRate 0.0845 Epoch: 9 Global Step: 102820 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:05:29,785-Speed 5411.31 samples/sec Loss 5.5982 LearningRate 0.0845 Epoch: 9 Global Step: 102830 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:05:37,303-Speed 5449.07 samples/sec Loss 5.6151 LearningRate 0.0845 Epoch: 9 Global Step: 102840 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:05:44,736-Speed 5511.16 samples/sec Loss 5.6696 LearningRate 0.0845 Epoch: 9 Global Step: 102850 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:05:52,279-Speed 5431.04 samples/sec Loss 5.6185 LearningRate 0.0844 Epoch: 9 Global Step: 102860 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:05:59,776-Speed 5464.00 samples/sec Loss 5.6548 LearningRate 0.0844 Epoch: 9 Global Step: 102870 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:06:07,283-Speed 5456.69 samples/sec Loss 5.6607 LearningRate 0.0844 Epoch: 9 Global Step: 102880 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:06:14,746-Speed 5489.54 samples/sec Loss 5.6702 LearningRate 0.0844 Epoch: 9 Global Step: 102890 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:06:22,237-Speed 5468.27 samples/sec Loss 5.6704 LearningRate 0.0844 Epoch: 9 Global Step: 102900 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:06:29,701-Speed 5488.86 samples/sec Loss 5.6638 LearningRate 0.0844 Epoch: 9 Global Step: 102910 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:06:37,153-Speed 5497.42 samples/sec Loss 5.6181 LearningRate 0.0843 Epoch: 9 Global Step: 102920 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:06:44,716-Speed 5416.38 samples/sec Loss 5.6114 LearningRate 0.0843 Epoch: 9 Global Step: 102930 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:06:52,245-Speed 5440.65 samples/sec Loss 5.6460 LearningRate 0.0843 Epoch: 9 Global Step: 102940 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:06:59,842-Speed 5392.62 samples/sec Loss 5.6444 LearningRate 0.0843 Epoch: 9 Global Step: 102950 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:07:07,352-Speed 5454.56 samples/sec Loss 5.6855 LearningRate 0.0843 Epoch: 9 Global Step: 102960 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:07:14,877-Speed 5444.55 samples/sec Loss 5.6507 LearningRate 0.0843 Epoch: 9 Global Step: 102970 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:07:22,381-Speed 5458.75 samples/sec Loss 5.5752 LearningRate 0.0842 Epoch: 9 Global Step: 102980 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:07:29,926-Speed 5429.50 samples/sec Loss 5.7149 LearningRate 0.0842 Epoch: 9 Global Step: 102990 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:07:37,530-Speed 5387.84 samples/sec Loss 5.6278 LearningRate 0.0842 Epoch: 9 Global Step: 103000 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:07:45,380-Speed 5218.01 samples/sec Loss 5.6564 LearningRate 0.0842 Epoch: 9 Global Step: 103010 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:07:52,924-Speed 5430.10 samples/sec Loss 5.6771 LearningRate 0.0842 Epoch: 9 Global Step: 103020 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:08:00,515-Speed 5396.93 samples/sec Loss 5.6344 LearningRate 0.0842 Epoch: 9 Global Step: 103030 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:08:07,970-Speed 5495.09 samples/sec Loss 5.6440 LearningRate 0.0841 Epoch: 9 Global Step: 103040 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:08:15,505-Speed 5436.48 samples/sec Loss 5.6250 LearningRate 0.0841 Epoch: 9 Global Step: 103050 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:08:22,949-Speed 5503.36 samples/sec Loss 5.6701 LearningRate 0.0841 Epoch: 9 Global Step: 103060 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:08:30,447-Speed 5463.84 samples/sec Loss 5.6479 LearningRate 0.0841 Epoch: 9 Global Step: 103070 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:08:37,983-Speed 5435.52 samples/sec Loss 5.6079 LearningRate 0.0841 Epoch: 9 Global Step: 103080 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-01-08 18:08:45,530-Speed 5428.14 samples/sec Loss 5.6394 LearningRate 0.0841 Epoch: 9 Global Step: 103090 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:08:53,225-Speed 5323.91 samples/sec Loss 5.6251 LearningRate 0.0841 Epoch: 9 Global Step: 103100 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:09:00,889-Speed 5345.53 samples/sec Loss 5.5571 LearningRate 0.0840 Epoch: 9 Global Step: 103110 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:09:08,422-Speed 5438.12 samples/sec Loss 5.5914 LearningRate 0.0840 Epoch: 9 Global Step: 103120 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:09:15,947-Speed 5443.24 samples/sec Loss 5.5792 LearningRate 0.0840 Epoch: 9 Global Step: 103130 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:09:23,422-Speed 5480.36 samples/sec Loss 5.6150 LearningRate 0.0840 Epoch: 9 Global Step: 103140 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:09:30,918-Speed 5465.92 samples/sec Loss 5.5765 LearningRate 0.0840 Epoch: 9 Global Step: 103150 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:09:38,387-Speed 5484.16 samples/sec Loss 5.6111 LearningRate 0.0840 Epoch: 9 Global Step: 103160 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:09:45,837-Speed 5498.87 samples/sec Loss 5.5847 LearningRate 0.0839 Epoch: 9 Global Step: 103170 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:09:53,387-Speed 5425.90 samples/sec Loss 5.6061 LearningRate 0.0839 Epoch: 9 Global Step: 103180 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:10:00,859-Speed 5483.08 samples/sec Loss 5.6746 LearningRate 0.0839 Epoch: 9 Global Step: 103190 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:10:08,379-Speed 5446.72 samples/sec Loss 5.5864 LearningRate 0.0839 Epoch: 9 Global Step: 103200 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:10:15,854-Speed 5480.66 samples/sec Loss 5.6502 LearningRate 0.0839 Epoch: 9 Global Step: 103210 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:10:23,340-Speed 5472.41 samples/sec Loss 5.6257 LearningRate 0.0839 Epoch: 9 Global Step: 103220 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:10:30,949-Speed 5383.82 samples/sec Loss 5.6793 LearningRate 0.0838 Epoch: 9 Global Step: 103230 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:10:38,488-Speed 5433.93 samples/sec Loss 5.6736 LearningRate 0.0838 Epoch: 9 Global Step: 103240 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:10:45,967-Speed 5477.10 samples/sec Loss 5.6382 LearningRate 0.0838 Epoch: 9 Global Step: 103250 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:10:53,436-Speed 5485.06 samples/sec Loss 5.6396 LearningRate 0.0838 Epoch: 9 Global Step: 103260 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:11:01,053-Speed 5378.14 samples/sec Loss 5.7037 LearningRate 0.0838 Epoch: 9 Global Step: 103270 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:11:08,644-Speed 5396.93 samples/sec Loss 5.6258 LearningRate 0.0838 Epoch: 9 Global Step: 103280 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:11:16,215-Speed 5410.31 samples/sec Loss 5.6692 LearningRate 0.0837 Epoch: 9 Global Step: 103290 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:11:23,778-Speed 5416.72 samples/sec Loss 5.6434 LearningRate 0.0837 Epoch: 9 Global Step: 103300 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:11:31,284-Speed 5457.52 samples/sec Loss 5.5981 LearningRate 0.0837 Epoch: 9 Global Step: 103310 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:11:38,742-Speed 5493.34 samples/sec Loss 5.6176 LearningRate 0.0837 Epoch: 9 Global Step: 103320 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:11:46,231-Speed 5469.62 samples/sec Loss 5.5807 LearningRate 0.0837 Epoch: 9 Global Step: 103330 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:11:53,732-Speed 5460.99 samples/sec Loss 5.6416 LearningRate 0.0837 Epoch: 9 Global Step: 103340 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:12:01,204-Speed 5483.07 samples/sec Loss 5.5994 LearningRate 0.0836 Epoch: 9 Global Step: 103350 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:12:08,860-Speed 5351.10 samples/sec Loss 5.6299 LearningRate 0.0836 Epoch: 9 Global Step: 103360 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:12:16,354-Speed 5465.73 samples/sec Loss 5.7116 LearningRate 0.0836 Epoch: 9 Global Step: 103370 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:12:24,031-Speed 5335.93 samples/sec Loss 5.6290 LearningRate 0.0836 Epoch: 9 Global Step: 103380 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:12:31,708-Speed 5336.44 samples/sec Loss 5.6502 LearningRate 0.0836 Epoch: 9 Global Step: 103390 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:12:39,444-Speed 5295.55 samples/sec Loss 5.6518 LearningRate 0.0836 Epoch: 9 Global Step: 103400 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:12:46,942-Speed 5463.17 samples/sec Loss 5.6181 LearningRate 0.0836 Epoch: 9 Global Step: 103410 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:12:54,688-Speed 5288.82 samples/sec Loss 5.6528 LearningRate 0.0835 Epoch: 9 Global Step: 103420 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:13:02,172-Speed 5473.89 samples/sec Loss 5.6303 LearningRate 0.0835 Epoch: 9 Global Step: 103430 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:13:09,706-Speed 5437.66 samples/sec Loss 5.6127 LearningRate 0.0835 Epoch: 9 Global Step: 103440 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:13:19,553-Speed 4159.70 samples/sec Loss 5.5219 LearningRate 0.0835 Epoch: 9 Global Step: 103450 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:13:27,084-Speed 5439.73 samples/sec Loss 5.5856 LearningRate 0.0835 Epoch: 9 Global Step: 103460 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:13:34,603-Speed 5448.33 samples/sec Loss 5.6498 LearningRate 0.0835 Epoch: 9 Global Step: 103470 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:13:42,033-Speed 5513.36 samples/sec Loss 5.6369 LearningRate 0.0834 Epoch: 9 Global Step: 103480 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:13:49,689-Speed 5350.88 samples/sec Loss 5.6055 LearningRate 0.0834 Epoch: 9 Global Step: 103490 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:13:57,283-Speed 5394.98 samples/sec Loss 5.6486 LearningRate 0.0834 Epoch: 9 Global Step: 103500 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:14:04,771-Speed 5471.02 samples/sec Loss 5.6213 LearningRate 0.0834 Epoch: 9 Global Step: 103510 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:14:12,373-Speed 5388.36 samples/sec Loss 5.5850 LearningRate 0.0834 Epoch: 9 Global Step: 103520 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:14:19,828-Speed 5495.25 samples/sec Loss 5.5870 LearningRate 0.0834 Epoch: 9 Global Step: 103530 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:14:27,341-Speed 5452.88 samples/sec Loss 5.6132 LearningRate 0.0833 Epoch: 9 Global Step: 103540 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:14:34,883-Speed 5431.40 samples/sec Loss 5.6921 LearningRate 0.0833 Epoch: 9 Global Step: 103550 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:14:42,427-Speed 5430.64 samples/sec Loss 5.6337 LearningRate 0.0833 Epoch: 9 Global Step: 103560 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:14:49,871-Speed 5503.16 samples/sec Loss 5.5689 LearningRate 0.0833 Epoch: 9 Global Step: 103570 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:14:57,321-Speed 5498.31 samples/sec Loss 5.5844 LearningRate 0.0833 Epoch: 9 Global Step: 103580 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:15:04,996-Speed 5337.89 samples/sec Loss 5.5964 LearningRate 0.0833 Epoch: 9 Global Step: 103590 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:15:12,558-Speed 5417.30 samples/sec Loss 5.6139 LearningRate 0.0832 Epoch: 9 Global Step: 103600 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:15:20,114-Speed 5421.36 samples/sec Loss 5.6334 LearningRate 0.0832 Epoch: 9 Global Step: 103610 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:15:27,579-Speed 5487.83 samples/sec Loss 5.5316 LearningRate 0.0832 Epoch: 9 Global Step: 103620 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:15:35,137-Speed 5420.75 samples/sec Loss 5.5781 LearningRate 0.0832 Epoch: 9 Global Step: 103630 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:15:42,607-Speed 5483.50 samples/sec Loss 5.6514 LearningRate 0.0832 Epoch: 9 Global Step: 103640 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:15:50,141-Speed 5437.22 samples/sec Loss 5.6320 LearningRate 0.0832 Epoch: 9 Global Step: 103650 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:15:57,778-Speed 5364.41 samples/sec Loss 5.5927 LearningRate 0.0832 Epoch: 9 Global Step: 103660 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:16:05,347-Speed 5412.32 samples/sec Loss 5.5904 LearningRate 0.0831 Epoch: 9 Global Step: 103670 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:16:12,869-Speed 5445.99 samples/sec Loss 5.7069 LearningRate 0.0831 Epoch: 9 Global Step: 103680 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:16:20,280-Speed 5527.60 samples/sec Loss 5.6682 LearningRate 0.0831 Epoch: 9 Global Step: 103690 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:16:42,945-Speed 1807.30 samples/sec Loss 5.5986 LearningRate 0.0831 Epoch: 10 Global Step: 103700 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:16:50,382-Speed 5508.03 samples/sec Loss 5.6259 LearningRate 0.0831 Epoch: 10 Global Step: 103710 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:16:57,866-Speed 5474.37 samples/sec Loss 5.5833 LearningRate 0.0831 Epoch: 10 Global Step: 103720 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:17:05,358-Speed 5467.64 samples/sec Loss 5.6207 LearningRate 0.0830 Epoch: 10 Global Step: 103730 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:17:12,762-Speed 5532.85 samples/sec Loss 5.6011 LearningRate 0.0830 Epoch: 10 Global Step: 103740 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:17:20,166-Speed 5532.48 samples/sec Loss 5.6077 LearningRate 0.0830 Epoch: 10 Global Step: 103750 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:17:27,759-Speed 5396.07 samples/sec Loss 5.6526 LearningRate 0.0830 Epoch: 10 Global Step: 103760 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:17:35,196-Speed 5508.50 samples/sec Loss 5.5429 LearningRate 0.0830 Epoch: 10 Global Step: 103770 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:17:42,617-Speed 5520.05 samples/sec Loss 5.5894 LearningRate 0.0830 Epoch: 10 Global Step: 103780 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:17:50,022-Speed 5531.81 samples/sec Loss 5.5818 LearningRate 0.0829 Epoch: 10 Global Step: 103790 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:17:57,444-Speed 5519.83 samples/sec Loss 5.6205 LearningRate 0.0829 Epoch: 10 Global Step: 103800 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:18:04,853-Speed 5529.31 samples/sec Loss 5.6116 LearningRate 0.0829 Epoch: 10 Global Step: 103810 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:18:12,253-Speed 5535.96 samples/sec Loss 5.6013 LearningRate 0.0829 Epoch: 10 Global Step: 103820 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:18:19,788-Speed 5436.42 samples/sec Loss 5.5969 LearningRate 0.0829 Epoch: 10 Global Step: 103830 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:18:27,277-Speed 5469.83 samples/sec Loss 5.5861 LearningRate 0.0829 Epoch: 10 Global Step: 103840 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:18:34,674-Speed 5541.23 samples/sec Loss 5.5836 LearningRate 0.0828 Epoch: 10 Global Step: 103850 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:18:42,282-Speed 5384.50 samples/sec Loss 5.5996 LearningRate 0.0828 Epoch: 10 Global Step: 103860 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:18:49,836-Speed 5422.80 samples/sec Loss 5.4872 LearningRate 0.0828 Epoch: 10 Global Step: 103870 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:18:57,470-Speed 5366.51 samples/sec Loss 5.5840 LearningRate 0.0828 Epoch: 10 Global Step: 103880 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:19:05,066-Speed 5392.81 samples/sec Loss 5.5559 LearningRate 0.0828 Epoch: 10 Global Step: 103890 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:19:12,727-Speed 5347.42 samples/sec Loss 5.5640 LearningRate 0.0828 Epoch: 10 Global Step: 103900 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:19:20,319-Speed 5395.48 samples/sec Loss 5.5460 LearningRate 0.0828 Epoch: 10 Global Step: 103910 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:19:27,910-Speed 5396.77 samples/sec Loss 5.5626 LearningRate 0.0827 Epoch: 10 Global Step: 103920 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:19:35,542-Speed 5367.96 samples/sec Loss 5.6298 LearningRate 0.0827 Epoch: 10 Global Step: 103930 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:19:43,135-Speed 5394.61 samples/sec Loss 5.5947 LearningRate 0.0827 Epoch: 10 Global Step: 103940 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:19:50,726-Speed 5396.74 samples/sec Loss 5.5812 LearningRate 0.0827 Epoch: 10 Global Step: 103950 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:19:58,430-Speed 5317.37 samples/sec Loss 5.5849 LearningRate 0.0827 Epoch: 10 Global Step: 103960 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:20:06,240-Speed 5245.06 samples/sec Loss 5.5689 LearningRate 0.0827 Epoch: 10 Global Step: 103970 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:20:13,821-Speed 5403.90 samples/sec Loss 5.6072 LearningRate 0.0826 Epoch: 10 Global Step: 103980 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:20:21,416-Speed 5393.71 samples/sec Loss 5.5898 LearningRate 0.0826 Epoch: 10 Global Step: 103990 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:20:29,022-Speed 5386.11 samples/sec Loss 5.5938 LearningRate 0.0826 Epoch: 10 Global Step: 104000 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:21:13,147-[lfw][104000]XNorm: 22.918996 Training: 2022-01-08 18:21:13,147-[lfw][104000]Accuracy-Flip: 0.99767+-0.00281 Training: 2022-01-08 18:21:13,148-[lfw][104000]Accuracy-Highest: 0.99817 Training: 2022-01-08 18:22:04,797-[cfp_fp][104000]XNorm: 20.986479 Training: 2022-01-08 18:22:04,799-[cfp_fp][104000]Accuracy-Flip: 0.98914+-0.00444 Training: 2022-01-08 18:22:04,799-[cfp_fp][104000]Accuracy-Highest: 0.98914 Training: 2022-01-08 18:22:50,294-[agedb_30][104000]XNorm: 22.826693 Training: 2022-01-08 18:22:50,295-[agedb_30][104000]Accuracy-Flip: 0.97650+-0.00555 Training: 2022-01-08 18:22:50,296-[agedb_30][104000]Accuracy-Highest: 0.97917 Training: 2022-01-08 18:22:57,920-Speed 275.09 samples/sec Loss 5.5904 LearningRate 0.0826 Epoch: 10 Global Step: 104010 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:23:05,557-Speed 5364.75 samples/sec Loss 5.5722 LearningRate 0.0826 Epoch: 10 Global Step: 104020 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:23:13,075-Speed 5449.96 samples/sec Loss 5.5740 LearningRate 0.0826 Epoch: 10 Global Step: 104030 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:23:20,515-Speed 5506.95 samples/sec Loss 5.5842 LearningRate 0.0825 Epoch: 10 Global Step: 104040 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:23:27,952-Speed 5509.27 samples/sec Loss 5.5599 LearningRate 0.0825 Epoch: 10 Global Step: 104050 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:23:35,322-Speed 5559.43 samples/sec Loss 5.5466 LearningRate 0.0825 Epoch: 10 Global Step: 104060 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:23:42,768-Speed 5502.14 samples/sec Loss 5.5919 LearningRate 0.0825 Epoch: 10 Global Step: 104070 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:23:50,223-Speed 5495.37 samples/sec Loss 5.5543 LearningRate 0.0825 Epoch: 10 Global Step: 104080 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:23:57,724-Speed 5462.18 samples/sec Loss 5.5938 LearningRate 0.0825 Epoch: 10 Global Step: 104090 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:24:05,160-Speed 5509.42 samples/sec Loss 5.6096 LearningRate 0.0824 Epoch: 10 Global Step: 104100 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:24:12,675-Speed 5451.30 samples/sec Loss 5.5818 LearningRate 0.0824 Epoch: 10 Global Step: 104110 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:24:20,148-Speed 5482.78 samples/sec Loss 5.6027 LearningRate 0.0824 Epoch: 10 Global Step: 104120 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:24:27,660-Speed 5453.45 samples/sec Loss 5.5817 LearningRate 0.0824 Epoch: 10 Global Step: 104130 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:24:35,139-Speed 5478.43 samples/sec Loss 5.5842 LearningRate 0.0824 Epoch: 10 Global Step: 104140 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:24:42,547-Speed 5530.92 samples/sec Loss 5.5287 LearningRate 0.0824 Epoch: 10 Global Step: 104150 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:24:50,099-Speed 5424.37 samples/sec Loss 5.5240 LearningRate 0.0824 Epoch: 10 Global Step: 104160 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:24:57,591-Speed 5468.72 samples/sec Loss 5.5917 LearningRate 0.0823 Epoch: 10 Global Step: 104170 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:25:05,142-Speed 5425.47 samples/sec Loss 5.6116 LearningRate 0.0823 Epoch: 10 Global Step: 104180 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:25:12,272-Speed 5746.45 samples/sec Loss 5.6354 LearningRate 0.0823 Epoch: 10 Global Step: 104190 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:25:19,917-Speed 5359.26 samples/sec Loss 5.5380 LearningRate 0.0823 Epoch: 10 Global Step: 104200 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:25:27,426-Speed 5456.48 samples/sec Loss 5.5834 LearningRate 0.0823 Epoch: 10 Global Step: 104210 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:25:35,077-Speed 5354.44 samples/sec Loss 5.5826 LearningRate 0.0823 Epoch: 10 Global Step: 104220 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:25:42,529-Speed 5497.84 samples/sec Loss 5.6405 LearningRate 0.0822 Epoch: 10 Global Step: 104230 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:25:50,141-Speed 5382.05 samples/sec Loss 5.5621 LearningRate 0.0822 Epoch: 10 Global Step: 104240 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:25:57,481-Speed 5581.35 samples/sec Loss 5.5544 LearningRate 0.0822 Epoch: 10 Global Step: 104250 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:26:04,542-Speed 5802.31 samples/sec Loss 5.5538 LearningRate 0.0822 Epoch: 10 Global Step: 104260 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:26:11,988-Speed 5501.59 samples/sec Loss 5.5332 LearningRate 0.0822 Epoch: 10 Global Step: 104270 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:26:19,484-Speed 5465.93 samples/sec Loss 5.5712 LearningRate 0.0822 Epoch: 10 Global Step: 104280 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:26:27,074-Speed 5397.48 samples/sec Loss 5.5468 LearningRate 0.0821 Epoch: 10 Global Step: 104290 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:26:34,430-Speed 5569.73 samples/sec Loss 5.5796 LearningRate 0.0821 Epoch: 10 Global Step: 104300 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:26:41,708-Speed 5628.93 samples/sec Loss 5.5689 LearningRate 0.0821 Epoch: 10 Global Step: 104310 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:26:49,309-Speed 5390.25 samples/sec Loss 5.5484 LearningRate 0.0821 Epoch: 10 Global Step: 104320 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:26:56,430-Speed 5753.06 samples/sec Loss 5.6046 LearningRate 0.0821 Epoch: 10 Global Step: 104330 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:27:03,479-Speed 5811.94 samples/sec Loss 5.5522 LearningRate 0.0821 Epoch: 10 Global Step: 104340 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:27:10,633-Speed 5726.99 samples/sec Loss 5.5803 LearningRate 0.0820 Epoch: 10 Global Step: 104350 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:27:18,064-Speed 5513.66 samples/sec Loss 5.5177 LearningRate 0.0820 Epoch: 10 Global Step: 104360 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:27:25,639-Speed 5407.87 samples/sec Loss 5.5919 LearningRate 0.0820 Epoch: 10 Global Step: 104370 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:27:33,154-Speed 5452.21 samples/sec Loss 5.5449 LearningRate 0.0820 Epoch: 10 Global Step: 104380 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:27:40,341-Speed 5700.72 samples/sec Loss 5.5532 LearningRate 0.0820 Epoch: 10 Global Step: 104390 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:27:47,540-Speed 5690.29 samples/sec Loss 5.5332 LearningRate 0.0820 Epoch: 10 Global Step: 104400 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:27:55,073-Speed 5439.39 samples/sec Loss 5.5747 LearningRate 0.0820 Epoch: 10 Global Step: 104410 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:28:02,502-Speed 5514.31 samples/sec Loss 5.5677 LearningRate 0.0819 Epoch: 10 Global Step: 104420 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:28:10,135-Speed 5367.73 samples/sec Loss 5.5528 LearningRate 0.0819 Epoch: 10 Global Step: 104430 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:28:17,618-Speed 5474.82 samples/sec Loss 5.5513 LearningRate 0.0819 Epoch: 10 Global Step: 104440 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:28:25,066-Speed 5500.13 samples/sec Loss 5.5839 LearningRate 0.0819 Epoch: 10 Global Step: 104450 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:28:32,586-Speed 5448.35 samples/sec Loss 5.5399 LearningRate 0.0819 Epoch: 10 Global Step: 104460 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:28:40,139-Speed 5424.10 samples/sec Loss 5.5776 LearningRate 0.0819 Epoch: 10 Global Step: 104470 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:28:47,620-Speed 5476.44 samples/sec Loss 5.5694 LearningRate 0.0818 Epoch: 10 Global Step: 104480 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:28:54,922-Speed 5610.68 samples/sec Loss 5.5880 LearningRate 0.0818 Epoch: 10 Global Step: 104490 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:29:02,229-Speed 5606.66 samples/sec Loss 5.5900 LearningRate 0.0818 Epoch: 10 Global Step: 104500 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:29:09,727-Speed 5463.71 samples/sec Loss 5.5987 LearningRate 0.0818 Epoch: 10 Global Step: 104510 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:29:17,098-Speed 5558.81 samples/sec Loss 5.5352 LearningRate 0.0818 Epoch: 10 Global Step: 104520 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:29:24,696-Speed 5392.15 samples/sec Loss 5.5280 LearningRate 0.0818 Epoch: 10 Global Step: 104530 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:29:32,027-Speed 5588.35 samples/sec Loss 5.4868 LearningRate 0.0817 Epoch: 10 Global Step: 104540 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:29:39,532-Speed 5459.05 samples/sec Loss 5.5622 LearningRate 0.0817 Epoch: 10 Global Step: 104550 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:29:47,126-Speed 5394.64 samples/sec Loss 5.5682 LearningRate 0.0817 Epoch: 10 Global Step: 104560 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:29:54,564-Speed 5508.47 samples/sec Loss 5.5528 LearningRate 0.0817 Epoch: 10 Global Step: 104570 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:30:02,096-Speed 5438.97 samples/sec Loss 5.5680 LearningRate 0.0817 Epoch: 10 Global Step: 104580 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:30:09,608-Speed 5454.41 samples/sec Loss 5.5925 LearningRate 0.0817 Epoch: 10 Global Step: 104590 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:30:17,049-Speed 5505.78 samples/sec Loss 5.5690 LearningRate 0.0817 Epoch: 10 Global Step: 104600 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:30:24,501-Speed 5497.00 samples/sec Loss 5.5737 LearningRate 0.0816 Epoch: 10 Global Step: 104610 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:30:31,947-Speed 5502.37 samples/sec Loss 5.5786 LearningRate 0.0816 Epoch: 10 Global Step: 104620 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:30:39,490-Speed 5431.37 samples/sec Loss 5.5453 LearningRate 0.0816 Epoch: 10 Global Step: 104630 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:30:46,953-Speed 5490.05 samples/sec Loss 5.5340 LearningRate 0.0816 Epoch: 10 Global Step: 104640 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:30:54,347-Speed 5540.61 samples/sec Loss 5.4915 LearningRate 0.0816 Epoch: 10 Global Step: 104650 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-08 18:31:01,758-Speed 5529.78 samples/sec Loss 5.5978 LearningRate 0.0816 Epoch: 10 Global Step: 104660 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:31:09,163-Speed 5532.27 samples/sec Loss 5.5779 LearningRate 0.0815 Epoch: 10 Global Step: 104670 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:31:16,327-Speed 5718.54 samples/sec Loss 5.5254 LearningRate 0.0815 Epoch: 10 Global Step: 104680 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:31:23,762-Speed 5510.58 samples/sec Loss 5.5397 LearningRate 0.0815 Epoch: 10 Global Step: 104690 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:31:31,191-Speed 5514.69 samples/sec Loss 5.5817 LearningRate 0.0815 Epoch: 10 Global Step: 104700 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:31:38,484-Speed 5617.29 samples/sec Loss 5.5846 LearningRate 0.0815 Epoch: 10 Global Step: 104710 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:31:45,681-Speed 5693.05 samples/sec Loss 5.5340 LearningRate 0.0815 Epoch: 10 Global Step: 104720 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:31:52,838-Speed 5724.20 samples/sec Loss 5.4837 LearningRate 0.0814 Epoch: 10 Global Step: 104730 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:32:00,365-Speed 5442.25 samples/sec Loss 5.5342 LearningRate 0.0814 Epoch: 10 Global Step: 104740 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:32:07,886-Speed 5447.86 samples/sec Loss 5.5782 LearningRate 0.0814 Epoch: 10 Global Step: 104750 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:32:15,393-Speed 5457.10 samples/sec Loss 5.5776 LearningRate 0.0814 Epoch: 10 Global Step: 104760 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:32:22,673-Speed 5627.28 samples/sec Loss 5.5698 LearningRate 0.0814 Epoch: 10 Global Step: 104770 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:32:29,820-Speed 5731.85 samples/sec Loss 5.5697 LearningRate 0.0814 Epoch: 10 Global Step: 104780 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:32:37,002-Speed 5704.81 samples/sec Loss 5.5441 LearningRate 0.0813 Epoch: 10 Global Step: 104790 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:32:44,610-Speed 5386.30 samples/sec Loss 5.5064 LearningRate 0.0813 Epoch: 10 Global Step: 104800 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:32:52,041-Speed 5513.21 samples/sec Loss 5.5133 LearningRate 0.0813 Epoch: 10 Global Step: 104810 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:32:59,486-Speed 5503.25 samples/sec Loss 5.5858 LearningRate 0.0813 Epoch: 10 Global Step: 104820 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-01-08 18:33:06,935-Speed 5499.58 samples/sec Loss 5.5153 LearningRate 0.0813 Epoch: 10 Global Step: 104830 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:33:14,338-Speed 5534.18 samples/sec Loss 5.5086 LearningRate 0.0813 Epoch: 10 Global Step: 104840 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:33:21,838-Speed 5462.74 samples/sec Loss 5.5613 LearningRate 0.0813 Epoch: 10 Global Step: 104850 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:33:29,215-Speed 5553.50 samples/sec Loss 5.5487 LearningRate 0.0812 Epoch: 10 Global Step: 104860 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-01-08 18:33:36,837-Speed 5375.05 samples/sec Loss 5.5171 LearningRate 0.0812 Epoch: 10 Global Step: 104870 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:33:44,204-Speed 5561.30 samples/sec Loss 5.5381 LearningRate 0.0812 Epoch: 10 Global Step: 104880 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:33:51,381-Speed 5708.21 samples/sec Loss 5.5514 LearningRate 0.0812 Epoch: 10 Global Step: 104890 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:33:58,892-Speed 5454.91 samples/sec Loss 5.5299 LearningRate 0.0812 Epoch: 10 Global Step: 104900 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:34:06,377-Speed 5473.04 samples/sec Loss 5.4862 LearningRate 0.0812 Epoch: 10 Global Step: 104910 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:34:13,858-Speed 5476.46 samples/sec Loss 5.5425 LearningRate 0.0811 Epoch: 10 Global Step: 104920 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:34:21,412-Speed 5423.36 samples/sec Loss 5.4927 LearningRate 0.0811 Epoch: 10 Global Step: 104930 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:34:29,001-Speed 5398.60 samples/sec Loss 5.5771 LearningRate 0.0811 Epoch: 10 Global Step: 104940 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:34:36,433-Speed 5513.02 samples/sec Loss 5.5171 LearningRate 0.0811 Epoch: 10 Global Step: 104950 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:34:43,769-Speed 5584.52 samples/sec Loss 5.5427 LearningRate 0.0811 Epoch: 10 Global Step: 104960 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:34:51,338-Speed 5413.19 samples/sec Loss 5.4662 LearningRate 0.0811 Epoch: 10 Global Step: 104970 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:34:58,765-Speed 5516.13 samples/sec Loss 5.5327 LearningRate 0.0810 Epoch: 10 Global Step: 104980 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:35:06,375-Speed 5383.39 samples/sec Loss 5.5165 LearningRate 0.0810 Epoch: 10 Global Step: 104990 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:35:13,682-Speed 5606.74 samples/sec Loss 5.5139 LearningRate 0.0810 Epoch: 10 Global Step: 105000 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:35:20,745-Speed 5800.25 samples/sec Loss 5.5531 LearningRate 0.0810 Epoch: 10 Global Step: 105010 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:35:28,240-Speed 5466.08 samples/sec Loss 5.5548 LearningRate 0.0810 Epoch: 10 Global Step: 105020 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:35:35,670-Speed 5514.53 samples/sec Loss 5.4938 LearningRate 0.0810 Epoch: 10 Global Step: 105030 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:35:43,199-Speed 5441.10 samples/sec Loss 5.4709 LearningRate 0.0810 Epoch: 10 Global Step: 105040 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:35:50,767-Speed 5414.26 samples/sec Loss 5.5385 LearningRate 0.0809 Epoch: 10 Global Step: 105050 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:35:58,378-Speed 5382.74 samples/sec Loss 5.5712 LearningRate 0.0809 Epoch: 10 Global Step: 105060 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:36:06,097-Speed 5307.07 samples/sec Loss 5.4975 LearningRate 0.0809 Epoch: 10 Global Step: 105070 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:36:13,810-Speed 5312.17 samples/sec Loss 5.4489 LearningRate 0.0809 Epoch: 10 Global Step: 105080 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:36:21,362-Speed 5424.40 samples/sec Loss 5.5552 LearningRate 0.0809 Epoch: 10 Global Step: 105090 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:36:28,973-Speed 5383.10 samples/sec Loss 5.4961 LearningRate 0.0809 Epoch: 10 Global Step: 105100 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:36:36,591-Speed 5377.88 samples/sec Loss 5.5582 LearningRate 0.0808 Epoch: 10 Global Step: 105110 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:36:44,255-Speed 5345.85 samples/sec Loss 5.4937 LearningRate 0.0808 Epoch: 10 Global Step: 105120 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:36:51,816-Speed 5418.08 samples/sec Loss 5.5229 LearningRate 0.0808 Epoch: 10 Global Step: 105130 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:36:59,280-Speed 5488.81 samples/sec Loss 5.5206 LearningRate 0.0808 Epoch: 10 Global Step: 105140 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:37:06,872-Speed 5396.33 samples/sec Loss 5.5540 LearningRate 0.0808 Epoch: 10 Global Step: 105150 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:37:14,485-Speed 5381.74 samples/sec Loss 5.5198 LearningRate 0.0808 Epoch: 10 Global Step: 105160 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:37:21,726-Speed 5657.95 samples/sec Loss 5.4943 LearningRate 0.0807 Epoch: 10 Global Step: 105170 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:37:29,008-Speed 5625.84 samples/sec Loss 5.5590 LearningRate 0.0807 Epoch: 10 Global Step: 105180 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:37:36,500-Speed 5468.96 samples/sec Loss 5.5455 LearningRate 0.0807 Epoch: 10 Global Step: 105190 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:37:44,055-Speed 5422.79 samples/sec Loss 5.5727 LearningRate 0.0807 Epoch: 10 Global Step: 105200 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:37:52,642-Speed 5702.35 samples/sec Loss 5.5463 LearningRate 0.0807 Epoch: 10 Global Step: 105210 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:37:59,955-Speed 5601.91 samples/sec Loss 5.5015 LearningRate 0.0807 Epoch: 10 Global Step: 105220 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:38:07,501-Speed 5429.74 samples/sec Loss 5.5376 LearningRate 0.0807 Epoch: 10 Global Step: 105230 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:38:15,086-Speed 5400.93 samples/sec Loss 5.5589 LearningRate 0.0806 Epoch: 10 Global Step: 105240 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:38:22,591-Speed 5458.83 samples/sec Loss 5.5530 LearningRate 0.0806 Epoch: 10 Global Step: 105250 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:38:30,183-Speed 5396.24 samples/sec Loss 5.4896 LearningRate 0.0806 Epoch: 10 Global Step: 105260 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:38:37,729-Speed 5430.91 samples/sec Loss 5.4582 LearningRate 0.0806 Epoch: 10 Global Step: 105270 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:38:45,422-Speed 5324.91 samples/sec Loss 5.5160 LearningRate 0.0806 Epoch: 10 Global Step: 105280 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:38:52,956-Speed 5438.36 samples/sec Loss 5.4947 LearningRate 0.0806 Epoch: 10 Global Step: 105290 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:39:00,573-Speed 5378.95 samples/sec Loss 5.5015 LearningRate 0.0805 Epoch: 10 Global Step: 105300 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:39:08,070-Speed 5464.81 samples/sec Loss 5.4877 LearningRate 0.0805 Epoch: 10 Global Step: 105310 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:39:15,636-Speed 5414.30 samples/sec Loss 5.5063 LearningRate 0.0805 Epoch: 10 Global Step: 105320 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:39:23,442-Speed 5248.23 samples/sec Loss 5.5026 LearningRate 0.0805 Epoch: 10 Global Step: 105330 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:39:31,015-Speed 5409.63 samples/sec Loss 5.5286 LearningRate 0.0805 Epoch: 10 Global Step: 105340 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:39:38,553-Speed 5434.64 samples/sec Loss 5.5437 LearningRate 0.0805 Epoch: 10 Global Step: 105350 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:39:46,180-Speed 5371.59 samples/sec Loss 5.5428 LearningRate 0.0804 Epoch: 10 Global Step: 105360 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:39:53,665-Speed 5473.58 samples/sec Loss 5.4831 LearningRate 0.0804 Epoch: 10 Global Step: 105370 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:40:01,251-Speed 5400.30 samples/sec Loss 5.5140 LearningRate 0.0804 Epoch: 10 Global Step: 105380 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:40:08,805-Speed 5423.63 samples/sec Loss 5.4863 LearningRate 0.0804 Epoch: 10 Global Step: 105390 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:40:16,207-Speed 5534.52 samples/sec Loss 5.5286 LearningRate 0.0804 Epoch: 10 Global Step: 105400 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:40:23,676-Speed 5485.59 samples/sec Loss 5.5323 LearningRate 0.0804 Epoch: 10 Global Step: 105410 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:40:31,142-Speed 5487.27 samples/sec Loss 5.4812 LearningRate 0.0804 Epoch: 10 Global Step: 105420 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:40:38,679-Speed 5435.36 samples/sec Loss 5.4823 LearningRate 0.0803 Epoch: 10 Global Step: 105430 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:40:46,097-Speed 5522.51 samples/sec Loss 5.4962 LearningRate 0.0803 Epoch: 10 Global Step: 105440 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:40:53,523-Speed 5517.63 samples/sec Loss 5.5273 LearningRate 0.0803 Epoch: 10 Global Step: 105450 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:41:00,926-Speed 5533.97 samples/sec Loss 5.5108 LearningRate 0.0803 Epoch: 10 Global Step: 105460 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:41:08,221-Speed 5615.48 samples/sec Loss 5.5321 LearningRate 0.0803 Epoch: 10 Global Step: 105470 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:41:15,729-Speed 5458.68 samples/sec Loss 5.5002 LearningRate 0.0803 Epoch: 10 Global Step: 105480 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:41:23,167-Speed 5508.24 samples/sec Loss 5.5289 LearningRate 0.0802 Epoch: 10 Global Step: 105490 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:41:30,656-Speed 5470.24 samples/sec Loss 5.4721 LearningRate 0.0802 Epoch: 10 Global Step: 105500 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:41:38,136-Speed 5477.79 samples/sec Loss 5.4824 LearningRate 0.0802 Epoch: 10 Global Step: 105510 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:41:45,601-Speed 5487.46 samples/sec Loss 5.4967 LearningRate 0.0802 Epoch: 10 Global Step: 105520 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:41:53,159-Speed 5420.51 samples/sec Loss 5.5340 LearningRate 0.0802 Epoch: 10 Global Step: 105530 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:42:00,711-Speed 5424.49 samples/sec Loss 5.5441 LearningRate 0.0802 Epoch: 10 Global Step: 105540 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:42:08,176-Speed 5487.93 samples/sec Loss 5.5173 LearningRate 0.0801 Epoch: 10 Global Step: 105550 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:42:15,670-Speed 5466.61 samples/sec Loss 5.5552 LearningRate 0.0801 Epoch: 10 Global Step: 105560 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:42:23,143-Speed 5481.33 samples/sec Loss 5.5441 LearningRate 0.0801 Epoch: 10 Global Step: 105570 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:42:30,622-Speed 5477.27 samples/sec Loss 5.5072 LearningRate 0.0801 Epoch: 10 Global Step: 105580 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:42:38,146-Speed 5445.20 samples/sec Loss 5.4885 LearningRate 0.0801 Epoch: 10 Global Step: 105590 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:42:45,561-Speed 5524.54 samples/sec Loss 5.4736 LearningRate 0.0801 Epoch: 10 Global Step: 105600 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:42:53,068-Speed 5457.08 samples/sec Loss 5.5381 LearningRate 0.0801 Epoch: 10 Global Step: 105610 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:43:00,500-Speed 5512.01 samples/sec Loss 5.4504 LearningRate 0.0800 Epoch: 10 Global Step: 105620 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:43:07,975-Speed 5480.39 samples/sec Loss 5.4734 LearningRate 0.0800 Epoch: 10 Global Step: 105630 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:43:15,440-Speed 5487.34 samples/sec Loss 5.5088 LearningRate 0.0800 Epoch: 10 Global Step: 105640 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:43:22,938-Speed 5463.53 samples/sec Loss 5.4371 LearningRate 0.0800 Epoch: 10 Global Step: 105650 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:43:30,420-Speed 5475.37 samples/sec Loss 5.4730 LearningRate 0.0800 Epoch: 10 Global Step: 105660 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:43:37,969-Speed 5426.31 samples/sec Loss 5.3926 LearningRate 0.0800 Epoch: 10 Global Step: 105670 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:43:45,409-Speed 5506.81 samples/sec Loss 5.4802 LearningRate 0.0799 Epoch: 10 Global Step: 105680 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:43:52,905-Speed 5464.77 samples/sec Loss 5.4171 LearningRate 0.0799 Epoch: 10 Global Step: 105690 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:44:00,518-Speed 5380.58 samples/sec Loss 5.4930 LearningRate 0.0799 Epoch: 10 Global Step: 105700 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:44:08,020-Speed 5460.78 samples/sec Loss 5.5018 LearningRate 0.0799 Epoch: 10 Global Step: 105710 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:44:15,505-Speed 5473.38 samples/sec Loss 5.4843 LearningRate 0.0799 Epoch: 10 Global Step: 105720 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:44:22,945-Speed 5505.86 samples/sec Loss 5.4841 LearningRate 0.0799 Epoch: 10 Global Step: 105730 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:44:30,386-Speed 5505.58 samples/sec Loss 5.4576 LearningRate 0.0798 Epoch: 10 Global Step: 105740 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:44:37,839-Speed 5496.20 samples/sec Loss 5.4595 LearningRate 0.0798 Epoch: 10 Global Step: 105750 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:44:45,391-Speed 5424.36 samples/sec Loss 5.4660 LearningRate 0.0798 Epoch: 10 Global Step: 105760 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:44:52,891-Speed 5462.20 samples/sec Loss 5.5013 LearningRate 0.0798 Epoch: 10 Global Step: 105770 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:45:00,364-Speed 5482.42 samples/sec Loss 5.4621 LearningRate 0.0798 Epoch: 10 Global Step: 105780 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:45:07,823-Speed 5491.77 samples/sec Loss 5.5255 LearningRate 0.0798 Epoch: 10 Global Step: 105790 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:45:15,800-Speed 5135.12 samples/sec Loss 5.5786 LearningRate 0.0798 Epoch: 10 Global Step: 105800 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:45:23,283-Speed 5474.80 samples/sec Loss 5.4327 LearningRate 0.0797 Epoch: 10 Global Step: 105810 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:45:30,822-Speed 5433.59 samples/sec Loss 5.4394 LearningRate 0.0797 Epoch: 10 Global Step: 105820 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:45:38,316-Speed 5466.74 samples/sec Loss 5.4948 LearningRate 0.0797 Epoch: 10 Global Step: 105830 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:45:45,865-Speed 5426.49 samples/sec Loss 5.5096 LearningRate 0.0797 Epoch: 10 Global Step: 105840 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:45:53,298-Speed 5510.92 samples/sec Loss 5.4967 LearningRate 0.0797 Epoch: 10 Global Step: 105850 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:46:00,758-Speed 5491.88 samples/sec Loss 5.4969 LearningRate 0.0797 Epoch: 10 Global Step: 105860 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:46:08,237-Speed 5477.44 samples/sec Loss 5.5303 LearningRate 0.0796 Epoch: 10 Global Step: 105870 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 18:46:15,687-Speed 5498.61 samples/sec Loss 5.4703 LearningRate 0.0796 Epoch: 10 Global Step: 105880 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 18:46:23,141-Speed 5495.84 samples/sec Loss 5.4773 LearningRate 0.0796 Epoch: 10 Global Step: 105890 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 18:46:30,654-Speed 5452.83 samples/sec Loss 5.4632 LearningRate 0.0796 Epoch: 10 Global Step: 105900 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 18:46:38,102-Speed 5500.09 samples/sec Loss 5.4317 LearningRate 0.0796 Epoch: 10 Global Step: 105910 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 18:46:45,529-Speed 5515.95 samples/sec Loss 5.4666 LearningRate 0.0796 Epoch: 10 Global Step: 105920 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 18:46:53,010-Speed 5475.61 samples/sec Loss 5.5002 LearningRate 0.0796 Epoch: 10 Global Step: 105930 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 18:47:00,563-Speed 5423.86 samples/sec Loss 5.5083 LearningRate 0.0795 Epoch: 10 Global Step: 105940 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 18:47:08,184-Speed 5375.81 samples/sec Loss 5.5088 LearningRate 0.0795 Epoch: 10 Global Step: 105950 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 18:47:15,767-Speed 5401.88 samples/sec Loss 5.5120 LearningRate 0.0795 Epoch: 10 Global Step: 105960 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 18:47:23,254-Speed 5471.91 samples/sec Loss 5.4972 LearningRate 0.0795 Epoch: 10 Global Step: 105970 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:47:30,750-Speed 5464.69 samples/sec Loss 5.5286 LearningRate 0.0795 Epoch: 10 Global Step: 105980 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:47:38,242-Speed 5468.41 samples/sec Loss 5.5160 LearningRate 0.0795 Epoch: 10 Global Step: 105990 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:47:45,683-Speed 5504.83 samples/sec Loss 5.4761 LearningRate 0.0794 Epoch: 10 Global Step: 106000 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:48:29,990-[lfw][106000]XNorm: 23.475606 Training: 2022-01-08 18:48:29,991-[lfw][106000]Accuracy-Flip: 0.99800+-0.00287 Training: 2022-01-08 18:48:29,991-[lfw][106000]Accuracy-Highest: 0.99817 Training: 2022-01-08 18:49:21,301-[cfp_fp][106000]XNorm: 21.649124 Training: 2022-01-08 18:49:21,301-[cfp_fp][106000]Accuracy-Flip: 0.99043+-0.00429 Training: 2022-01-08 18:49:21,302-[cfp_fp][106000]Accuracy-Highest: 0.99043 Training: 2022-01-08 18:50:05,646-[agedb_30][106000]XNorm: 23.216536 Training: 2022-01-08 18:50:05,647-[agedb_30][106000]Accuracy-Flip: 0.97783+-0.00619 Training: 2022-01-08 18:50:05,647-[agedb_30][106000]Accuracy-Highest: 0.97917 Training: 2022-01-08 18:50:13,161-Speed 277.74 samples/sec Loss 5.5376 LearningRate 0.0794 Epoch: 10 Global Step: 106010 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:50:20,667-Speed 5457.92 samples/sec Loss 5.4482 LearningRate 0.0794 Epoch: 10 Global Step: 106020 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:50:28,138-Speed 5484.40 samples/sec Loss 5.5169 LearningRate 0.0794 Epoch: 10 Global Step: 106030 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:50:35,578-Speed 5506.48 samples/sec Loss 5.4660 LearningRate 0.0794 Epoch: 10 Global Step: 106040 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:50:43,025-Speed 5501.57 samples/sec Loss 5.4478 LearningRate 0.0794 Epoch: 10 Global Step: 106050 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:50:50,468-Speed 5504.53 samples/sec Loss 5.5010 LearningRate 0.0793 Epoch: 10 Global Step: 106060 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:50:57,915-Speed 5501.76 samples/sec Loss 5.4412 LearningRate 0.0793 Epoch: 10 Global Step: 106070 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:51:05,362-Speed 5500.96 samples/sec Loss 5.4483 LearningRate 0.0793 Epoch: 10 Global Step: 106080 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:51:12,826-Speed 5488.82 samples/sec Loss 5.4830 LearningRate 0.0793 Epoch: 10 Global Step: 106090 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:51:20,375-Speed 5427.51 samples/sec Loss 5.5091 LearningRate 0.0793 Epoch: 10 Global Step: 106100 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:51:27,839-Speed 5489.30 samples/sec Loss 5.4965 LearningRate 0.0793 Epoch: 10 Global Step: 106110 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:51:35,054-Speed 5679.39 samples/sec Loss 5.4489 LearningRate 0.0793 Epoch: 10 Global Step: 106120 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:51:42,597-Speed 5431.71 samples/sec Loss 5.4778 LearningRate 0.0792 Epoch: 10 Global Step: 106130 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:51:50,147-Speed 5425.84 samples/sec Loss 5.4635 LearningRate 0.0792 Epoch: 10 Global Step: 106140 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:51:57,557-Speed 5529.17 samples/sec Loss 5.4921 LearningRate 0.0792 Epoch: 10 Global Step: 106150 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:52:05,023-Speed 5487.24 samples/sec Loss 5.4393 LearningRate 0.0792 Epoch: 10 Global Step: 106160 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:52:12,162-Speed 5738.89 samples/sec Loss 5.4254 LearningRate 0.0792 Epoch: 10 Global Step: 106170 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:52:19,671-Speed 5456.29 samples/sec Loss 5.4472 LearningRate 0.0792 Epoch: 10 Global Step: 106180 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:52:27,126-Speed 5495.73 samples/sec Loss 5.4386 LearningRate 0.0791 Epoch: 10 Global Step: 106190 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:52:34,643-Speed 5449.88 samples/sec Loss 5.4464 LearningRate 0.0791 Epoch: 10 Global Step: 106200 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:52:42,173-Speed 5440.92 samples/sec Loss 5.4625 LearningRate 0.0791 Epoch: 10 Global Step: 106210 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:52:49,639-Speed 5487.98 samples/sec Loss 5.4927 LearningRate 0.0791 Epoch: 10 Global Step: 106220 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:52:57,100-Speed 5491.23 samples/sec Loss 5.4768 LearningRate 0.0791 Epoch: 10 Global Step: 106230 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:53:04,575-Speed 5480.45 samples/sec Loss 5.4820 LearningRate 0.0791 Epoch: 10 Global Step: 106240 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:53:11,998-Speed 5519.17 samples/sec Loss 5.4732 LearningRate 0.0790 Epoch: 10 Global Step: 106250 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:53:19,442-Speed 5504.06 samples/sec Loss 5.5514 LearningRate 0.0790 Epoch: 10 Global Step: 106260 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:53:26,983-Speed 5432.46 samples/sec Loss 5.4769 LearningRate 0.0790 Epoch: 10 Global Step: 106270 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:53:34,489-Speed 5458.27 samples/sec Loss 5.4062 LearningRate 0.0790 Epoch: 10 Global Step: 106280 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:53:41,951-Speed 5490.41 samples/sec Loss 5.4200 LearningRate 0.0790 Epoch: 10 Global Step: 106290 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:53:49,350-Speed 5537.32 samples/sec Loss 5.4700 LearningRate 0.0790 Epoch: 10 Global Step: 106300 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:53:56,774-Speed 5518.29 samples/sec Loss 5.4402 LearningRate 0.0790 Epoch: 10 Global Step: 106310 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:54:04,184-Speed 5528.78 samples/sec Loss 5.4776 LearningRate 0.0789 Epoch: 10 Global Step: 106320 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:54:11,579-Speed 5539.97 samples/sec Loss 5.4487 LearningRate 0.0789 Epoch: 10 Global Step: 106330 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:54:19,084-Speed 5459.03 samples/sec Loss 5.4346 LearningRate 0.0789 Epoch: 10 Global Step: 106340 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:54:26,458-Speed 5555.37 samples/sec Loss 5.4871 LearningRate 0.0789 Epoch: 10 Global Step: 106350 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:54:33,889-Speed 5513.65 samples/sec Loss 5.4531 LearningRate 0.0789 Epoch: 10 Global Step: 106360 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:54:41,178-Speed 5620.37 samples/sec Loss 5.4854 LearningRate 0.0789 Epoch: 10 Global Step: 106370 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:54:48,684-Speed 5458.85 samples/sec Loss 5.4502 LearningRate 0.0788 Epoch: 10 Global Step: 106380 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:54:56,102-Speed 5522.50 samples/sec Loss 5.4986 LearningRate 0.0788 Epoch: 10 Global Step: 106390 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:55:03,440-Speed 5583.19 samples/sec Loss 5.4629 LearningRate 0.0788 Epoch: 10 Global Step: 106400 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:55:10,465-Speed 5831.81 samples/sec Loss 5.4762 LearningRate 0.0788 Epoch: 10 Global Step: 106410 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:55:17,877-Speed 5527.73 samples/sec Loss 5.4461 LearningRate 0.0788 Epoch: 10 Global Step: 106420 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:55:25,683-Speed 5248.53 samples/sec Loss 5.4205 LearningRate 0.0788 Epoch: 10 Global Step: 106430 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:55:33,319-Speed 5364.78 samples/sec Loss 5.4479 LearningRate 0.0788 Epoch: 10 Global Step: 106440 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:55:40,735-Speed 5524.33 samples/sec Loss 5.4379 LearningRate 0.0787 Epoch: 10 Global Step: 106450 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:55:48,318-Speed 5402.86 samples/sec Loss 5.4680 LearningRate 0.0787 Epoch: 10 Global Step: 106460 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:55:55,820-Speed 5461.54 samples/sec Loss 5.4469 LearningRate 0.0787 Epoch: 10 Global Step: 106470 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:56:03,315-Speed 5466.07 samples/sec Loss 5.4273 LearningRate 0.0787 Epoch: 10 Global Step: 106480 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:56:10,760-Speed 5502.51 samples/sec Loss 5.4279 LearningRate 0.0787 Epoch: 10 Global Step: 106490 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:56:18,221-Speed 5490.85 samples/sec Loss 5.4883 LearningRate 0.0787 Epoch: 10 Global Step: 106500 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:56:25,696-Speed 5480.72 samples/sec Loss 5.4908 LearningRate 0.0786 Epoch: 10 Global Step: 106510 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:56:33,224-Speed 5442.96 samples/sec Loss 5.5101 LearningRate 0.0786 Epoch: 10 Global Step: 106520 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:56:40,679-Speed 5494.65 samples/sec Loss 5.4508 LearningRate 0.0786 Epoch: 10 Global Step: 106530 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:56:48,157-Speed 5477.68 samples/sec Loss 5.4556 LearningRate 0.0786 Epoch: 10 Global Step: 106540 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:56:55,686-Speed 5441.49 samples/sec Loss 5.4545 LearningRate 0.0786 Epoch: 10 Global Step: 106550 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:57:03,399-Speed 5311.15 samples/sec Loss 5.4490 LearningRate 0.0786 Epoch: 10 Global Step: 106560 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:57:19,005-Speed 2624.82 samples/sec Loss 5.4241 LearningRate 0.0786 Epoch: 10 Global Step: 106570 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:57:26,399-Speed 5540.53 samples/sec Loss 5.4505 LearningRate 0.0785 Epoch: 10 Global Step: 106580 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:57:33,871-Speed 5482.84 samples/sec Loss 5.4367 LearningRate 0.0785 Epoch: 10 Global Step: 106590 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:57:41,355-Speed 5473.69 samples/sec Loss 5.4846 LearningRate 0.0785 Epoch: 10 Global Step: 106600 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:57:48,945-Speed 5397.54 samples/sec Loss 5.4764 LearningRate 0.0785 Epoch: 10 Global Step: 106610 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:57:56,385-Speed 5506.61 samples/sec Loss 5.4401 LearningRate 0.0785 Epoch: 10 Global Step: 106620 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:58:03,840-Speed 5494.58 samples/sec Loss 5.4064 LearningRate 0.0785 Epoch: 10 Global Step: 106630 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:58:11,288-Speed 5500.26 samples/sec Loss 5.4462 LearningRate 0.0784 Epoch: 10 Global Step: 106640 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:58:18,831-Speed 5431.32 samples/sec Loss 5.4794 LearningRate 0.0784 Epoch: 10 Global Step: 106650 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:58:26,330-Speed 5462.79 samples/sec Loss 5.4341 LearningRate 0.0784 Epoch: 10 Global Step: 106660 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:58:33,744-Speed 5525.79 samples/sec Loss 5.4512 LearningRate 0.0784 Epoch: 10 Global Step: 106670 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:58:41,217-Speed 5481.22 samples/sec Loss 5.4438 LearningRate 0.0784 Epoch: 10 Global Step: 106680 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:58:48,647-Speed 5514.20 samples/sec Loss 5.4363 LearningRate 0.0784 Epoch: 10 Global Step: 106690 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 18:58:56,078-Speed 5512.43 samples/sec Loss 5.4485 LearningRate 0.0783 Epoch: 10 Global Step: 106700 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:59:03,518-Speed 5506.24 samples/sec Loss 5.4645 LearningRate 0.0783 Epoch: 10 Global Step: 106710 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:59:11,028-Speed 5454.75 samples/sec Loss 5.4383 LearningRate 0.0783 Epoch: 10 Global Step: 106720 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:59:18,537-Speed 5455.59 samples/sec Loss 5.4117 LearningRate 0.0783 Epoch: 10 Global Step: 106730 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:59:26,047-Speed 5454.92 samples/sec Loss 5.4278 LearningRate 0.0783 Epoch: 10 Global Step: 106740 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:59:33,471-Speed 5517.84 samples/sec Loss 5.4206 LearningRate 0.0783 Epoch: 10 Global Step: 106750 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:59:40,886-Speed 5524.10 samples/sec Loss 5.4467 LearningRate 0.0783 Epoch: 10 Global Step: 106760 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:59:48,409-Speed 5446.19 samples/sec Loss 5.4756 LearningRate 0.0782 Epoch: 10 Global Step: 106770 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 18:59:55,886-Speed 5478.17 samples/sec Loss 5.4535 LearningRate 0.0782 Epoch: 10 Global Step: 106780 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:00:03,318-Speed 5512.35 samples/sec Loss 5.4313 LearningRate 0.0782 Epoch: 10 Global Step: 106790 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:00:10,785-Speed 5486.33 samples/sec Loss 5.4337 LearningRate 0.0782 Epoch: 10 Global Step: 106800 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:00:18,257-Speed 5482.41 samples/sec Loss 5.4509 LearningRate 0.0782 Epoch: 10 Global Step: 106810 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:00:25,768-Speed 5453.95 samples/sec Loss 5.4783 LearningRate 0.0782 Epoch: 10 Global Step: 106820 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:00:33,251-Speed 5474.71 samples/sec Loss 5.4210 LearningRate 0.0781 Epoch: 10 Global Step: 106830 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:00:40,726-Speed 5480.36 samples/sec Loss 5.4043 LearningRate 0.0781 Epoch: 10 Global Step: 106840 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:00:48,160-Speed 5510.31 samples/sec Loss 5.4343 LearningRate 0.0781 Epoch: 10 Global Step: 106850 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:00:55,677-Speed 5449.83 samples/sec Loss 5.4602 LearningRate 0.0781 Epoch: 10 Global Step: 106860 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:01:03,154-Speed 5479.16 samples/sec Loss 5.4240 LearningRate 0.0781 Epoch: 10 Global Step: 106870 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:01:10,626-Speed 5482.48 samples/sec Loss 5.4648 LearningRate 0.0781 Epoch: 10 Global Step: 106880 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:01:18,116-Speed 5469.72 samples/sec Loss 5.4503 LearningRate 0.0781 Epoch: 10 Global Step: 106890 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:01:25,541-Speed 5517.38 samples/sec Loss 5.3629 LearningRate 0.0780 Epoch: 10 Global Step: 106900 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:01:33,039-Speed 5463.31 samples/sec Loss 5.4023 LearningRate 0.0780 Epoch: 10 Global Step: 106910 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:01:40,527-Speed 5471.19 samples/sec Loss 5.4404 LearningRate 0.0780 Epoch: 10 Global Step: 106920 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:01:47,986-Speed 5491.70 samples/sec Loss 5.3870 LearningRate 0.0780 Epoch: 10 Global Step: 106930 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:01:55,450-Speed 5488.97 samples/sec Loss 5.4066 LearningRate 0.0780 Epoch: 10 Global Step: 106940 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:02:02,877-Speed 5515.41 samples/sec Loss 5.4331 LearningRate 0.0780 Epoch: 10 Global Step: 106950 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:02:10,403-Speed 5443.23 samples/sec Loss 5.4404 LearningRate 0.0779 Epoch: 10 Global Step: 106960 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:02:17,937-Speed 5437.67 samples/sec Loss 5.4421 LearningRate 0.0779 Epoch: 10 Global Step: 106970 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:02:25,426-Speed 5469.75 samples/sec Loss 5.3935 LearningRate 0.0779 Epoch: 10 Global Step: 106980 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:02:32,875-Speed 5499.45 samples/sec Loss 5.3764 LearningRate 0.0779 Epoch: 10 Global Step: 106990 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:02:40,375-Speed 5462.37 samples/sec Loss 5.4239 LearningRate 0.0779 Epoch: 10 Global Step: 107000 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:02:47,893-Speed 5448.68 samples/sec Loss 5.4014 LearningRate 0.0779 Epoch: 10 Global Step: 107010 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:02:55,346-Speed 5496.84 samples/sec Loss 5.4768 LearningRate 0.0779 Epoch: 10 Global Step: 107020 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:03:02,822-Speed 5479.56 samples/sec Loss 5.4058 LearningRate 0.0778 Epoch: 10 Global Step: 107030 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:03:10,271-Speed 5499.72 samples/sec Loss 5.4541 LearningRate 0.0778 Epoch: 10 Global Step: 107040 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:03:17,752-Speed 5475.55 samples/sec Loss 5.3659 LearningRate 0.0778 Epoch: 10 Global Step: 107050 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:03:25,369-Speed 5378.03 samples/sec Loss 5.3846 LearningRate 0.0778 Epoch: 10 Global Step: 107060 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:03:32,894-Speed 5444.57 samples/sec Loss 5.3929 LearningRate 0.0778 Epoch: 10 Global Step: 107070 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:03:40,540-Speed 5357.77 samples/sec Loss 5.4146 LearningRate 0.0778 Epoch: 10 Global Step: 107080 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:03:48,091-Speed 5425.08 samples/sec Loss 5.4354 LearningRate 0.0777 Epoch: 10 Global Step: 107090 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:03:55,617-Speed 5443.34 samples/sec Loss 5.4312 LearningRate 0.0777 Epoch: 10 Global Step: 107100 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:04:03,113-Speed 5464.35 samples/sec Loss 5.4658 LearningRate 0.0777 Epoch: 10 Global Step: 107110 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:04:10,639-Speed 5443.72 samples/sec Loss 5.4496 LearningRate 0.0777 Epoch: 10 Global Step: 107120 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:04:18,106-Speed 5486.25 samples/sec Loss 5.3899 LearningRate 0.0777 Epoch: 10 Global Step: 107130 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:04:25,621-Speed 5451.23 samples/sec Loss 5.4112 LearningRate 0.0777 Epoch: 10 Global Step: 107140 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:04:33,206-Speed 5400.71 samples/sec Loss 5.3492 LearningRate 0.0776 Epoch: 10 Global Step: 107150 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:04:40,740-Speed 5437.13 samples/sec Loss 5.3583 LearningRate 0.0776 Epoch: 10 Global Step: 107160 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:04:48,151-Speed 5527.84 samples/sec Loss 5.4487 LearningRate 0.0776 Epoch: 10 Global Step: 107170 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:04:55,639-Speed 5470.79 samples/sec Loss 5.4532 LearningRate 0.0776 Epoch: 10 Global Step: 107180 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:05:03,093-Speed 5495.67 samples/sec Loss 5.3635 LearningRate 0.0776 Epoch: 10 Global Step: 107190 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:05:10,562-Speed 5484.38 samples/sec Loss 5.4330 LearningRate 0.0776 Epoch: 10 Global Step: 107200 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:05:18,045-Speed 5474.72 samples/sec Loss 5.3867 LearningRate 0.0776 Epoch: 10 Global Step: 107210 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:05:25,537-Speed 5468.32 samples/sec Loss 5.4135 LearningRate 0.0775 Epoch: 10 Global Step: 107220 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:05:33,057-Speed 5447.42 samples/sec Loss 5.3486 LearningRate 0.0775 Epoch: 10 Global Step: 107230 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:05:40,462-Speed 5531.98 samples/sec Loss 5.3703 LearningRate 0.0775 Epoch: 10 Global Step: 107240 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:05:47,937-Speed 5480.13 samples/sec Loss 5.4330 LearningRate 0.0775 Epoch: 10 Global Step: 107250 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:05:55,381-Speed 5503.31 samples/sec Loss 5.4159 LearningRate 0.0775 Epoch: 10 Global Step: 107260 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:06:02,794-Speed 5526.36 samples/sec Loss 5.4095 LearningRate 0.0775 Epoch: 10 Global Step: 107270 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:06:10,306-Speed 5453.19 samples/sec Loss 5.4386 LearningRate 0.0774 Epoch: 10 Global Step: 107280 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:06:17,750-Speed 5503.35 samples/sec Loss 5.3776 LearningRate 0.0774 Epoch: 10 Global Step: 107290 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:06:25,211-Speed 5490.97 samples/sec Loss 5.4685 LearningRate 0.0774 Epoch: 10 Global Step: 107300 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:06:32,807-Speed 5392.87 samples/sec Loss 5.4210 LearningRate 0.0774 Epoch: 10 Global Step: 107310 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:06:40,370-Speed 5416.18 samples/sec Loss 5.4276 LearningRate 0.0774 Epoch: 10 Global Step: 107320 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:06:47,918-Speed 5427.87 samples/sec Loss 5.4158 LearningRate 0.0774 Epoch: 10 Global Step: 107330 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:06:55,476-Speed 5420.11 samples/sec Loss 5.4155 LearningRate 0.0774 Epoch: 10 Global Step: 107340 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:07:03,008-Speed 5438.84 samples/sec Loss 5.3900 LearningRate 0.0773 Epoch: 10 Global Step: 107350 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:07:10,583-Speed 5408.01 samples/sec Loss 5.3468 LearningRate 0.0773 Epoch: 10 Global Step: 107360 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:07:18,107-Speed 5444.68 samples/sec Loss 5.4155 LearningRate 0.0773 Epoch: 10 Global Step: 107370 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:07:25,694-Speed 5399.43 samples/sec Loss 5.4121 LearningRate 0.0773 Epoch: 10 Global Step: 107380 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:07:33,244-Speed 5425.44 samples/sec Loss 5.4246 LearningRate 0.0773 Epoch: 10 Global Step: 107390 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:07:40,730-Speed 5472.05 samples/sec Loss 5.3701 LearningRate 0.0773 Epoch: 10 Global Step: 107400 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:07:48,192-Speed 5489.95 samples/sec Loss 5.3497 LearningRate 0.0772 Epoch: 10 Global Step: 107410 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:07:55,615-Speed 5518.86 samples/sec Loss 5.3809 LearningRate 0.0772 Epoch: 10 Global Step: 107420 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:08:03,091-Speed 5479.69 samples/sec Loss 5.3936 LearningRate 0.0772 Epoch: 10 Global Step: 107430 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:08:10,677-Speed 5400.38 samples/sec Loss 5.3644 LearningRate 0.0772 Epoch: 10 Global Step: 107440 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:08:18,143-Speed 5486.40 samples/sec Loss 5.3917 LearningRate 0.0772 Epoch: 10 Global Step: 107450 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:08:25,610-Speed 5486.81 samples/sec Loss 5.3602 LearningRate 0.0772 Epoch: 10 Global Step: 107460 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:08:33,140-Speed 5440.38 samples/sec Loss 5.3650 LearningRate 0.0772 Epoch: 10 Global Step: 107470 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:08:40,640-Speed 5461.55 samples/sec Loss 5.3833 LearningRate 0.0771 Epoch: 10 Global Step: 107480 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:08:48,173-Speed 5437.90 samples/sec Loss 5.4360 LearningRate 0.0771 Epoch: 10 Global Step: 107490 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:08:55,727-Speed 5423.25 samples/sec Loss 5.3551 LearningRate 0.0771 Epoch: 10 Global Step: 107500 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:09:03,395-Speed 5342.75 samples/sec Loss 5.3781 LearningRate 0.0771 Epoch: 10 Global Step: 107510 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:09:10,901-Speed 5456.97 samples/sec Loss 5.3735 LearningRate 0.0771 Epoch: 10 Global Step: 107520 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:09:18,465-Speed 5415.97 samples/sec Loss 5.4340 LearningRate 0.0771 Epoch: 10 Global Step: 107530 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:09:25,992-Speed 5442.95 samples/sec Loss 5.4024 LearningRate 0.0770 Epoch: 10 Global Step: 107540 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:09:33,754-Speed 5277.62 samples/sec Loss 5.3942 LearningRate 0.0770 Epoch: 10 Global Step: 107550 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:09:41,314-Speed 5417.96 samples/sec Loss 5.3349 LearningRate 0.0770 Epoch: 10 Global Step: 107560 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:09:48,859-Speed 5430.08 samples/sec Loss 5.3481 LearningRate 0.0770 Epoch: 10 Global Step: 107570 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:09:56,340-Speed 5475.57 samples/sec Loss 5.3956 LearningRate 0.0770 Epoch: 10 Global Step: 107580 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:10:03,798-Speed 5493.22 samples/sec Loss 5.3608 LearningRate 0.0770 Epoch: 10 Global Step: 107590 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:10:11,299-Speed 5461.36 samples/sec Loss 5.3564 LearningRate 0.0770 Epoch: 10 Global Step: 107600 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:10:18,799-Speed 5461.88 samples/sec Loss 5.3899 LearningRate 0.0769 Epoch: 10 Global Step: 107610 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:10:26,351-Speed 5423.94 samples/sec Loss 5.3722 LearningRate 0.0769 Epoch: 10 Global Step: 107620 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:10:33,859-Speed 5456.65 samples/sec Loss 5.3630 LearningRate 0.0769 Epoch: 10 Global Step: 107630 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:10:41,340-Speed 5475.88 samples/sec Loss 5.3609 LearningRate 0.0769 Epoch: 10 Global Step: 107640 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:10:48,819-Speed 5477.26 samples/sec Loss 5.3817 LearningRate 0.0769 Epoch: 10 Global Step: 107650 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:10:56,342-Speed 5445.40 samples/sec Loss 5.3209 LearningRate 0.0769 Epoch: 10 Global Step: 107660 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:11:03,794-Speed 5497.76 samples/sec Loss 5.3009 LearningRate 0.0768 Epoch: 10 Global Step: 107670 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:11:11,238-Speed 5502.65 samples/sec Loss 5.3954 LearningRate 0.0768 Epoch: 10 Global Step: 107680 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:11:18,708-Speed 5484.55 samples/sec Loss 5.3435 LearningRate 0.0768 Epoch: 10 Global Step: 107690 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:11:26,168-Speed 5490.74 samples/sec Loss 5.3606 LearningRate 0.0768 Epoch: 10 Global Step: 107700 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:11:33,605-Speed 5508.87 samples/sec Loss 5.4451 LearningRate 0.0768 Epoch: 10 Global Step: 107710 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:11:40,997-Speed 5541.70 samples/sec Loss 5.4200 LearningRate 0.0768 Epoch: 10 Global Step: 107720 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:11:48,525-Speed 5441.77 samples/sec Loss 5.3649 LearningRate 0.0768 Epoch: 10 Global Step: 107730 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:11:55,929-Speed 5532.60 samples/sec Loss 5.3790 LearningRate 0.0767 Epoch: 10 Global Step: 107740 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:12:03,455-Speed 5443.30 samples/sec Loss 5.3720 LearningRate 0.0767 Epoch: 10 Global Step: 107750 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:12:10,928-Speed 5482.35 samples/sec Loss 5.4046 LearningRate 0.0767 Epoch: 10 Global Step: 107760 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:12:18,407-Speed 5476.75 samples/sec Loss 5.3797 LearningRate 0.0767 Epoch: 10 Global Step: 107770 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:12:25,914-Speed 5457.60 samples/sec Loss 5.3784 LearningRate 0.0767 Epoch: 10 Global Step: 107780 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:12:33,342-Speed 5514.75 samples/sec Loss 5.3790 LearningRate 0.0767 Epoch: 10 Global Step: 107790 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:12:40,789-Speed 5500.95 samples/sec Loss 5.3614 LearningRate 0.0766 Epoch: 10 Global Step: 107800 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:12:48,360-Speed 5410.64 samples/sec Loss 5.3468 LearningRate 0.0766 Epoch: 10 Global Step: 107810 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:12:55,793-Speed 5511.39 samples/sec Loss 5.4119 LearningRate 0.0766 Epoch: 10 Global Step: 107820 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:13:03,310-Speed 5449.88 samples/sec Loss 5.3751 LearningRate 0.0766 Epoch: 10 Global Step: 107830 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:13:10,786-Speed 5479.67 samples/sec Loss 5.3293 LearningRate 0.0766 Epoch: 10 Global Step: 107840 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:13:18,308-Speed 5446.13 samples/sec Loss 5.3463 LearningRate 0.0766 Epoch: 10 Global Step: 107850 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:13:25,773-Speed 5488.02 samples/sec Loss 5.3349 LearningRate 0.0766 Epoch: 10 Global Step: 107860 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:13:33,235-Speed 5489.74 samples/sec Loss 5.3661 LearningRate 0.0765 Epoch: 10 Global Step: 107870 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:13:40,733-Speed 5463.48 samples/sec Loss 5.3227 LearningRate 0.0765 Epoch: 10 Global Step: 107880 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:13:48,239-Speed 5457.66 samples/sec Loss 5.4020 LearningRate 0.0765 Epoch: 10 Global Step: 107890 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:13:55,746-Speed 5456.79 samples/sec Loss 5.3358 LearningRate 0.0765 Epoch: 10 Global Step: 107900 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:14:03,232-Speed 5472.78 samples/sec Loss 5.3645 LearningRate 0.0765 Epoch: 10 Global Step: 107910 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:14:10,778-Speed 5428.71 samples/sec Loss 5.2709 LearningRate 0.0765 Epoch: 10 Global Step: 107920 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:14:18,322-Speed 5430.47 samples/sec Loss 5.3379 LearningRate 0.0764 Epoch: 10 Global Step: 107930 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:14:25,902-Speed 5404.26 samples/sec Loss 5.3289 LearningRate 0.0764 Epoch: 10 Global Step: 107940 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:14:33,540-Speed 5363.05 samples/sec Loss 5.3918 LearningRate 0.0764 Epoch: 10 Global Step: 107950 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:14:41,100-Speed 5419.25 samples/sec Loss 5.3740 LearningRate 0.0764 Epoch: 10 Global Step: 107960 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:14:48,742-Speed 5360.63 samples/sec Loss 5.3083 LearningRate 0.0764 Epoch: 10 Global Step: 107970 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:14:56,180-Speed 5507.50 samples/sec Loss 5.3475 LearningRate 0.0764 Epoch: 10 Global Step: 107980 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:15:03,694-Speed 5451.59 samples/sec Loss 5.3314 LearningRate 0.0764 Epoch: 10 Global Step: 107990 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:15:11,115-Speed 5520.38 samples/sec Loss 5.3657 LearningRate 0.0763 Epoch: 10 Global Step: 108000 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:15:55,173-[lfw][108000]XNorm: 21.434070 Training: 2022-01-08 19:15:55,174-[lfw][108000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-08 19:15:55,175-[lfw][108000]Accuracy-Highest: 0.99817 Training: 2022-01-08 19:16:46,395-[cfp_fp][108000]XNorm: 19.724399 Training: 2022-01-08 19:16:46,396-[cfp_fp][108000]Accuracy-Flip: 0.98900+-0.00507 Training: 2022-01-08 19:16:46,397-[cfp_fp][108000]Accuracy-Highest: 0.99043 Training: 2022-01-08 19:17:30,557-[agedb_30][108000]XNorm: 21.423153 Training: 2022-01-08 19:17:30,558-[agedb_30][108000]Accuracy-Flip: 0.97867+-0.00557 Training: 2022-01-08 19:17:30,559-[agedb_30][108000]Accuracy-Highest: 0.97917 Training: 2022-01-08 19:17:37,783-Speed 279.27 samples/sec Loss 5.4095 LearningRate 0.0763 Epoch: 10 Global Step: 108010 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:17:45,179-Speed 5540.36 samples/sec Loss 5.4122 LearningRate 0.0763 Epoch: 10 Global Step: 108020 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:17:52,638-Speed 5491.64 samples/sec Loss 5.4180 LearningRate 0.0763 Epoch: 10 Global Step: 108030 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:18:00,059-Speed 5521.19 samples/sec Loss 5.3817 LearningRate 0.0763 Epoch: 10 Global Step: 108040 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:18:07,564-Speed 5459.19 samples/sec Loss 5.4186 LearningRate 0.0763 Epoch: 10 Global Step: 108050 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:18:15,141-Speed 5406.82 samples/sec Loss 5.3511 LearningRate 0.0762 Epoch: 10 Global Step: 108060 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:18:22,579-Speed 5507.13 samples/sec Loss 5.3526 LearningRate 0.0762 Epoch: 10 Global Step: 108070 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:18:29,983-Speed 5533.03 samples/sec Loss 5.3607 LearningRate 0.0762 Epoch: 10 Global Step: 108080 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:18:37,549-Speed 5415.11 samples/sec Loss 5.3209 LearningRate 0.0762 Epoch: 10 Global Step: 108090 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:18:45,000-Speed 5497.26 samples/sec Loss 5.3781 LearningRate 0.0762 Epoch: 10 Global Step: 108100 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:18:52,526-Speed 5443.34 samples/sec Loss 5.3486 LearningRate 0.0762 Epoch: 10 Global Step: 108110 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:19:00,034-Speed 5456.84 samples/sec Loss 5.3494 LearningRate 0.0762 Epoch: 10 Global Step: 108120 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:19:07,596-Speed 5417.13 samples/sec Loss 5.3043 LearningRate 0.0761 Epoch: 10 Global Step: 108130 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:19:15,114-Speed 5448.61 samples/sec Loss 5.4219 LearningRate 0.0761 Epoch: 10 Global Step: 108140 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:19:22,658-Speed 5430.49 samples/sec Loss 5.3793 LearningRate 0.0761 Epoch: 10 Global Step: 108150 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:19:30,154-Speed 5464.80 samples/sec Loss 5.3417 LearningRate 0.0761 Epoch: 10 Global Step: 108160 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:19:37,941-Speed 5260.82 samples/sec Loss 5.3272 LearningRate 0.0761 Epoch: 10 Global Step: 108170 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:19:45,421-Speed 5476.28 samples/sec Loss 5.3395 LearningRate 0.0761 Epoch: 10 Global Step: 108180 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:19:52,899-Speed 5478.38 samples/sec Loss 5.3283 LearningRate 0.0760 Epoch: 10 Global Step: 108190 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:20:00,359-Speed 5491.67 samples/sec Loss 5.3691 LearningRate 0.0760 Epoch: 10 Global Step: 108200 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:20:07,799-Speed 5506.02 samples/sec Loss 5.3747 LearningRate 0.0760 Epoch: 10 Global Step: 108210 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:20:15,267-Speed 5485.20 samples/sec Loss 5.3385 LearningRate 0.0760 Epoch: 10 Global Step: 108220 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:20:22,782-Speed 5451.00 samples/sec Loss 5.3021 LearningRate 0.0760 Epoch: 10 Global Step: 108230 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:20:30,245-Speed 5488.92 samples/sec Loss 5.3634 LearningRate 0.0760 Epoch: 10 Global Step: 108240 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:20:37,681-Speed 5509.84 samples/sec Loss 5.3582 LearningRate 0.0760 Epoch: 10 Global Step: 108250 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:20:45,276-Speed 5393.41 samples/sec Loss 5.4142 LearningRate 0.0759 Epoch: 10 Global Step: 108260 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:20:52,722-Speed 5501.56 samples/sec Loss 5.4024 LearningRate 0.0759 Epoch: 10 Global Step: 108270 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:21:00,347-Speed 5373.40 samples/sec Loss 5.3392 LearningRate 0.0759 Epoch: 10 Global Step: 108280 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:21:07,804-Speed 5493.88 samples/sec Loss 5.3399 LearningRate 0.0759 Epoch: 10 Global Step: 108290 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:21:15,248-Speed 5503.24 samples/sec Loss 5.3671 LearningRate 0.0759 Epoch: 10 Global Step: 108300 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:21:22,863-Speed 5379.12 samples/sec Loss 5.2822 LearningRate 0.0759 Epoch: 10 Global Step: 108310 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:21:30,554-Speed 5326.59 samples/sec Loss 5.3376 LearningRate 0.0758 Epoch: 10 Global Step: 108320 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:21:38,085-Speed 5439.34 samples/sec Loss 5.3327 LearningRate 0.0758 Epoch: 10 Global Step: 108330 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:21:45,603-Speed 5448.80 samples/sec Loss 5.3252 LearningRate 0.0758 Epoch: 10 Global Step: 108340 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:21:53,185-Speed 5402.81 samples/sec Loss 5.3701 LearningRate 0.0758 Epoch: 10 Global Step: 108350 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:22:00,576-Speed 5543.09 samples/sec Loss 5.3435 LearningRate 0.0758 Epoch: 10 Global Step: 108360 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:22:08,008-Speed 5512.12 samples/sec Loss 5.3543 LearningRate 0.0758 Epoch: 10 Global Step: 108370 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:22:15,544-Speed 5438.36 samples/sec Loss 5.3168 LearningRate 0.0758 Epoch: 10 Global Step: 108380 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:22:22,953-Speed 5528.67 samples/sec Loss 5.3810 LearningRate 0.0757 Epoch: 10 Global Step: 108390 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:22:30,405-Speed 5497.28 samples/sec Loss 5.3531 LearningRate 0.0757 Epoch: 10 Global Step: 108400 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:22:37,771-Speed 5561.08 samples/sec Loss 5.3720 LearningRate 0.0757 Epoch: 10 Global Step: 108410 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:22:45,322-Speed 5425.85 samples/sec Loss 5.3090 LearningRate 0.0757 Epoch: 10 Global Step: 108420 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:22:52,871-Speed 5426.33 samples/sec Loss 5.3384 LearningRate 0.0757 Epoch: 10 Global Step: 108430 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:23:00,311-Speed 5505.79 samples/sec Loss 5.3481 LearningRate 0.0757 Epoch: 10 Global Step: 108440 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:23:07,908-Speed 5392.94 samples/sec Loss 5.3932 LearningRate 0.0756 Epoch: 10 Global Step: 108450 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:23:15,400-Speed 5467.42 samples/sec Loss 5.3329 LearningRate 0.0756 Epoch: 10 Global Step: 108460 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:23:22,923-Speed 5444.95 samples/sec Loss 5.3572 LearningRate 0.0756 Epoch: 10 Global Step: 108470 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:23:30,553-Speed 5369.57 samples/sec Loss 5.3029 LearningRate 0.0756 Epoch: 10 Global Step: 108480 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:23:38,050-Speed 5464.50 samples/sec Loss 5.3230 LearningRate 0.0756 Epoch: 10 Global Step: 108490 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:23:45,533-Speed 5474.70 samples/sec Loss 5.3153 LearningRate 0.0756 Epoch: 10 Global Step: 108500 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:23:53,154-Speed 5374.87 samples/sec Loss 5.3352 LearningRate 0.0756 Epoch: 10 Global Step: 108510 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:24:00,701-Speed 5428.49 samples/sec Loss 5.3503 LearningRate 0.0755 Epoch: 10 Global Step: 108520 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:24:08,146-Speed 5502.16 samples/sec Loss 5.3913 LearningRate 0.0755 Epoch: 10 Global Step: 108530 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:24:15,657-Speed 5453.75 samples/sec Loss 5.3357 LearningRate 0.0755 Epoch: 10 Global Step: 108540 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:24:23,267-Speed 5383.24 samples/sec Loss 5.3694 LearningRate 0.0755 Epoch: 10 Global Step: 108550 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:24:30,682-Speed 5524.15 samples/sec Loss 5.3657 LearningRate 0.0755 Epoch: 10 Global Step: 108560 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:24:38,186-Speed 5459.55 samples/sec Loss 5.3485 LearningRate 0.0755 Epoch: 10 Global Step: 108570 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:24:45,856-Speed 5340.77 samples/sec Loss 5.3582 LearningRate 0.0754 Epoch: 10 Global Step: 108580 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:24:53,383-Speed 5442.63 samples/sec Loss 5.3111 LearningRate 0.0754 Epoch: 10 Global Step: 108590 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:25:00,921-Speed 5433.95 samples/sec Loss 5.2937 LearningRate 0.0754 Epoch: 10 Global Step: 108600 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:25:08,363-Speed 5504.60 samples/sec Loss 5.2796 LearningRate 0.0754 Epoch: 10 Global Step: 108610 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:25:16,387-Speed 5105.67 samples/sec Loss 5.3194 LearningRate 0.0754 Epoch: 10 Global Step: 108620 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:25:23,838-Speed 5497.53 samples/sec Loss 5.3384 LearningRate 0.0754 Epoch: 10 Global Step: 108630 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:25:31,366-Speed 5442.13 samples/sec Loss 5.3215 LearningRate 0.0754 Epoch: 10 Global Step: 108640 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:25:38,843-Speed 5478.73 samples/sec Loss 5.3441 LearningRate 0.0753 Epoch: 10 Global Step: 108650 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:25:46,302-Speed 5491.82 samples/sec Loss 5.3153 LearningRate 0.0753 Epoch: 10 Global Step: 108660 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:25:53,855-Speed 5423.72 samples/sec Loss 5.3712 LearningRate 0.0753 Epoch: 10 Global Step: 108670 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:26:01,355-Speed 5462.45 samples/sec Loss 5.3406 LearningRate 0.0753 Epoch: 10 Global Step: 108680 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:26:08,841-Speed 5472.26 samples/sec Loss 5.3425 LearningRate 0.0753 Epoch: 10 Global Step: 108690 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:26:16,352-Speed 5453.83 samples/sec Loss 5.3844 LearningRate 0.0753 Epoch: 10 Global Step: 108700 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:26:23,890-Speed 5434.48 samples/sec Loss 5.3207 LearningRate 0.0753 Epoch: 10 Global Step: 108710 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:26:31,389-Speed 5463.23 samples/sec Loss 5.3110 LearningRate 0.0752 Epoch: 10 Global Step: 108720 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:26:38,879-Speed 5469.30 samples/sec Loss 5.3431 LearningRate 0.0752 Epoch: 10 Global Step: 108730 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:26:46,376-Speed 5464.29 samples/sec Loss 5.2860 LearningRate 0.0752 Epoch: 10 Global Step: 108740 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:26:53,996-Speed 5375.94 samples/sec Loss 5.3118 LearningRate 0.0752 Epoch: 10 Global Step: 108750 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:27:01,463-Speed 5485.83 samples/sec Loss 5.2689 LearningRate 0.0752 Epoch: 10 Global Step: 108760 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:27:08,934-Speed 5483.77 samples/sec Loss 5.3435 LearningRate 0.0752 Epoch: 10 Global Step: 108770 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:27:16,484-Speed 5426.01 samples/sec Loss 5.2759 LearningRate 0.0751 Epoch: 10 Global Step: 108780 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:27:24,020-Speed 5435.55 samples/sec Loss 5.3510 LearningRate 0.0751 Epoch: 10 Global Step: 108790 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:27:31,502-Speed 5475.99 samples/sec Loss 5.3368 LearningRate 0.0751 Epoch: 10 Global Step: 108800 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:27:39,071-Speed 5411.63 samples/sec Loss 5.3153 LearningRate 0.0751 Epoch: 10 Global Step: 108810 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:27:46,497-Speed 5516.57 samples/sec Loss 5.2879 LearningRate 0.0751 Epoch: 10 Global Step: 108820 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:27:53,961-Speed 5488.81 samples/sec Loss 5.3432 LearningRate 0.0751 Epoch: 10 Global Step: 108830 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:28:01,483-Speed 5446.43 samples/sec Loss 5.3130 LearningRate 0.0751 Epoch: 10 Global Step: 108840 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:28:08,863-Speed 5550.66 samples/sec Loss 5.3151 LearningRate 0.0750 Epoch: 10 Global Step: 108850 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:28:16,275-Speed 5526.30 samples/sec Loss 5.2989 LearningRate 0.0750 Epoch: 10 Global Step: 108860 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:28:23,781-Speed 5458.20 samples/sec Loss 5.3176 LearningRate 0.0750 Epoch: 10 Global Step: 108870 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:28:31,273-Speed 5468.06 samples/sec Loss 5.3410 LearningRate 0.0750 Epoch: 10 Global Step: 108880 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:28:38,666-Speed 5541.17 samples/sec Loss 5.3255 LearningRate 0.0750 Epoch: 10 Global Step: 108890 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:28:46,105-Speed 5506.87 samples/sec Loss 5.3371 LearningRate 0.0750 Epoch: 10 Global Step: 108900 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:28:53,544-Speed 5506.62 samples/sec Loss 5.3359 LearningRate 0.0749 Epoch: 10 Global Step: 108910 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:29:01,038-Speed 5466.39 samples/sec Loss 5.2829 LearningRate 0.0749 Epoch: 10 Global Step: 108920 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:29:08,569-Speed 5439.90 samples/sec Loss 5.2997 LearningRate 0.0749 Epoch: 10 Global Step: 108930 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:29:16,119-Speed 5425.68 samples/sec Loss 5.4006 LearningRate 0.0749 Epoch: 10 Global Step: 108940 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:29:23,632-Speed 5452.58 samples/sec Loss 5.3253 LearningRate 0.0749 Epoch: 10 Global Step: 108950 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:29:31,084-Speed 5497.96 samples/sec Loss 5.3100 LearningRate 0.0749 Epoch: 10 Global Step: 108960 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:29:38,568-Speed 5473.68 samples/sec Loss 5.2993 LearningRate 0.0749 Epoch: 10 Global Step: 108970 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:29:46,066-Speed 5463.38 samples/sec Loss 5.2936 LearningRate 0.0748 Epoch: 10 Global Step: 108980 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:29:53,520-Speed 5495.63 samples/sec Loss 5.2913 LearningRate 0.0748 Epoch: 10 Global Step: 108990 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:30:01,084-Speed 5416.26 samples/sec Loss 5.3078 LearningRate 0.0748 Epoch: 10 Global Step: 109000 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:30:08,539-Speed 5494.90 samples/sec Loss 5.4003 LearningRate 0.0748 Epoch: 10 Global Step: 109010 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-01-08 19:30:16,044-Speed 5458.50 samples/sec Loss 5.3459 LearningRate 0.0748 Epoch: 10 Global Step: 109020 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:30:23,472-Speed 5515.35 samples/sec Loss 5.3964 LearningRate 0.0748 Epoch: 10 Global Step: 109030 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:30:30,930-Speed 5492.49 samples/sec Loss 5.3246 LearningRate 0.0747 Epoch: 10 Global Step: 109040 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:30:38,377-Speed 5501.11 samples/sec Loss 5.3513 LearningRate 0.0747 Epoch: 10 Global Step: 109050 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:30:45,806-Speed 5514.11 samples/sec Loss 5.2851 LearningRate 0.0747 Epoch: 10 Global Step: 109060 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:30:53,222-Speed 5523.93 samples/sec Loss 5.2606 LearningRate 0.0747 Epoch: 10 Global Step: 109070 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:31:00,699-Speed 5479.45 samples/sec Loss 5.2494 LearningRate 0.0747 Epoch: 10 Global Step: 109080 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:31:08,155-Speed 5494.43 samples/sec Loss 5.2790 LearningRate 0.0747 Epoch: 10 Global Step: 109090 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:31:15,594-Speed 5506.62 samples/sec Loss 5.2405 LearningRate 0.0747 Epoch: 10 Global Step: 109100 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:31:23,020-Speed 5516.18 samples/sec Loss 5.2399 LearningRate 0.0746 Epoch: 10 Global Step: 109110 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:31:30,442-Speed 5519.46 samples/sec Loss 5.2422 LearningRate 0.0746 Epoch: 10 Global Step: 109120 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:31:37,871-Speed 5514.37 samples/sec Loss 5.2315 LearningRate 0.0746 Epoch: 10 Global Step: 109130 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:31:45,350-Speed 5477.47 samples/sec Loss 5.2643 LearningRate 0.0746 Epoch: 10 Global Step: 109140 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:31:52,776-Speed 5516.30 samples/sec Loss 5.2619 LearningRate 0.0746 Epoch: 10 Global Step: 109150 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:32:00,392-Speed 5379.25 samples/sec Loss 5.2730 LearningRate 0.0746 Epoch: 10 Global Step: 109160 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:32:07,834-Speed 5505.18 samples/sec Loss 5.3102 LearningRate 0.0746 Epoch: 10 Global Step: 109170 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-08 19:32:15,241-Speed 5530.29 samples/sec Loss 5.3070 LearningRate 0.0745 Epoch: 10 Global Step: 109180 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-08 19:32:22,720-Speed 5477.70 samples/sec Loss 5.3066 LearningRate 0.0745 Epoch: 10 Global Step: 109190 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:32:30,141-Speed 5520.17 samples/sec Loss 5.2949 LearningRate 0.0745 Epoch: 10 Global Step: 109200 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:32:37,590-Speed 5499.10 samples/sec Loss 5.3239 LearningRate 0.0745 Epoch: 10 Global Step: 109210 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:32:45,136-Speed 5429.54 samples/sec Loss 5.2739 LearningRate 0.0745 Epoch: 10 Global Step: 109220 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:32:52,621-Speed 5472.56 samples/sec Loss 5.3262 LearningRate 0.0745 Epoch: 10 Global Step: 109230 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:33:00,104-Speed 5474.14 samples/sec Loss 5.3630 LearningRate 0.0744 Epoch: 10 Global Step: 109240 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:33:07,543-Speed 5507.51 samples/sec Loss 5.3136 LearningRate 0.0744 Epoch: 10 Global Step: 109250 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:33:14,960-Speed 5523.55 samples/sec Loss 5.3017 LearningRate 0.0744 Epoch: 10 Global Step: 109260 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:33:22,384-Speed 5517.42 samples/sec Loss 5.2994 LearningRate 0.0744 Epoch: 10 Global Step: 109270 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:33:29,809-Speed 5517.32 samples/sec Loss 5.3266 LearningRate 0.0744 Epoch: 10 Global Step: 109280 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:33:37,238-Speed 5514.24 samples/sec Loss 5.3516 LearningRate 0.0744 Epoch: 10 Global Step: 109290 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:33:44,655-Speed 5523.84 samples/sec Loss 5.2775 LearningRate 0.0744 Epoch: 10 Global Step: 109300 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:33:52,093-Speed 5507.38 samples/sec Loss 5.2920 LearningRate 0.0743 Epoch: 10 Global Step: 109310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:33:59,541-Speed 5500.33 samples/sec Loss 5.2882 LearningRate 0.0743 Epoch: 10 Global Step: 109320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:34:06,941-Speed 5535.99 samples/sec Loss 5.3032 LearningRate 0.0743 Epoch: 10 Global Step: 109330 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:34:14,396-Speed 5494.80 samples/sec Loss 5.2713 LearningRate 0.0743 Epoch: 10 Global Step: 109340 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:34:21,860-Speed 5488.38 samples/sec Loss 5.3524 LearningRate 0.0743 Epoch: 10 Global Step: 109350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:34:29,310-Speed 5498.63 samples/sec Loss 5.3518 LearningRate 0.0743 Epoch: 10 Global Step: 109360 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:34:36,725-Speed 5525.00 samples/sec Loss 5.2844 LearningRate 0.0742 Epoch: 10 Global Step: 109370 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:34:44,156-Speed 5512.69 samples/sec Loss 5.2512 LearningRate 0.0742 Epoch: 10 Global Step: 109380 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:34:51,604-Speed 5499.90 samples/sec Loss 5.3212 LearningRate 0.0742 Epoch: 10 Global Step: 109390 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:34:59,044-Speed 5506.27 samples/sec Loss 5.3354 LearningRate 0.0742 Epoch: 10 Global Step: 109400 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:35:06,511-Speed 5487.38 samples/sec Loss 5.2737 LearningRate 0.0742 Epoch: 10 Global Step: 109410 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:35:14,043-Speed 5438.81 samples/sec Loss 5.2822 LearningRate 0.0742 Epoch: 10 Global Step: 109420 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:35:21,492-Speed 5498.93 samples/sec Loss 5.2704 LearningRate 0.0742 Epoch: 10 Global Step: 109430 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:35:28,908-Speed 5523.99 samples/sec Loss 5.2524 LearningRate 0.0741 Epoch: 10 Global Step: 109440 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:35:36,326-Speed 5523.05 samples/sec Loss 5.3026 LearningRate 0.0741 Epoch: 10 Global Step: 109450 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:35:43,753-Speed 5515.31 samples/sec Loss 5.3056 LearningRate 0.0741 Epoch: 10 Global Step: 109460 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:35:51,159-Speed 5531.18 samples/sec Loss 5.2992 LearningRate 0.0741 Epoch: 10 Global Step: 109470 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:35:58,673-Speed 5452.17 samples/sec Loss 5.2731 LearningRate 0.0741 Epoch: 10 Global Step: 109480 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:36:06,172-Speed 5462.45 samples/sec Loss 5.2335 LearningRate 0.0741 Epoch: 10 Global Step: 109490 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:36:13,624-Speed 5497.44 samples/sec Loss 5.3027 LearningRate 0.0741 Epoch: 10 Global Step: 109500 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:36:21,058-Speed 5510.21 samples/sec Loss 5.3025 LearningRate 0.0740 Epoch: 10 Global Step: 109510 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:36:28,568-Speed 5455.16 samples/sec Loss 5.3116 LearningRate 0.0740 Epoch: 10 Global Step: 109520 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:36:36,018-Speed 5498.63 samples/sec Loss 5.2977 LearningRate 0.0740 Epoch: 10 Global Step: 109530 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:36:43,462-Speed 5503.49 samples/sec Loss 5.2613 LearningRate 0.0740 Epoch: 10 Global Step: 109540 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:36:50,864-Speed 5534.59 samples/sec Loss 5.3089 LearningRate 0.0740 Epoch: 10 Global Step: 109550 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:36:58,287-Speed 5518.69 samples/sec Loss 5.2750 LearningRate 0.0740 Epoch: 10 Global Step: 109560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:37:05,825-Speed 5433.96 samples/sec Loss 5.2778 LearningRate 0.0739 Epoch: 10 Global Step: 109570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:37:13,275-Speed 5498.93 samples/sec Loss 5.2840 LearningRate 0.0739 Epoch: 10 Global Step: 109580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:37:20,723-Speed 5500.09 samples/sec Loss 5.3407 LearningRate 0.0739 Epoch: 10 Global Step: 109590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:37:28,152-Speed 5514.38 samples/sec Loss 5.2284 LearningRate 0.0739 Epoch: 10 Global Step: 109600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:37:35,670-Speed 5448.63 samples/sec Loss 5.2552 LearningRate 0.0739 Epoch: 10 Global Step: 109610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:37:43,159-Speed 5471.02 samples/sec Loss 5.2699 LearningRate 0.0739 Epoch: 10 Global Step: 109620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:37:50,685-Speed 5442.96 samples/sec Loss 5.3026 LearningRate 0.0739 Epoch: 10 Global Step: 109630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:37:58,153-Speed 5485.06 samples/sec Loss 5.3081 LearningRate 0.0738 Epoch: 10 Global Step: 109640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:38:05,590-Speed 5508.77 samples/sec Loss 5.2435 LearningRate 0.0738 Epoch: 10 Global Step: 109650 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:38:13,029-Speed 5507.03 samples/sec Loss 5.2528 LearningRate 0.0738 Epoch: 10 Global Step: 109660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:38:20,444-Speed 5524.44 samples/sec Loss 5.2653 LearningRate 0.0738 Epoch: 10 Global Step: 109670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:38:27,890-Speed 5501.85 samples/sec Loss 5.2234 LearningRate 0.0738 Epoch: 10 Global Step: 109680 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:38:35,331-Speed 5504.57 samples/sec Loss 5.2898 LearningRate 0.0738 Epoch: 10 Global Step: 109690 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:38:42,788-Speed 5494.15 samples/sec Loss 5.2733 LearningRate 0.0737 Epoch: 10 Global Step: 109700 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:38:50,284-Speed 5464.61 samples/sec Loss 5.3489 LearningRate 0.0737 Epoch: 10 Global Step: 109710 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:38:57,758-Speed 5481.30 samples/sec Loss 5.2969 LearningRate 0.0737 Epoch: 10 Global Step: 109720 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:39:05,289-Speed 5439.14 samples/sec Loss 5.2748 LearningRate 0.0737 Epoch: 10 Global Step: 109730 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:39:12,763-Speed 5481.22 samples/sec Loss 5.2576 LearningRate 0.0737 Epoch: 10 Global Step: 109740 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:39:20,346-Speed 5402.54 samples/sec Loss 5.3213 LearningRate 0.0737 Epoch: 10 Global Step: 109750 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:39:27,980-Speed 5366.45 samples/sec Loss 5.2253 LearningRate 0.0737 Epoch: 10 Global Step: 109760 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:39:35,542-Speed 5416.45 samples/sec Loss 5.2822 LearningRate 0.0736 Epoch: 10 Global Step: 109770 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:39:43,094-Speed 5424.69 samples/sec Loss 5.2897 LearningRate 0.0736 Epoch: 10 Global Step: 109780 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:39:50,561-Speed 5486.46 samples/sec Loss 5.2929 LearningRate 0.0736 Epoch: 10 Global Step: 109790 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:39:57,986-Speed 5517.26 samples/sec Loss 5.2620 LearningRate 0.0736 Epoch: 10 Global Step: 109800 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:40:05,447-Speed 5490.72 samples/sec Loss 5.2573 LearningRate 0.0736 Epoch: 10 Global Step: 109810 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:40:12,918-Speed 5482.76 samples/sec Loss 5.2658 LearningRate 0.0736 Epoch: 10 Global Step: 109820 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:40:20,378-Speed 5491.40 samples/sec Loss 5.3077 LearningRate 0.0736 Epoch: 10 Global Step: 109830 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:40:27,852-Speed 5481.34 samples/sec Loss 5.3099 LearningRate 0.0735 Epoch: 10 Global Step: 109840 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:40:35,269-Speed 5523.34 samples/sec Loss 5.2751 LearningRate 0.0735 Epoch: 10 Global Step: 109850 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:40:42,709-Speed 5505.82 samples/sec Loss 5.2295 LearningRate 0.0735 Epoch: 10 Global Step: 109860 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:40:50,128-Speed 5521.68 samples/sec Loss 5.2378 LearningRate 0.0735 Epoch: 10 Global Step: 109870 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:40:57,566-Speed 5507.76 samples/sec Loss 5.2736 LearningRate 0.0735 Epoch: 10 Global Step: 109880 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:41:05,037-Speed 5482.70 samples/sec Loss 5.2576 LearningRate 0.0735 Epoch: 10 Global Step: 109890 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:41:12,548-Speed 5453.71 samples/sec Loss 5.2755 LearningRate 0.0734 Epoch: 10 Global Step: 109900 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:41:20,025-Speed 5479.09 samples/sec Loss 5.3048 LearningRate 0.0734 Epoch: 10 Global Step: 109910 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:41:27,545-Speed 5447.74 samples/sec Loss 5.2394 LearningRate 0.0734 Epoch: 10 Global Step: 109920 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:41:35,036-Speed 5468.52 samples/sec Loss 5.3107 LearningRate 0.0734 Epoch: 10 Global Step: 109930 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:41:42,491-Speed 5495.21 samples/sec Loss 5.2673 LearningRate 0.0734 Epoch: 10 Global Step: 109940 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:41:50,005-Speed 5451.97 samples/sec Loss 5.2102 LearningRate 0.0734 Epoch: 10 Global Step: 109950 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:41:57,519-Speed 5451.41 samples/sec Loss 5.2540 LearningRate 0.0734 Epoch: 10 Global Step: 109960 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:42:05,227-Speed 5314.56 samples/sec Loss 5.2526 LearningRate 0.0733 Epoch: 10 Global Step: 109970 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:42:12,703-Speed 5479.22 samples/sec Loss 5.3202 LearningRate 0.0733 Epoch: 10 Global Step: 109980 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:42:20,138-Speed 5509.89 samples/sec Loss 5.2706 LearningRate 0.0733 Epoch: 10 Global Step: 109990 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:42:27,554-Speed 5523.99 samples/sec Loss 5.2781 LearningRate 0.0733 Epoch: 10 Global Step: 110000 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:43:11,251-[lfw][110000]XNorm: 23.009356 Training: 2022-01-08 19:43:11,252-[lfw][110000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-01-08 19:43:11,253-[lfw][110000]Accuracy-Highest: 0.99817 Training: 2022-01-08 19:44:03,337-[cfp_fp][110000]XNorm: 21.193388 Training: 2022-01-08 19:44:03,337-[cfp_fp][110000]Accuracy-Flip: 0.99014+-0.00509 Training: 2022-01-08 19:44:03,338-[cfp_fp][110000]Accuracy-Highest: 0.99043 Training: 2022-01-08 19:44:47,338-[agedb_30][110000]XNorm: 22.795291 Training: 2022-01-08 19:44:47,339-[agedb_30][110000]Accuracy-Flip: 0.97900+-0.00834 Training: 2022-01-08 19:44:47,339-[agedb_30][110000]Accuracy-Highest: 0.97917 Training: 2022-01-08 19:44:54,842-Speed 278.10 samples/sec Loss 5.2561 LearningRate 0.0733 Epoch: 10 Global Step: 110010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:45:02,375-Speed 5438.81 samples/sec Loss 5.2666 LearningRate 0.0733 Epoch: 10 Global Step: 110020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:45:09,806-Speed 5512.87 samples/sec Loss 5.2735 LearningRate 0.0733 Epoch: 10 Global Step: 110030 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:45:17,284-Speed 5479.20 samples/sec Loss 5.3091 LearningRate 0.0732 Epoch: 10 Global Step: 110040 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:45:24,860-Speed 5408.96 samples/sec Loss 5.2312 LearningRate 0.0732 Epoch: 10 Global Step: 110050 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:45:32,373-Speed 5452.34 samples/sec Loss 5.2104 LearningRate 0.0732 Epoch: 10 Global Step: 110060 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:45:39,973-Speed 5390.23 samples/sec Loss 5.2567 LearningRate 0.0732 Epoch: 10 Global Step: 110070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:45:47,473-Speed 5462.53 samples/sec Loss 5.2220 LearningRate 0.0732 Epoch: 10 Global Step: 110080 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:45:54,953-Speed 5476.51 samples/sec Loss 5.2116 LearningRate 0.0732 Epoch: 10 Global Step: 110090 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:46:02,412-Speed 5492.08 samples/sec Loss 5.2558 LearningRate 0.0731 Epoch: 10 Global Step: 110100 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:46:10,018-Speed 5386.03 samples/sec Loss 5.2362 LearningRate 0.0731 Epoch: 10 Global Step: 110110 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:46:17,540-Speed 5445.86 samples/sec Loss 5.2413 LearningRate 0.0731 Epoch: 10 Global Step: 110120 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:46:24,951-Speed 5527.96 samples/sec Loss 5.2053 LearningRate 0.0731 Epoch: 10 Global Step: 110130 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:46:32,442-Speed 5468.37 samples/sec Loss 5.2300 LearningRate 0.0731 Epoch: 10 Global Step: 110140 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:46:39,845-Speed 5533.53 samples/sec Loss 5.2865 LearningRate 0.0731 Epoch: 10 Global Step: 110150 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:46:47,199-Speed 5570.88 samples/sec Loss 5.2099 LearningRate 0.0731 Epoch: 10 Global Step: 110160 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:46:54,621-Speed 5519.46 samples/sec Loss 5.1427 LearningRate 0.0730 Epoch: 10 Global Step: 110170 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:47:02,041-Speed 5520.95 samples/sec Loss 5.1654 LearningRate 0.0730 Epoch: 10 Global Step: 110180 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:47:09,467-Speed 5516.14 samples/sec Loss 5.2126 LearningRate 0.0730 Epoch: 10 Global Step: 110190 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:47:16,953-Speed 5472.27 samples/sec Loss 5.2354 LearningRate 0.0730 Epoch: 10 Global Step: 110200 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:47:24,415-Speed 5490.27 samples/sec Loss 5.2428 LearningRate 0.0730 Epoch: 10 Global Step: 110210 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:47:31,812-Speed 5538.36 samples/sec Loss 5.2482 LearningRate 0.0730 Epoch: 10 Global Step: 110220 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:47:39,351-Speed 5433.71 samples/sec Loss 5.2817 LearningRate 0.0730 Epoch: 10 Global Step: 110230 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:47:46,798-Speed 5500.99 samples/sec Loss 5.2595 LearningRate 0.0729 Epoch: 10 Global Step: 110240 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:47:54,299-Speed 5461.31 samples/sec Loss 5.3207 LearningRate 0.0729 Epoch: 10 Global Step: 110250 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:48:01,785-Speed 5472.23 samples/sec Loss 5.2512 LearningRate 0.0729 Epoch: 10 Global Step: 110260 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:48:09,296-Speed 5453.79 samples/sec Loss 5.2335 LearningRate 0.0729 Epoch: 10 Global Step: 110270 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:48:16,797-Speed 5461.92 samples/sec Loss 5.1804 LearningRate 0.0729 Epoch: 10 Global Step: 110280 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:48:24,248-Speed 5497.19 samples/sec Loss 5.2350 LearningRate 0.0729 Epoch: 10 Global Step: 110290 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:48:31,702-Speed 5495.62 samples/sec Loss 5.2535 LearningRate 0.0728 Epoch: 10 Global Step: 110300 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:48:39,149-Speed 5501.91 samples/sec Loss 5.2699 LearningRate 0.0728 Epoch: 10 Global Step: 110310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:48:46,565-Speed 5523.11 samples/sec Loss 5.2527 LearningRate 0.0728 Epoch: 10 Global Step: 110320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:48:53,983-Speed 5522.90 samples/sec Loss 5.1957 LearningRate 0.0728 Epoch: 10 Global Step: 110330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:49:01,495-Speed 5453.49 samples/sec Loss 5.2270 LearningRate 0.0728 Epoch: 10 Global Step: 110340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:49:09,055-Speed 5418.40 samples/sec Loss 5.2607 LearningRate 0.0728 Epoch: 10 Global Step: 110350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:49:16,480-Speed 5517.24 samples/sec Loss 5.1920 LearningRate 0.0728 Epoch: 10 Global Step: 110360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:49:23,916-Speed 5508.87 samples/sec Loss 5.2544 LearningRate 0.0727 Epoch: 10 Global Step: 110370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:49:31,310-Speed 5540.48 samples/sec Loss 5.2752 LearningRate 0.0727 Epoch: 10 Global Step: 110380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:49:38,709-Speed 5537.07 samples/sec Loss 5.1973 LearningRate 0.0727 Epoch: 10 Global Step: 110390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:49:46,136-Speed 5515.46 samples/sec Loss 5.2462 LearningRate 0.0727 Epoch: 10 Global Step: 110400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:49:53,590-Speed 5496.09 samples/sec Loss 5.1898 LearningRate 0.0727 Epoch: 10 Global Step: 110410 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:50:00,972-Speed 5549.42 samples/sec Loss 5.1940 LearningRate 0.0727 Epoch: 10 Global Step: 110420 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:50:08,410-Speed 5507.02 samples/sec Loss 5.2250 LearningRate 0.0727 Epoch: 10 Global Step: 110430 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:50:15,769-Speed 5567.04 samples/sec Loss 5.2450 LearningRate 0.0726 Epoch: 10 Global Step: 110440 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:50:23,142-Speed 5556.09 samples/sec Loss 5.2336 LearningRate 0.0726 Epoch: 10 Global Step: 110450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:50:30,543-Speed 5534.95 samples/sec Loss 5.2527 LearningRate 0.0726 Epoch: 10 Global Step: 110460 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:50:37,953-Speed 5528.62 samples/sec Loss 5.2504 LearningRate 0.0726 Epoch: 10 Global Step: 110470 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:50:45,364-Speed 5528.18 samples/sec Loss 5.2215 LearningRate 0.0726 Epoch: 10 Global Step: 110480 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:50:52,768-Speed 5532.37 samples/sec Loss 5.2232 LearningRate 0.0726 Epoch: 10 Global Step: 110490 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:51:00,179-Speed 5528.22 samples/sec Loss 5.2065 LearningRate 0.0725 Epoch: 10 Global Step: 110500 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:51:07,561-Speed 5549.17 samples/sec Loss 5.1977 LearningRate 0.0725 Epoch: 10 Global Step: 110510 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:51:14,976-Speed 5524.42 samples/sec Loss 5.2040 LearningRate 0.0725 Epoch: 10 Global Step: 110520 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:51:22,370-Speed 5540.38 samples/sec Loss 5.2283 LearningRate 0.0725 Epoch: 10 Global Step: 110530 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:51:29,814-Speed 5503.82 samples/sec Loss 5.2575 LearningRate 0.0725 Epoch: 10 Global Step: 110540 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:51:37,213-Speed 5535.89 samples/sec Loss 5.1858 LearningRate 0.0725 Epoch: 10 Global Step: 110550 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:51:44,713-Speed 5462.84 samples/sec Loss 5.2338 LearningRate 0.0725 Epoch: 10 Global Step: 110560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:51:52,098-Speed 5547.03 samples/sec Loss 5.1874 LearningRate 0.0724 Epoch: 10 Global Step: 110570 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:51:59,554-Speed 5494.21 samples/sec Loss 5.2327 LearningRate 0.0724 Epoch: 10 Global Step: 110580 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:52:06,959-Speed 5531.97 samples/sec Loss 5.2433 LearningRate 0.0724 Epoch: 10 Global Step: 110590 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:52:14,419-Speed 5491.93 samples/sec Loss 5.2853 LearningRate 0.0724 Epoch: 10 Global Step: 110600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:52:21,891-Speed 5481.91 samples/sec Loss 5.2396 LearningRate 0.0724 Epoch: 10 Global Step: 110610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:52:29,344-Speed 5496.70 samples/sec Loss 5.2512 LearningRate 0.0724 Epoch: 10 Global Step: 110620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:52:36,761-Speed 5523.37 samples/sec Loss 5.2485 LearningRate 0.0724 Epoch: 10 Global Step: 110630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:52:44,279-Speed 5448.74 samples/sec Loss 5.1959 LearningRate 0.0723 Epoch: 10 Global Step: 110640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:52:51,679-Speed 5536.43 samples/sec Loss 5.2183 LearningRate 0.0723 Epoch: 10 Global Step: 110650 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:52:59,023-Speed 5578.07 samples/sec Loss 5.2537 LearningRate 0.0723 Epoch: 10 Global Step: 110660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:53:06,416-Speed 5540.63 samples/sec Loss 5.2583 LearningRate 0.0723 Epoch: 10 Global Step: 110670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:53:13,797-Speed 5550.78 samples/sec Loss 5.2380 LearningRate 0.0723 Epoch: 10 Global Step: 110680 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:53:21,170-Speed 5556.31 samples/sec Loss 5.2384 LearningRate 0.0723 Epoch: 10 Global Step: 110690 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:53:28,598-Speed 5514.25 samples/sec Loss 5.2399 LearningRate 0.0722 Epoch: 10 Global Step: 110700 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:53:36,070-Speed 5483.17 samples/sec Loss 5.1821 LearningRate 0.0722 Epoch: 10 Global Step: 110710 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:53:43,554-Speed 5473.17 samples/sec Loss 5.2673 LearningRate 0.0722 Epoch: 10 Global Step: 110720 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:53:51,017-Speed 5489.50 samples/sec Loss 5.1963 LearningRate 0.0722 Epoch: 10 Global Step: 110730 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:53:58,464-Speed 5500.57 samples/sec Loss 5.1766 LearningRate 0.0722 Epoch: 10 Global Step: 110740 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:54:05,914-Speed 5498.89 samples/sec Loss 5.2076 LearningRate 0.0722 Epoch: 10 Global Step: 110750 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:54:13,325-Speed 5527.46 samples/sec Loss 5.2086 LearningRate 0.0722 Epoch: 10 Global Step: 110760 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:54:20,681-Speed 5570.63 samples/sec Loss 5.1826 LearningRate 0.0721 Epoch: 10 Global Step: 110770 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:54:28,061-Speed 5550.90 samples/sec Loss 5.1924 LearningRate 0.0721 Epoch: 10 Global Step: 110780 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:54:35,437-Speed 5553.01 samples/sec Loss 5.1603 LearningRate 0.0721 Epoch: 10 Global Step: 110790 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:54:42,888-Speed 5498.37 samples/sec Loss 5.2258 LearningRate 0.0721 Epoch: 10 Global Step: 110800 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:54:50,323-Speed 5509.86 samples/sec Loss 5.2209 LearningRate 0.0721 Epoch: 10 Global Step: 110810 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:54:57,719-Speed 5539.37 samples/sec Loss 5.1945 LearningRate 0.0721 Epoch: 10 Global Step: 110820 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:55:05,145-Speed 5515.90 samples/sec Loss 5.2141 LearningRate 0.0721 Epoch: 10 Global Step: 110830 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:55:12,539-Speed 5540.50 samples/sec Loss 5.2529 LearningRate 0.0720 Epoch: 10 Global Step: 110840 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:55:20,059-Speed 5447.50 samples/sec Loss 5.2462 LearningRate 0.0720 Epoch: 10 Global Step: 110850 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:55:27,485-Speed 5516.98 samples/sec Loss 5.1565 LearningRate 0.0720 Epoch: 10 Global Step: 110860 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:55:34,910-Speed 5516.82 samples/sec Loss 5.2044 LearningRate 0.0720 Epoch: 10 Global Step: 110870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:55:42,349-Speed 5507.30 samples/sec Loss 5.2218 LearningRate 0.0720 Epoch: 10 Global Step: 110880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:55:49,805-Speed 5494.27 samples/sec Loss 5.2147 LearningRate 0.0720 Epoch: 10 Global Step: 110890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:55:57,193-Speed 5544.94 samples/sec Loss 5.1860 LearningRate 0.0719 Epoch: 10 Global Step: 110900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:56:04,658-Speed 5487.56 samples/sec Loss 5.1893 LearningRate 0.0719 Epoch: 10 Global Step: 110910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:56:12,060-Speed 5534.84 samples/sec Loss 5.2053 LearningRate 0.0719 Epoch: 10 Global Step: 110920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:56:19,513-Speed 5496.61 samples/sec Loss 5.2144 LearningRate 0.0719 Epoch: 10 Global Step: 110930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:56:26,918-Speed 5532.32 samples/sec Loss 5.1881 LearningRate 0.0719 Epoch: 10 Global Step: 110940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:56:34,464-Speed 5428.63 samples/sec Loss 5.2057 LearningRate 0.0719 Epoch: 10 Global Step: 110950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:56:42,028-Speed 5415.97 samples/sec Loss 5.2069 LearningRate 0.0719 Epoch: 10 Global Step: 110960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:56:49,500-Speed 5482.51 samples/sec Loss 5.2017 LearningRate 0.0718 Epoch: 10 Global Step: 110970 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-08 19:56:56,997-Speed 5464.42 samples/sec Loss 5.2308 LearningRate 0.0718 Epoch: 10 Global Step: 110980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:57:04,424-Speed 5515.83 samples/sec Loss 5.1644 LearningRate 0.0718 Epoch: 10 Global Step: 110990 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 19:57:14,008-Speed 5528.88 samples/sec Loss 5.2301 LearningRate 0.0718 Epoch: 10 Global Step: 111000 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:57:21,656-Speed 5356.82 samples/sec Loss 5.1972 LearningRate 0.0718 Epoch: 10 Global Step: 111010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:57:29,119-Speed 5489.09 samples/sec Loss 5.2593 LearningRate 0.0718 Epoch: 10 Global Step: 111020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:57:36,529-Speed 5528.86 samples/sec Loss 5.1798 LearningRate 0.0718 Epoch: 10 Global Step: 111030 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:57:44,093-Speed 5415.29 samples/sec Loss 5.2368 LearningRate 0.0717 Epoch: 10 Global Step: 111040 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:57:51,572-Speed 5477.40 samples/sec Loss 5.2158 LearningRate 0.0717 Epoch: 10 Global Step: 111050 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:57:59,106-Speed 5437.43 samples/sec Loss 5.2246 LearningRate 0.0717 Epoch: 10 Global Step: 111060 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:58:06,550-Speed 5503.32 samples/sec Loss 5.2304 LearningRate 0.0717 Epoch: 10 Global Step: 111070 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:58:14,037-Speed 5471.48 samples/sec Loss 5.1696 LearningRate 0.0717 Epoch: 10 Global Step: 111080 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:58:21,538-Speed 5460.92 samples/sec Loss 5.2506 LearningRate 0.0717 Epoch: 10 Global Step: 111090 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:58:28,973-Speed 5509.80 samples/sec Loss 5.2223 LearningRate 0.0716 Epoch: 10 Global Step: 111100 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:58:36,481-Speed 5456.29 samples/sec Loss 5.1944 LearningRate 0.0716 Epoch: 10 Global Step: 111110 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:58:43,877-Speed 5538.82 samples/sec Loss 5.1521 LearningRate 0.0716 Epoch: 10 Global Step: 111120 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:58:51,349-Speed 5483.02 samples/sec Loss 5.1905 LearningRate 0.0716 Epoch: 10 Global Step: 111130 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:58:58,818-Speed 5484.48 samples/sec Loss 5.0965 LearningRate 0.0716 Epoch: 10 Global Step: 111140 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:59:06,246-Speed 5514.86 samples/sec Loss 5.1679 LearningRate 0.0716 Epoch: 10 Global Step: 111150 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:59:13,650-Speed 5532.65 samples/sec Loss 5.1743 LearningRate 0.0716 Epoch: 10 Global Step: 111160 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:59:21,079-Speed 5514.34 samples/sec Loss 5.2182 LearningRate 0.0715 Epoch: 10 Global Step: 111170 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:59:28,496-Speed 5523.16 samples/sec Loss 5.1748 LearningRate 0.0715 Epoch: 10 Global Step: 111180 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 19:59:35,921-Speed 5517.45 samples/sec Loss 5.1095 LearningRate 0.0715 Epoch: 10 Global Step: 111190 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:59:43,449-Speed 5441.62 samples/sec Loss 5.1928 LearningRate 0.0715 Epoch: 10 Global Step: 111200 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:59:50,948-Speed 5462.53 samples/sec Loss 5.1959 LearningRate 0.0715 Epoch: 10 Global Step: 111210 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 19:59:58,401-Speed 5496.51 samples/sec Loss 5.1899 LearningRate 0.0715 Epoch: 10 Global Step: 111220 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:00:05,918-Speed 5449.98 samples/sec Loss 5.1339 LearningRate 0.0715 Epoch: 10 Global Step: 111230 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:00:13,386-Speed 5485.29 samples/sec Loss 5.1522 LearningRate 0.0714 Epoch: 10 Global Step: 111240 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:00:21,207-Speed 5238.47 samples/sec Loss 5.1555 LearningRate 0.0714 Epoch: 10 Global Step: 111250 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:00:28,743-Speed 5436.19 samples/sec Loss 5.1599 LearningRate 0.0714 Epoch: 10 Global Step: 111260 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:00:36,279-Speed 5436.12 samples/sec Loss 5.1396 LearningRate 0.0714 Epoch: 10 Global Step: 111270 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:00:43,741-Speed 5489.83 samples/sec Loss 5.1741 LearningRate 0.0714 Epoch: 10 Global Step: 111280 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:00:51,253-Speed 5453.17 samples/sec Loss 5.1669 LearningRate 0.0714 Epoch: 10 Global Step: 111290 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:00:58,667-Speed 5525.84 samples/sec Loss 5.1514 LearningRate 0.0714 Epoch: 10 Global Step: 111300 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:01:06,325-Speed 5349.52 samples/sec Loss 5.1749 LearningRate 0.0713 Epoch: 10 Global Step: 111310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:01:13,772-Speed 5500.97 samples/sec Loss 5.1834 LearningRate 0.0713 Epoch: 10 Global Step: 111320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:01:21,196-Speed 5517.75 samples/sec Loss 5.1432 LearningRate 0.0713 Epoch: 10 Global Step: 111330 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:01:28,676-Speed 5476.73 samples/sec Loss 5.1974 LearningRate 0.0713 Epoch: 10 Global Step: 111340 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:01:36,135-Speed 5492.59 samples/sec Loss 5.1514 LearningRate 0.0713 Epoch: 10 Global Step: 111350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:01:43,641-Speed 5457.32 samples/sec Loss 5.1830 LearningRate 0.0713 Epoch: 10 Global Step: 111360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:01:51,267-Speed 5371.80 samples/sec Loss 5.1846 LearningRate 0.0712 Epoch: 10 Global Step: 111370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:01:58,795-Speed 5441.47 samples/sec Loss 5.2553 LearningRate 0.0712 Epoch: 10 Global Step: 111380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:02:06,284-Speed 5470.17 samples/sec Loss 5.2156 LearningRate 0.0712 Epoch: 10 Global Step: 111390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:02:13,764-Speed 5476.95 samples/sec Loss 5.1618 LearningRate 0.0712 Epoch: 10 Global Step: 111400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:02:21,255-Speed 5467.96 samples/sec Loss 5.2002 LearningRate 0.0712 Epoch: 10 Global Step: 111410 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:02:28,807-Speed 5425.27 samples/sec Loss 5.2006 LearningRate 0.0712 Epoch: 10 Global Step: 111420 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:02:36,385-Speed 5406.46 samples/sec Loss 5.1831 LearningRate 0.0712 Epoch: 10 Global Step: 111430 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:02:43,933-Speed 5426.89 samples/sec Loss 5.2137 LearningRate 0.0711 Epoch: 10 Global Step: 111440 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:02:51,476-Speed 5430.51 samples/sec Loss 5.1943 LearningRate 0.0711 Epoch: 10 Global Step: 111450 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:02:59,007-Speed 5440.55 samples/sec Loss 5.1493 LearningRate 0.0711 Epoch: 10 Global Step: 111460 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:03:06,622-Speed 5379.49 samples/sec Loss 5.1663 LearningRate 0.0711 Epoch: 10 Global Step: 111470 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:03:14,122-Speed 5462.23 samples/sec Loss 5.1265 LearningRate 0.0711 Epoch: 10 Global Step: 111480 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:03:21,610-Speed 5470.24 samples/sec Loss 5.2157 LearningRate 0.0711 Epoch: 10 Global Step: 111490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:03:29,069-Speed 5492.26 samples/sec Loss 5.1319 LearningRate 0.0711 Epoch: 10 Global Step: 111500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:03:36,548-Speed 5477.96 samples/sec Loss 5.2336 LearningRate 0.0710 Epoch: 10 Global Step: 111510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:03:44,021-Speed 5482.11 samples/sec Loss 5.1308 LearningRate 0.0710 Epoch: 10 Global Step: 111520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:03:51,611-Speed 5396.70 samples/sec Loss 5.1901 LearningRate 0.0710 Epoch: 10 Global Step: 111530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:03:59,072-Speed 5490.76 samples/sec Loss 5.1097 LearningRate 0.0710 Epoch: 10 Global Step: 111540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:04:06,582-Speed 5454.58 samples/sec Loss 5.1625 LearningRate 0.0710 Epoch: 10 Global Step: 111550 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:04:14,157-Speed 5408.63 samples/sec Loss 5.1963 LearningRate 0.0710 Epoch: 10 Global Step: 111560 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:04:21,605-Speed 5499.65 samples/sec Loss 5.1562 LearningRate 0.0710 Epoch: 10 Global Step: 111570 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:04:29,079-Speed 5480.85 samples/sec Loss 5.2111 LearningRate 0.0709 Epoch: 10 Global Step: 111580 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:04:36,533-Speed 5496.29 samples/sec Loss 5.1450 LearningRate 0.0709 Epoch: 10 Global Step: 111590 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:04:44,169-Speed 5364.68 samples/sec Loss 5.1604 LearningRate 0.0709 Epoch: 10 Global Step: 111600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:04:51,822-Speed 5352.79 samples/sec Loss 5.1653 LearningRate 0.0709 Epoch: 10 Global Step: 111610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:04:59,373-Speed 5425.06 samples/sec Loss 5.1559 LearningRate 0.0709 Epoch: 10 Global Step: 111620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:05:06,821-Speed 5500.60 samples/sec Loss 5.2222 LearningRate 0.0709 Epoch: 10 Global Step: 111630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:05:14,345-Speed 5444.56 samples/sec Loss 5.1701 LearningRate 0.0708 Epoch: 10 Global Step: 111640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:05:21,815-Speed 5483.63 samples/sec Loss 5.1881 LearningRate 0.0708 Epoch: 10 Global Step: 111650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:05:29,270-Speed 5495.00 samples/sec Loss 5.1945 LearningRate 0.0708 Epoch: 10 Global Step: 111660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:05:36,707-Speed 5509.18 samples/sec Loss 5.1258 LearningRate 0.0708 Epoch: 10 Global Step: 111670 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:05:44,195-Speed 5470.28 samples/sec Loss 5.1404 LearningRate 0.0708 Epoch: 10 Global Step: 111680 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:05:51,658-Speed 5489.21 samples/sec Loss 5.1907 LearningRate 0.0708 Epoch: 10 Global Step: 111690 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:05:59,120-Speed 5489.75 samples/sec Loss 5.1749 LearningRate 0.0708 Epoch: 10 Global Step: 111700 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:06:06,566-Speed 5502.43 samples/sec Loss 5.1372 LearningRate 0.0707 Epoch: 10 Global Step: 111710 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:06:14,087-Speed 5446.45 samples/sec Loss 5.1630 LearningRate 0.0707 Epoch: 10 Global Step: 111720 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:06:21,566-Speed 5476.85 samples/sec Loss 5.1630 LearningRate 0.0707 Epoch: 10 Global Step: 111730 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:06:28,991-Speed 5517.43 samples/sec Loss 5.1760 LearningRate 0.0707 Epoch: 10 Global Step: 111740 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:06:36,403-Speed 5527.42 samples/sec Loss 5.1420 LearningRate 0.0707 Epoch: 10 Global Step: 111750 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:06:43,866-Speed 5489.03 samples/sec Loss 5.2180 LearningRate 0.0707 Epoch: 10 Global Step: 111760 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:06:51,311-Speed 5502.32 samples/sec Loss 5.1468 LearningRate 0.0707 Epoch: 10 Global Step: 111770 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:06:58,744-Speed 5511.08 samples/sec Loss 5.1770 LearningRate 0.0706 Epoch: 10 Global Step: 111780 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:07:06,293-Speed 5426.61 samples/sec Loss 5.1616 LearningRate 0.0706 Epoch: 10 Global Step: 111790 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:07:13,762-Speed 5485.02 samples/sec Loss 5.1550 LearningRate 0.0706 Epoch: 10 Global Step: 111800 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:07:21,410-Speed 5356.39 samples/sec Loss 5.1175 LearningRate 0.0706 Epoch: 10 Global Step: 111810 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:07:28,894-Speed 5472.94 samples/sec Loss 5.1692 LearningRate 0.0706 Epoch: 10 Global Step: 111820 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:07:36,376-Speed 5475.82 samples/sec Loss 5.1734 LearningRate 0.0706 Epoch: 10 Global Step: 111830 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:07:43,863-Speed 5471.26 samples/sec Loss 5.1366 LearningRate 0.0706 Epoch: 10 Global Step: 111840 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:07:51,302-Speed 5507.31 samples/sec Loss 5.1431 LearningRate 0.0705 Epoch: 10 Global Step: 111850 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:07:58,788-Speed 5471.88 samples/sec Loss 5.0931 LearningRate 0.0705 Epoch: 10 Global Step: 111860 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:08:06,268-Speed 5477.00 samples/sec Loss 5.0906 LearningRate 0.0705 Epoch: 10 Global Step: 111870 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:08:13,725-Speed 5493.55 samples/sec Loss 5.1607 LearningRate 0.0705 Epoch: 10 Global Step: 111880 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:08:21,235-Speed 5454.68 samples/sec Loss 5.0932 LearningRate 0.0705 Epoch: 10 Global Step: 111890 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:08:28,829-Speed 5394.47 samples/sec Loss 5.1870 LearningRate 0.0705 Epoch: 10 Global Step: 111900 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:08:36,323-Speed 5466.37 samples/sec Loss 5.1038 LearningRate 0.0704 Epoch: 10 Global Step: 111910 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:08:43,778-Speed 5495.74 samples/sec Loss 5.1702 LearningRate 0.0704 Epoch: 10 Global Step: 111920 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:08:51,289-Speed 5453.86 samples/sec Loss 5.1497 LearningRate 0.0704 Epoch: 10 Global Step: 111930 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:08:58,758-Speed 5484.29 samples/sec Loss 5.1362 LearningRate 0.0704 Epoch: 10 Global Step: 111940 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:09:06,243-Speed 5473.63 samples/sec Loss 5.1681 LearningRate 0.0704 Epoch: 10 Global Step: 111950 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:09:13,799-Speed 5421.71 samples/sec Loss 5.1343 LearningRate 0.0704 Epoch: 10 Global Step: 111960 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:09:21,398-Speed 5390.81 samples/sec Loss 5.1418 LearningRate 0.0704 Epoch: 10 Global Step: 111970 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:09:28,850-Speed 5496.93 samples/sec Loss 5.1259 LearningRate 0.0703 Epoch: 10 Global Step: 111980 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:09:36,322-Speed 5482.51 samples/sec Loss 5.1250 LearningRate 0.0703 Epoch: 10 Global Step: 111990 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:09:43,843-Speed 5446.71 samples/sec Loss 5.1607 LearningRate 0.0703 Epoch: 10 Global Step: 112000 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:10:27,648-[lfw][112000]XNorm: 23.461704 Training: 2022-01-08 20:10:27,648-[lfw][112000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-01-08 20:10:27,649-[lfw][112000]Accuracy-Highest: 0.99817 Training: 2022-01-08 20:11:18,659-[cfp_fp][112000]XNorm: 21.740752 Training: 2022-01-08 20:11:18,660-[cfp_fp][112000]Accuracy-Flip: 0.99057+-0.00415 Training: 2022-01-08 20:11:18,661-[cfp_fp][112000]Accuracy-Highest: 0.99057 Training: 2022-01-08 20:12:02,679-[agedb_30][112000]XNorm: 23.420655 Training: 2022-01-08 20:12:02,681-[agedb_30][112000]Accuracy-Flip: 0.97800+-0.00718 Training: 2022-01-08 20:12:02,681-[agedb_30][112000]Accuracy-Highest: 0.97917 Training: 2022-01-08 20:12:10,108-Speed 280.04 samples/sec Loss 5.1471 LearningRate 0.0703 Epoch: 10 Global Step: 112010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:12:17,621-Speed 5453.30 samples/sec Loss 5.1303 LearningRate 0.0703 Epoch: 10 Global Step: 112020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:12:25,050-Speed 5515.07 samples/sec Loss 5.1593 LearningRate 0.0703 Epoch: 10 Global Step: 112030 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:12:32,541-Speed 5469.14 samples/sec Loss 5.1509 LearningRate 0.0703 Epoch: 10 Global Step: 112040 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:12:40,257-Speed 5309.76 samples/sec Loss 5.1513 LearningRate 0.0702 Epoch: 10 Global Step: 112050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:12:47,882-Speed 5373.57 samples/sec Loss 5.1337 LearningRate 0.0702 Epoch: 10 Global Step: 112060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:12:55,450-Speed 5413.09 samples/sec Loss 5.1214 LearningRate 0.0702 Epoch: 10 Global Step: 112070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:13:02,930-Speed 5476.69 samples/sec Loss 5.1090 LearningRate 0.0702 Epoch: 10 Global Step: 112080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:13:10,414-Speed 5473.86 samples/sec Loss 5.1276 LearningRate 0.0702 Epoch: 10 Global Step: 112090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:13:17,881-Speed 5486.82 samples/sec Loss 5.1539 LearningRate 0.0702 Epoch: 10 Global Step: 112100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:13:25,349-Speed 5484.93 samples/sec Loss 5.1063 LearningRate 0.0702 Epoch: 10 Global Step: 112110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:13:32,800-Speed 5498.29 samples/sec Loss 5.0837 LearningRate 0.0701 Epoch: 10 Global Step: 112120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:13:40,197-Speed 5537.90 samples/sec Loss 5.1139 LearningRate 0.0701 Epoch: 10 Global Step: 112130 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:13:47,665-Speed 5485.85 samples/sec Loss 5.0939 LearningRate 0.0701 Epoch: 10 Global Step: 112140 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:13:55,076-Speed 5527.75 samples/sec Loss 5.1512 LearningRate 0.0701 Epoch: 10 Global Step: 112150 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:14:02,549-Speed 5481.64 samples/sec Loss 5.1688 LearningRate 0.0701 Epoch: 10 Global Step: 112160 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:14:09,983-Speed 5510.09 samples/sec Loss 5.1415 LearningRate 0.0701 Epoch: 10 Global Step: 112170 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:14:17,454-Speed 5483.62 samples/sec Loss 5.1278 LearningRate 0.0701 Epoch: 10 Global Step: 112180 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:14:24,995-Speed 5432.49 samples/sec Loss 5.1177 LearningRate 0.0700 Epoch: 10 Global Step: 112190 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:14:32,441-Speed 5501.21 samples/sec Loss 5.1047 LearningRate 0.0700 Epoch: 10 Global Step: 112200 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:14:39,942-Speed 5461.45 samples/sec Loss 5.1098 LearningRate 0.0700 Epoch: 10 Global Step: 112210 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:14:47,378-Speed 5509.65 samples/sec Loss 5.1380 LearningRate 0.0700 Epoch: 10 Global Step: 112220 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:14:54,834-Speed 5494.44 samples/sec Loss 5.1175 LearningRate 0.0700 Epoch: 10 Global Step: 112230 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:15:02,268-Speed 5510.09 samples/sec Loss 5.1493 LearningRate 0.0700 Epoch: 10 Global Step: 112240 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:15:09,755-Speed 5471.58 samples/sec Loss 5.1862 LearningRate 0.0699 Epoch: 10 Global Step: 112250 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:15:17,292-Speed 5435.77 samples/sec Loss 5.1379 LearningRate 0.0699 Epoch: 10 Global Step: 112260 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:15:24,764-Speed 5482.51 samples/sec Loss 5.1191 LearningRate 0.0699 Epoch: 10 Global Step: 112270 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:15:32,240-Speed 5479.73 samples/sec Loss 5.1605 LearningRate 0.0699 Epoch: 10 Global Step: 112280 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:15:39,822-Speed 5402.56 samples/sec Loss 5.1465 LearningRate 0.0699 Epoch: 10 Global Step: 112290 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:15:47,352-Speed 5440.71 samples/sec Loss 5.1164 LearningRate 0.0699 Epoch: 10 Global Step: 112300 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:15:55,064-Speed 5311.70 samples/sec Loss 5.1286 LearningRate 0.0699 Epoch: 10 Global Step: 112310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:16:02,561-Speed 5464.55 samples/sec Loss 5.1492 LearningRate 0.0698 Epoch: 10 Global Step: 112320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:16:10,044-Speed 5473.98 samples/sec Loss 5.1747 LearningRate 0.0698 Epoch: 10 Global Step: 112330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:16:17,537-Speed 5467.79 samples/sec Loss 5.1099 LearningRate 0.0698 Epoch: 10 Global Step: 112340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:16:25,006-Speed 5484.48 samples/sec Loss 5.1366 LearningRate 0.0698 Epoch: 10 Global Step: 112350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:16:32,489-Speed 5474.92 samples/sec Loss 5.0841 LearningRate 0.0698 Epoch: 10 Global Step: 112360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:16:39,955-Speed 5486.35 samples/sec Loss 5.1324 LearningRate 0.0698 Epoch: 10 Global Step: 112370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:16:47,417-Speed 5489.97 samples/sec Loss 5.0603 LearningRate 0.0698 Epoch: 10 Global Step: 112380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:16:54,899-Speed 5475.49 samples/sec Loss 5.1238 LearningRate 0.0697 Epoch: 10 Global Step: 112390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:17:02,419-Speed 5447.68 samples/sec Loss 5.0864 LearningRate 0.0697 Epoch: 10 Global Step: 112400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:17:10,017-Speed 5391.39 samples/sec Loss 5.1322 LearningRate 0.0697 Epoch: 10 Global Step: 112410 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:17:17,566-Speed 5426.10 samples/sec Loss 5.1009 LearningRate 0.0697 Epoch: 10 Global Step: 112420 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:17:25,031-Speed 5488.17 samples/sec Loss 5.0915 LearningRate 0.0697 Epoch: 10 Global Step: 112430 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:17:32,618-Speed 5399.24 samples/sec Loss 5.1033 LearningRate 0.0697 Epoch: 10 Global Step: 112440 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:17:40,068-Speed 5498.75 samples/sec Loss 5.1455 LearningRate 0.0697 Epoch: 10 Global Step: 112450 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:17:47,555-Speed 5471.38 samples/sec Loss 5.1160 LearningRate 0.0696 Epoch: 10 Global Step: 112460 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:17:54,999-Speed 5503.56 samples/sec Loss 5.0774 LearningRate 0.0696 Epoch: 10 Global Step: 112470 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:18:02,509-Speed 5454.20 samples/sec Loss 5.0868 LearningRate 0.0696 Epoch: 10 Global Step: 112480 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:18:09,991-Speed 5474.99 samples/sec Loss 5.1288 LearningRate 0.0696 Epoch: 10 Global Step: 112490 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:18:17,816-Speed 5235.08 samples/sec Loss 5.1352 LearningRate 0.0696 Epoch: 10 Global Step: 112500 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:18:25,296-Speed 5477.27 samples/sec Loss 5.0939 LearningRate 0.0696 Epoch: 10 Global Step: 112510 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:18:32,742-Speed 5501.44 samples/sec Loss 5.1396 LearningRate 0.0696 Epoch: 10 Global Step: 112520 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:18:40,304-Speed 5417.33 samples/sec Loss 5.1567 LearningRate 0.0695 Epoch: 10 Global Step: 112530 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:18:47,899-Speed 5393.74 samples/sec Loss 5.1280 LearningRate 0.0695 Epoch: 10 Global Step: 112540 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:18:55,355-Speed 5494.13 samples/sec Loss 5.1133 LearningRate 0.0695 Epoch: 10 Global Step: 112550 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:19:02,912-Speed 5421.09 samples/sec Loss 5.1108 LearningRate 0.0695 Epoch: 10 Global Step: 112560 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:19:10,399-Speed 5471.88 samples/sec Loss 5.1157 LearningRate 0.0695 Epoch: 10 Global Step: 112570 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:19:18,000-Speed 5389.37 samples/sec Loss 5.1465 LearningRate 0.0695 Epoch: 10 Global Step: 112580 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:19:25,476-Speed 5479.07 samples/sec Loss 5.1311 LearningRate 0.0694 Epoch: 10 Global Step: 112590 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:19:32,926-Speed 5498.82 samples/sec Loss 5.0938 LearningRate 0.0694 Epoch: 10 Global Step: 112600 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:19:40,376-Speed 5499.33 samples/sec Loss 5.1216 LearningRate 0.0694 Epoch: 10 Global Step: 112610 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:19:47,801-Speed 5516.40 samples/sec Loss 5.1031 LearningRate 0.0694 Epoch: 10 Global Step: 112620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:19:55,235-Speed 5511.08 samples/sec Loss 5.1470 LearningRate 0.0694 Epoch: 10 Global Step: 112630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:20:02,658-Speed 5518.94 samples/sec Loss 5.0801 LearningRate 0.0694 Epoch: 10 Global Step: 112640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:20:10,113-Speed 5495.06 samples/sec Loss 5.1584 LearningRate 0.0694 Epoch: 10 Global Step: 112650 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:20:17,547-Speed 5510.08 samples/sec Loss 5.0937 LearningRate 0.0693 Epoch: 10 Global Step: 112660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:20:24,997-Speed 5499.05 samples/sec Loss 5.1260 LearningRate 0.0693 Epoch: 10 Global Step: 112670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:20:32,424-Speed 5515.87 samples/sec Loss 5.0867 LearningRate 0.0693 Epoch: 10 Global Step: 112680 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:20:39,930-Speed 5457.67 samples/sec Loss 5.1047 LearningRate 0.0693 Epoch: 10 Global Step: 112690 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:20:47,345-Speed 5524.85 samples/sec Loss 5.1151 LearningRate 0.0693 Epoch: 10 Global Step: 112700 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:20:54,766-Speed 5520.20 samples/sec Loss 5.0683 LearningRate 0.0693 Epoch: 10 Global Step: 112710 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:21:02,167-Speed 5535.25 samples/sec Loss 5.0684 LearningRate 0.0693 Epoch: 10 Global Step: 112720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:21:09,639-Speed 5482.38 samples/sec Loss 5.0511 LearningRate 0.0692 Epoch: 10 Global Step: 112730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:21:17,084-Speed 5502.80 samples/sec Loss 5.1081 LearningRate 0.0692 Epoch: 10 Global Step: 112740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:21:24,473-Speed 5544.36 samples/sec Loss 5.1251 LearningRate 0.0692 Epoch: 10 Global Step: 112750 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:21:31,941-Speed 5485.31 samples/sec Loss 5.0515 LearningRate 0.0692 Epoch: 10 Global Step: 112760 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:21:39,422-Speed 5476.43 samples/sec Loss 5.1337 LearningRate 0.0692 Epoch: 10 Global Step: 112770 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:21:46,856-Speed 5510.51 samples/sec Loss 5.0651 LearningRate 0.0692 Epoch: 10 Global Step: 112780 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:21:54,300-Speed 5502.93 samples/sec Loss 5.1324 LearningRate 0.0692 Epoch: 10 Global Step: 112790 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:22:01,751-Speed 5498.28 samples/sec Loss 5.1149 LearningRate 0.0691 Epoch: 10 Global Step: 112800 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:22:09,231-Speed 5477.07 samples/sec Loss 5.1175 LearningRate 0.0691 Epoch: 10 Global Step: 112810 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:22:16,696-Speed 5487.60 samples/sec Loss 5.1075 LearningRate 0.0691 Epoch: 10 Global Step: 112820 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:22:24,115-Speed 5521.38 samples/sec Loss 5.1343 LearningRate 0.0691 Epoch: 10 Global Step: 112830 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:22:31,531-Speed 5524.12 samples/sec Loss 5.0521 LearningRate 0.0691 Epoch: 10 Global Step: 112840 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:22:38,955-Speed 5518.06 samples/sec Loss 5.1245 LearningRate 0.0691 Epoch: 10 Global Step: 112850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:22:46,473-Speed 5449.15 samples/sec Loss 5.1116 LearningRate 0.0691 Epoch: 10 Global Step: 112860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:22:53,945-Speed 5482.07 samples/sec Loss 5.1429 LearningRate 0.0690 Epoch: 10 Global Step: 112870 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:23:01,391-Speed 5502.05 samples/sec Loss 5.1190 LearningRate 0.0690 Epoch: 10 Global Step: 112880 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:23:08,964-Speed 5409.64 samples/sec Loss 5.0755 LearningRate 0.0690 Epoch: 10 Global Step: 112890 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:23:16,556-Speed 5396.17 samples/sec Loss 5.0827 LearningRate 0.0690 Epoch: 10 Global Step: 112900 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:23:24,026-Speed 5483.39 samples/sec Loss 5.0853 LearningRate 0.0690 Epoch: 10 Global Step: 112910 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:23:31,591-Speed 5415.24 samples/sec Loss 5.0993 LearningRate 0.0690 Epoch: 10 Global Step: 112920 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:23:39,061-Speed 5484.17 samples/sec Loss 5.0597 LearningRate 0.0690 Epoch: 10 Global Step: 112930 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:23:46,524-Speed 5488.93 samples/sec Loss 5.0840 LearningRate 0.0689 Epoch: 10 Global Step: 112940 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:23:53,988-Speed 5488.11 samples/sec Loss 5.0909 LearningRate 0.0689 Epoch: 10 Global Step: 112950 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:24:01,468-Speed 5477.05 samples/sec Loss 5.0889 LearningRate 0.0689 Epoch: 10 Global Step: 112960 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:24:09,007-Speed 5434.02 samples/sec Loss 5.0528 LearningRate 0.0689 Epoch: 10 Global Step: 112970 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:24:16,468-Speed 5490.71 samples/sec Loss 5.1101 LearningRate 0.0689 Epoch: 10 Global Step: 112980 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:24:23,933-Speed 5487.75 samples/sec Loss 5.0890 LearningRate 0.0689 Epoch: 10 Global Step: 112990 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:24:31,372-Speed 5507.12 samples/sec Loss 5.1051 LearningRate 0.0688 Epoch: 10 Global Step: 113000 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:24:38,908-Speed 5436.13 samples/sec Loss 5.1073 LearningRate 0.0688 Epoch: 10 Global Step: 113010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:24:46,332-Speed 5517.81 samples/sec Loss 5.1040 LearningRate 0.0688 Epoch: 10 Global Step: 113020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:24:53,880-Speed 5427.36 samples/sec Loss 5.0746 LearningRate 0.0688 Epoch: 10 Global Step: 113030 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:25:01,372-Speed 5467.46 samples/sec Loss 5.1224 LearningRate 0.0688 Epoch: 10 Global Step: 113040 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:25:08,844-Speed 5483.17 samples/sec Loss 5.1358 LearningRate 0.0688 Epoch: 10 Global Step: 113050 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:25:16,370-Speed 5443.23 samples/sec Loss 5.1066 LearningRate 0.0688 Epoch: 10 Global Step: 113060 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:25:23,920-Speed 5425.60 samples/sec Loss 5.1349 LearningRate 0.0687 Epoch: 10 Global Step: 113070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:25:31,401-Speed 5476.18 samples/sec Loss 5.0536 LearningRate 0.0687 Epoch: 10 Global Step: 113080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:25:38,840-Speed 5506.76 samples/sec Loss 5.0729 LearningRate 0.0687 Epoch: 10 Global Step: 113090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:25:46,312-Speed 5483.19 samples/sec Loss 5.0884 LearningRate 0.0687 Epoch: 10 Global Step: 113100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:25:53,789-Speed 5478.17 samples/sec Loss 5.0754 LearningRate 0.0687 Epoch: 10 Global Step: 113110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:26:01,346-Speed 5421.05 samples/sec Loss 5.1346 LearningRate 0.0687 Epoch: 10 Global Step: 113120 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:26:08,730-Speed 5547.91 samples/sec Loss 5.1208 LearningRate 0.0687 Epoch: 10 Global Step: 113130 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:26:16,187-Speed 5493.89 samples/sec Loss 5.0745 LearningRate 0.0686 Epoch: 10 Global Step: 113140 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:26:23,633-Speed 5501.69 samples/sec Loss 5.0533 LearningRate 0.0686 Epoch: 10 Global Step: 113150 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:26:31,114-Speed 5475.16 samples/sec Loss 5.0666 LearningRate 0.0686 Epoch: 10 Global Step: 113160 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:26:38,584-Speed 5484.76 samples/sec Loss 5.0775 LearningRate 0.0686 Epoch: 10 Global Step: 113170 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:26:46,008-Speed 5517.45 samples/sec Loss 5.0785 LearningRate 0.0686 Epoch: 10 Global Step: 113180 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:26:53,503-Speed 5465.76 samples/sec Loss 5.0383 LearningRate 0.0686 Epoch: 10 Global Step: 113190 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:27:00,940-Speed 5508.60 samples/sec Loss 5.0921 LearningRate 0.0686 Epoch: 10 Global Step: 113200 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:27:08,479-Speed 5433.64 samples/sec Loss 5.0924 LearningRate 0.0685 Epoch: 10 Global Step: 113210 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:27:16,263-Speed 5263.11 samples/sec Loss 5.1224 LearningRate 0.0685 Epoch: 10 Global Step: 113220 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-08 20:27:23,664-Speed 5535.12 samples/sec Loss 5.1001 LearningRate 0.0685 Epoch: 10 Global Step: 113230 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:27:31,050-Speed 5546.10 samples/sec Loss 4.9954 LearningRate 0.0685 Epoch: 10 Global Step: 113240 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:27:38,502-Speed 5497.39 samples/sec Loss 5.0387 LearningRate 0.0685 Epoch: 10 Global Step: 113250 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:27:45,897-Speed 5539.52 samples/sec Loss 5.1091 LearningRate 0.0685 Epoch: 10 Global Step: 113260 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:27:53,357-Speed 5491.68 samples/sec Loss 5.1266 LearningRate 0.0685 Epoch: 10 Global Step: 113270 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:28:00,793-Speed 5508.83 samples/sec Loss 5.0960 LearningRate 0.0684 Epoch: 10 Global Step: 113280 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:28:08,206-Speed 5526.26 samples/sec Loss 5.0893 LearningRate 0.0684 Epoch: 10 Global Step: 113290 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:28:15,722-Speed 5450.07 samples/sec Loss 5.0915 LearningRate 0.0684 Epoch: 10 Global Step: 113300 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:28:23,168-Speed 5502.23 samples/sec Loss 5.0238 LearningRate 0.0684 Epoch: 10 Global Step: 113310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:28:30,593-Speed 5517.09 samples/sec Loss 5.0461 LearningRate 0.0684 Epoch: 10 Global Step: 113320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:28:38,011-Speed 5522.26 samples/sec Loss 4.9855 LearningRate 0.0684 Epoch: 10 Global Step: 113330 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:28:45,601-Speed 5397.34 samples/sec Loss 5.0444 LearningRate 0.0684 Epoch: 10 Global Step: 113340 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:28:53,057-Speed 5494.95 samples/sec Loss 5.0712 LearningRate 0.0683 Epoch: 10 Global Step: 113350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:29:00,548-Speed 5468.21 samples/sec Loss 5.0829 LearningRate 0.0683 Epoch: 10 Global Step: 113360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:29:07,990-Speed 5504.37 samples/sec Loss 5.0448 LearningRate 0.0683 Epoch: 10 Global Step: 113370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:29:15,408-Speed 5522.81 samples/sec Loss 5.0257 LearningRate 0.0683 Epoch: 10 Global Step: 113380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:29:22,884-Speed 5480.04 samples/sec Loss 5.0872 LearningRate 0.0683 Epoch: 10 Global Step: 113390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:29:30,329-Speed 5501.68 samples/sec Loss 5.0372 LearningRate 0.0683 Epoch: 10 Global Step: 113400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:29:37,804-Speed 5480.10 samples/sec Loss 5.0119 LearningRate 0.0683 Epoch: 10 Global Step: 113410 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:29:45,305-Speed 5462.10 samples/sec Loss 5.0595 LearningRate 0.0682 Epoch: 10 Global Step: 113420 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:29:52,733-Speed 5514.85 samples/sec Loss 5.1309 LearningRate 0.0682 Epoch: 10 Global Step: 113430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:30:00,179-Speed 5501.84 samples/sec Loss 5.1177 LearningRate 0.0682 Epoch: 10 Global Step: 113440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:30:07,682-Speed 5459.82 samples/sec Loss 5.0638 LearningRate 0.0682 Epoch: 10 Global Step: 113450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:30:15,167-Speed 5472.66 samples/sec Loss 5.0889 LearningRate 0.0682 Epoch: 10 Global Step: 113460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:30:22,627-Speed 5492.17 samples/sec Loss 5.0401 LearningRate 0.0682 Epoch: 10 Global Step: 113470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:30:30,045-Speed 5522.01 samples/sec Loss 5.0856 LearningRate 0.0682 Epoch: 10 Global Step: 113480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-08 20:30:37,487-Speed 5504.39 samples/sec Loss 5.0706 LearningRate 0.0681 Epoch: 10 Global Step: 113490 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-08 20:30:44,905-Speed 5522.66 samples/sec Loss 5.0489 LearningRate 0.0681 Epoch: 10 Global Step: 113500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:30:52,418-Speed 5453.35 samples/sec Loss 5.1222 LearningRate 0.0681 Epoch: 10 Global Step: 113510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:30:59,844-Speed 5516.28 samples/sec Loss 5.0861 LearningRate 0.0681 Epoch: 10 Global Step: 113520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:31:07,297-Speed 5496.49 samples/sec Loss 5.0636 LearningRate 0.0681 Epoch: 10 Global Step: 113530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:31:14,679-Speed 5548.96 samples/sec Loss 5.0631 LearningRate 0.0681 Epoch: 10 Global Step: 113540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:31:22,079-Speed 5536.27 samples/sec Loss 5.0747 LearningRate 0.0680 Epoch: 10 Global Step: 113550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:31:29,526-Speed 5500.68 samples/sec Loss 5.0343 LearningRate 0.0680 Epoch: 10 Global Step: 113560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:31:36,957-Speed 5512.57 samples/sec Loss 5.1321 LearningRate 0.0680 Epoch: 10 Global Step: 113570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:31:44,471-Speed 5452.11 samples/sec Loss 5.1165 LearningRate 0.0680 Epoch: 10 Global Step: 113580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:31:52,038-Speed 5413.74 samples/sec Loss 5.1198 LearningRate 0.0680 Epoch: 10 Global Step: 113590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:31:59,638-Speed 5390.55 samples/sec Loss 5.0956 LearningRate 0.0680 Epoch: 10 Global Step: 113600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:32:07,082-Speed 5502.79 samples/sec Loss 5.0457 LearningRate 0.0680 Epoch: 10 Global Step: 113610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:32:14,529-Speed 5501.19 samples/sec Loss 5.0712 LearningRate 0.0679 Epoch: 10 Global Step: 113620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:32:21,967-Speed 5507.27 samples/sec Loss 5.0838 LearningRate 0.0679 Epoch: 10 Global Step: 113630 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:32:29,400-Speed 5511.84 samples/sec Loss 5.0543 LearningRate 0.0679 Epoch: 10 Global Step: 113640 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:32:36,827-Speed 5515.38 samples/sec Loss 5.1030 LearningRate 0.0679 Epoch: 10 Global Step: 113650 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:32:44,303-Speed 5479.64 samples/sec Loss 5.0954 LearningRate 0.0679 Epoch: 10 Global Step: 113660 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:32:51,803-Speed 5462.74 samples/sec Loss 5.0429 LearningRate 0.0679 Epoch: 10 Global Step: 113670 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:32:59,312-Speed 5455.35 samples/sec Loss 5.0618 LearningRate 0.0679 Epoch: 10 Global Step: 113680 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:33:06,808-Speed 5465.23 samples/sec Loss 5.0372 LearningRate 0.0678 Epoch: 10 Global Step: 113690 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:33:14,278-Speed 5483.62 samples/sec Loss 5.0718 LearningRate 0.0678 Epoch: 10 Global Step: 113700 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:33:21,732-Speed 5496.43 samples/sec Loss 5.0774 LearningRate 0.0678 Epoch: 10 Global Step: 113710 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:33:29,239-Speed 5456.84 samples/sec Loss 5.0581 LearningRate 0.0678 Epoch: 10 Global Step: 113720 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:33:36,720-Speed 5476.10 samples/sec Loss 5.0113 LearningRate 0.0678 Epoch: 10 Global Step: 113730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:33:44,163-Speed 5503.26 samples/sec Loss 5.0346 LearningRate 0.0678 Epoch: 10 Global Step: 113740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:33:51,608-Speed 5502.83 samples/sec Loss 5.0679 LearningRate 0.0678 Epoch: 10 Global Step: 113750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:33:59,046-Speed 5508.22 samples/sec Loss 5.0295 LearningRate 0.0677 Epoch: 10 Global Step: 113760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:34:06,468-Speed 5518.76 samples/sec Loss 5.0509 LearningRate 0.0677 Epoch: 10 Global Step: 113770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:34:13,928-Speed 5491.97 samples/sec Loss 4.9956 LearningRate 0.0677 Epoch: 10 Global Step: 113780 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:34:21,421-Speed 5467.13 samples/sec Loss 5.0406 LearningRate 0.0677 Epoch: 10 Global Step: 113790 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:34:28,951-Speed 5440.61 samples/sec Loss 5.0565 LearningRate 0.0677 Epoch: 10 Global Step: 113800 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:34:36,441-Speed 5468.68 samples/sec Loss 5.0633 LearningRate 0.0677 Epoch: 10 Global Step: 113810 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:34:43,962-Speed 5447.14 samples/sec Loss 5.0235 LearningRate 0.0677 Epoch: 10 Global Step: 113820 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:34:51,469-Speed 5456.47 samples/sec Loss 5.0599 LearningRate 0.0676 Epoch: 10 Global Step: 113830 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:34:58,947-Speed 5478.56 samples/sec Loss 5.0767 LearningRate 0.0676 Epoch: 10 Global Step: 113840 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:35:06,481-Speed 5437.52 samples/sec Loss 5.0373 LearningRate 0.0676 Epoch: 10 Global Step: 113850 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:35:13,961-Speed 5476.16 samples/sec Loss 5.0630 LearningRate 0.0676 Epoch: 10 Global Step: 113860 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:35:21,438-Speed 5478.81 samples/sec Loss 5.0428 LearningRate 0.0676 Epoch: 10 Global Step: 113870 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:35:28,911-Speed 5482.09 samples/sec Loss 5.0420 LearningRate 0.0676 Epoch: 10 Global Step: 113880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:35:36,405-Speed 5466.60 samples/sec Loss 5.0605 LearningRate 0.0676 Epoch: 10 Global Step: 113890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:35:43,946-Speed 5432.44 samples/sec Loss 5.0743 LearningRate 0.0675 Epoch: 10 Global Step: 113900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:35:51,446-Speed 5461.72 samples/sec Loss 5.0886 LearningRate 0.0675 Epoch: 10 Global Step: 113910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:35:58,922-Speed 5479.90 samples/sec Loss 5.0298 LearningRate 0.0675 Epoch: 10 Global Step: 113920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:36:06,414-Speed 5468.41 samples/sec Loss 5.0092 LearningRate 0.0675 Epoch: 10 Global Step: 113930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:36:14,081-Speed 5342.79 samples/sec Loss 5.0924 LearningRate 0.0675 Epoch: 10 Global Step: 113940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:36:21,788-Speed 5315.33 samples/sec Loss 5.0390 LearningRate 0.0675 Epoch: 10 Global Step: 113950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:36:29,245-Speed 5493.86 samples/sec Loss 5.0876 LearningRate 0.0675 Epoch: 10 Global Step: 113960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:36:36,693-Speed 5500.49 samples/sec Loss 5.0655 LearningRate 0.0674 Epoch: 10 Global Step: 113970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:36:44,081-Speed 5544.12 samples/sec Loss 5.0217 LearningRate 0.0674 Epoch: 10 Global Step: 113980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:36:51,529-Speed 5500.30 samples/sec Loss 5.0927 LearningRate 0.0674 Epoch: 10 Global Step: 113990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:36:58,909-Speed 5551.57 samples/sec Loss 5.0728 LearningRate 0.0674 Epoch: 10 Global Step: 114000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:37:42,978-[lfw][114000]XNorm: 23.642814 Training: 2022-01-08 20:37:42,979-[lfw][114000]Accuracy-Flip: 0.99767+-0.00281 Training: 2022-01-08 20:37:42,979-[lfw][114000]Accuracy-Highest: 0.99817 Training: 2022-01-08 20:38:34,330-[cfp_fp][114000]XNorm: 22.002958 Training: 2022-01-08 20:38:34,331-[cfp_fp][114000]Accuracy-Flip: 0.99043+-0.00495 Training: 2022-01-08 20:38:34,331-[cfp_fp][114000]Accuracy-Highest: 0.99057 Training: 2022-01-08 20:39:18,603-[agedb_30][114000]XNorm: 23.458689 Training: 2022-01-08 20:39:18,604-[agedb_30][114000]Accuracy-Flip: 0.97733+-0.00943 Training: 2022-01-08 20:39:18,605-[agedb_30][114000]Accuracy-Highest: 0.97917 Training: 2022-01-08 20:39:26,034-Speed 278.41 samples/sec Loss 5.0199 LearningRate 0.0674 Epoch: 10 Global Step: 114010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:39:33,478-Speed 5504.09 samples/sec Loss 5.0805 LearningRate 0.0674 Epoch: 10 Global Step: 114020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:39:40,915-Speed 5509.11 samples/sec Loss 5.0395 LearningRate 0.0674 Epoch: 10 Global Step: 114030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:39:48,354-Speed 5507.02 samples/sec Loss 5.0449 LearningRate 0.0673 Epoch: 10 Global Step: 114040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:39:55,777-Speed 5519.91 samples/sec Loss 5.0741 LearningRate 0.0673 Epoch: 10 Global Step: 114050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:40:03,191-Speed 5525.71 samples/sec Loss 5.0447 LearningRate 0.0673 Epoch: 10 Global Step: 114060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:40:26,587-Speed 1750.83 samples/sec Loss 5.0380 LearningRate 0.0673 Epoch: 11 Global Step: 114070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:40:34,058-Speed 5483.40 samples/sec Loss 5.0296 LearningRate 0.0673 Epoch: 11 Global Step: 114080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:40:41,564-Speed 5457.78 samples/sec Loss 5.0638 LearningRate 0.0673 Epoch: 11 Global Step: 114090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:40:49,021-Speed 5494.10 samples/sec Loss 4.9859 LearningRate 0.0673 Epoch: 11 Global Step: 114100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:40:56,419-Speed 5537.55 samples/sec Loss 4.9813 LearningRate 0.0672 Epoch: 11 Global Step: 114110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:41:03,851-Speed 5512.01 samples/sec Loss 5.0120 LearningRate 0.0672 Epoch: 11 Global Step: 114120 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:41:11,285-Speed 5509.91 samples/sec Loss 5.0429 LearningRate 0.0672 Epoch: 11 Global Step: 114130 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:41:18,741-Speed 5495.06 samples/sec Loss 4.9895 LearningRate 0.0672 Epoch: 11 Global Step: 114140 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:41:26,140-Speed 5536.43 samples/sec Loss 5.0340 LearningRate 0.0672 Epoch: 11 Global Step: 114150 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:41:33,605-Speed 5488.04 samples/sec Loss 5.0652 LearningRate 0.0672 Epoch: 11 Global Step: 114160 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:41:41,402-Speed 5253.83 samples/sec Loss 5.0171 LearningRate 0.0672 Epoch: 11 Global Step: 114170 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:41:49,093-Speed 5325.79 samples/sec Loss 5.0094 LearningRate 0.0671 Epoch: 11 Global Step: 114180 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:41:56,840-Speed 5288.13 samples/sec Loss 5.0225 LearningRate 0.0671 Epoch: 11 Global Step: 114190 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:42:04,577-Speed 5295.15 samples/sec Loss 5.0312 LearningRate 0.0671 Epoch: 11 Global Step: 114200 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:42:12,341-Speed 5276.06 samples/sec Loss 4.9800 LearningRate 0.0671 Epoch: 11 Global Step: 114210 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:42:20,070-Speed 5300.75 samples/sec Loss 5.0054 LearningRate 0.0671 Epoch: 11 Global Step: 114220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:42:27,824-Speed 5282.72 samples/sec Loss 4.9886 LearningRate 0.0671 Epoch: 11 Global Step: 114230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:42:35,536-Speed 5311.71 samples/sec Loss 4.9686 LearningRate 0.0671 Epoch: 11 Global Step: 114240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:42:43,282-Speed 5288.62 samples/sec Loss 4.9683 LearningRate 0.0670 Epoch: 11 Global Step: 114250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:42:51,095-Speed 5243.69 samples/sec Loss 5.0407 LearningRate 0.0670 Epoch: 11 Global Step: 114260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:42:58,789-Speed 5324.02 samples/sec Loss 5.0153 LearningRate 0.0670 Epoch: 11 Global Step: 114270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:43:06,456-Speed 5343.24 samples/sec Loss 5.0232 LearningRate 0.0670 Epoch: 11 Global Step: 114280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:43:14,098-Speed 5360.25 samples/sec Loss 5.0189 LearningRate 0.0670 Epoch: 11 Global Step: 114290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:43:21,799-Speed 5319.50 samples/sec Loss 5.0222 LearningRate 0.0670 Epoch: 11 Global Step: 114300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:43:29,434-Speed 5365.54 samples/sec Loss 5.0263 LearningRate 0.0670 Epoch: 11 Global Step: 114310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:43:36,845-Speed 5527.70 samples/sec Loss 4.9916 LearningRate 0.0669 Epoch: 11 Global Step: 114320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:43:44,254-Speed 5528.76 samples/sec Loss 5.0307 LearningRate 0.0669 Epoch: 11 Global Step: 114330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:43:51,682-Speed 5515.43 samples/sec Loss 5.0360 LearningRate 0.0669 Epoch: 11 Global Step: 114340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:43:59,103-Speed 5520.17 samples/sec Loss 4.9866 LearningRate 0.0669 Epoch: 11 Global Step: 114350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:44:06,596-Speed 5467.32 samples/sec Loss 4.9827 LearningRate 0.0669 Epoch: 11 Global Step: 114360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:44:14,066-Speed 5484.01 samples/sec Loss 4.9961 LearningRate 0.0669 Epoch: 11 Global Step: 114370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:44:21,504-Speed 5507.79 samples/sec Loss 5.0203 LearningRate 0.0669 Epoch: 11 Global Step: 114380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:44:29,025-Speed 5446.53 samples/sec Loss 5.0082 LearningRate 0.0668 Epoch: 11 Global Step: 114390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:44:36,476-Speed 5498.53 samples/sec Loss 5.0131 LearningRate 0.0668 Epoch: 11 Global Step: 114400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:44:43,900-Speed 5517.44 samples/sec Loss 5.0158 LearningRate 0.0668 Epoch: 11 Global Step: 114410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:44:51,342-Speed 5504.61 samples/sec Loss 4.9578 LearningRate 0.0668 Epoch: 11 Global Step: 114420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:44:58,788-Speed 5502.28 samples/sec Loss 4.9551 LearningRate 0.0668 Epoch: 11 Global Step: 114430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:45:06,273-Speed 5472.70 samples/sec Loss 4.9862 LearningRate 0.0668 Epoch: 11 Global Step: 114440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:45:13,737-Speed 5488.57 samples/sec Loss 4.9962 LearningRate 0.0668 Epoch: 11 Global Step: 114450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:45:21,328-Speed 5396.73 samples/sec Loss 4.9906 LearningRate 0.0667 Epoch: 11 Global Step: 114460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:45:28,742-Speed 5525.29 samples/sec Loss 5.0096 LearningRate 0.0667 Epoch: 11 Global Step: 114470 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:45:36,328-Speed 5400.47 samples/sec Loss 4.9716 LearningRate 0.0667 Epoch: 11 Global Step: 114480 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:45:43,964-Speed 5364.96 samples/sec Loss 5.0139 LearningRate 0.0667 Epoch: 11 Global Step: 114490 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:45:51,553-Speed 5397.21 samples/sec Loss 4.9999 LearningRate 0.0667 Epoch: 11 Global Step: 114500 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:45:59,030-Speed 5479.30 samples/sec Loss 5.0246 LearningRate 0.0667 Epoch: 11 Global Step: 114510 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:46:06,589-Speed 5419.71 samples/sec Loss 5.0739 LearningRate 0.0666 Epoch: 11 Global Step: 114520 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:46:14,015-Speed 5516.12 samples/sec Loss 5.0386 LearningRate 0.0666 Epoch: 11 Global Step: 114530 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:46:21,509-Speed 5466.49 samples/sec Loss 5.0072 LearningRate 0.0666 Epoch: 11 Global Step: 114540 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:46:28,948-Speed 5506.84 samples/sec Loss 4.9597 LearningRate 0.0666 Epoch: 11 Global Step: 114550 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:46:36,440-Speed 5468.21 samples/sec Loss 4.9674 LearningRate 0.0666 Epoch: 11 Global Step: 114560 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:46:43,928-Speed 5470.13 samples/sec Loss 5.0168 LearningRate 0.0666 Epoch: 11 Global Step: 114570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:46:51,425-Speed 5464.25 samples/sec Loss 5.0354 LearningRate 0.0666 Epoch: 11 Global Step: 114580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:46:58,979-Speed 5423.55 samples/sec Loss 5.0103 LearningRate 0.0665 Epoch: 11 Global Step: 114590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:47:06,417-Speed 5507.81 samples/sec Loss 4.9970 LearningRate 0.0665 Epoch: 11 Global Step: 114600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:47:13,842-Speed 5516.74 samples/sec Loss 4.9442 LearningRate 0.0665 Epoch: 11 Global Step: 114610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:47:21,326-Speed 5474.14 samples/sec Loss 4.9597 LearningRate 0.0665 Epoch: 11 Global Step: 114620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:47:28,800-Speed 5481.11 samples/sec Loss 5.0306 LearningRate 0.0665 Epoch: 11 Global Step: 114630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:47:36,256-Speed 5493.82 samples/sec Loss 4.9958 LearningRate 0.0665 Epoch: 11 Global Step: 114640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:47:43,720-Speed 5489.11 samples/sec Loss 4.9818 LearningRate 0.0665 Epoch: 11 Global Step: 114650 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:47:51,169-Speed 5499.19 samples/sec Loss 5.0118 LearningRate 0.0664 Epoch: 11 Global Step: 114660 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:47:58,612-Speed 5504.02 samples/sec Loss 4.9889 LearningRate 0.0664 Epoch: 11 Global Step: 114670 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:48:06,170-Speed 5419.77 samples/sec Loss 5.0071 LearningRate 0.0664 Epoch: 11 Global Step: 114680 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:48:13,746-Speed 5407.64 samples/sec Loss 4.9976 LearningRate 0.0664 Epoch: 11 Global Step: 114690 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:48:21,280-Speed 5437.38 samples/sec Loss 4.9682 LearningRate 0.0664 Epoch: 11 Global Step: 114700 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:48:28,711-Speed 5512.82 samples/sec Loss 5.0013 LearningRate 0.0664 Epoch: 11 Global Step: 114710 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:48:36,304-Speed 5394.72 samples/sec Loss 4.9767 LearningRate 0.0664 Epoch: 11 Global Step: 114720 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:48:43,825-Speed 5446.74 samples/sec Loss 4.9683 LearningRate 0.0663 Epoch: 11 Global Step: 114730 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:48:51,315-Speed 5469.49 samples/sec Loss 5.0183 LearningRate 0.0663 Epoch: 11 Global Step: 114740 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:48:58,844-Speed 5441.23 samples/sec Loss 5.0169 LearningRate 0.0663 Epoch: 11 Global Step: 114750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:49:06,361-Speed 5450.10 samples/sec Loss 4.9905 LearningRate 0.0663 Epoch: 11 Global Step: 114760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:49:13,894-Speed 5437.65 samples/sec Loss 4.9788 LearningRate 0.0663 Epoch: 11 Global Step: 114770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:49:21,458-Speed 5415.98 samples/sec Loss 4.9860 LearningRate 0.0663 Epoch: 11 Global Step: 114780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:49:28,942-Speed 5473.40 samples/sec Loss 4.9623 LearningRate 0.0663 Epoch: 11 Global Step: 114790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:49:36,385-Speed 5504.37 samples/sec Loss 5.0557 LearningRate 0.0662 Epoch: 11 Global Step: 114800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:49:43,799-Speed 5524.81 samples/sec Loss 5.0233 LearningRate 0.0662 Epoch: 11 Global Step: 114810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:49:51,287-Speed 5471.42 samples/sec Loss 4.9641 LearningRate 0.0662 Epoch: 11 Global Step: 114820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:49:58,769-Speed 5475.31 samples/sec Loss 4.9581 LearningRate 0.0662 Epoch: 11 Global Step: 114830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:50:06,204-Speed 5509.67 samples/sec Loss 4.9797 LearningRate 0.0662 Epoch: 11 Global Step: 114840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:50:13,692-Speed 5470.59 samples/sec Loss 5.0229 LearningRate 0.0662 Epoch: 11 Global Step: 114850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:50:21,200-Speed 5456.25 samples/sec Loss 4.9704 LearningRate 0.0662 Epoch: 11 Global Step: 114860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:50:28,736-Speed 5436.09 samples/sec Loss 5.0139 LearningRate 0.0661 Epoch: 11 Global Step: 114870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:50:36,194-Speed 5493.32 samples/sec Loss 5.0326 LearningRate 0.0661 Epoch: 11 Global Step: 114880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:50:43,627-Speed 5511.14 samples/sec Loss 5.0253 LearningRate 0.0661 Epoch: 11 Global Step: 114890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:50:51,145-Speed 5448.84 samples/sec Loss 4.9310 LearningRate 0.0661 Epoch: 11 Global Step: 114900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:50:58,654-Speed 5455.65 samples/sec Loss 4.9672 LearningRate 0.0661 Epoch: 11 Global Step: 114910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:51:06,103-Speed 5498.93 samples/sec Loss 5.0189 LearningRate 0.0661 Epoch: 11 Global Step: 114920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:51:13,724-Speed 5375.69 samples/sec Loss 4.9731 LearningRate 0.0661 Epoch: 11 Global Step: 114930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:51:21,286-Speed 5417.46 samples/sec Loss 4.9492 LearningRate 0.0660 Epoch: 11 Global Step: 114940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:51:28,787-Speed 5460.63 samples/sec Loss 4.9486 LearningRate 0.0660 Epoch: 11 Global Step: 114950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:51:36,265-Speed 5478.32 samples/sec Loss 5.0150 LearningRate 0.0660 Epoch: 11 Global Step: 114960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:51:43,779-Speed 5452.16 samples/sec Loss 5.0035 LearningRate 0.0660 Epoch: 11 Global Step: 114970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:51:51,269-Speed 5468.90 samples/sec Loss 4.9817 LearningRate 0.0660 Epoch: 11 Global Step: 114980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:51:58,852-Speed 5402.71 samples/sec Loss 4.9543 LearningRate 0.0660 Epoch: 11 Global Step: 114990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:52:06,350-Speed 5462.88 samples/sec Loss 5.0105 LearningRate 0.0660 Epoch: 11 Global Step: 115000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:52:13,861-Speed 5454.87 samples/sec Loss 4.9634 LearningRate 0.0659 Epoch: 11 Global Step: 115010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:52:21,326-Speed 5487.02 samples/sec Loss 4.9829 LearningRate 0.0659 Epoch: 11 Global Step: 115020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:52:28,760-Speed 5510.54 samples/sec Loss 5.0173 LearningRate 0.0659 Epoch: 11 Global Step: 115030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:52:36,211-Speed 5498.69 samples/sec Loss 4.9668 LearningRate 0.0659 Epoch: 11 Global Step: 115040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:52:43,587-Speed 5553.66 samples/sec Loss 4.9903 LearningRate 0.0659 Epoch: 11 Global Step: 115050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:52:51,070-Speed 5474.29 samples/sec Loss 4.9878 LearningRate 0.0659 Epoch: 11 Global Step: 115060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:52:58,574-Speed 5458.87 samples/sec Loss 4.9830 LearningRate 0.0659 Epoch: 11 Global Step: 115070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:53:06,016-Speed 5504.95 samples/sec Loss 4.9111 LearningRate 0.0658 Epoch: 11 Global Step: 115080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:53:13,422-Speed 5531.50 samples/sec Loss 5.0044 LearningRate 0.0658 Epoch: 11 Global Step: 115090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:53:20,892-Speed 5483.70 samples/sec Loss 5.0019 LearningRate 0.0658 Epoch: 11 Global Step: 115100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:53:28,379-Speed 5471.79 samples/sec Loss 5.0080 LearningRate 0.0658 Epoch: 11 Global Step: 115110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:53:35,850-Speed 5483.02 samples/sec Loss 5.0452 LearningRate 0.0658 Epoch: 11 Global Step: 115120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:53:43,283-Speed 5511.62 samples/sec Loss 4.9887 LearningRate 0.0658 Epoch: 11 Global Step: 115130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:53:50,766-Speed 5474.23 samples/sec Loss 4.9519 LearningRate 0.0658 Epoch: 11 Global Step: 115140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:53:58,214-Speed 5500.39 samples/sec Loss 4.9795 LearningRate 0.0657 Epoch: 11 Global Step: 115150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:54:05,629-Speed 5524.71 samples/sec Loss 4.9970 LearningRate 0.0657 Epoch: 11 Global Step: 115160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:54:13,068-Speed 5506.76 samples/sec Loss 4.9740 LearningRate 0.0657 Epoch: 11 Global Step: 115170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:54:20,571-Speed 5460.10 samples/sec Loss 5.0148 LearningRate 0.0657 Epoch: 11 Global Step: 115180 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:54:28,026-Speed 5495.04 samples/sec Loss 4.9510 LearningRate 0.0657 Epoch: 11 Global Step: 115190 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:54:35,499-Speed 5481.66 samples/sec Loss 4.9980 LearningRate 0.0657 Epoch: 11 Global Step: 115200 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:54:42,942-Speed 5503.80 samples/sec Loss 4.9937 LearningRate 0.0657 Epoch: 11 Global Step: 115210 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:54:50,457-Speed 5451.63 samples/sec Loss 5.0076 LearningRate 0.0656 Epoch: 11 Global Step: 115220 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:54:57,897-Speed 5505.92 samples/sec Loss 4.9858 LearningRate 0.0656 Epoch: 11 Global Step: 115230 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:55:05,372-Speed 5480.11 samples/sec Loss 4.9530 LearningRate 0.0656 Epoch: 11 Global Step: 115240 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:55:12,798-Speed 5517.04 samples/sec Loss 4.9736 LearningRate 0.0656 Epoch: 11 Global Step: 115250 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:55:20,292-Speed 5466.18 samples/sec Loss 4.9651 LearningRate 0.0656 Epoch: 11 Global Step: 115260 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:55:27,751-Speed 5491.72 samples/sec Loss 4.9436 LearningRate 0.0656 Epoch: 11 Global Step: 115270 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 20:55:35,190-Speed 5507.32 samples/sec Loss 4.9528 LearningRate 0.0656 Epoch: 11 Global Step: 115280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:55:42,666-Speed 5479.57 samples/sec Loss 4.9605 LearningRate 0.0655 Epoch: 11 Global Step: 115290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:55:50,111-Speed 5502.70 samples/sec Loss 5.0041 LearningRate 0.0655 Epoch: 11 Global Step: 115300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:55:57,535-Speed 5517.43 samples/sec Loss 4.9596 LearningRate 0.0655 Epoch: 11 Global Step: 115310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:56:05,036-Speed 5461.48 samples/sec Loss 4.9898 LearningRate 0.0655 Epoch: 11 Global Step: 115320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:56:12,526-Speed 5469.49 samples/sec Loss 4.9739 LearningRate 0.0655 Epoch: 11 Global Step: 115330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:56:20,004-Speed 5477.90 samples/sec Loss 4.9493 LearningRate 0.0655 Epoch: 11 Global Step: 115340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:56:27,476-Speed 5482.47 samples/sec Loss 4.9565 LearningRate 0.0655 Epoch: 11 Global Step: 115350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:56:34,960-Speed 5473.75 samples/sec Loss 4.9533 LearningRate 0.0654 Epoch: 11 Global Step: 115360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:56:42,519-Speed 5419.82 samples/sec Loss 4.9114 LearningRate 0.0654 Epoch: 11 Global Step: 115370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:56:50,071-Speed 5423.97 samples/sec Loss 4.9720 LearningRate 0.0654 Epoch: 11 Global Step: 115380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:56:57,513-Speed 5504.62 samples/sec Loss 4.9400 LearningRate 0.0654 Epoch: 11 Global Step: 115390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:57:04,973-Speed 5490.91 samples/sec Loss 4.9152 LearningRate 0.0654 Epoch: 11 Global Step: 115400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:57:12,402-Speed 5514.80 samples/sec Loss 4.8933 LearningRate 0.0654 Epoch: 11 Global Step: 115410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:57:19,858-Speed 5494.47 samples/sec Loss 4.9304 LearningRate 0.0654 Epoch: 11 Global Step: 115420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:57:27,299-Speed 5505.38 samples/sec Loss 4.9344 LearningRate 0.0653 Epoch: 11 Global Step: 115430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:57:34,816-Speed 5449.85 samples/sec Loss 4.9356 LearningRate 0.0653 Epoch: 11 Global Step: 115440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:57:42,290-Speed 5480.95 samples/sec Loss 4.9735 LearningRate 0.0653 Epoch: 11 Global Step: 115450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:57:49,800-Speed 5454.58 samples/sec Loss 4.9698 LearningRate 0.0653 Epoch: 11 Global Step: 115460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:57:57,368-Speed 5413.38 samples/sec Loss 4.9515 LearningRate 0.0653 Epoch: 11 Global Step: 115470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:58:04,884-Speed 5450.02 samples/sec Loss 4.9664 LearningRate 0.0653 Epoch: 11 Global Step: 115480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:58:12,349-Speed 5487.97 samples/sec Loss 4.9845 LearningRate 0.0653 Epoch: 11 Global Step: 115490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:58:19,832-Speed 5474.49 samples/sec Loss 4.9425 LearningRate 0.0653 Epoch: 11 Global Step: 115500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:58:27,256-Speed 5518.22 samples/sec Loss 4.9635 LearningRate 0.0652 Epoch: 11 Global Step: 115510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:58:34,744-Speed 5470.55 samples/sec Loss 4.8839 LearningRate 0.0652 Epoch: 11 Global Step: 115520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:58:42,186-Speed 5504.70 samples/sec Loss 4.9965 LearningRate 0.0652 Epoch: 11 Global Step: 115530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:58:49,649-Speed 5488.87 samples/sec Loss 4.9946 LearningRate 0.0652 Epoch: 11 Global Step: 115540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:58:57,140-Speed 5468.68 samples/sec Loss 4.9078 LearningRate 0.0652 Epoch: 11 Global Step: 115550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:59:04,620-Speed 5476.58 samples/sec Loss 4.9597 LearningRate 0.0652 Epoch: 11 Global Step: 115560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:59:12,107-Speed 5471.81 samples/sec Loss 4.9413 LearningRate 0.0652 Epoch: 11 Global Step: 115570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:59:19,684-Speed 5406.27 samples/sec Loss 4.9427 LearningRate 0.0651 Epoch: 11 Global Step: 115580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:59:27,205-Speed 5447.31 samples/sec Loss 4.9425 LearningRate 0.0651 Epoch: 11 Global Step: 115590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:59:34,613-Speed 5529.49 samples/sec Loss 4.9179 LearningRate 0.0651 Epoch: 11 Global Step: 115600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:59:42,062-Speed 5499.27 samples/sec Loss 4.9299 LearningRate 0.0651 Epoch: 11 Global Step: 115610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 20:59:49,505-Speed 5504.60 samples/sec Loss 4.9656 LearningRate 0.0651 Epoch: 11 Global Step: 115620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 20:59:56,965-Speed 5491.01 samples/sec Loss 4.9835 LearningRate 0.0651 Epoch: 11 Global Step: 115630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:00:04,451-Speed 5472.03 samples/sec Loss 4.9165 LearningRate 0.0651 Epoch: 11 Global Step: 115640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:00:11,896-Speed 5502.65 samples/sec Loss 4.9345 LearningRate 0.0650 Epoch: 11 Global Step: 115650 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:00:19,337-Speed 5505.07 samples/sec Loss 4.9567 LearningRate 0.0650 Epoch: 11 Global Step: 115660 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:00:26,865-Speed 5442.30 samples/sec Loss 4.9206 LearningRate 0.0650 Epoch: 11 Global Step: 115670 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:00:34,394-Speed 5440.43 samples/sec Loss 4.9978 LearningRate 0.0650 Epoch: 11 Global Step: 115680 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:00:41,970-Speed 5407.81 samples/sec Loss 4.9260 LearningRate 0.0650 Epoch: 11 Global Step: 115690 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:00:49,443-Speed 5481.78 samples/sec Loss 4.9486 LearningRate 0.0650 Epoch: 11 Global Step: 115700 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:00:56,903-Speed 5490.81 samples/sec Loss 4.9743 LearningRate 0.0650 Epoch: 11 Global Step: 115710 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:01:04,391-Speed 5471.10 samples/sec Loss 4.8868 LearningRate 0.0649 Epoch: 11 Global Step: 115720 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:01:11,812-Speed 5520.07 samples/sec Loss 4.9022 LearningRate 0.0649 Epoch: 11 Global Step: 115730 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:01:19,256-Speed 5503.05 samples/sec Loss 4.9634 LearningRate 0.0649 Epoch: 11 Global Step: 115740 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:01:26,762-Speed 5457.50 samples/sec Loss 4.8932 LearningRate 0.0649 Epoch: 11 Global Step: 115750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:01:34,280-Speed 5449.40 samples/sec Loss 4.9692 LearningRate 0.0649 Epoch: 11 Global Step: 115760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:01:41,790-Speed 5454.35 samples/sec Loss 4.8988 LearningRate 0.0649 Epoch: 11 Global Step: 115770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:01:49,337-Speed 5428.37 samples/sec Loss 4.9084 LearningRate 0.0649 Epoch: 11 Global Step: 115780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:01:56,877-Speed 5433.24 samples/sec Loss 4.9437 LearningRate 0.0648 Epoch: 11 Global Step: 115790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:02:04,483-Speed 5385.87 samples/sec Loss 4.9539 LearningRate 0.0648 Epoch: 11 Global Step: 115800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:02:11,940-Speed 5492.78 samples/sec Loss 4.9816 LearningRate 0.0648 Epoch: 11 Global Step: 115810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:02:19,349-Speed 5529.40 samples/sec Loss 4.9540 LearningRate 0.0648 Epoch: 11 Global Step: 115820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:02:26,836-Speed 5471.51 samples/sec Loss 4.9819 LearningRate 0.0648 Epoch: 11 Global Step: 115830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:02:34,313-Speed 5479.28 samples/sec Loss 4.8895 LearningRate 0.0648 Epoch: 11 Global Step: 115840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:02:41,814-Speed 5461.33 samples/sec Loss 4.9664 LearningRate 0.0648 Epoch: 11 Global Step: 115850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:02:49,292-Speed 5478.06 samples/sec Loss 4.9157 LearningRate 0.0647 Epoch: 11 Global Step: 115860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:02:56,759-Speed 5486.20 samples/sec Loss 4.9460 LearningRate 0.0647 Epoch: 11 Global Step: 115870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:03:04,308-Speed 5426.33 samples/sec Loss 4.9830 LearningRate 0.0647 Epoch: 11 Global Step: 115880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:03:11,831-Speed 5445.47 samples/sec Loss 4.9563 LearningRate 0.0647 Epoch: 11 Global Step: 115890 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:03:19,448-Speed 5378.08 samples/sec Loss 4.9340 LearningRate 0.0647 Epoch: 11 Global Step: 115900 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:03:27,069-Speed 5375.64 samples/sec Loss 4.9565 LearningRate 0.0647 Epoch: 11 Global Step: 115910 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:03:34,523-Speed 5495.84 samples/sec Loss 4.9310 LearningRate 0.0647 Epoch: 11 Global Step: 115920 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:03:41,988-Speed 5487.20 samples/sec Loss 4.9199 LearningRate 0.0646 Epoch: 11 Global Step: 115930 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:03:49,473-Speed 5473.14 samples/sec Loss 4.8589 LearningRate 0.0646 Epoch: 11 Global Step: 115940 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:03:57,038-Speed 5414.77 samples/sec Loss 4.8849 LearningRate 0.0646 Epoch: 11 Global Step: 115950 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:04:04,534-Speed 5465.39 samples/sec Loss 4.9086 LearningRate 0.0646 Epoch: 11 Global Step: 115960 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:04:12,044-Speed 5454.89 samples/sec Loss 4.9009 LearningRate 0.0646 Epoch: 11 Global Step: 115970 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:04:19,525-Speed 5475.72 samples/sec Loss 4.9285 LearningRate 0.0646 Epoch: 11 Global Step: 115980 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:04:27,071-Speed 5429.14 samples/sec Loss 4.9422 LearningRate 0.0646 Epoch: 11 Global Step: 115990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:04:34,656-Speed 5401.01 samples/sec Loss 4.8840 LearningRate 0.0645 Epoch: 11 Global Step: 116000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:05:18,372-[lfw][116000]XNorm: 22.277321 Training: 2022-01-08 21:05:18,372-[lfw][116000]Accuracy-Flip: 0.99783+-0.00279 Training: 2022-01-08 21:05:18,373-[lfw][116000]Accuracy-Highest: 0.99817 Training: 2022-01-08 21:06:09,474-[cfp_fp][116000]XNorm: 20.367996 Training: 2022-01-08 21:06:09,474-[cfp_fp][116000]Accuracy-Flip: 0.98986+-0.00621 Training: 2022-01-08 21:06:09,475-[cfp_fp][116000]Accuracy-Highest: 0.99057 Training: 2022-01-08 21:06:53,770-[agedb_30][116000]XNorm: 22.038671 Training: 2022-01-08 21:06:53,771-[agedb_30][116000]Accuracy-Flip: 0.97550+-0.00796 Training: 2022-01-08 21:06:53,772-[agedb_30][116000]Accuracy-Highest: 0.97917 Training: 2022-01-08 21:07:00,982-Speed 279.92 samples/sec Loss 4.9729 LearningRate 0.0645 Epoch: 11 Global Step: 116010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:07:08,565-Speed 5401.90 samples/sec Loss 4.9270 LearningRate 0.0645 Epoch: 11 Global Step: 116020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:07:16,041-Speed 5479.64 samples/sec Loss 4.8882 LearningRate 0.0645 Epoch: 11 Global Step: 116030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:07:23,585-Speed 5431.38 samples/sec Loss 4.9377 LearningRate 0.0645 Epoch: 11 Global Step: 116040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:07:31,070-Speed 5473.61 samples/sec Loss 4.9531 LearningRate 0.0645 Epoch: 11 Global Step: 116050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:07:38,630-Speed 5419.76 samples/sec Loss 4.9500 LearningRate 0.0645 Epoch: 11 Global Step: 116060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:07:46,104-Speed 5480.89 samples/sec Loss 4.9379 LearningRate 0.0644 Epoch: 11 Global Step: 116070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:07:53,563-Speed 5492.59 samples/sec Loss 4.9239 LearningRate 0.0644 Epoch: 11 Global Step: 116080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:08:01,055-Speed 5467.57 samples/sec Loss 4.9721 LearningRate 0.0644 Epoch: 11 Global Step: 116090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:08:08,550-Speed 5466.04 samples/sec Loss 4.9267 LearningRate 0.0644 Epoch: 11 Global Step: 116100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:08:16,040-Speed 5469.67 samples/sec Loss 4.9075 LearningRate 0.0644 Epoch: 11 Global Step: 116110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:08:23,479-Speed 5506.88 samples/sec Loss 4.8758 LearningRate 0.0644 Epoch: 11 Global Step: 116120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:08:30,931-Speed 5497.08 samples/sec Loss 4.9073 LearningRate 0.0644 Epoch: 11 Global Step: 116130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:08:38,474-Speed 5430.85 samples/sec Loss 4.9308 LearningRate 0.0643 Epoch: 11 Global Step: 116140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:08:45,948-Speed 5480.70 samples/sec Loss 4.9161 LearningRate 0.0643 Epoch: 11 Global Step: 116150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:08:53,453-Speed 5463.03 samples/sec Loss 4.8775 LearningRate 0.0643 Epoch: 11 Global Step: 116160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:09:00,995-Speed 5431.26 samples/sec Loss 4.9036 LearningRate 0.0643 Epoch: 11 Global Step: 116170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:09:08,456-Speed 5491.00 samples/sec Loss 4.8871 LearningRate 0.0643 Epoch: 11 Global Step: 116180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:09:15,894-Speed 5506.74 samples/sec Loss 4.8960 LearningRate 0.0643 Epoch: 11 Global Step: 116190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:09:23,358-Speed 5489.03 samples/sec Loss 4.9077 LearningRate 0.0643 Epoch: 11 Global Step: 116200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:09:30,801-Speed 5503.76 samples/sec Loss 4.9072 LearningRate 0.0642 Epoch: 11 Global Step: 116210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:09:38,261-Speed 5491.33 samples/sec Loss 4.8598 LearningRate 0.0642 Epoch: 11 Global Step: 116220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:09:45,708-Speed 5500.70 samples/sec Loss 4.8525 LearningRate 0.0642 Epoch: 11 Global Step: 116230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:09:53,246-Speed 5434.75 samples/sec Loss 4.8893 LearningRate 0.0642 Epoch: 11 Global Step: 116240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:10:00,696-Speed 5499.27 samples/sec Loss 4.9590 LearningRate 0.0642 Epoch: 11 Global Step: 116250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:10:08,221-Speed 5443.86 samples/sec Loss 4.8654 LearningRate 0.0642 Epoch: 11 Global Step: 116260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:10:15,688-Speed 5486.34 samples/sec Loss 4.8911 LearningRate 0.0642 Epoch: 11 Global Step: 116270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:10:23,188-Speed 5461.22 samples/sec Loss 4.9235 LearningRate 0.0641 Epoch: 11 Global Step: 116280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:10:30,634-Speed 5502.36 samples/sec Loss 4.9029 LearningRate 0.0641 Epoch: 11 Global Step: 116290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:10:38,101-Speed 5486.61 samples/sec Loss 4.9118 LearningRate 0.0641 Epoch: 11 Global Step: 116300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:10:45,555-Speed 5495.21 samples/sec Loss 4.8929 LearningRate 0.0641 Epoch: 11 Global Step: 116310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:10:53,135-Speed 5404.77 samples/sec Loss 4.9213 LearningRate 0.0641 Epoch: 11 Global Step: 116320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:11:00,666-Speed 5439.70 samples/sec Loss 4.8730 LearningRate 0.0641 Epoch: 11 Global Step: 116330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:11:08,194-Speed 5441.69 samples/sec Loss 4.9035 LearningRate 0.0641 Epoch: 11 Global Step: 116340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:11:15,794-Speed 5390.17 samples/sec Loss 4.9561 LearningRate 0.0640 Epoch: 11 Global Step: 116350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:11:23,242-Speed 5499.76 samples/sec Loss 4.9497 LearningRate 0.0640 Epoch: 11 Global Step: 116360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:11:30,767-Speed 5444.57 samples/sec Loss 4.9511 LearningRate 0.0640 Epoch: 11 Global Step: 116370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:11:38,215-Speed 5499.66 samples/sec Loss 4.9061 LearningRate 0.0640 Epoch: 11 Global Step: 116380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:11:45,743-Speed 5441.93 samples/sec Loss 4.9431 LearningRate 0.0640 Epoch: 11 Global Step: 116390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:11:53,237-Speed 5466.78 samples/sec Loss 4.9059 LearningRate 0.0640 Epoch: 11 Global Step: 116400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:12:00,735-Speed 5463.31 samples/sec Loss 4.8964 LearningRate 0.0640 Epoch: 11 Global Step: 116410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:12:08,290-Speed 5422.16 samples/sec Loss 4.9514 LearningRate 0.0640 Epoch: 11 Global Step: 116420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:12:15,812-Speed 5446.35 samples/sec Loss 4.8788 LearningRate 0.0639 Epoch: 11 Global Step: 116430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:12:23,255-Speed 5503.77 samples/sec Loss 4.9060 LearningRate 0.0639 Epoch: 11 Global Step: 116440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:12:30,730-Speed 5480.14 samples/sec Loss 4.9326 LearningRate 0.0639 Epoch: 11 Global Step: 116450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:12:38,258-Speed 5442.43 samples/sec Loss 4.9055 LearningRate 0.0639 Epoch: 11 Global Step: 116460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:12:45,759-Speed 5460.44 samples/sec Loss 4.9270 LearningRate 0.0639 Epoch: 11 Global Step: 116470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:12:53,296-Speed 5435.48 samples/sec Loss 4.9534 LearningRate 0.0639 Epoch: 11 Global Step: 116480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:13:00,780-Speed 5473.94 samples/sec Loss 4.9288 LearningRate 0.0639 Epoch: 11 Global Step: 116490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:13:08,351-Speed 5412.56 samples/sec Loss 4.9029 LearningRate 0.0638 Epoch: 11 Global Step: 116500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:13:15,904-Speed 5423.22 samples/sec Loss 4.9015 LearningRate 0.0638 Epoch: 11 Global Step: 116510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:13:23,381-Speed 5479.09 samples/sec Loss 4.9191 LearningRate 0.0638 Epoch: 11 Global Step: 116520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:13:30,833-Speed 5496.86 samples/sec Loss 4.8863 LearningRate 0.0638 Epoch: 11 Global Step: 116530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:13:38,369-Speed 5436.46 samples/sec Loss 4.9091 LearningRate 0.0638 Epoch: 11 Global Step: 116540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:13:45,863-Speed 5466.82 samples/sec Loss 4.8996 LearningRate 0.0638 Epoch: 11 Global Step: 116550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:13:53,326-Speed 5488.37 samples/sec Loss 4.9121 LearningRate 0.0638 Epoch: 11 Global Step: 116560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:14:00,899-Speed 5409.99 samples/sec Loss 4.8932 LearningRate 0.0637 Epoch: 11 Global Step: 116570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:14:08,360-Speed 5490.47 samples/sec Loss 4.8736 LearningRate 0.0637 Epoch: 11 Global Step: 116580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:14:15,886-Speed 5442.87 samples/sec Loss 4.8877 LearningRate 0.0637 Epoch: 11 Global Step: 116590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:14:23,442-Speed 5421.86 samples/sec Loss 4.9216 LearningRate 0.0637 Epoch: 11 Global Step: 116600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:14:30,881-Speed 5507.21 samples/sec Loss 4.9054 LearningRate 0.0637 Epoch: 11 Global Step: 116610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:14:38,285-Speed 5532.55 samples/sec Loss 4.8632 LearningRate 0.0637 Epoch: 11 Global Step: 116620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:14:45,750-Speed 5487.77 samples/sec Loss 4.8807 LearningRate 0.0637 Epoch: 11 Global Step: 116630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:14:53,203-Speed 5496.33 samples/sec Loss 4.8686 LearningRate 0.0636 Epoch: 11 Global Step: 116640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:15:00,645-Speed 5505.22 samples/sec Loss 4.8818 LearningRate 0.0636 Epoch: 11 Global Step: 116650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:15:08,054-Speed 5528.80 samples/sec Loss 4.8932 LearningRate 0.0636 Epoch: 11 Global Step: 116660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:15:15,511-Speed 5493.75 samples/sec Loss 4.8688 LearningRate 0.0636 Epoch: 11 Global Step: 116670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:15:22,961-Speed 5498.65 samples/sec Loss 4.8785 LearningRate 0.0636 Epoch: 11 Global Step: 116680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:15:30,428-Speed 5485.97 samples/sec Loss 4.9369 LearningRate 0.0636 Epoch: 11 Global Step: 116690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:15:37,903-Speed 5481.17 samples/sec Loss 4.8874 LearningRate 0.0636 Epoch: 11 Global Step: 116700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:15:45,348-Speed 5501.53 samples/sec Loss 4.9091 LearningRate 0.0635 Epoch: 11 Global Step: 116710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:15:52,783-Speed 5510.07 samples/sec Loss 4.8604 LearningRate 0.0635 Epoch: 11 Global Step: 116720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:16:00,237-Speed 5496.04 samples/sec Loss 4.8975 LearningRate 0.0635 Epoch: 11 Global Step: 116730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:16:07,746-Speed 5455.47 samples/sec Loss 4.8324 LearningRate 0.0635 Epoch: 11 Global Step: 116740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:16:15,254-Speed 5456.43 samples/sec Loss 4.8814 LearningRate 0.0635 Epoch: 11 Global Step: 116750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:16:22,672-Speed 5522.06 samples/sec Loss 4.9180 LearningRate 0.0635 Epoch: 11 Global Step: 116760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:16:30,071-Speed 5536.93 samples/sec Loss 4.8511 LearningRate 0.0635 Epoch: 11 Global Step: 116770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:16:37,572-Speed 5461.45 samples/sec Loss 4.8140 LearningRate 0.0634 Epoch: 11 Global Step: 116780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:16:44,951-Speed 5551.33 samples/sec Loss 4.8897 LearningRate 0.0634 Epoch: 11 Global Step: 116790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:16:52,441-Speed 5469.74 samples/sec Loss 4.9275 LearningRate 0.0634 Epoch: 11 Global Step: 116800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:16:59,942-Speed 5461.22 samples/sec Loss 4.9040 LearningRate 0.0634 Epoch: 11 Global Step: 116810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:17:07,415-Speed 5481.98 samples/sec Loss 4.9297 LearningRate 0.0634 Epoch: 11 Global Step: 116820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:17:14,960-Speed 5429.66 samples/sec Loss 4.9058 LearningRate 0.0634 Epoch: 11 Global Step: 116830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:17:22,402-Speed 5504.47 samples/sec Loss 4.8632 LearningRate 0.0634 Epoch: 11 Global Step: 116840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:17:29,959-Speed 5421.05 samples/sec Loss 4.8569 LearningRate 0.0633 Epoch: 11 Global Step: 116850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:17:37,526-Speed 5413.57 samples/sec Loss 4.8284 LearningRate 0.0633 Epoch: 11 Global Step: 116860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:17:45,082-Speed 5421.46 samples/sec Loss 4.8871 LearningRate 0.0633 Epoch: 11 Global Step: 116870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:17:52,623-Speed 5431.91 samples/sec Loss 4.8568 LearningRate 0.0633 Epoch: 11 Global Step: 116880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:18:00,188-Speed 5415.03 samples/sec Loss 4.9733 LearningRate 0.0633 Epoch: 11 Global Step: 116890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:18:07,710-Speed 5446.09 samples/sec Loss 4.9544 LearningRate 0.0633 Epoch: 11 Global Step: 116900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:18:15,230-Speed 5447.92 samples/sec Loss 4.8511 LearningRate 0.0633 Epoch: 11 Global Step: 116910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:18:22,752-Speed 5445.73 samples/sec Loss 4.9006 LearningRate 0.0632 Epoch: 11 Global Step: 116920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:18:30,355-Speed 5388.24 samples/sec Loss 4.8859 LearningRate 0.0632 Epoch: 11 Global Step: 116930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:18:37,857-Speed 5460.77 samples/sec Loss 4.8710 LearningRate 0.0632 Epoch: 11 Global Step: 116940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:18:45,373-Speed 5450.52 samples/sec Loss 4.8678 LearningRate 0.0632 Epoch: 11 Global Step: 116950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:18:52,821-Speed 5499.84 samples/sec Loss 4.8482 LearningRate 0.0632 Epoch: 11 Global Step: 116960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:19:00,242-Speed 5520.21 samples/sec Loss 4.9201 LearningRate 0.0632 Epoch: 11 Global Step: 116970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:19:07,751-Speed 5455.81 samples/sec Loss 4.8699 LearningRate 0.0632 Epoch: 11 Global Step: 116980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:19:15,212-Speed 5490.30 samples/sec Loss 4.8789 LearningRate 0.0632 Epoch: 11 Global Step: 116990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:19:22,682-Speed 5484.09 samples/sec Loss 4.9084 LearningRate 0.0631 Epoch: 11 Global Step: 117000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:19:30,144-Speed 5490.27 samples/sec Loss 4.8932 LearningRate 0.0631 Epoch: 11 Global Step: 117010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:19:37,635-Speed 5468.81 samples/sec Loss 4.9003 LearningRate 0.0631 Epoch: 11 Global Step: 117020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:19:45,088-Speed 5496.62 samples/sec Loss 4.8783 LearningRate 0.0631 Epoch: 11 Global Step: 117030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:19:52,607-Speed 5447.76 samples/sec Loss 4.8698 LearningRate 0.0631 Epoch: 11 Global Step: 117040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:20:00,124-Speed 5449.35 samples/sec Loss 4.8911 LearningRate 0.0631 Epoch: 11 Global Step: 117050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:20:07,586-Speed 5490.14 samples/sec Loss 4.8269 LearningRate 0.0631 Epoch: 11 Global Step: 117060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:20:15,130-Speed 5430.57 samples/sec Loss 4.8987 LearningRate 0.0630 Epoch: 11 Global Step: 117070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:20:22,693-Speed 5416.58 samples/sec Loss 4.8464 LearningRate 0.0630 Epoch: 11 Global Step: 117080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:20:30,181-Speed 5470.63 samples/sec Loss 4.8062 LearningRate 0.0630 Epoch: 11 Global Step: 117090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:20:37,675-Speed 5465.83 samples/sec Loss 4.8945 LearningRate 0.0630 Epoch: 11 Global Step: 117100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:20:45,284-Speed 5384.12 samples/sec Loss 4.8137 LearningRate 0.0630 Epoch: 11 Global Step: 117110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:20:52,855-Speed 5411.48 samples/sec Loss 4.8606 LearningRate 0.0630 Epoch: 11 Global Step: 117120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:21:00,349-Speed 5465.81 samples/sec Loss 4.8195 LearningRate 0.0630 Epoch: 11 Global Step: 117130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:21:07,858-Speed 5455.53 samples/sec Loss 4.8928 LearningRate 0.0629 Epoch: 11 Global Step: 117140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:21:15,330-Speed 5482.64 samples/sec Loss 4.8668 LearningRate 0.0629 Epoch: 11 Global Step: 117150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:21:22,826-Speed 5465.50 samples/sec Loss 4.8149 LearningRate 0.0629 Epoch: 11 Global Step: 117160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:21:30,352-Speed 5442.63 samples/sec Loss 4.9023 LearningRate 0.0629 Epoch: 11 Global Step: 117170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:21:37,834-Speed 5475.25 samples/sec Loss 4.9190 LearningRate 0.0629 Epoch: 11 Global Step: 117180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:21:45,285-Speed 5498.54 samples/sec Loss 4.8276 LearningRate 0.0629 Epoch: 11 Global Step: 117190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:21:52,807-Speed 5446.15 samples/sec Loss 4.9002 LearningRate 0.0629 Epoch: 11 Global Step: 117200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:22:00,308-Speed 5460.48 samples/sec Loss 4.8892 LearningRate 0.0628 Epoch: 11 Global Step: 117210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:22:07,787-Speed 5477.47 samples/sec Loss 4.8914 LearningRate 0.0628 Epoch: 11 Global Step: 117220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:22:15,305-Speed 5449.90 samples/sec Loss 4.8929 LearningRate 0.0628 Epoch: 11 Global Step: 117230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:22:22,755-Speed 5498.05 samples/sec Loss 4.8602 LearningRate 0.0628 Epoch: 11 Global Step: 117240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:22:30,350-Speed 5394.18 samples/sec Loss 4.8544 LearningRate 0.0628 Epoch: 11 Global Step: 117250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:22:37,929-Speed 5404.77 samples/sec Loss 4.8313 LearningRate 0.0628 Epoch: 11 Global Step: 117260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:22:45,432-Speed 5459.60 samples/sec Loss 4.8600 LearningRate 0.0628 Epoch: 11 Global Step: 117270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:22:52,970-Speed 5434.65 samples/sec Loss 4.9040 LearningRate 0.0627 Epoch: 11 Global Step: 117280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:23:00,481-Speed 5454.05 samples/sec Loss 4.8976 LearningRate 0.0627 Epoch: 11 Global Step: 117290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:23:08,059-Speed 5405.69 samples/sec Loss 4.8614 LearningRate 0.0627 Epoch: 11 Global Step: 117300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:23:15,544-Speed 5473.13 samples/sec Loss 4.7723 LearningRate 0.0627 Epoch: 11 Global Step: 117310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:23:23,025-Speed 5476.01 samples/sec Loss 4.8754 LearningRate 0.0627 Epoch: 11 Global Step: 117320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:23:30,710-Speed 5330.43 samples/sec Loss 4.8449 LearningRate 0.0627 Epoch: 11 Global Step: 117330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:23:38,252-Speed 5431.54 samples/sec Loss 4.8352 LearningRate 0.0627 Epoch: 11 Global Step: 117340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:23:45,817-Speed 5415.65 samples/sec Loss 4.8829 LearningRate 0.0626 Epoch: 11 Global Step: 117350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:23:53,309-Speed 5467.94 samples/sec Loss 4.8694 LearningRate 0.0626 Epoch: 11 Global Step: 117360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:24:00,789-Speed 5475.99 samples/sec Loss 4.8397 LearningRate 0.0626 Epoch: 11 Global Step: 117370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:24:08,264-Speed 5480.48 samples/sec Loss 4.8143 LearningRate 0.0626 Epoch: 11 Global Step: 117380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:24:15,733-Speed 5485.16 samples/sec Loss 4.8402 LearningRate 0.0626 Epoch: 11 Global Step: 117390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:24:23,235-Speed 5460.46 samples/sec Loss 4.7667 LearningRate 0.0626 Epoch: 11 Global Step: 117400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:24:30,743-Speed 5456.59 samples/sec Loss 4.7905 LearningRate 0.0626 Epoch: 11 Global Step: 117410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:24:38,273-Speed 5440.04 samples/sec Loss 4.8602 LearningRate 0.0626 Epoch: 11 Global Step: 117420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:24:45,869-Speed 5392.45 samples/sec Loss 4.8014 LearningRate 0.0625 Epoch: 11 Global Step: 117430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:24:53,362-Speed 5467.45 samples/sec Loss 4.8685 LearningRate 0.0625 Epoch: 11 Global Step: 117440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:25:00,879-Speed 5449.55 samples/sec Loss 4.8617 LearningRate 0.0625 Epoch: 11 Global Step: 117450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:25:08,374-Speed 5466.29 samples/sec Loss 4.8636 LearningRate 0.0625 Epoch: 11 Global Step: 117460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:25:15,934-Speed 5418.59 samples/sec Loss 4.8606 LearningRate 0.0625 Epoch: 11 Global Step: 117470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:25:23,484-Speed 5425.55 samples/sec Loss 4.8550 LearningRate 0.0625 Epoch: 11 Global Step: 117480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:25:30,983-Speed 5463.45 samples/sec Loss 4.7889 LearningRate 0.0625 Epoch: 11 Global Step: 117490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:25:38,491-Speed 5455.36 samples/sec Loss 4.8254 LearningRate 0.0624 Epoch: 11 Global Step: 117500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:25:46,055-Speed 5416.47 samples/sec Loss 4.8895 LearningRate 0.0624 Epoch: 11 Global Step: 117510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:25:53,533-Speed 5478.38 samples/sec Loss 4.8463 LearningRate 0.0624 Epoch: 11 Global Step: 117520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:26:01,026-Speed 5466.72 samples/sec Loss 4.8182 LearningRate 0.0624 Epoch: 11 Global Step: 117530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:26:08,599-Speed 5409.73 samples/sec Loss 4.8976 LearningRate 0.0624 Epoch: 11 Global Step: 117540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:26:16,049-Speed 5498.50 samples/sec Loss 4.8653 LearningRate 0.0624 Epoch: 11 Global Step: 117550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:26:23,556-Speed 5457.04 samples/sec Loss 4.8293 LearningRate 0.0624 Epoch: 11 Global Step: 117560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:26:31,119-Speed 5416.16 samples/sec Loss 4.8603 LearningRate 0.0623 Epoch: 11 Global Step: 117570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:26:38,596-Speed 5479.30 samples/sec Loss 4.8716 LearningRate 0.0623 Epoch: 11 Global Step: 117580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:26:46,075-Speed 5476.87 samples/sec Loss 4.8153 LearningRate 0.0623 Epoch: 11 Global Step: 117590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:26:53,616-Speed 5432.53 samples/sec Loss 4.8818 LearningRate 0.0623 Epoch: 11 Global Step: 117600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:27:01,120-Speed 5459.32 samples/sec Loss 4.7947 LearningRate 0.0623 Epoch: 11 Global Step: 117610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:27:08,632-Speed 5453.27 samples/sec Loss 4.8190 LearningRate 0.0623 Epoch: 11 Global Step: 117620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:27:16,099-Speed 5486.29 samples/sec Loss 4.8385 LearningRate 0.0623 Epoch: 11 Global Step: 117630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:27:23,688-Speed 5398.11 samples/sec Loss 4.7984 LearningRate 0.0622 Epoch: 11 Global Step: 117640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-08 21:27:31,199-Speed 5453.92 samples/sec Loss 4.7937 LearningRate 0.0622 Epoch: 11 Global Step: 117650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:27:38,764-Speed 5415.05 samples/sec Loss 4.8451 LearningRate 0.0622 Epoch: 11 Global Step: 117660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:27:46,263-Speed 5462.47 samples/sec Loss 4.8434 LearningRate 0.0622 Epoch: 11 Global Step: 117670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:27:54,103-Speed 5225.67 samples/sec Loss 4.8343 LearningRate 0.0622 Epoch: 11 Global Step: 117680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:28:01,582-Speed 5476.92 samples/sec Loss 4.8534 LearningRate 0.0622 Epoch: 11 Global Step: 117690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:28:09,240-Speed 5349.92 samples/sec Loss 4.8128 LearningRate 0.0622 Epoch: 11 Global Step: 117700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:28:16,792-Speed 5424.08 samples/sec Loss 4.8037 LearningRate 0.0621 Epoch: 11 Global Step: 117710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:28:24,347-Speed 5422.19 samples/sec Loss 4.8125 LearningRate 0.0621 Epoch: 11 Global Step: 117720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:28:31,857-Speed 5455.28 samples/sec Loss 4.8117 LearningRate 0.0621 Epoch: 11 Global Step: 117730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:28:39,332-Speed 5479.85 samples/sec Loss 4.8622 LearningRate 0.0621 Epoch: 11 Global Step: 117740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:28:46,801-Speed 5484.42 samples/sec Loss 4.8520 LearningRate 0.0621 Epoch: 11 Global Step: 117750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-08 21:28:54,306-Speed 5458.83 samples/sec Loss 4.8600 LearningRate 0.0621 Epoch: 11 Global Step: 117760 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:29:01,809-Speed 5460.08 samples/sec Loss 4.8231 LearningRate 0.0621 Epoch: 11 Global Step: 117770 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:29:09,248-Speed 5507.17 samples/sec Loss 4.8423 LearningRate 0.0621 Epoch: 11 Global Step: 117780 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:29:16,738-Speed 5469.26 samples/sec Loss 4.8127 LearningRate 0.0620 Epoch: 11 Global Step: 117790 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:29:24,197-Speed 5492.33 samples/sec Loss 4.8448 LearningRate 0.0620 Epoch: 11 Global Step: 117800 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:29:31,615-Speed 5522.58 samples/sec Loss 4.8343 LearningRate 0.0620 Epoch: 11 Global Step: 117810 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:29:39,157-Speed 5431.41 samples/sec Loss 4.8630 LearningRate 0.0620 Epoch: 11 Global Step: 117820 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:29:46,783-Speed 5371.97 samples/sec Loss 4.8514 LearningRate 0.0620 Epoch: 11 Global Step: 117830 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-08 21:29:54,390-Speed 5384.96 samples/sec Loss 4.8831 LearningRate 0.0620 Epoch: 11 Global Step: 117840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:30:01,922-Speed 5438.81 samples/sec Loss 4.8969 LearningRate 0.0620 Epoch: 11 Global Step: 117850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:30:09,557-Speed 5365.71 samples/sec Loss 4.8339 LearningRate 0.0619 Epoch: 11 Global Step: 117860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:30:17,168-Speed 5382.11 samples/sec Loss 4.8752 LearningRate 0.0619 Epoch: 11 Global Step: 117870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:30:24,655-Speed 5471.76 samples/sec Loss 4.8122 LearningRate 0.0619 Epoch: 11 Global Step: 117880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:30:32,074-Speed 5521.61 samples/sec Loss 4.7965 LearningRate 0.0619 Epoch: 11 Global Step: 117890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:30:39,617-Speed 5430.81 samples/sec Loss 4.7756 LearningRate 0.0619 Epoch: 11 Global Step: 117900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:30:47,106-Speed 5469.71 samples/sec Loss 4.8418 LearningRate 0.0619 Epoch: 11 Global Step: 117910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:30:54,593-Speed 5471.76 samples/sec Loss 4.8011 LearningRate 0.0619 Epoch: 11 Global Step: 117920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:31:02,081-Speed 5470.86 samples/sec Loss 4.7878 LearningRate 0.0618 Epoch: 11 Global Step: 117930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:31:09,614-Speed 5438.44 samples/sec Loss 4.7984 LearningRate 0.0618 Epoch: 11 Global Step: 117940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:31:17,068-Speed 5495.44 samples/sec Loss 4.7880 LearningRate 0.0618 Epoch: 11 Global Step: 117950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:31:24,613-Speed 5429.58 samples/sec Loss 4.7855 LearningRate 0.0618 Epoch: 11 Global Step: 117960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:31:32,089-Speed 5479.45 samples/sec Loss 4.7854 LearningRate 0.0618 Epoch: 11 Global Step: 117970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:31:39,555-Speed 5487.40 samples/sec Loss 4.8021 LearningRate 0.0618 Epoch: 11 Global Step: 117980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:31:47,021-Speed 5486.57 samples/sec Loss 4.7558 LearningRate 0.0618 Epoch: 11 Global Step: 117990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:31:54,510-Speed 5469.86 samples/sec Loss 4.8682 LearningRate 0.0617 Epoch: 11 Global Step: 118000 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:32:38,059-[lfw][118000]XNorm: 23.955640 Training: 2022-01-08 21:32:38,060-[lfw][118000]Accuracy-Flip: 0.99767+-0.00281 Training: 2022-01-08 21:32:38,060-[lfw][118000]Accuracy-Highest: 0.99817 Training: 2022-01-08 21:33:29,355-[cfp_fp][118000]XNorm: 22.261557 Training: 2022-01-08 21:33:29,356-[cfp_fp][118000]Accuracy-Flip: 0.98986+-0.00440 Training: 2022-01-08 21:33:29,357-[cfp_fp][118000]Accuracy-Highest: 0.99057 Training: 2022-01-08 21:34:13,628-[agedb_30][118000]XNorm: 24.014558 Training: 2022-01-08 21:34:13,629-[agedb_30][118000]Accuracy-Flip: 0.97683+-0.00828 Training: 2022-01-08 21:34:13,629-[agedb_30][118000]Accuracy-Highest: 0.97917 Training: 2022-01-08 21:34:21,180-Speed 279.27 samples/sec Loss 4.7775 LearningRate 0.0617 Epoch: 11 Global Step: 118010 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:34:28,649-Speed 5485.18 samples/sec Loss 4.8039 LearningRate 0.0617 Epoch: 11 Global Step: 118020 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:34:36,101-Speed 5498.02 samples/sec Loss 4.7949 LearningRate 0.0617 Epoch: 11 Global Step: 118030 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:34:43,666-Speed 5416.43 samples/sec Loss 4.8215 LearningRate 0.0617 Epoch: 11 Global Step: 118040 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:34:51,178-Speed 5453.97 samples/sec Loss 4.7745 LearningRate 0.0617 Epoch: 11 Global Step: 118050 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:34:58,723-Speed 5429.48 samples/sec Loss 4.7933 LearningRate 0.0617 Epoch: 11 Global Step: 118060 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:35:06,197-Speed 5482.11 samples/sec Loss 4.7831 LearningRate 0.0617 Epoch: 11 Global Step: 118070 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:35:13,642-Speed 5502.88 samples/sec Loss 4.8524 LearningRate 0.0616 Epoch: 11 Global Step: 118080 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:35:21,074-Speed 5512.70 samples/sec Loss 4.8235 LearningRate 0.0616 Epoch: 11 Global Step: 118090 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:35:28,563-Speed 5470.48 samples/sec Loss 4.8219 LearningRate 0.0616 Epoch: 11 Global Step: 118100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:35:35,999-Speed 5509.25 samples/sec Loss 4.8140 LearningRate 0.0616 Epoch: 11 Global Step: 118110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:35:43,475-Speed 5480.12 samples/sec Loss 4.8188 LearningRate 0.0616 Epoch: 11 Global Step: 118120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:35:50,912-Speed 5508.37 samples/sec Loss 4.8074 LearningRate 0.0616 Epoch: 11 Global Step: 118130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:35:58,440-Speed 5441.49 samples/sec Loss 4.8289 LearningRate 0.0616 Epoch: 11 Global Step: 118140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:36:05,999-Speed 5419.46 samples/sec Loss 4.8256 LearningRate 0.0615 Epoch: 11 Global Step: 118150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:36:13,516-Speed 5449.95 samples/sec Loss 4.8523 LearningRate 0.0615 Epoch: 11 Global Step: 118160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:36:21,053-Speed 5435.33 samples/sec Loss 4.7884 LearningRate 0.0615 Epoch: 11 Global Step: 118170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:36:28,564-Speed 5453.75 samples/sec Loss 4.8030 LearningRate 0.0615 Epoch: 11 Global Step: 118180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:36:36,086-Speed 5446.44 samples/sec Loss 4.7762 LearningRate 0.0615 Epoch: 11 Global Step: 118190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:36:43,548-Speed 5489.42 samples/sec Loss 4.7929 LearningRate 0.0615 Epoch: 11 Global Step: 118200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:36:51,088-Speed 5433.83 samples/sec Loss 4.7905 LearningRate 0.0615 Epoch: 11 Global Step: 118210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:36:58,476-Speed 5544.87 samples/sec Loss 4.7394 LearningRate 0.0614 Epoch: 11 Global Step: 118220 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:37:06,027-Speed 5424.76 samples/sec Loss 4.8051 LearningRate 0.0614 Epoch: 11 Global Step: 118230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:37:13,502-Speed 5479.92 samples/sec Loss 4.7982 LearningRate 0.0614 Epoch: 11 Global Step: 118240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:37:20,922-Speed 5521.68 samples/sec Loss 4.7966 LearningRate 0.0614 Epoch: 11 Global Step: 118250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:37:28,411-Speed 5470.13 samples/sec Loss 4.7563 LearningRate 0.0614 Epoch: 11 Global Step: 118260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:37:35,894-Speed 5473.84 samples/sec Loss 4.8195 LearningRate 0.0614 Epoch: 11 Global Step: 118270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:37:43,364-Speed 5484.27 samples/sec Loss 4.7960 LearningRate 0.0614 Epoch: 11 Global Step: 118280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:37:50,875-Speed 5453.74 samples/sec Loss 4.7882 LearningRate 0.0613 Epoch: 11 Global Step: 118290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:37:58,331-Speed 5494.76 samples/sec Loss 4.7637 LearningRate 0.0613 Epoch: 11 Global Step: 118300 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:38:05,838-Speed 5457.08 samples/sec Loss 4.7827 LearningRate 0.0613 Epoch: 11 Global Step: 118310 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:38:13,289-Speed 5497.41 samples/sec Loss 4.8071 LearningRate 0.0613 Epoch: 11 Global Step: 118320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:38:20,746-Speed 5493.67 samples/sec Loss 4.7912 LearningRate 0.0613 Epoch: 11 Global Step: 118330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:38:28,236-Speed 5469.35 samples/sec Loss 4.8118 LearningRate 0.0613 Epoch: 11 Global Step: 118340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:38:35,766-Speed 5440.13 samples/sec Loss 4.7681 LearningRate 0.0613 Epoch: 11 Global Step: 118350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:38:43,311-Speed 5429.51 samples/sec Loss 4.7393 LearningRate 0.0613 Epoch: 11 Global Step: 118360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:38:50,808-Speed 5464.52 samples/sec Loss 4.7677 LearningRate 0.0612 Epoch: 11 Global Step: 118370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:38:58,323-Speed 5451.26 samples/sec Loss 4.8546 LearningRate 0.0612 Epoch: 11 Global Step: 118380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:39:05,809-Speed 5472.03 samples/sec Loss 4.8097 LearningRate 0.0612 Epoch: 11 Global Step: 118390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:39:13,361-Speed 5424.76 samples/sec Loss 4.7927 LearningRate 0.0612 Epoch: 11 Global Step: 118400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:39:20,804-Speed 5503.71 samples/sec Loss 4.8033 LearningRate 0.0612 Epoch: 11 Global Step: 118410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:39:28,373-Speed 5412.40 samples/sec Loss 4.8147 LearningRate 0.0612 Epoch: 11 Global Step: 118420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:39:35,899-Speed 5443.32 samples/sec Loss 4.7710 LearningRate 0.0612 Epoch: 11 Global Step: 118430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:39:43,443-Speed 5430.32 samples/sec Loss 4.7905 LearningRate 0.0611 Epoch: 11 Global Step: 118440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:39:51,182-Speed 5293.43 samples/sec Loss 4.8060 LearningRate 0.0611 Epoch: 11 Global Step: 118450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:39:58,676-Speed 5466.01 samples/sec Loss 4.7759 LearningRate 0.0611 Epoch: 11 Global Step: 118460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:40:06,199-Speed 5445.50 samples/sec Loss 4.8041 LearningRate 0.0611 Epoch: 11 Global Step: 118470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:40:13,673-Speed 5480.92 samples/sec Loss 4.8020 LearningRate 0.0611 Epoch: 11 Global Step: 118480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:40:21,123-Speed 5498.75 samples/sec Loss 4.8001 LearningRate 0.0611 Epoch: 11 Global Step: 118490 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:40:28,551-Speed 5514.89 samples/sec Loss 4.8436 LearningRate 0.0611 Epoch: 11 Global Step: 118500 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:40:36,005-Speed 5495.70 samples/sec Loss 4.7554 LearningRate 0.0610 Epoch: 11 Global Step: 118510 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:40:43,444-Speed 5507.33 samples/sec Loss 4.8580 LearningRate 0.0610 Epoch: 11 Global Step: 118520 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:40:51,010-Speed 5414.03 samples/sec Loss 4.7571 LearningRate 0.0610 Epoch: 11 Global Step: 118530 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:40:58,551-Speed 5432.27 samples/sec Loss 4.7837 LearningRate 0.0610 Epoch: 11 Global Step: 118540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:41:06,140-Speed 5398.65 samples/sec Loss 4.8109 LearningRate 0.0610 Epoch: 11 Global Step: 118550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:41:13,654-Speed 5451.53 samples/sec Loss 4.7680 LearningRate 0.0610 Epoch: 11 Global Step: 118560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:41:21,164-Speed 5454.27 samples/sec Loss 4.7560 LearningRate 0.0610 Epoch: 11 Global Step: 118570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:41:28,724-Speed 5419.11 samples/sec Loss 4.7928 LearningRate 0.0609 Epoch: 11 Global Step: 118580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:41:36,190-Speed 5487.25 samples/sec Loss 4.8048 LearningRate 0.0609 Epoch: 11 Global Step: 118590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:41:43,914-Speed 5302.84 samples/sec Loss 4.7830 LearningRate 0.0609 Epoch: 11 Global Step: 118600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:41:51,518-Speed 5387.46 samples/sec Loss 4.7918 LearningRate 0.0609 Epoch: 11 Global Step: 118610 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:41:59,054-Speed 5436.09 samples/sec Loss 4.7949 LearningRate 0.0609 Epoch: 11 Global Step: 118620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:42:06,527-Speed 5482.12 samples/sec Loss 4.8227 LearningRate 0.0609 Epoch: 11 Global Step: 118630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:42:14,034-Speed 5456.29 samples/sec Loss 4.8209 LearningRate 0.0609 Epoch: 11 Global Step: 118640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:42:21,481-Speed 5501.37 samples/sec Loss 4.7619 LearningRate 0.0609 Epoch: 11 Global Step: 118650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:42:28,988-Speed 5457.03 samples/sec Loss 4.7964 LearningRate 0.0608 Epoch: 11 Global Step: 118660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:42:36,475-Speed 5471.42 samples/sec Loss 4.7636 LearningRate 0.0608 Epoch: 11 Global Step: 118670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:42:43,983-Speed 5456.00 samples/sec Loss 4.7838 LearningRate 0.0608 Epoch: 11 Global Step: 118680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:42:51,737-Speed 5283.40 samples/sec Loss 4.7982 LearningRate 0.0608 Epoch: 11 Global Step: 118690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:42:59,240-Speed 5459.41 samples/sec Loss 4.7804 LearningRate 0.0608 Epoch: 11 Global Step: 118700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:43:06,933-Speed 5325.08 samples/sec Loss 4.7909 LearningRate 0.0608 Epoch: 11 Global Step: 118710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:43:14,477-Speed 5430.07 samples/sec Loss 4.7831 LearningRate 0.0608 Epoch: 11 Global Step: 118720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:43:21,964-Speed 5471.72 samples/sec Loss 4.8002 LearningRate 0.0607 Epoch: 11 Global Step: 118730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:43:29,490-Speed 5443.20 samples/sec Loss 4.7573 LearningRate 0.0607 Epoch: 11 Global Step: 118740 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:43:36,951-Speed 5490.74 samples/sec Loss 4.7687 LearningRate 0.0607 Epoch: 11 Global Step: 118750 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:43:44,389-Speed 5507.75 samples/sec Loss 4.8252 LearningRate 0.0607 Epoch: 11 Global Step: 118760 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:43:51,919-Speed 5440.10 samples/sec Loss 4.7378 LearningRate 0.0607 Epoch: 11 Global Step: 118770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:43:59,409-Speed 5470.33 samples/sec Loss 4.8257 LearningRate 0.0607 Epoch: 11 Global Step: 118780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:44:06,983-Speed 5408.49 samples/sec Loss 4.8227 LearningRate 0.0607 Epoch: 11 Global Step: 118790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:44:14,607-Speed 5373.22 samples/sec Loss 4.7681 LearningRate 0.0606 Epoch: 11 Global Step: 118800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:44:22,132-Speed 5443.71 samples/sec Loss 4.8337 LearningRate 0.0606 Epoch: 11 Global Step: 118810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:44:29,754-Speed 5374.79 samples/sec Loss 4.7875 LearningRate 0.0606 Epoch: 11 Global Step: 118820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:44:37,374-Speed 5376.30 samples/sec Loss 4.7791 LearningRate 0.0606 Epoch: 11 Global Step: 118830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:44:44,970-Speed 5392.72 samples/sec Loss 4.8129 LearningRate 0.0606 Epoch: 11 Global Step: 118840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:44:52,469-Speed 5463.18 samples/sec Loss 4.7858 LearningRate 0.0606 Epoch: 11 Global Step: 118850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:44:59,912-Speed 5503.14 samples/sec Loss 4.7724 LearningRate 0.0606 Epoch: 11 Global Step: 118860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:45:07,380-Speed 5486.28 samples/sec Loss 4.7975 LearningRate 0.0606 Epoch: 11 Global Step: 118870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:45:14,905-Speed 5443.59 samples/sec Loss 4.7350 LearningRate 0.0605 Epoch: 11 Global Step: 118880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:45:22,419-Speed 5451.83 samples/sec Loss 4.7569 LearningRate 0.0605 Epoch: 11 Global Step: 118890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:45:29,883-Speed 5488.23 samples/sec Loss 4.7873 LearningRate 0.0605 Epoch: 11 Global Step: 118900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:45:37,340-Speed 5493.70 samples/sec Loss 4.7632 LearningRate 0.0605 Epoch: 11 Global Step: 118910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:45:44,800-Speed 5491.72 samples/sec Loss 4.7663 LearningRate 0.0605 Epoch: 11 Global Step: 118920 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:45:52,278-Speed 5477.98 samples/sec Loss 4.7783 LearningRate 0.0605 Epoch: 11 Global Step: 118930 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:45:59,761-Speed 5474.38 samples/sec Loss 4.8109 LearningRate 0.0605 Epoch: 11 Global Step: 118940 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:46:07,327-Speed 5414.67 samples/sec Loss 4.7857 LearningRate 0.0604 Epoch: 11 Global Step: 118950 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:46:14,789-Speed 5490.11 samples/sec Loss 4.7287 LearningRate 0.0604 Epoch: 11 Global Step: 118960 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:46:22,289-Speed 5461.43 samples/sec Loss 4.7202 LearningRate 0.0604 Epoch: 11 Global Step: 118970 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:46:29,862-Speed 5409.40 samples/sec Loss 4.7352 LearningRate 0.0604 Epoch: 11 Global Step: 118980 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:46:37,327-Speed 5488.58 samples/sec Loss 4.7375 LearningRate 0.0604 Epoch: 11 Global Step: 118990 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:46:44,808-Speed 5475.07 samples/sec Loss 4.7935 LearningRate 0.0604 Epoch: 11 Global Step: 119000 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:46:52,302-Speed 5466.95 samples/sec Loss 4.8056 LearningRate 0.0604 Epoch: 11 Global Step: 119010 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:46:59,831-Speed 5441.11 samples/sec Loss 4.7308 LearningRate 0.0603 Epoch: 11 Global Step: 119020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:47:07,347-Speed 5449.96 samples/sec Loss 4.7521 LearningRate 0.0603 Epoch: 11 Global Step: 119030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:47:14,807-Speed 5491.37 samples/sec Loss 4.7623 LearningRate 0.0603 Epoch: 11 Global Step: 119040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:47:22,286-Speed 5477.53 samples/sec Loss 4.7470 LearningRate 0.0603 Epoch: 11 Global Step: 119050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:47:29,819-Speed 5438.12 samples/sec Loss 4.8285 LearningRate 0.0603 Epoch: 11 Global Step: 119060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:47:37,339-Speed 5447.75 samples/sec Loss 4.7362 LearningRate 0.0603 Epoch: 11 Global Step: 119070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:47:44,893-Speed 5423.18 samples/sec Loss 4.7147 LearningRate 0.0603 Epoch: 11 Global Step: 119080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:47:52,375-Speed 5475.06 samples/sec Loss 4.7534 LearningRate 0.0603 Epoch: 11 Global Step: 119090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:47:59,803-Speed 5514.57 samples/sec Loss 4.7769 LearningRate 0.0602 Epoch: 11 Global Step: 119100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:48:07,311-Speed 5456.75 samples/sec Loss 4.7736 LearningRate 0.0602 Epoch: 11 Global Step: 119110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:48:14,766-Speed 5494.88 samples/sec Loss 4.8053 LearningRate 0.0602 Epoch: 11 Global Step: 119120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:48:22,349-Speed 5401.92 samples/sec Loss 4.7288 LearningRate 0.0602 Epoch: 11 Global Step: 119130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:48:30,011-Speed 5346.44 samples/sec Loss 4.7424 LearningRate 0.0602 Epoch: 11 Global Step: 119140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:48:37,764-Speed 5283.58 samples/sec Loss 4.7684 LearningRate 0.0602 Epoch: 11 Global Step: 119150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:48:45,303-Speed 5433.95 samples/sec Loss 4.7504 LearningRate 0.0602 Epoch: 11 Global Step: 119160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:48:52,954-Speed 5354.15 samples/sec Loss 4.7817 LearningRate 0.0601 Epoch: 11 Global Step: 119170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:49:00,458-Speed 5459.16 samples/sec Loss 4.7506 LearningRate 0.0601 Epoch: 11 Global Step: 119180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:49:07,967-Speed 5455.28 samples/sec Loss 4.7714 LearningRate 0.0601 Epoch: 11 Global Step: 119190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:49:15,539-Speed 5410.23 samples/sec Loss 4.7264 LearningRate 0.0601 Epoch: 11 Global Step: 119200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:49:23,044-Speed 5458.62 samples/sec Loss 4.7332 LearningRate 0.0601 Epoch: 11 Global Step: 119210 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:49:30,606-Speed 5417.34 samples/sec Loss 4.7381 LearningRate 0.0601 Epoch: 11 Global Step: 119220 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:49:38,184-Speed 5405.52 samples/sec Loss 4.7842 LearningRate 0.0601 Epoch: 11 Global Step: 119230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:49:45,683-Speed 5462.91 samples/sec Loss 4.7618 LearningRate 0.0600 Epoch: 11 Global Step: 119240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:49:53,185-Speed 5460.95 samples/sec Loss 4.7837 LearningRate 0.0600 Epoch: 11 Global Step: 119250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:50:00,709-Speed 5444.15 samples/sec Loss 4.7414 LearningRate 0.0600 Epoch: 11 Global Step: 119260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:50:08,156-Speed 5501.26 samples/sec Loss 4.7865 LearningRate 0.0600 Epoch: 11 Global Step: 119270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:50:15,645-Speed 5470.21 samples/sec Loss 4.7888 LearningRate 0.0600 Epoch: 11 Global Step: 119280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:50:23,174-Speed 5440.68 samples/sec Loss 4.7654 LearningRate 0.0600 Epoch: 11 Global Step: 119290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:50:30,645-Speed 5483.20 samples/sec Loss 4.7233 LearningRate 0.0600 Epoch: 11 Global Step: 119300 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:50:38,109-Speed 5488.55 samples/sec Loss 4.7785 LearningRate 0.0600 Epoch: 11 Global Step: 119310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:50:45,583-Speed 5481.00 samples/sec Loss 4.7884 LearningRate 0.0599 Epoch: 11 Global Step: 119320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:50:53,068-Speed 5473.27 samples/sec Loss 4.7796 LearningRate 0.0599 Epoch: 11 Global Step: 119330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:51:00,661-Speed 5394.81 samples/sec Loss 4.7414 LearningRate 0.0599 Epoch: 11 Global Step: 119340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:51:08,156-Speed 5466.06 samples/sec Loss 4.7630 LearningRate 0.0599 Epoch: 11 Global Step: 119350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:51:15,706-Speed 5425.87 samples/sec Loss 4.7858 LearningRate 0.0599 Epoch: 11 Global Step: 119360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:51:23,239-Speed 5437.64 samples/sec Loss 4.7293 LearningRate 0.0599 Epoch: 11 Global Step: 119370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:51:30,852-Speed 5381.44 samples/sec Loss 4.7499 LearningRate 0.0599 Epoch: 11 Global Step: 119380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:51:38,454-Speed 5388.82 samples/sec Loss 4.6809 LearningRate 0.0598 Epoch: 11 Global Step: 119390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:51:46,066-Speed 5381.87 samples/sec Loss 4.7578 LearningRate 0.0598 Epoch: 11 Global Step: 119400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:51:53,754-Speed 5328.22 samples/sec Loss 4.6598 LearningRate 0.0598 Epoch: 11 Global Step: 119410 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:52:01,307-Speed 5423.95 samples/sec Loss 4.6648 LearningRate 0.0598 Epoch: 11 Global Step: 119420 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:52:08,786-Speed 5477.12 samples/sec Loss 4.7437 LearningRate 0.0598 Epoch: 11 Global Step: 119430 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:52:16,361-Speed 5407.80 samples/sec Loss 4.7303 LearningRate 0.0598 Epoch: 11 Global Step: 119440 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:52:23,863-Speed 5461.02 samples/sec Loss 4.7167 LearningRate 0.0598 Epoch: 11 Global Step: 119450 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:52:31,393-Speed 5439.76 samples/sec Loss 4.7295 LearningRate 0.0597 Epoch: 11 Global Step: 119460 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:52:38,943-Speed 5426.13 samples/sec Loss 4.7273 LearningRate 0.0597 Epoch: 11 Global Step: 119470 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:52:46,432-Speed 5470.12 samples/sec Loss 4.7227 LearningRate 0.0597 Epoch: 11 Global Step: 119480 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:52:53,889-Speed 5494.11 samples/sec Loss 4.6945 LearningRate 0.0597 Epoch: 11 Global Step: 119490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:53:01,374-Speed 5472.15 samples/sec Loss 4.7369 LearningRate 0.0597 Epoch: 11 Global Step: 119500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:53:08,876-Speed 5461.00 samples/sec Loss 4.7004 LearningRate 0.0597 Epoch: 11 Global Step: 119510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:53:16,385-Speed 5455.70 samples/sec Loss 4.7055 LearningRate 0.0597 Epoch: 11 Global Step: 119520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:53:23,841-Speed 5494.21 samples/sec Loss 4.7670 LearningRate 0.0597 Epoch: 11 Global Step: 119530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:53:31,309-Speed 5485.65 samples/sec Loss 4.7674 LearningRate 0.0596 Epoch: 11 Global Step: 119540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:53:38,757-Speed 5500.26 samples/sec Loss 4.7156 LearningRate 0.0596 Epoch: 11 Global Step: 119550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:53:46,229-Speed 5482.42 samples/sec Loss 4.7459 LearningRate 0.0596 Epoch: 11 Global Step: 119560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:53:53,660-Speed 5512.93 samples/sec Loss 4.7632 LearningRate 0.0596 Epoch: 11 Global Step: 119570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:54:01,154-Speed 5466.31 samples/sec Loss 4.7657 LearningRate 0.0596 Epoch: 11 Global Step: 119580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:54:08,632-Speed 5477.66 samples/sec Loss 4.7552 LearningRate 0.0596 Epoch: 11 Global Step: 119590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:54:16,045-Speed 5526.89 samples/sec Loss 4.7811 LearningRate 0.0596 Epoch: 11 Global Step: 119600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:54:23,483-Speed 5507.30 samples/sec Loss 4.7018 LearningRate 0.0595 Epoch: 11 Global Step: 119610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:54:30,948-Speed 5487.90 samples/sec Loss 4.7217 LearningRate 0.0595 Epoch: 11 Global Step: 119620 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:54:38,433-Speed 5472.62 samples/sec Loss 4.7960 LearningRate 0.0595 Epoch: 11 Global Step: 119630 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:54:45,932-Speed 5463.10 samples/sec Loss 4.7559 LearningRate 0.0595 Epoch: 11 Global Step: 119640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:54:53,422-Speed 5469.44 samples/sec Loss 4.7676 LearningRate 0.0595 Epoch: 11 Global Step: 119650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:55:00,893-Speed 5482.88 samples/sec Loss 4.7544 LearningRate 0.0595 Epoch: 11 Global Step: 119660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:55:08,490-Speed 5392.34 samples/sec Loss 4.7500 LearningRate 0.0595 Epoch: 11 Global Step: 119670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:55:16,149-Speed 5348.72 samples/sec Loss 4.7870 LearningRate 0.0594 Epoch: 11 Global Step: 119680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:55:23,619-Speed 5484.77 samples/sec Loss 4.7223 LearningRate 0.0594 Epoch: 11 Global Step: 119690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:55:31,165-Speed 5427.84 samples/sec Loss 4.7222 LearningRate 0.0594 Epoch: 11 Global Step: 119700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:55:38,650-Speed 5472.96 samples/sec Loss 4.7495 LearningRate 0.0594 Epoch: 11 Global Step: 119710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:55:46,161-Speed 5454.65 samples/sec Loss 4.7127 LearningRate 0.0594 Epoch: 11 Global Step: 119720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:55:53,738-Speed 5406.49 samples/sec Loss 4.7301 LearningRate 0.0594 Epoch: 11 Global Step: 119730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:56:01,290-Speed 5424.40 samples/sec Loss 4.7751 LearningRate 0.0594 Epoch: 11 Global Step: 119740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 21:56:08,760-Speed 5483.64 samples/sec Loss 4.7307 LearningRate 0.0594 Epoch: 11 Global Step: 119750 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:56:16,231-Speed 5483.28 samples/sec Loss 4.7443 LearningRate 0.0593 Epoch: 11 Global Step: 119760 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:56:23,679-Speed 5500.79 samples/sec Loss 4.6514 LearningRate 0.0593 Epoch: 11 Global Step: 119770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:56:31,283-Speed 5387.41 samples/sec Loss 4.7000 LearningRate 0.0593 Epoch: 11 Global Step: 119780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:56:38,912-Speed 5369.74 samples/sec Loss 4.6975 LearningRate 0.0593 Epoch: 11 Global Step: 119790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:56:46,392-Speed 5476.16 samples/sec Loss 4.7552 LearningRate 0.0593 Epoch: 11 Global Step: 119800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:56:53,876-Speed 5474.32 samples/sec Loss 4.6939 LearningRate 0.0593 Epoch: 11 Global Step: 119810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:57:01,403-Speed 5442.27 samples/sec Loss 4.7295 LearningRate 0.0593 Epoch: 11 Global Step: 119820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:57:08,462-Speed 5803.18 samples/sec Loss 4.7095 LearningRate 0.0592 Epoch: 11 Global Step: 119830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:57:15,468-Speed 5847.16 samples/sec Loss 4.7142 LearningRate 0.0592 Epoch: 11 Global Step: 119840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:57:22,844-Speed 5553.85 samples/sec Loss 4.7863 LearningRate 0.0592 Epoch: 11 Global Step: 119850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:57:30,357-Speed 5453.20 samples/sec Loss 4.7151 LearningRate 0.0592 Epoch: 11 Global Step: 119860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:57:37,699-Speed 5579.35 samples/sec Loss 4.7141 LearningRate 0.0592 Epoch: 11 Global Step: 119870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:57:45,172-Speed 5481.91 samples/sec Loss 4.7209 LearningRate 0.0592 Epoch: 11 Global Step: 119880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:57:52,718-Speed 5428.64 samples/sec Loss 4.6668 LearningRate 0.0592 Epoch: 11 Global Step: 119890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:58:00,218-Speed 5462.40 samples/sec Loss 4.7472 LearningRate 0.0592 Epoch: 11 Global Step: 119900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:58:07,743-Speed 5443.52 samples/sec Loss 4.7089 LearningRate 0.0591 Epoch: 11 Global Step: 119910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:58:15,241-Speed 5463.57 samples/sec Loss 4.7298 LearningRate 0.0591 Epoch: 11 Global Step: 119920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 21:58:22,682-Speed 5505.19 samples/sec Loss 4.6764 LearningRate 0.0591 Epoch: 11 Global Step: 119930 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:58:30,148-Speed 5487.35 samples/sec Loss 4.7746 LearningRate 0.0591 Epoch: 11 Global Step: 119940 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:58:37,696-Speed 5427.52 samples/sec Loss 4.7541 LearningRate 0.0591 Epoch: 11 Global Step: 119950 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:58:45,229-Speed 5437.57 samples/sec Loss 4.7569 LearningRate 0.0591 Epoch: 11 Global Step: 119960 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:58:52,706-Speed 5478.90 samples/sec Loss 4.7239 LearningRate 0.0591 Epoch: 11 Global Step: 119970 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:59:00,170-Speed 5488.49 samples/sec Loss 4.7300 LearningRate 0.0590 Epoch: 11 Global Step: 119980 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:59:07,725-Speed 5422.27 samples/sec Loss 4.6921 LearningRate 0.0590 Epoch: 11 Global Step: 119990 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:59:15,282-Speed 5421.09 samples/sec Loss 4.6723 LearningRate 0.0590 Epoch: 11 Global Step: 120000 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 21:59:59,272-[lfw][120000]XNorm: 21.978734 Training: 2022-01-08 21:59:59,273-[lfw][120000]Accuracy-Flip: 0.99767+-0.00318 Training: 2022-01-08 21:59:59,273-[lfw][120000]Accuracy-Highest: 0.99817 Training: 2022-01-08 22:00:51,080-[cfp_fp][120000]XNorm: 20.407298 Training: 2022-01-08 22:00:51,081-[cfp_fp][120000]Accuracy-Flip: 0.98886+-0.00428 Training: 2022-01-08 22:00:51,081-[cfp_fp][120000]Accuracy-Highest: 0.99057 Training: 2022-01-08 22:01:35,431-[agedb_30][120000]XNorm: 21.831277 Training: 2022-01-08 22:01:35,432-[agedb_30][120000]Accuracy-Flip: 0.97783+-0.00827 Training: 2022-01-08 22:01:35,433-[agedb_30][120000]Accuracy-Highest: 0.97917 Training: 2022-01-08 22:01:43,082-Speed 277.13 samples/sec Loss 4.7560 LearningRate 0.0590 Epoch: 11 Global Step: 120010 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:01:50,637-Speed 5422.87 samples/sec Loss 4.7786 LearningRate 0.0590 Epoch: 11 Global Step: 120020 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:01:58,079-Speed 5505.46 samples/sec Loss 4.7092 LearningRate 0.0590 Epoch: 11 Global Step: 120030 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:02:05,579-Speed 5462.00 samples/sec Loss 4.7607 LearningRate 0.0590 Epoch: 11 Global Step: 120040 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:02:13,064-Speed 5473.01 samples/sec Loss 4.7610 LearningRate 0.0589 Epoch: 11 Global Step: 120050 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:02:20,507-Speed 5504.89 samples/sec Loss 4.7086 LearningRate 0.0589 Epoch: 11 Global Step: 120060 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:02:27,988-Speed 5475.57 samples/sec Loss 4.7042 LearningRate 0.0589 Epoch: 11 Global Step: 120070 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:02:35,514-Speed 5442.98 samples/sec Loss 4.7212 LearningRate 0.0589 Epoch: 11 Global Step: 120080 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:02:42,947-Speed 5511.59 samples/sec Loss 4.7005 LearningRate 0.0589 Epoch: 11 Global Step: 120090 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:02:50,404-Speed 5493.42 samples/sec Loss 4.6705 LearningRate 0.0589 Epoch: 11 Global Step: 120100 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:02:57,866-Speed 5489.89 samples/sec Loss 4.6825 LearningRate 0.0589 Epoch: 11 Global Step: 120110 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:03:05,372-Speed 5457.87 samples/sec Loss 4.6969 LearningRate 0.0589 Epoch: 11 Global Step: 120120 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:03:12,885-Speed 5452.50 samples/sec Loss 4.7095 LearningRate 0.0588 Epoch: 11 Global Step: 120130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:03:20,350-Speed 5487.94 samples/sec Loss 4.7314 LearningRate 0.0588 Epoch: 11 Global Step: 120140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:03:27,811-Speed 5490.46 samples/sec Loss 4.6850 LearningRate 0.0588 Epoch: 11 Global Step: 120150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:03:35,473-Speed 5346.43 samples/sec Loss 4.7304 LearningRate 0.0588 Epoch: 11 Global Step: 120160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:03:42,916-Speed 5503.69 samples/sec Loss 4.7261 LearningRate 0.0588 Epoch: 11 Global Step: 120170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:03:50,513-Speed 5392.43 samples/sec Loss 4.7133 LearningRate 0.0588 Epoch: 11 Global Step: 120180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:03:57,977-Speed 5488.71 samples/sec Loss 4.6645 LearningRate 0.0588 Epoch: 11 Global Step: 120190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:04:05,544-Speed 5413.38 samples/sec Loss 4.6438 LearningRate 0.0587 Epoch: 11 Global Step: 120200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:04:13,165-Speed 5375.05 samples/sec Loss 4.6784 LearningRate 0.0587 Epoch: 11 Global Step: 120210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:04:20,709-Speed 5430.56 samples/sec Loss 4.6606 LearningRate 0.0587 Epoch: 11 Global Step: 120220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:04:28,269-Speed 5418.65 samples/sec Loss 4.6739 LearningRate 0.0587 Epoch: 11 Global Step: 120230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:04:35,804-Speed 5436.68 samples/sec Loss 4.7063 LearningRate 0.0587 Epoch: 11 Global Step: 120240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:04:43,389-Speed 5400.80 samples/sec Loss 4.6697 LearningRate 0.0587 Epoch: 11 Global Step: 120250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:04:50,950-Speed 5417.88 samples/sec Loss 4.6784 LearningRate 0.0587 Epoch: 11 Global Step: 120260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:04:58,464-Speed 5452.01 samples/sec Loss 4.7324 LearningRate 0.0587 Epoch: 11 Global Step: 120270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:05:05,960-Speed 5464.43 samples/sec Loss 4.6827 LearningRate 0.0586 Epoch: 11 Global Step: 120280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:05:13,436-Speed 5479.57 samples/sec Loss 4.6998 LearningRate 0.0586 Epoch: 11 Global Step: 120290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:05:20,935-Speed 5463.23 samples/sec Loss 4.6494 LearningRate 0.0586 Epoch: 11 Global Step: 120300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:05:28,337-Speed 5534.40 samples/sec Loss 4.7145 LearningRate 0.0586 Epoch: 11 Global Step: 120310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:05:35,275-Speed 5904.93 samples/sec Loss 4.6854 LearningRate 0.0586 Epoch: 11 Global Step: 120320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:05:42,341-Speed 5797.14 samples/sec Loss 4.6919 LearningRate 0.0586 Epoch: 11 Global Step: 120330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:05:50,007-Speed 5343.88 samples/sec Loss 4.7032 LearningRate 0.0586 Epoch: 11 Global Step: 120340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:05:57,823-Speed 5241.49 samples/sec Loss 4.7290 LearningRate 0.0585 Epoch: 11 Global Step: 120350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:06:05,320-Speed 5464.11 samples/sec Loss 4.6820 LearningRate 0.0585 Epoch: 11 Global Step: 120360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:06:12,798-Speed 5477.82 samples/sec Loss 4.6508 LearningRate 0.0585 Epoch: 11 Global Step: 120370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:06:20,324-Speed 5443.26 samples/sec Loss 4.6957 LearningRate 0.0585 Epoch: 11 Global Step: 120380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:06:27,792-Speed 5485.49 samples/sec Loss 4.6727 LearningRate 0.0585 Epoch: 11 Global Step: 120390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:06:35,337-Speed 5429.09 samples/sec Loss 4.7224 LearningRate 0.0585 Epoch: 11 Global Step: 120400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:06:42,849-Speed 5453.71 samples/sec Loss 4.6826 LearningRate 0.0585 Epoch: 11 Global Step: 120410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:06:50,353-Speed 5459.83 samples/sec Loss 4.6825 LearningRate 0.0584 Epoch: 11 Global Step: 120420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:06:57,845-Speed 5467.20 samples/sec Loss 4.6881 LearningRate 0.0584 Epoch: 11 Global Step: 120430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:07:05,427-Speed 5402.97 samples/sec Loss 4.7222 LearningRate 0.0584 Epoch: 11 Global Step: 120440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:07:12,915-Speed 5471.42 samples/sec Loss 4.6772 LearningRate 0.0584 Epoch: 11 Global Step: 120450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:07:20,465-Speed 5425.40 samples/sec Loss 4.7225 LearningRate 0.0584 Epoch: 11 Global Step: 120460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:07:27,998-Speed 5438.38 samples/sec Loss 4.7126 LearningRate 0.0584 Epoch: 11 Global Step: 120470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:07:35,430-Speed 5511.63 samples/sec Loss 4.6751 LearningRate 0.0584 Epoch: 11 Global Step: 120480 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:07:42,987-Speed 5421.15 samples/sec Loss 4.6655 LearningRate 0.0584 Epoch: 11 Global Step: 120490 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:07:50,571-Speed 5401.66 samples/sec Loss 4.6730 LearningRate 0.0583 Epoch: 11 Global Step: 120500 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:07:58,099-Speed 5441.85 samples/sec Loss 4.6625 LearningRate 0.0583 Epoch: 11 Global Step: 120510 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:08:05,673-Speed 5408.99 samples/sec Loss 4.6419 LearningRate 0.0583 Epoch: 11 Global Step: 120520 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:08:13,296-Speed 5373.46 samples/sec Loss 4.6823 LearningRate 0.0583 Epoch: 11 Global Step: 120530 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:08:20,949-Speed 5353.07 samples/sec Loss 4.6670 LearningRate 0.0583 Epoch: 11 Global Step: 120540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:08:28,424-Speed 5480.25 samples/sec Loss 4.7206 LearningRate 0.0583 Epoch: 11 Global Step: 120550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:08:35,915-Speed 5468.77 samples/sec Loss 4.6959 LearningRate 0.0583 Epoch: 11 Global Step: 120560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:08:43,465-Speed 5425.29 samples/sec Loss 4.6989 LearningRate 0.0582 Epoch: 11 Global Step: 120570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:08:51,062-Speed 5392.76 samples/sec Loss 4.6743 LearningRate 0.0582 Epoch: 11 Global Step: 120580 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:08:58,692-Speed 5368.82 samples/sec Loss 4.6274 LearningRate 0.0582 Epoch: 11 Global Step: 120590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:09:06,223-Speed 5440.14 samples/sec Loss 4.7014 LearningRate 0.0582 Epoch: 11 Global Step: 120600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:09:13,806-Speed 5401.83 samples/sec Loss 4.6745 LearningRate 0.0582 Epoch: 11 Global Step: 120610 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:09:21,374-Speed 5412.94 samples/sec Loss 4.6817 LearningRate 0.0582 Epoch: 11 Global Step: 120620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:09:28,927-Speed 5423.43 samples/sec Loss 4.6457 LearningRate 0.0582 Epoch: 11 Global Step: 120630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:09:36,547-Speed 5376.44 samples/sec Loss 4.6197 LearningRate 0.0582 Epoch: 11 Global Step: 120640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:09:44,031-Speed 5473.33 samples/sec Loss 4.6607 LearningRate 0.0581 Epoch: 11 Global Step: 120650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:09:51,516-Speed 5473.30 samples/sec Loss 4.6430 LearningRate 0.0581 Epoch: 11 Global Step: 120660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:09:58,831-Speed 5599.77 samples/sec Loss 4.7050 LearningRate 0.0581 Epoch: 11 Global Step: 120670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:10:06,321-Speed 5469.80 samples/sec Loss 4.6741 LearningRate 0.0581 Epoch: 11 Global Step: 120680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:10:13,824-Speed 5460.03 samples/sec Loss 4.7039 LearningRate 0.0581 Epoch: 11 Global Step: 120690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:10:21,359-Speed 5436.80 samples/sec Loss 4.7121 LearningRate 0.0581 Epoch: 11 Global Step: 120700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:10:28,882-Speed 5445.09 samples/sec Loss 4.6344 LearningRate 0.0581 Epoch: 11 Global Step: 120710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:10:36,410-Speed 5441.78 samples/sec Loss 4.6638 LearningRate 0.0580 Epoch: 11 Global Step: 120720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:10:43,922-Speed 5453.38 samples/sec Loss 4.6503 LearningRate 0.0580 Epoch: 11 Global Step: 120730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:10:51,445-Speed 5444.60 samples/sec Loss 4.6958 LearningRate 0.0580 Epoch: 11 Global Step: 120740 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:10:58,971-Speed 5443.52 samples/sec Loss 4.6796 LearningRate 0.0580 Epoch: 11 Global Step: 120750 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:11:06,482-Speed 5454.07 samples/sec Loss 4.6402 LearningRate 0.0580 Epoch: 11 Global Step: 120760 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:11:13,958-Speed 5479.81 samples/sec Loss 4.6019 LearningRate 0.0580 Epoch: 11 Global Step: 120770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:11:21,369-Speed 5527.65 samples/sec Loss 4.6333 LearningRate 0.0580 Epoch: 11 Global Step: 120780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:11:28,850-Speed 5475.70 samples/sec Loss 4.6382 LearningRate 0.0580 Epoch: 11 Global Step: 120790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:11:36,383-Speed 5438.16 samples/sec Loss 4.6367 LearningRate 0.0579 Epoch: 11 Global Step: 120800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:11:43,872-Speed 5470.08 samples/sec Loss 4.7024 LearningRate 0.0579 Epoch: 11 Global Step: 120810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:11:51,440-Speed 5413.23 samples/sec Loss 4.6336 LearningRate 0.0579 Epoch: 11 Global Step: 120820 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:11:58,882-Speed 5503.82 samples/sec Loss 4.6635 LearningRate 0.0579 Epoch: 11 Global Step: 120830 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:12:06,438-Speed 5422.34 samples/sec Loss 4.6387 LearningRate 0.0579 Epoch: 11 Global Step: 120840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:12:13,921-Speed 5474.56 samples/sec Loss 4.6574 LearningRate 0.0579 Epoch: 11 Global Step: 120850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:12:21,486-Speed 5414.94 samples/sec Loss 4.6715 LearningRate 0.0579 Epoch: 11 Global Step: 120860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:12:29,015-Speed 5440.62 samples/sec Loss 4.6444 LearningRate 0.0578 Epoch: 11 Global Step: 120870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:12:36,506-Speed 5469.17 samples/sec Loss 4.6870 LearningRate 0.0578 Epoch: 11 Global Step: 120880 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:12:44,017-Speed 5454.13 samples/sec Loss 4.6528 LearningRate 0.0578 Epoch: 11 Global Step: 120890 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:12:51,497-Speed 5476.55 samples/sec Loss 4.6871 LearningRate 0.0578 Epoch: 11 Global Step: 120900 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:12:58,989-Speed 5467.97 samples/sec Loss 4.7006 LearningRate 0.0578 Epoch: 11 Global Step: 120910 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:13:06,434-Speed 5501.99 samples/sec Loss 4.6179 LearningRate 0.0578 Epoch: 11 Global Step: 120920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:13:13,917-Speed 5474.92 samples/sec Loss 4.5977 LearningRate 0.0578 Epoch: 11 Global Step: 120930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:13:21,447-Speed 5440.10 samples/sec Loss 4.6098 LearningRate 0.0578 Epoch: 11 Global Step: 120940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:13:29,056-Speed 5384.03 samples/sec Loss 4.7186 LearningRate 0.0577 Epoch: 11 Global Step: 120950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:13:36,585-Speed 5440.68 samples/sec Loss 4.6733 LearningRate 0.0577 Epoch: 11 Global Step: 120960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:13:44,185-Speed 5390.11 samples/sec Loss 4.6732 LearningRate 0.0577 Epoch: 11 Global Step: 120970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:13:51,726-Speed 5432.51 samples/sec Loss 4.6562 LearningRate 0.0577 Epoch: 11 Global Step: 120980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:13:59,251-Speed 5443.48 samples/sec Loss 4.6485 LearningRate 0.0577 Epoch: 11 Global Step: 120990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:14:06,814-Speed 5416.86 samples/sec Loss 4.6759 LearningRate 0.0577 Epoch: 11 Global Step: 121000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:14:14,276-Speed 5489.94 samples/sec Loss 4.6450 LearningRate 0.0577 Epoch: 11 Global Step: 121010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:14:21,748-Speed 5482.79 samples/sec Loss 4.7023 LearningRate 0.0576 Epoch: 11 Global Step: 121020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:14:29,258-Speed 5454.66 samples/sec Loss 4.6494 LearningRate 0.0576 Epoch: 11 Global Step: 121030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:14:36,692-Speed 5510.45 samples/sec Loss 4.6698 LearningRate 0.0576 Epoch: 11 Global Step: 121040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:14:44,162-Speed 5483.90 samples/sec Loss 4.6003 LearningRate 0.0576 Epoch: 11 Global Step: 121050 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:14:51,646-Speed 5473.54 samples/sec Loss 4.6473 LearningRate 0.0576 Epoch: 11 Global Step: 121060 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:14:59,120-Speed 5481.11 samples/sec Loss 4.6128 LearningRate 0.0576 Epoch: 11 Global Step: 121070 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:15:06,625-Speed 5458.41 samples/sec Loss 4.6355 LearningRate 0.0576 Epoch: 11 Global Step: 121080 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:15:14,112-Speed 5471.91 samples/sec Loss 4.6284 LearningRate 0.0576 Epoch: 11 Global Step: 121090 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:15:21,607-Speed 5466.20 samples/sec Loss 4.5932 LearningRate 0.0575 Epoch: 11 Global Step: 121100 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:15:29,233-Speed 5371.19 samples/sec Loss 4.6348 LearningRate 0.0575 Epoch: 11 Global Step: 121110 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:15:36,819-Speed 5399.92 samples/sec Loss 4.5950 LearningRate 0.0575 Epoch: 11 Global Step: 121120 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:15:44,458-Speed 5363.20 samples/sec Loss 4.6247 LearningRate 0.0575 Epoch: 11 Global Step: 121130 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:15:52,180-Speed 5304.64 samples/sec Loss 4.6299 LearningRate 0.0575 Epoch: 11 Global Step: 121140 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:15:59,831-Speed 5354.91 samples/sec Loss 4.6701 LearningRate 0.0575 Epoch: 11 Global Step: 121150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:16:07,371-Speed 5432.22 samples/sec Loss 4.6446 LearningRate 0.0575 Epoch: 11 Global Step: 121160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:16:14,831-Speed 5492.06 samples/sec Loss 4.6276 LearningRate 0.0574 Epoch: 11 Global Step: 121170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:16:22,374-Speed 5430.52 samples/sec Loss 4.6981 LearningRate 0.0574 Epoch: 11 Global Step: 121180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:16:29,924-Speed 5425.84 samples/sec Loss 4.7196 LearningRate 0.0574 Epoch: 11 Global Step: 121190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:16:37,485-Speed 5418.36 samples/sec Loss 4.6154 LearningRate 0.0574 Epoch: 11 Global Step: 121200 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:16:45,083-Speed 5391.44 samples/sec Loss 4.6473 LearningRate 0.0574 Epoch: 11 Global Step: 121210 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:16:52,722-Speed 5363.02 samples/sec Loss 4.6955 LearningRate 0.0574 Epoch: 11 Global Step: 121220 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:17:00,375-Speed 5352.57 samples/sec Loss 4.6060 LearningRate 0.0574 Epoch: 11 Global Step: 121230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:17:07,977-Speed 5389.10 samples/sec Loss 4.6609 LearningRate 0.0574 Epoch: 11 Global Step: 121240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:17:15,514-Speed 5435.01 samples/sec Loss 4.6550 LearningRate 0.0573 Epoch: 11 Global Step: 121250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:17:23,005-Speed 5468.53 samples/sec Loss 4.6869 LearningRate 0.0573 Epoch: 11 Global Step: 121260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:17:30,575-Speed 5411.52 samples/sec Loss 4.6654 LearningRate 0.0573 Epoch: 11 Global Step: 121270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:17:38,289-Speed 5310.67 samples/sec Loss 4.6195 LearningRate 0.0573 Epoch: 11 Global Step: 121280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:17:45,812-Speed 5444.98 samples/sec Loss 4.6172 LearningRate 0.0573 Epoch: 11 Global Step: 121290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:17:53,420-Speed 5384.58 samples/sec Loss 4.7099 LearningRate 0.0573 Epoch: 11 Global Step: 121300 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:18:01,012-Speed 5395.99 samples/sec Loss 4.6321 LearningRate 0.0573 Epoch: 11 Global Step: 121310 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:18:08,592-Speed 5404.06 samples/sec Loss 4.6712 LearningRate 0.0572 Epoch: 11 Global Step: 121320 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:18:16,099-Speed 5456.90 samples/sec Loss 4.6729 LearningRate 0.0572 Epoch: 11 Global Step: 121330 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:18:23,605-Speed 5458.40 samples/sec Loss 4.7028 LearningRate 0.0572 Epoch: 11 Global Step: 121340 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:18:31,140-Speed 5435.85 samples/sec Loss 4.6413 LearningRate 0.0572 Epoch: 11 Global Step: 121350 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:18:38,691-Speed 5425.43 samples/sec Loss 4.6235 LearningRate 0.0572 Epoch: 11 Global Step: 121360 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:18:46,431-Speed 5292.55 samples/sec Loss 4.6079 LearningRate 0.0572 Epoch: 11 Global Step: 121370 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:18:54,135-Speed 5317.94 samples/sec Loss 4.6556 LearningRate 0.0572 Epoch: 11 Global Step: 121380 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:19:01,761-Speed 5371.32 samples/sec Loss 4.6266 LearningRate 0.0572 Epoch: 11 Global Step: 121390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:19:09,349-Speed 5399.03 samples/sec Loss 4.6549 LearningRate 0.0571 Epoch: 11 Global Step: 121400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:19:16,932-Speed 5401.82 samples/sec Loss 4.6787 LearningRate 0.0571 Epoch: 11 Global Step: 121410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:19:24,514-Speed 5403.26 samples/sec Loss 4.6042 LearningRate 0.0571 Epoch: 11 Global Step: 121420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:19:32,075-Speed 5417.71 samples/sec Loss 4.6218 LearningRate 0.0571 Epoch: 11 Global Step: 121430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:19:39,577-Speed 5460.96 samples/sec Loss 4.5656 LearningRate 0.0571 Epoch: 11 Global Step: 121440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:19:47,164-Speed 5399.46 samples/sec Loss 4.6005 LearningRate 0.0571 Epoch: 11 Global Step: 121450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:19:54,826-Speed 5346.75 samples/sec Loss 4.6025 LearningRate 0.0571 Epoch: 11 Global Step: 121460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:20:02,344-Speed 5448.76 samples/sec Loss 4.6461 LearningRate 0.0570 Epoch: 11 Global Step: 121470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:20:09,867-Speed 5445.04 samples/sec Loss 4.5830 LearningRate 0.0570 Epoch: 11 Global Step: 121480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:20:17,449-Speed 5403.49 samples/sec Loss 4.6413 LearningRate 0.0570 Epoch: 11 Global Step: 121490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:20:25,052-Speed 5388.03 samples/sec Loss 4.6250 LearningRate 0.0570 Epoch: 11 Global Step: 121500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:20:32,550-Speed 5462.82 samples/sec Loss 4.6367 LearningRate 0.0570 Epoch: 11 Global Step: 121510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:20:40,054-Speed 5459.79 samples/sec Loss 4.5992 LearningRate 0.0570 Epoch: 11 Global Step: 121520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:20:47,572-Speed 5448.50 samples/sec Loss 4.6019 LearningRate 0.0570 Epoch: 11 Global Step: 121530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:20:55,048-Speed 5479.96 samples/sec Loss 4.6422 LearningRate 0.0570 Epoch: 11 Global Step: 121540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:21:02,576-Speed 5441.66 samples/sec Loss 4.7096 LearningRate 0.0569 Epoch: 11 Global Step: 121550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:21:10,099-Speed 5444.96 samples/sec Loss 4.6109 LearningRate 0.0569 Epoch: 11 Global Step: 121560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:21:17,725-Speed 5372.67 samples/sec Loss 4.6048 LearningRate 0.0569 Epoch: 11 Global Step: 121570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:21:25,296-Speed 5410.67 samples/sec Loss 4.5648 LearningRate 0.0569 Epoch: 11 Global Step: 121580 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:21:32,832-Speed 5435.58 samples/sec Loss 4.6495 LearningRate 0.0569 Epoch: 11 Global Step: 121590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:21:40,412-Speed 5404.04 samples/sec Loss 4.6232 LearningRate 0.0569 Epoch: 11 Global Step: 121600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:21:47,951-Speed 5434.31 samples/sec Loss 4.5637 LearningRate 0.0569 Epoch: 11 Global Step: 121610 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:21:55,479-Speed 5441.58 samples/sec Loss 4.5766 LearningRate 0.0568 Epoch: 11 Global Step: 121620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:22:03,091-Speed 5382.04 samples/sec Loss 4.5860 LearningRate 0.0568 Epoch: 11 Global Step: 121630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:22:10,626-Speed 5436.47 samples/sec Loss 4.6070 LearningRate 0.0568 Epoch: 11 Global Step: 121640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:22:18,168-Speed 5431.63 samples/sec Loss 4.5926 LearningRate 0.0568 Epoch: 11 Global Step: 121650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:22:25,659-Speed 5468.63 samples/sec Loss 4.5640 LearningRate 0.0568 Epoch: 11 Global Step: 121660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:22:33,242-Speed 5401.86 samples/sec Loss 4.5763 LearningRate 0.0568 Epoch: 11 Global Step: 121670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:22:40,754-Speed 5453.51 samples/sec Loss 4.6221 LearningRate 0.0568 Epoch: 11 Global Step: 121680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:22:48,390-Speed 5364.84 samples/sec Loss 4.6412 LearningRate 0.0568 Epoch: 11 Global Step: 121690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:22:55,923-Speed 5437.89 samples/sec Loss 4.6404 LearningRate 0.0567 Epoch: 11 Global Step: 121700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:23:03,466-Speed 5430.97 samples/sec Loss 4.6130 LearningRate 0.0567 Epoch: 11 Global Step: 121710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:23:11,002-Speed 5435.63 samples/sec Loss 4.6001 LearningRate 0.0567 Epoch: 11 Global Step: 121720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:23:18,505-Speed 5460.16 samples/sec Loss 4.6022 LearningRate 0.0567 Epoch: 11 Global Step: 121730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:23:26,004-Speed 5463.01 samples/sec Loss 4.5716 LearningRate 0.0567 Epoch: 11 Global Step: 121740 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:23:33,499-Speed 5465.73 samples/sec Loss 4.6150 LearningRate 0.0567 Epoch: 11 Global Step: 121750 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:23:41,041-Speed 5431.40 samples/sec Loss 4.5947 LearningRate 0.0567 Epoch: 11 Global Step: 121760 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:23:48,559-Speed 5449.48 samples/sec Loss 4.6648 LearningRate 0.0566 Epoch: 11 Global Step: 121770 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:23:56,098-Speed 5433.83 samples/sec Loss 4.6145 LearningRate 0.0566 Epoch: 11 Global Step: 121780 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:24:03,669-Speed 5410.34 samples/sec Loss 4.5828 LearningRate 0.0566 Epoch: 11 Global Step: 121790 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:24:11,257-Speed 5398.68 samples/sec Loss 4.6327 LearningRate 0.0566 Epoch: 11 Global Step: 121800 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:24:18,835-Speed 5405.59 samples/sec Loss 4.6128 LearningRate 0.0566 Epoch: 11 Global Step: 121810 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:24:26,387-Speed 5425.11 samples/sec Loss 4.6176 LearningRate 0.0566 Epoch: 11 Global Step: 121820 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:24:34,018-Speed 5367.98 samples/sec Loss 4.5877 LearningRate 0.0566 Epoch: 11 Global Step: 121830 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:24:41,604-Speed 5399.85 samples/sec Loss 4.5686 LearningRate 0.0566 Epoch: 11 Global Step: 121840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:24:49,111-Speed 5457.29 samples/sec Loss 4.5876 LearningRate 0.0565 Epoch: 11 Global Step: 121850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:24:56,754-Speed 5359.91 samples/sec Loss 4.6217 LearningRate 0.0565 Epoch: 11 Global Step: 121860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:25:04,543-Speed 5259.18 samples/sec Loss 4.6268 LearningRate 0.0565 Epoch: 11 Global Step: 121870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:25:12,165-Speed 5375.02 samples/sec Loss 4.5485 LearningRate 0.0565 Epoch: 11 Global Step: 121880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:25:19,693-Speed 5441.12 samples/sec Loss 4.5878 LearningRate 0.0565 Epoch: 11 Global Step: 121890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:25:27,357-Speed 5345.63 samples/sec Loss 4.5825 LearningRate 0.0565 Epoch: 11 Global Step: 121900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:25:34,905-Speed 5426.88 samples/sec Loss 4.5903 LearningRate 0.0565 Epoch: 11 Global Step: 121910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:25:42,519-Speed 5380.96 samples/sec Loss 4.5572 LearningRate 0.0565 Epoch: 11 Global Step: 121920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:25:50,094-Speed 5407.61 samples/sec Loss 4.5832 LearningRate 0.0564 Epoch: 11 Global Step: 121930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:25:57,680-Speed 5400.34 samples/sec Loss 4.5767 LearningRate 0.0564 Epoch: 11 Global Step: 121940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:26:05,191-Speed 5453.63 samples/sec Loss 4.5973 LearningRate 0.0564 Epoch: 11 Global Step: 121950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:26:12,750-Speed 5419.92 samples/sec Loss 4.6112 LearningRate 0.0564 Epoch: 11 Global Step: 121960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:26:20,288-Speed 5434.12 samples/sec Loss 4.6176 LearningRate 0.0564 Epoch: 11 Global Step: 121970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-08 22:26:27,867-Speed 5405.43 samples/sec Loss 4.5718 LearningRate 0.0564 Epoch: 11 Global Step: 121980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:26:35,439-Speed 5409.98 samples/sec Loss 4.6090 LearningRate 0.0564 Epoch: 11 Global Step: 121990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:26:43,048-Speed 5383.87 samples/sec Loss 4.6209 LearningRate 0.0563 Epoch: 11 Global Step: 122000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:27:27,442-[lfw][122000]XNorm: 23.173306 Training: 2022-01-08 22:27:27,443-[lfw][122000]Accuracy-Flip: 0.99767+-0.00281 Training: 2022-01-08 22:27:27,443-[lfw][122000]Accuracy-Highest: 0.99817 Training: 2022-01-08 22:28:19,314-[cfp_fp][122000]XNorm: 21.469200 Training: 2022-01-08 22:28:19,315-[cfp_fp][122000]Accuracy-Flip: 0.98929+-0.00614 Training: 2022-01-08 22:28:19,316-[cfp_fp][122000]Accuracy-Highest: 0.99057 Training: 2022-01-08 22:29:04,109-[agedb_30][122000]XNorm: 23.163223 Training: 2022-01-08 22:29:04,111-[agedb_30][122000]Accuracy-Flip: 0.98000+-0.00687 Training: 2022-01-08 22:29:04,111-[agedb_30][122000]Accuracy-Highest: 0.98000 Training: 2022-01-08 22:29:11,655-Speed 275.63 samples/sec Loss 4.6092 LearningRate 0.0563 Epoch: 11 Global Step: 122010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:29:19,235-Speed 5405.27 samples/sec Loss 4.6637 LearningRate 0.0563 Epoch: 11 Global Step: 122020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:29:26,733-Speed 5464.81 samples/sec Loss 4.6823 LearningRate 0.0563 Epoch: 11 Global Step: 122030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:29:34,189-Speed 5494.55 samples/sec Loss 4.6612 LearningRate 0.0563 Epoch: 11 Global Step: 122040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:29:41,706-Speed 5450.24 samples/sec Loss 4.5865 LearningRate 0.0563 Epoch: 11 Global Step: 122050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:29:49,220-Speed 5452.77 samples/sec Loss 4.5654 LearningRate 0.0563 Epoch: 11 Global Step: 122060 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:29:56,668-Speed 5500.51 samples/sec Loss 4.5343 LearningRate 0.0563 Epoch: 11 Global Step: 122070 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:30:04,274-Speed 5386.50 samples/sec Loss 4.6154 LearningRate 0.0562 Epoch: 11 Global Step: 122080 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:30:11,855-Speed 5404.02 samples/sec Loss 4.6566 LearningRate 0.0562 Epoch: 11 Global Step: 122090 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:30:19,340-Speed 5473.09 samples/sec Loss 4.6084 LearningRate 0.0562 Epoch: 11 Global Step: 122100 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:30:26,835-Speed 5466.00 samples/sec Loss 4.5869 LearningRate 0.0562 Epoch: 11 Global Step: 122110 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:30:34,309-Speed 5480.78 samples/sec Loss 4.6414 LearningRate 0.0562 Epoch: 11 Global Step: 122120 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:30:41,893-Speed 5401.88 samples/sec Loss 4.6353 LearningRate 0.0562 Epoch: 11 Global Step: 122130 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:30:49,408-Speed 5450.82 samples/sec Loss 4.6051 LearningRate 0.0562 Epoch: 11 Global Step: 122140 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:30:56,990-Speed 5402.98 samples/sec Loss 4.5894 LearningRate 0.0561 Epoch: 11 Global Step: 122150 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:31:04,578-Speed 5398.54 samples/sec Loss 4.5737 LearningRate 0.0561 Epoch: 11 Global Step: 122160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:31:12,066-Speed 5470.91 samples/sec Loss 4.5634 LearningRate 0.0561 Epoch: 11 Global Step: 122170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:31:19,729-Speed 5346.12 samples/sec Loss 4.5963 LearningRate 0.0561 Epoch: 11 Global Step: 122180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:31:27,261-Speed 5438.81 samples/sec Loss 4.6170 LearningRate 0.0561 Epoch: 11 Global Step: 122190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:31:34,758-Speed 5463.89 samples/sec Loss 4.6250 LearningRate 0.0561 Epoch: 11 Global Step: 122200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:31:42,325-Speed 5414.34 samples/sec Loss 4.5390 LearningRate 0.0561 Epoch: 11 Global Step: 122210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:31:49,835-Speed 5454.58 samples/sec Loss 4.5691 LearningRate 0.0561 Epoch: 11 Global Step: 122220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:31:57,452-Speed 5377.96 samples/sec Loss 4.5663 LearningRate 0.0560 Epoch: 11 Global Step: 122230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:32:04,979-Speed 5442.15 samples/sec Loss 4.5778 LearningRate 0.0560 Epoch: 11 Global Step: 122240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-08 22:32:12,551-Speed 5410.70 samples/sec Loss 4.5761 LearningRate 0.0560 Epoch: 11 Global Step: 122250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:32:20,051-Speed 5461.65 samples/sec Loss 4.5920 LearningRate 0.0560 Epoch: 11 Global Step: 122260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:32:27,585-Speed 5437.36 samples/sec Loss 4.5583 LearningRate 0.0560 Epoch: 11 Global Step: 122270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:32:35,094-Speed 5455.23 samples/sec Loss 4.5914 LearningRate 0.0560 Epoch: 11 Global Step: 122280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:32:42,631-Speed 5435.23 samples/sec Loss 4.5786 LearningRate 0.0560 Epoch: 11 Global Step: 122290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-08 22:32:50,155-Speed 5445.13 samples/sec Loss 4.5549 LearningRate 0.0559 Epoch: 11 Global Step: 122300 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:32:57,721-Speed 5414.13 samples/sec Loss 4.5506 LearningRate 0.0559 Epoch: 11 Global Step: 122310 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:33:05,260-Speed 5434.14 samples/sec Loss 4.6099 LearningRate 0.0559 Epoch: 11 Global Step: 122320 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:33:12,843-Speed 5402.16 samples/sec Loss 4.6391 LearningRate 0.0559 Epoch: 11 Global Step: 122330 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:33:20,347-Speed 5459.20 samples/sec Loss 4.5889 LearningRate 0.0559 Epoch: 11 Global Step: 122340 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:33:27,856-Speed 5454.81 samples/sec Loss 4.5764 LearningRate 0.0559 Epoch: 11 Global Step: 122350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:33:35,412-Speed 5421.85 samples/sec Loss 4.5933 LearningRate 0.0559 Epoch: 11 Global Step: 122360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:33:42,967-Speed 5422.71 samples/sec Loss 4.5835 LearningRate 0.0559 Epoch: 11 Global Step: 122370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:33:50,617-Speed 5354.67 samples/sec Loss 4.5781 LearningRate 0.0558 Epoch: 11 Global Step: 122380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:33:58,131-Speed 5451.84 samples/sec Loss 4.5656 LearningRate 0.0558 Epoch: 11 Global Step: 122390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:34:05,740-Speed 5383.44 samples/sec Loss 4.5842 LearningRate 0.0558 Epoch: 11 Global Step: 122400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:34:13,279-Speed 5434.31 samples/sec Loss 4.5572 LearningRate 0.0558 Epoch: 11 Global Step: 122410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:34:20,838-Speed 5419.35 samples/sec Loss 4.5640 LearningRate 0.0558 Epoch: 11 Global Step: 122420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:34:28,389-Speed 5425.03 samples/sec Loss 4.5481 LearningRate 0.0558 Epoch: 11 Global Step: 122430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:34:35,910-Speed 5446.65 samples/sec Loss 4.5727 LearningRate 0.0558 Epoch: 11 Global Step: 122440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:34:43,411-Speed 5461.69 samples/sec Loss 4.5553 LearningRate 0.0558 Epoch: 11 Global Step: 122450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:34:51,035-Speed 5372.97 samples/sec Loss 4.6201 LearningRate 0.0557 Epoch: 11 Global Step: 122460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:34:58,528-Speed 5467.53 samples/sec Loss 4.5670 LearningRate 0.0557 Epoch: 11 Global Step: 122470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:35:06,078-Speed 5425.00 samples/sec Loss 4.4967 LearningRate 0.0557 Epoch: 11 Global Step: 122480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:35:13,623-Speed 5429.82 samples/sec Loss 4.5083 LearningRate 0.0557 Epoch: 11 Global Step: 122490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:35:21,230-Speed 5385.44 samples/sec Loss 4.5902 LearningRate 0.0557 Epoch: 11 Global Step: 122500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:35:28,856-Speed 5372.01 samples/sec Loss 4.5468 LearningRate 0.0557 Epoch: 11 Global Step: 122510 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:35:36,496-Speed 5361.42 samples/sec Loss 4.6076 LearningRate 0.0557 Epoch: 11 Global Step: 122520 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:35:44,130-Speed 5366.44 samples/sec Loss 4.5703 LearningRate 0.0556 Epoch: 11 Global Step: 122530 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:35:51,706-Speed 5407.03 samples/sec Loss 4.5569 LearningRate 0.0556 Epoch: 11 Global Step: 122540 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:35:59,200-Speed 5466.75 samples/sec Loss 4.5728 LearningRate 0.0556 Epoch: 11 Global Step: 122550 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:36:06,799-Speed 5390.84 samples/sec Loss 4.5806 LearningRate 0.0556 Epoch: 11 Global Step: 122560 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:36:14,361-Speed 5417.00 samples/sec Loss 4.5914 LearningRate 0.0556 Epoch: 11 Global Step: 122570 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:36:21,926-Speed 5414.96 samples/sec Loss 4.5814 LearningRate 0.0556 Epoch: 11 Global Step: 122580 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:36:29,421-Speed 5466.03 samples/sec Loss 4.6062 LearningRate 0.0556 Epoch: 11 Global Step: 122590 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:36:36,923-Speed 5460.61 samples/sec Loss 4.6195 LearningRate 0.0556 Epoch: 11 Global Step: 122600 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:36:44,414-Speed 5468.74 samples/sec Loss 4.5222 LearningRate 0.0555 Epoch: 11 Global Step: 122610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:36:51,884-Speed 5484.09 samples/sec Loss 4.5396 LearningRate 0.0555 Epoch: 11 Global Step: 122620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:36:59,502-Speed 5377.03 samples/sec Loss 4.5735 LearningRate 0.0555 Epoch: 11 Global Step: 122630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:37:06,927-Speed 5517.89 samples/sec Loss 4.5449 LearningRate 0.0555 Epoch: 11 Global Step: 122640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:37:14,494-Speed 5413.07 samples/sec Loss 4.5923 LearningRate 0.0555 Epoch: 11 Global Step: 122650 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:37:21,980-Speed 5472.99 samples/sec Loss 4.5559 LearningRate 0.0555 Epoch: 11 Global Step: 122660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:37:29,466-Speed 5471.96 samples/sec Loss 4.5312 LearningRate 0.0555 Epoch: 11 Global Step: 122670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:37:36,969-Speed 5459.65 samples/sec Loss 4.6031 LearningRate 0.0555 Epoch: 11 Global Step: 122680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:37:44,458-Speed 5469.90 samples/sec Loss 4.5972 LearningRate 0.0554 Epoch: 11 Global Step: 122690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:37:52,009-Speed 5425.91 samples/sec Loss 4.5824 LearningRate 0.0554 Epoch: 11 Global Step: 122700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:37:59,641-Speed 5367.32 samples/sec Loss 4.5450 LearningRate 0.0554 Epoch: 11 Global Step: 122710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:38:07,159-Speed 5449.04 samples/sec Loss 4.6315 LearningRate 0.0554 Epoch: 11 Global Step: 122720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:38:14,710-Speed 5424.41 samples/sec Loss 4.5692 LearningRate 0.0554 Epoch: 11 Global Step: 122730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:38:22,250-Speed 5433.69 samples/sec Loss 4.5267 LearningRate 0.0554 Epoch: 11 Global Step: 122740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:38:29,751-Speed 5460.81 samples/sec Loss 4.6186 LearningRate 0.0554 Epoch: 11 Global Step: 122750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:38:37,212-Speed 5490.59 samples/sec Loss 4.5369 LearningRate 0.0553 Epoch: 11 Global Step: 122760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:38:44,761-Speed 5426.88 samples/sec Loss 4.5084 LearningRate 0.0553 Epoch: 11 Global Step: 122770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:38:52,407-Speed 5357.93 samples/sec Loss 4.5813 LearningRate 0.0553 Epoch: 11 Global Step: 122780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:38:59,851-Speed 5503.43 samples/sec Loss 4.5342 LearningRate 0.0553 Epoch: 11 Global Step: 122790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:39:07,324-Speed 5481.53 samples/sec Loss 4.4856 LearningRate 0.0553 Epoch: 11 Global Step: 122800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:39:15,195-Speed 5204.55 samples/sec Loss 4.5064 LearningRate 0.0553 Epoch: 11 Global Step: 122810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:39:22,680-Speed 5472.99 samples/sec Loss 4.5472 LearningRate 0.0553 Epoch: 11 Global Step: 122820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:39:30,105-Speed 5517.62 samples/sec Loss 4.5371 LearningRate 0.0553 Epoch: 11 Global Step: 122830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:39:37,667-Speed 5417.05 samples/sec Loss 4.5783 LearningRate 0.0552 Epoch: 11 Global Step: 122840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:39:45,209-Speed 5431.63 samples/sec Loss 4.5735 LearningRate 0.0552 Epoch: 11 Global Step: 122850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:39:52,736-Speed 5441.95 samples/sec Loss 4.5582 LearningRate 0.0552 Epoch: 11 Global Step: 122860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:40:00,252-Speed 5450.83 samples/sec Loss 4.5596 LearningRate 0.0552 Epoch: 11 Global Step: 122870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:40:07,829-Speed 5406.49 samples/sec Loss 4.5612 LearningRate 0.0552 Epoch: 11 Global Step: 122880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:40:15,479-Speed 5354.55 samples/sec Loss 4.5850 LearningRate 0.0552 Epoch: 11 Global Step: 122890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:40:23,094-Speed 5379.75 samples/sec Loss 4.5706 LearningRate 0.0552 Epoch: 11 Global Step: 122900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:40:30,615-Speed 5447.07 samples/sec Loss 4.5410 LearningRate 0.0551 Epoch: 11 Global Step: 122910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:40:38,091-Speed 5479.57 samples/sec Loss 4.5691 LearningRate 0.0551 Epoch: 11 Global Step: 122920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:40:45,653-Speed 5416.80 samples/sec Loss 4.5668 LearningRate 0.0551 Epoch: 11 Global Step: 122930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:40:53,179-Speed 5443.08 samples/sec Loss 4.5590 LearningRate 0.0551 Epoch: 11 Global Step: 122940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:41:00,717-Speed 5434.58 samples/sec Loss 4.5235 LearningRate 0.0551 Epoch: 11 Global Step: 122950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:41:08,214-Speed 5464.61 samples/sec Loss 4.6185 LearningRate 0.0551 Epoch: 11 Global Step: 122960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:41:15,679-Speed 5487.30 samples/sec Loss 4.5529 LearningRate 0.0551 Epoch: 11 Global Step: 122970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:41:23,235-Speed 5421.81 samples/sec Loss 4.5125 LearningRate 0.0551 Epoch: 11 Global Step: 122980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:41:30,787-Speed 5425.06 samples/sec Loss 4.5469 LearningRate 0.0550 Epoch: 11 Global Step: 122990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:41:38,296-Speed 5454.88 samples/sec Loss 4.5568 LearningRate 0.0550 Epoch: 11 Global Step: 123000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:41:45,789-Speed 5467.53 samples/sec Loss 4.5659 LearningRate 0.0550 Epoch: 11 Global Step: 123010 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:41:53,427-Speed 5363.01 samples/sec Loss 4.5444 LearningRate 0.0550 Epoch: 11 Global Step: 123020 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:42:00,918-Speed 5468.78 samples/sec Loss 4.5377 LearningRate 0.0550 Epoch: 11 Global Step: 123030 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:42:08,455-Speed 5434.90 samples/sec Loss 4.5649 LearningRate 0.0550 Epoch: 11 Global Step: 123040 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:42:15,973-Speed 5449.34 samples/sec Loss 4.5348 LearningRate 0.0550 Epoch: 11 Global Step: 123050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:42:23,500-Speed 5442.17 samples/sec Loss 4.5110 LearningRate 0.0550 Epoch: 11 Global Step: 123060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:42:31,019-Speed 5448.48 samples/sec Loss 4.5426 LearningRate 0.0549 Epoch: 11 Global Step: 123070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:42:38,517-Speed 5463.09 samples/sec Loss 4.5251 LearningRate 0.0549 Epoch: 11 Global Step: 123080 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:42:45,966-Speed 5499.76 samples/sec Loss 4.5227 LearningRate 0.0549 Epoch: 11 Global Step: 123090 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:42:53,494-Speed 5441.73 samples/sec Loss 4.5787 LearningRate 0.0549 Epoch: 11 Global Step: 123100 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:43:01,081-Speed 5399.19 samples/sec Loss 4.5704 LearningRate 0.0549 Epoch: 11 Global Step: 123110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:43:08,600-Speed 5448.54 samples/sec Loss 4.5011 LearningRate 0.0549 Epoch: 11 Global Step: 123120 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:43:16,136-Speed 5435.49 samples/sec Loss 4.4967 LearningRate 0.0549 Epoch: 11 Global Step: 123130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:43:23,685-Speed 5426.54 samples/sec Loss 4.4924 LearningRate 0.0549 Epoch: 11 Global Step: 123140 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:43:31,188-Speed 5459.98 samples/sec Loss 4.5197 LearningRate 0.0548 Epoch: 11 Global Step: 123150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:43:38,624-Speed 5509.42 samples/sec Loss 4.5442 LearningRate 0.0548 Epoch: 11 Global Step: 123160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:43:46,110-Speed 5472.31 samples/sec Loss 4.4377 LearningRate 0.0548 Epoch: 11 Global Step: 123170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:43:53,611-Speed 5461.11 samples/sec Loss 4.5245 LearningRate 0.0548 Epoch: 11 Global Step: 123180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:44:01,049-Speed 5507.29 samples/sec Loss 4.5587 LearningRate 0.0548 Epoch: 11 Global Step: 123190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:44:08,513-Speed 5488.72 samples/sec Loss 4.5364 LearningRate 0.0548 Epoch: 11 Global Step: 123200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:44:15,972-Speed 5491.89 samples/sec Loss 4.5413 LearningRate 0.0548 Epoch: 11 Global Step: 123210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:44:23,448-Speed 5479.72 samples/sec Loss 4.5808 LearningRate 0.0547 Epoch: 11 Global Step: 123220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:44:30,910-Speed 5489.75 samples/sec Loss 4.4949 LearningRate 0.0547 Epoch: 11 Global Step: 123230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:44:38,497-Speed 5399.62 samples/sec Loss 4.5233 LearningRate 0.0547 Epoch: 11 Global Step: 123240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:44:45,996-Speed 5463.17 samples/sec Loss 4.5229 LearningRate 0.0547 Epoch: 11 Global Step: 123250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:44:53,534-Speed 5434.15 samples/sec Loss 4.5454 LearningRate 0.0547 Epoch: 11 Global Step: 123260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:45:01,022-Speed 5470.52 samples/sec Loss 4.5232 LearningRate 0.0547 Epoch: 11 Global Step: 123270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:45:08,558-Speed 5435.71 samples/sec Loss 4.4851 LearningRate 0.0547 Epoch: 11 Global Step: 123280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:45:16,081-Speed 5445.86 samples/sec Loss 4.5508 LearningRate 0.0547 Epoch: 11 Global Step: 123290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:45:23,549-Speed 5485.24 samples/sec Loss 4.5763 LearningRate 0.0546 Epoch: 11 Global Step: 123300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:45:31,027-Speed 5477.83 samples/sec Loss 4.5028 LearningRate 0.0546 Epoch: 11 Global Step: 123310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:45:38,585-Speed 5420.75 samples/sec Loss 4.5042 LearningRate 0.0546 Epoch: 11 Global Step: 123320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:45:46,079-Speed 5466.57 samples/sec Loss 4.5013 LearningRate 0.0546 Epoch: 11 Global Step: 123330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:45:53,546-Speed 5485.69 samples/sec Loss 4.4910 LearningRate 0.0546 Epoch: 11 Global Step: 123340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:46:00,998-Speed 5497.33 samples/sec Loss 4.4902 LearningRate 0.0546 Epoch: 11 Global Step: 123350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:46:08,482-Speed 5473.52 samples/sec Loss 4.5570 LearningRate 0.0546 Epoch: 11 Global Step: 123360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:46:15,970-Speed 5470.80 samples/sec Loss 4.5509 LearningRate 0.0546 Epoch: 11 Global Step: 123370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:46:23,472-Speed 5460.64 samples/sec Loss 4.5522 LearningRate 0.0545 Epoch: 11 Global Step: 123380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:46:31,016-Speed 5430.33 samples/sec Loss 4.5065 LearningRate 0.0545 Epoch: 11 Global Step: 123390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:46:38,590-Speed 5408.37 samples/sec Loss 4.5384 LearningRate 0.0545 Epoch: 11 Global Step: 123400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:46:46,068-Speed 5478.48 samples/sec Loss 4.5214 LearningRate 0.0545 Epoch: 11 Global Step: 123410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:46:53,498-Speed 5513.87 samples/sec Loss 4.5116 LearningRate 0.0545 Epoch: 11 Global Step: 123420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:47:00,950-Speed 5496.65 samples/sec Loss 4.5468 LearningRate 0.0545 Epoch: 11 Global Step: 123430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:47:08,453-Speed 5460.00 samples/sec Loss 4.5409 LearningRate 0.0545 Epoch: 11 Global Step: 123440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:47:15,965-Speed 5453.57 samples/sec Loss 4.5079 LearningRate 0.0544 Epoch: 11 Global Step: 123450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:47:23,443-Speed 5478.06 samples/sec Loss 4.4679 LearningRate 0.0544 Epoch: 11 Global Step: 123460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:47:30,954-Speed 5453.36 samples/sec Loss 4.5218 LearningRate 0.0544 Epoch: 11 Global Step: 123470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:47:38,580-Speed 5372.01 samples/sec Loss 4.5525 LearningRate 0.0544 Epoch: 11 Global Step: 123480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:47:46,172-Speed 5395.80 samples/sec Loss 4.5630 LearningRate 0.0544 Epoch: 11 Global Step: 123490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:47:53,677-Speed 5458.94 samples/sec Loss 4.5433 LearningRate 0.0544 Epoch: 11 Global Step: 123500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:48:01,202-Speed 5443.68 samples/sec Loss 4.5171 LearningRate 0.0544 Epoch: 11 Global Step: 123510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:48:08,698-Speed 5464.54 samples/sec Loss 4.5261 LearningRate 0.0544 Epoch: 11 Global Step: 123520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:48:16,192-Speed 5466.53 samples/sec Loss 4.5039 LearningRate 0.0543 Epoch: 11 Global Step: 123530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:48:24,456-Speed 4957.15 samples/sec Loss 4.5465 LearningRate 0.0543 Epoch: 11 Global Step: 123540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:48:31,972-Speed 5450.97 samples/sec Loss 4.4850 LearningRate 0.0543 Epoch: 11 Global Step: 123550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:48:39,503-Speed 5438.84 samples/sec Loss 4.5687 LearningRate 0.0543 Epoch: 11 Global Step: 123560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:48:47,043-Speed 5433.78 samples/sec Loss 4.4812 LearningRate 0.0543 Epoch: 11 Global Step: 123570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:48:54,549-Speed 5457.20 samples/sec Loss 4.5674 LearningRate 0.0543 Epoch: 11 Global Step: 123580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:49:02,044-Speed 5465.97 samples/sec Loss 4.4995 LearningRate 0.0543 Epoch: 11 Global Step: 123590 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:49:09,694-Speed 5354.98 samples/sec Loss 4.5180 LearningRate 0.0543 Epoch: 11 Global Step: 123600 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:49:17,288-Speed 5394.56 samples/sec Loss 4.4857 LearningRate 0.0542 Epoch: 11 Global Step: 123610 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:49:24,833-Speed 5429.34 samples/sec Loss 4.5401 LearningRate 0.0542 Epoch: 11 Global Step: 123620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:49:32,434-Speed 5389.59 samples/sec Loss 4.5097 LearningRate 0.0542 Epoch: 11 Global Step: 123630 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:49:39,944-Speed 5454.26 samples/sec Loss 4.5380 LearningRate 0.0542 Epoch: 11 Global Step: 123640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:49:47,467-Speed 5445.50 samples/sec Loss 4.5250 LearningRate 0.0542 Epoch: 11 Global Step: 123650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:49:54,952-Speed 5472.71 samples/sec Loss 4.5170 LearningRate 0.0542 Epoch: 11 Global Step: 123660 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:50:02,390-Speed 5507.96 samples/sec Loss 4.5329 LearningRate 0.0542 Epoch: 11 Global Step: 123670 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:50:09,928-Speed 5434.40 samples/sec Loss 4.4704 LearningRate 0.0541 Epoch: 11 Global Step: 123680 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:50:17,421-Speed 5467.39 samples/sec Loss 4.5118 LearningRate 0.0541 Epoch: 11 Global Step: 123690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:50:24,907-Speed 5473.96 samples/sec Loss 4.5359 LearningRate 0.0541 Epoch: 11 Global Step: 123700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:50:32,525-Speed 5377.45 samples/sec Loss 4.5359 LearningRate 0.0541 Epoch: 11 Global Step: 123710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:50:40,055-Speed 5440.21 samples/sec Loss 4.5243 LearningRate 0.0541 Epoch: 11 Global Step: 123720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:50:47,609-Speed 5422.05 samples/sec Loss 4.4821 LearningRate 0.0541 Epoch: 11 Global Step: 123730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:50:55,069-Speed 5492.01 samples/sec Loss 4.4764 LearningRate 0.0541 Epoch: 11 Global Step: 123740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:51:02,592-Speed 5445.37 samples/sec Loss 4.5365 LearningRate 0.0541 Epoch: 11 Global Step: 123750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:51:10,242-Speed 5355.21 samples/sec Loss 4.5006 LearningRate 0.0540 Epoch: 11 Global Step: 123760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:51:17,740-Speed 5463.19 samples/sec Loss 4.4725 LearningRate 0.0540 Epoch: 11 Global Step: 123770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:51:25,181-Speed 5505.69 samples/sec Loss 4.5214 LearningRate 0.0540 Epoch: 11 Global Step: 123780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:51:32,642-Speed 5490.86 samples/sec Loss 4.4925 LearningRate 0.0540 Epoch: 11 Global Step: 123790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:51:40,205-Speed 5416.16 samples/sec Loss 4.5163 LearningRate 0.0540 Epoch: 11 Global Step: 123800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:51:47,676-Speed 5482.94 samples/sec Loss 4.4735 LearningRate 0.0540 Epoch: 11 Global Step: 123810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:51:55,165-Speed 5470.67 samples/sec Loss 4.5026 LearningRate 0.0540 Epoch: 11 Global Step: 123820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:52:02,578-Speed 5526.17 samples/sec Loss 4.4971 LearningRate 0.0540 Epoch: 11 Global Step: 123830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:52:10,148-Speed 5411.52 samples/sec Loss 4.4979 LearningRate 0.0539 Epoch: 11 Global Step: 123840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:52:17,580-Speed 5512.12 samples/sec Loss 4.4423 LearningRate 0.0539 Epoch: 11 Global Step: 123850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:52:25,177-Speed 5392.33 samples/sec Loss 4.5078 LearningRate 0.0539 Epoch: 11 Global Step: 123860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:52:32,779-Speed 5388.58 samples/sec Loss 4.5245 LearningRate 0.0539 Epoch: 11 Global Step: 123870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:52:40,332-Speed 5423.66 samples/sec Loss 4.4650 LearningRate 0.0539 Epoch: 11 Global Step: 123880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:52:47,773-Speed 5505.81 samples/sec Loss 4.5033 LearningRate 0.0539 Epoch: 11 Global Step: 123890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:52:55,314-Speed 5431.89 samples/sec Loss 4.5277 LearningRate 0.0539 Epoch: 11 Global Step: 123900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:53:02,926-Speed 5381.76 samples/sec Loss 4.4842 LearningRate 0.0539 Epoch: 11 Global Step: 123910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:53:10,452-Speed 5443.31 samples/sec Loss 4.5194 LearningRate 0.0538 Epoch: 11 Global Step: 123920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:53:17,964-Speed 5453.69 samples/sec Loss 4.4780 LearningRate 0.0538 Epoch: 11 Global Step: 123930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:53:25,552-Speed 5398.25 samples/sec Loss 4.4936 LearningRate 0.0538 Epoch: 11 Global Step: 123940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:53:32,991-Speed 5507.39 samples/sec Loss 4.4663 LearningRate 0.0538 Epoch: 11 Global Step: 123950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:53:40,398-Speed 5530.79 samples/sec Loss 4.5087 LearningRate 0.0538 Epoch: 11 Global Step: 123960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:53:47,944-Speed 5428.49 samples/sec Loss 4.5463 LearningRate 0.0538 Epoch: 11 Global Step: 123970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:53:55,401-Speed 5492.98 samples/sec Loss 4.5007 LearningRate 0.0538 Epoch: 11 Global Step: 123980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:54:03,016-Speed 5380.31 samples/sec Loss 4.5592 LearningRate 0.0537 Epoch: 11 Global Step: 123990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:54:10,508-Speed 5467.79 samples/sec Loss 4.5316 LearningRate 0.0537 Epoch: 11 Global Step: 124000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:54:54,520-[lfw][124000]XNorm: 23.200261 Training: 2022-01-08 22:54:54,521-[lfw][124000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-08 22:54:54,521-[lfw][124000]Accuracy-Highest: 0.99817 Training: 2022-01-08 22:55:45,668-[cfp_fp][124000]XNorm: 21.354716 Training: 2022-01-08 22:55:45,668-[cfp_fp][124000]Accuracy-Flip: 0.98929+-0.00551 Training: 2022-01-08 22:55:45,669-[cfp_fp][124000]Accuracy-Highest: 0.99057 Training: 2022-01-08 22:56:29,556-[agedb_30][124000]XNorm: 23.171578 Training: 2022-01-08 22:56:29,557-[agedb_30][124000]Accuracy-Flip: 0.97917+-0.00790 Training: 2022-01-08 22:56:29,557-[agedb_30][124000]Accuracy-Highest: 0.98000 Training: 2022-01-08 22:56:37,070-Speed 279.47 samples/sec Loss 4.5409 LearningRate 0.0537 Epoch: 11 Global Step: 124010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:56:44,650-Speed 5403.90 samples/sec Loss 4.5612 LearningRate 0.0537 Epoch: 11 Global Step: 124020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:56:52,079-Speed 5514.34 samples/sec Loss 4.4831 LearningRate 0.0537 Epoch: 11 Global Step: 124030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:56:59,564-Speed 5472.74 samples/sec Loss 4.4691 LearningRate 0.0537 Epoch: 11 Global Step: 124040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:57:07,112-Speed 5428.06 samples/sec Loss 4.5058 LearningRate 0.0537 Epoch: 11 Global Step: 124050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:57:14,550-Speed 5507.88 samples/sec Loss 4.4403 LearningRate 0.0537 Epoch: 11 Global Step: 124060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:57:21,980-Speed 5512.69 samples/sec Loss 4.4789 LearningRate 0.0536 Epoch: 11 Global Step: 124070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:57:29,428-Speed 5500.91 samples/sec Loss 4.5124 LearningRate 0.0536 Epoch: 11 Global Step: 124080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:57:37,087-Speed 5348.91 samples/sec Loss 4.4799 LearningRate 0.0536 Epoch: 11 Global Step: 124090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:57:44,652-Speed 5414.38 samples/sec Loss 4.5102 LearningRate 0.0536 Epoch: 11 Global Step: 124100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:57:52,132-Speed 5476.88 samples/sec Loss 4.5076 LearningRate 0.0536 Epoch: 11 Global Step: 124110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:57:59,760-Speed 5370.17 samples/sec Loss 4.4724 LearningRate 0.0536 Epoch: 11 Global Step: 124120 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:58:07,329-Speed 5412.78 samples/sec Loss 4.4935 LearningRate 0.0536 Epoch: 11 Global Step: 124130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:58:14,942-Speed 5380.72 samples/sec Loss 4.4544 LearningRate 0.0536 Epoch: 11 Global Step: 124140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 22:58:22,427-Speed 5473.24 samples/sec Loss 4.4592 LearningRate 0.0535 Epoch: 11 Global Step: 124150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:58:29,949-Speed 5445.79 samples/sec Loss 4.4792 LearningRate 0.0535 Epoch: 11 Global Step: 124160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:58:37,532-Speed 5402.32 samples/sec Loss 4.4665 LearningRate 0.0535 Epoch: 11 Global Step: 124170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:58:45,067-Speed 5436.83 samples/sec Loss 4.4653 LearningRate 0.0535 Epoch: 11 Global Step: 124180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:58:52,594-Speed 5442.27 samples/sec Loss 4.4907 LearningRate 0.0535 Epoch: 11 Global Step: 124190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 22:59:00,014-Speed 5520.71 samples/sec Loss 4.4841 LearningRate 0.0535 Epoch: 11 Global Step: 124200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:59:07,539-Speed 5444.48 samples/sec Loss 4.4741 LearningRate 0.0535 Epoch: 11 Global Step: 124210 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:59:14,986-Speed 5500.73 samples/sec Loss 4.4986 LearningRate 0.0535 Epoch: 11 Global Step: 124220 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:59:22,441-Speed 5495.22 samples/sec Loss 4.4993 LearningRate 0.0534 Epoch: 11 Global Step: 124230 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:59:29,983-Speed 5431.35 samples/sec Loss 4.4908 LearningRate 0.0534 Epoch: 11 Global Step: 124240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:59:37,496-Speed 5452.55 samples/sec Loss 4.5500 LearningRate 0.0534 Epoch: 11 Global Step: 124250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:59:45,008-Speed 5453.73 samples/sec Loss 4.4935 LearningRate 0.0534 Epoch: 11 Global Step: 124260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:59:52,505-Speed 5464.01 samples/sec Loss 4.4653 LearningRate 0.0534 Epoch: 11 Global Step: 124270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 22:59:59,950-Speed 5502.17 samples/sec Loss 4.4420 LearningRate 0.0534 Epoch: 11 Global Step: 124280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:00:07,470-Speed 5448.36 samples/sec Loss 4.4579 LearningRate 0.0534 Epoch: 11 Global Step: 124290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:00:15,072-Speed 5388.25 samples/sec Loss 4.4628 LearningRate 0.0533 Epoch: 11 Global Step: 124300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:00:22,508-Speed 5509.62 samples/sec Loss 4.4481 LearningRate 0.0533 Epoch: 11 Global Step: 124310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:00:30,025-Speed 5449.49 samples/sec Loss 4.4507 LearningRate 0.0533 Epoch: 11 Global Step: 124320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:00:37,471-Speed 5501.88 samples/sec Loss 4.5141 LearningRate 0.0533 Epoch: 11 Global Step: 124330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:00:44,956-Speed 5473.31 samples/sec Loss 4.4650 LearningRate 0.0533 Epoch: 11 Global Step: 124340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:00:52,395-Speed 5506.31 samples/sec Loss 4.4653 LearningRate 0.0533 Epoch: 11 Global Step: 124350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:00:59,934-Speed 5434.06 samples/sec Loss 4.5533 LearningRate 0.0533 Epoch: 11 Global Step: 124360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:01:07,376-Speed 5504.87 samples/sec Loss 4.5161 LearningRate 0.0533 Epoch: 11 Global Step: 124370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:01:14,837-Speed 5490.61 samples/sec Loss 4.4748 LearningRate 0.0532 Epoch: 11 Global Step: 124380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:01:22,300-Speed 5489.16 samples/sec Loss 4.4954 LearningRate 0.0532 Epoch: 11 Global Step: 124390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:01:29,791-Speed 5468.18 samples/sec Loss 4.4591 LearningRate 0.0532 Epoch: 11 Global Step: 124400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:01:37,241-Speed 5498.48 samples/sec Loss 4.4883 LearningRate 0.0532 Epoch: 11 Global Step: 124410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:01:44,750-Speed 5456.14 samples/sec Loss 4.4818 LearningRate 0.0532 Epoch: 11 Global Step: 124420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:01:52,234-Speed 5473.88 samples/sec Loss 4.4659 LearningRate 0.0532 Epoch: 11 Global Step: 124430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:02:14,463-Speed 1842.69 samples/sec Loss 4.5146 LearningRate 0.0532 Epoch: 12 Global Step: 124440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:02:21,923-Speed 5491.34 samples/sec Loss 4.4836 LearningRate 0.0532 Epoch: 12 Global Step: 124450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:02:29,427-Speed 5459.66 samples/sec Loss 4.5169 LearningRate 0.0531 Epoch: 12 Global Step: 124460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:02:36,966-Speed 5433.66 samples/sec Loss 4.4191 LearningRate 0.0531 Epoch: 12 Global Step: 124470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:02:44,471-Speed 5458.71 samples/sec Loss 4.4290 LearningRate 0.0531 Epoch: 12 Global Step: 124480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:02:51,987-Speed 5450.20 samples/sec Loss 4.4471 LearningRate 0.0531 Epoch: 12 Global Step: 124490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:02:59,517-Speed 5440.29 samples/sec Loss 4.4524 LearningRate 0.0531 Epoch: 12 Global Step: 124500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:03:07,002-Speed 5472.81 samples/sec Loss 4.4882 LearningRate 0.0531 Epoch: 12 Global Step: 124510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:03:14,500-Speed 5464.13 samples/sec Loss 4.5026 LearningRate 0.0531 Epoch: 12 Global Step: 124520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:03:21,982-Speed 5474.58 samples/sec Loss 4.4929 LearningRate 0.0531 Epoch: 12 Global Step: 124530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:03:29,608-Speed 5371.90 samples/sec Loss 4.4954 LearningRate 0.0530 Epoch: 12 Global Step: 124540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:03:37,038-Speed 5514.14 samples/sec Loss 4.4623 LearningRate 0.0530 Epoch: 12 Global Step: 124550 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:03:44,503-Speed 5486.83 samples/sec Loss 4.4283 LearningRate 0.0530 Epoch: 12 Global Step: 124560 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:03:51,982-Speed 5477.57 samples/sec Loss 4.4651 LearningRate 0.0530 Epoch: 12 Global Step: 124570 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:03:59,469-Speed 5472.25 samples/sec Loss 4.4542 LearningRate 0.0530 Epoch: 12 Global Step: 124580 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:04:06,906-Speed 5508.10 samples/sec Loss 4.4565 LearningRate 0.0530 Epoch: 12 Global Step: 124590 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:04:14,498-Speed 5395.72 samples/sec Loss 4.4086 LearningRate 0.0530 Epoch: 12 Global Step: 124600 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:04:21,958-Speed 5491.67 samples/sec Loss 4.4562 LearningRate 0.0530 Epoch: 12 Global Step: 124610 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:04:29,432-Speed 5481.36 samples/sec Loss 4.3924 LearningRate 0.0529 Epoch: 12 Global Step: 124620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:04:36,887-Speed 5495.19 samples/sec Loss 4.4332 LearningRate 0.0529 Epoch: 12 Global Step: 124630 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:04:44,462-Speed 5407.61 samples/sec Loss 4.4106 LearningRate 0.0529 Epoch: 12 Global Step: 124640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:04:52,194-Speed 5297.63 samples/sec Loss 4.4065 LearningRate 0.0529 Epoch: 12 Global Step: 124650 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:04:59,883-Speed 5328.75 samples/sec Loss 4.4204 LearningRate 0.0529 Epoch: 12 Global Step: 124660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:05:07,616-Speed 5297.09 samples/sec Loss 4.3990 LearningRate 0.0529 Epoch: 12 Global Step: 124670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:05:15,294-Speed 5335.22 samples/sec Loss 4.3834 LearningRate 0.0529 Epoch: 12 Global Step: 124680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:05:22,994-Speed 5319.83 samples/sec Loss 4.4083 LearningRate 0.0529 Epoch: 12 Global Step: 124690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:05:30,684-Speed 5327.34 samples/sec Loss 4.4900 LearningRate 0.0528 Epoch: 12 Global Step: 124700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:05:38,162-Speed 5478.38 samples/sec Loss 4.4523 LearningRate 0.0528 Epoch: 12 Global Step: 124710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:05:45,686-Speed 5444.42 samples/sec Loss 4.4528 LearningRate 0.0528 Epoch: 12 Global Step: 124720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:05:53,215-Speed 5441.08 samples/sec Loss 4.4492 LearningRate 0.0528 Epoch: 12 Global Step: 124730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:06:00,731-Speed 5450.65 samples/sec Loss 4.3566 LearningRate 0.0528 Epoch: 12 Global Step: 124740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:06:08,283-Speed 5424.37 samples/sec Loss 4.4774 LearningRate 0.0528 Epoch: 12 Global Step: 124750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:06:15,839-Speed 5421.16 samples/sec Loss 4.4675 LearningRate 0.0528 Epoch: 12 Global Step: 124760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:06:23,379-Speed 5433.00 samples/sec Loss 4.4054 LearningRate 0.0527 Epoch: 12 Global Step: 124770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:06:30,862-Speed 5475.15 samples/sec Loss 4.4043 LearningRate 0.0527 Epoch: 12 Global Step: 124780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:06:38,403-Speed 5432.11 samples/sec Loss 4.4408 LearningRate 0.0527 Epoch: 12 Global Step: 124790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:06:45,911-Speed 5456.56 samples/sec Loss 4.4206 LearningRate 0.0527 Epoch: 12 Global Step: 124800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:06:53,418-Speed 5456.45 samples/sec Loss 4.4896 LearningRate 0.0527 Epoch: 12 Global Step: 124810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:07:00,914-Speed 5465.08 samples/sec Loss 4.4908 LearningRate 0.0527 Epoch: 12 Global Step: 124820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:07:08,427-Speed 5453.07 samples/sec Loss 4.4479 LearningRate 0.0527 Epoch: 12 Global Step: 124830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:07:15,944-Speed 5449.14 samples/sec Loss 4.4710 LearningRate 0.0527 Epoch: 12 Global Step: 124840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:07:23,514-Speed 5411.84 samples/sec Loss 4.4216 LearningRate 0.0526 Epoch: 12 Global Step: 124850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:07:31,068-Speed 5422.99 samples/sec Loss 4.4036 LearningRate 0.0526 Epoch: 12 Global Step: 124860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:07:38,606-Speed 5434.41 samples/sec Loss 4.4533 LearningRate 0.0526 Epoch: 12 Global Step: 124870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:07:46,265-Speed 5348.54 samples/sec Loss 4.4809 LearningRate 0.0526 Epoch: 12 Global Step: 124880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:07:53,906-Speed 5361.14 samples/sec Loss 4.4800 LearningRate 0.0526 Epoch: 12 Global Step: 124890 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:08:01,434-Speed 5442.04 samples/sec Loss 4.4459 LearningRate 0.0526 Epoch: 12 Global Step: 124900 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:08:08,912-Speed 5478.07 samples/sec Loss 4.4451 LearningRate 0.0526 Epoch: 12 Global Step: 124910 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:08:16,779-Speed 5206.81 samples/sec Loss 4.4545 LearningRate 0.0526 Epoch: 12 Global Step: 124920 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:08:24,292-Speed 5452.58 samples/sec Loss 4.4602 LearningRate 0.0525 Epoch: 12 Global Step: 124930 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:08:31,916-Speed 5372.92 samples/sec Loss 4.4754 LearningRate 0.0525 Epoch: 12 Global Step: 124940 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:08:39,458-Speed 5432.29 samples/sec Loss 4.4779 LearningRate 0.0525 Epoch: 12 Global Step: 124950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:08:46,965-Speed 5457.04 samples/sec Loss 4.4365 LearningRate 0.0525 Epoch: 12 Global Step: 124960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:08:54,521-Speed 5421.49 samples/sec Loss 4.4430 LearningRate 0.0525 Epoch: 12 Global Step: 124970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:09:02,003-Speed 5474.92 samples/sec Loss 4.4566 LearningRate 0.0525 Epoch: 12 Global Step: 124980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:09:09,541-Speed 5434.46 samples/sec Loss 4.4684 LearningRate 0.0525 Epoch: 12 Global Step: 124990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:09:17,019-Speed 5478.02 samples/sec Loss 4.4268 LearningRate 0.0525 Epoch: 12 Global Step: 125000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:09:24,519-Speed 5462.51 samples/sec Loss 4.4547 LearningRate 0.0524 Epoch: 12 Global Step: 125010 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:09:32,008-Speed 5469.54 samples/sec Loss 4.4632 LearningRate 0.0524 Epoch: 12 Global Step: 125020 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:09:39,605-Speed 5392.87 samples/sec Loss 4.4242 LearningRate 0.0524 Epoch: 12 Global Step: 125030 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:09:47,130-Speed 5443.50 samples/sec Loss 4.4508 LearningRate 0.0524 Epoch: 12 Global Step: 125040 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:09:54,741-Speed 5382.22 samples/sec Loss 4.4578 LearningRate 0.0524 Epoch: 12 Global Step: 125050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:10:02,239-Speed 5463.82 samples/sec Loss 4.4382 LearningRate 0.0524 Epoch: 12 Global Step: 125060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:10:09,686-Speed 5500.90 samples/sec Loss 4.4234 LearningRate 0.0524 Epoch: 12 Global Step: 125070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:10:17,105-Speed 5521.41 samples/sec Loss 4.4309 LearningRate 0.0524 Epoch: 12 Global Step: 125080 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:10:24,594-Speed 5470.40 samples/sec Loss 4.4061 LearningRate 0.0523 Epoch: 12 Global Step: 125090 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:10:32,057-Speed 5489.07 samples/sec Loss 4.3679 LearningRate 0.0523 Epoch: 12 Global Step: 125100 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:10:39,505-Speed 5500.08 samples/sec Loss 4.4769 LearningRate 0.0523 Epoch: 12 Global Step: 125110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:10:46,948-Speed 5504.13 samples/sec Loss 4.4324 LearningRate 0.0523 Epoch: 12 Global Step: 125120 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:10:54,419-Speed 5482.74 samples/sec Loss 4.4122 LearningRate 0.0523 Epoch: 12 Global Step: 125130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:11:02,004-Speed 5401.29 samples/sec Loss 4.4255 LearningRate 0.0523 Epoch: 12 Global Step: 125140 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:11:09,506-Speed 5460.64 samples/sec Loss 4.4717 LearningRate 0.0523 Epoch: 12 Global Step: 125150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:11:16,966-Speed 5491.37 samples/sec Loss 4.3951 LearningRate 0.0523 Epoch: 12 Global Step: 125160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:11:24,454-Speed 5470.62 samples/sec Loss 4.4463 LearningRate 0.0522 Epoch: 12 Global Step: 125170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:11:31,896-Speed 5504.89 samples/sec Loss 4.4590 LearningRate 0.0522 Epoch: 12 Global Step: 125180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:11:39,404-Speed 5455.76 samples/sec Loss 4.4629 LearningRate 0.0522 Epoch: 12 Global Step: 125190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:11:46,900-Speed 5465.50 samples/sec Loss 4.4192 LearningRate 0.0522 Epoch: 12 Global Step: 125200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:11:54,361-Speed 5490.32 samples/sec Loss 4.3959 LearningRate 0.0522 Epoch: 12 Global Step: 125210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:12:01,803-Speed 5504.87 samples/sec Loss 4.3880 LearningRate 0.0522 Epoch: 12 Global Step: 125220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:12:09,380-Speed 5406.76 samples/sec Loss 4.3862 LearningRate 0.0522 Epoch: 12 Global Step: 125230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:12:16,824-Speed 5503.26 samples/sec Loss 4.4123 LearningRate 0.0521 Epoch: 12 Global Step: 125240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:12:24,307-Speed 5473.88 samples/sec Loss 4.4224 LearningRate 0.0521 Epoch: 12 Global Step: 125250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:12:31,767-Speed 5491.36 samples/sec Loss 4.4253 LearningRate 0.0521 Epoch: 12 Global Step: 125260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:12:39,263-Speed 5465.08 samples/sec Loss 4.3935 LearningRate 0.0521 Epoch: 12 Global Step: 125270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:12:46,738-Speed 5480.54 samples/sec Loss 4.4594 LearningRate 0.0521 Epoch: 12 Global Step: 125280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:12:54,238-Speed 5461.67 samples/sec Loss 4.4324 LearningRate 0.0521 Epoch: 12 Global Step: 125290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:13:01,717-Speed 5478.10 samples/sec Loss 4.4407 LearningRate 0.0521 Epoch: 12 Global Step: 125300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:13:09,225-Speed 5455.74 samples/sec Loss 4.4737 LearningRate 0.0521 Epoch: 12 Global Step: 125310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:13:16,727-Speed 5460.59 samples/sec Loss 4.4235 LearningRate 0.0520 Epoch: 12 Global Step: 125320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:13:24,297-Speed 5411.74 samples/sec Loss 4.4331 LearningRate 0.0520 Epoch: 12 Global Step: 125330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:13:31,724-Speed 5515.70 samples/sec Loss 4.3792 LearningRate 0.0520 Epoch: 12 Global Step: 125340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:13:39,217-Speed 5467.54 samples/sec Loss 4.4340 LearningRate 0.0520 Epoch: 12 Global Step: 125350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:13:46,732-Speed 5451.42 samples/sec Loss 4.4039 LearningRate 0.0520 Epoch: 12 Global Step: 125360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:13:54,144-Speed 5526.20 samples/sec Loss 4.4159 LearningRate 0.0520 Epoch: 12 Global Step: 125370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:14:01,627-Speed 5474.76 samples/sec Loss 4.3996 LearningRate 0.0520 Epoch: 12 Global Step: 125380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:14:09,111-Speed 5473.44 samples/sec Loss 4.4115 LearningRate 0.0520 Epoch: 12 Global Step: 125390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:14:16,601-Speed 5469.78 samples/sec Loss 4.4080 LearningRate 0.0519 Epoch: 12 Global Step: 125400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:14:23,660-Speed 5803.22 samples/sec Loss 4.4012 LearningRate 0.0519 Epoch: 12 Global Step: 125410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:14:30,572-Speed 5926.63 samples/sec Loss 4.4336 LearningRate 0.0519 Epoch: 12 Global Step: 125420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:14:37,532-Speed 5885.34 samples/sec Loss 4.4624 LearningRate 0.0519 Epoch: 12 Global Step: 125430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:14:44,505-Speed 5875.54 samples/sec Loss 4.3711 LearningRate 0.0519 Epoch: 12 Global Step: 125440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:14:51,869-Speed 5563.13 samples/sec Loss 4.4424 LearningRate 0.0519 Epoch: 12 Global Step: 125450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:14:59,447-Speed 5405.24 samples/sec Loss 4.4187 LearningRate 0.0519 Epoch: 12 Global Step: 125460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:15:06,949-Speed 5460.96 samples/sec Loss 4.3771 LearningRate 0.0519 Epoch: 12 Global Step: 125470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:15:14,488-Speed 5434.04 samples/sec Loss 4.3527 LearningRate 0.0518 Epoch: 12 Global Step: 125480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:15:22,039-Speed 5424.57 samples/sec Loss 4.4490 LearningRate 0.0518 Epoch: 12 Global Step: 125490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:15:29,604-Speed 5415.69 samples/sec Loss 4.3816 LearningRate 0.0518 Epoch: 12 Global Step: 125500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:15:37,227-Speed 5373.71 samples/sec Loss 4.4336 LearningRate 0.0518 Epoch: 12 Global Step: 125510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:15:44,775-Speed 5427.34 samples/sec Loss 4.4402 LearningRate 0.0518 Epoch: 12 Global Step: 125520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:15:52,307-Speed 5438.28 samples/sec Loss 4.4094 LearningRate 0.0518 Epoch: 12 Global Step: 125530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:15:59,763-Speed 5495.02 samples/sec Loss 4.4601 LearningRate 0.0518 Epoch: 12 Global Step: 125540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:16:07,515-Speed 5283.77 samples/sec Loss 4.4223 LearningRate 0.0518 Epoch: 12 Global Step: 125550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:16:14,972-Speed 5494.06 samples/sec Loss 4.3840 LearningRate 0.0517 Epoch: 12 Global Step: 125560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:16:22,494-Speed 5445.84 samples/sec Loss 4.4157 LearningRate 0.0517 Epoch: 12 Global Step: 125570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:16:30,072-Speed 5406.08 samples/sec Loss 4.4282 LearningRate 0.0517 Epoch: 12 Global Step: 125580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:16:37,605-Speed 5438.37 samples/sec Loss 4.4207 LearningRate 0.0517 Epoch: 12 Global Step: 125590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:16:45,127-Speed 5445.60 samples/sec Loss 4.4152 LearningRate 0.0517 Epoch: 12 Global Step: 125600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:16:52,751-Speed 5373.83 samples/sec Loss 4.4006 LearningRate 0.0517 Epoch: 12 Global Step: 125610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:17:00,411-Speed 5347.42 samples/sec Loss 4.4197 LearningRate 0.0517 Epoch: 12 Global Step: 125620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:17:07,988-Speed 5406.79 samples/sec Loss 4.4462 LearningRate 0.0517 Epoch: 12 Global Step: 125630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:17:15,542-Speed 5422.75 samples/sec Loss 4.4208 LearningRate 0.0516 Epoch: 12 Global Step: 125640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:17:23,061-Speed 5448.76 samples/sec Loss 4.3638 LearningRate 0.0516 Epoch: 12 Global Step: 125650 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:17:30,545-Speed 5473.69 samples/sec Loss 4.3602 LearningRate 0.0516 Epoch: 12 Global Step: 125660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:17:37,979-Speed 5510.09 samples/sec Loss 4.3754 LearningRate 0.0516 Epoch: 12 Global Step: 125670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:17:45,487-Speed 5456.98 samples/sec Loss 4.4544 LearningRate 0.0516 Epoch: 12 Global Step: 125680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:17:53,024-Speed 5434.82 samples/sec Loss 4.4085 LearningRate 0.0516 Epoch: 12 Global Step: 125690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:18:00,499-Speed 5480.34 samples/sec Loss 4.3356 LearningRate 0.0516 Epoch: 12 Global Step: 125700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:18:08,009-Speed 5454.71 samples/sec Loss 4.3533 LearningRate 0.0516 Epoch: 12 Global Step: 125710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:18:15,556-Speed 5427.92 samples/sec Loss 4.4273 LearningRate 0.0515 Epoch: 12 Global Step: 125720 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:18:23,190-Speed 5366.31 samples/sec Loss 4.3632 LearningRate 0.0515 Epoch: 12 Global Step: 125730 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:18:30,661-Speed 5483.66 samples/sec Loss 4.4217 LearningRate 0.0515 Epoch: 12 Global Step: 125740 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:18:38,147-Speed 5471.81 samples/sec Loss 4.4040 LearningRate 0.0515 Epoch: 12 Global Step: 125750 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:18:45,585-Speed 5507.43 samples/sec Loss 4.4051 LearningRate 0.0515 Epoch: 12 Global Step: 125760 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:18:53,214-Speed 5369.95 samples/sec Loss 4.3977 LearningRate 0.0515 Epoch: 12 Global Step: 125770 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:19:00,909-Speed 5323.75 samples/sec Loss 4.3886 LearningRate 0.0515 Epoch: 12 Global Step: 125780 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:19:08,457-Speed 5427.22 samples/sec Loss 4.4316 LearningRate 0.0515 Epoch: 12 Global Step: 125790 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:19:16,054-Speed 5392.11 samples/sec Loss 4.3819 LearningRate 0.0514 Epoch: 12 Global Step: 125800 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:19:23,556-Speed 5460.47 samples/sec Loss 4.3535 LearningRate 0.0514 Epoch: 12 Global Step: 125810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:19:31,109-Speed 5424.45 samples/sec Loss 4.4059 LearningRate 0.0514 Epoch: 12 Global Step: 125820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:19:38,618-Speed 5455.22 samples/sec Loss 4.4292 LearningRate 0.0514 Epoch: 12 Global Step: 125830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:19:46,110-Speed 5467.60 samples/sec Loss 4.3622 LearningRate 0.0514 Epoch: 12 Global Step: 125840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:19:53,730-Speed 5375.98 samples/sec Loss 4.3658 LearningRate 0.0514 Epoch: 12 Global Step: 125850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:20:01,278-Speed 5427.67 samples/sec Loss 4.4032 LearningRate 0.0514 Epoch: 12 Global Step: 125860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:20:08,791-Speed 5452.06 samples/sec Loss 4.3694 LearningRate 0.0514 Epoch: 12 Global Step: 125870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:20:16,369-Speed 5405.71 samples/sec Loss 4.4141 LearningRate 0.0513 Epoch: 12 Global Step: 125880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:20:23,876-Speed 5457.56 samples/sec Loss 4.3522 LearningRate 0.0513 Epoch: 12 Global Step: 125890 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:20:31,513-Speed 5364.24 samples/sec Loss 4.4200 LearningRate 0.0513 Epoch: 12 Global Step: 125900 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:20:39,046-Speed 5437.75 samples/sec Loss 4.3813 LearningRate 0.0513 Epoch: 12 Global Step: 125910 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:20:46,527-Speed 5476.31 samples/sec Loss 4.3475 LearningRate 0.0513 Epoch: 12 Global Step: 125920 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:20:53,961-Speed 5510.63 samples/sec Loss 4.4030 LearningRate 0.0513 Epoch: 12 Global Step: 125930 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:21:01,461-Speed 5461.89 samples/sec Loss 4.3750 LearningRate 0.0513 Epoch: 12 Global Step: 125940 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:21:08,964-Speed 5459.61 samples/sec Loss 4.4234 LearningRate 0.0513 Epoch: 12 Global Step: 125950 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:21:16,576-Speed 5381.88 samples/sec Loss 4.3561 LearningRate 0.0512 Epoch: 12 Global Step: 125960 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:21:24,231-Speed 5351.77 samples/sec Loss 4.4112 LearningRate 0.0512 Epoch: 12 Global Step: 125970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:21:31,720-Speed 5469.41 samples/sec Loss 4.4354 LearningRate 0.0512 Epoch: 12 Global Step: 125980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:21:39,194-Speed 5481.12 samples/sec Loss 4.4089 LearningRate 0.0512 Epoch: 12 Global Step: 125990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:21:46,636-Speed 5504.57 samples/sec Loss 4.4302 LearningRate 0.0512 Epoch: 12 Global Step: 126000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:22:30,614-[lfw][126000]XNorm: 23.421095 Training: 2022-01-08 23:22:30,615-[lfw][126000]Accuracy-Flip: 0.99783+-0.00308 Training: 2022-01-08 23:22:30,616-[lfw][126000]Accuracy-Highest: 0.99817 Training: 2022-01-08 23:23:22,301-[cfp_fp][126000]XNorm: 21.811621 Training: 2022-01-08 23:23:22,302-[cfp_fp][126000]Accuracy-Flip: 0.99129+-0.00416 Training: 2022-01-08 23:23:22,303-[cfp_fp][126000]Accuracy-Highest: 0.99129 Training: 2022-01-08 23:24:06,997-[agedb_30][126000]XNorm: 23.365409 Training: 2022-01-08 23:24:06,998-[agedb_30][126000]Accuracy-Flip: 0.97933+-0.00716 Training: 2022-01-08 23:24:06,998-[agedb_30][126000]Accuracy-Highest: 0.98000 Training: 2022-01-08 23:24:14,700-Speed 276.64 samples/sec Loss 4.3588 LearningRate 0.0512 Epoch: 12 Global Step: 126010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:24:22,188-Speed 5471.28 samples/sec Loss 4.3650 LearningRate 0.0512 Epoch: 12 Global Step: 126020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:24:29,661-Speed 5482.63 samples/sec Loss 4.3887 LearningRate 0.0512 Epoch: 12 Global Step: 126030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:24:37,147-Speed 5471.69 samples/sec Loss 4.3276 LearningRate 0.0511 Epoch: 12 Global Step: 126040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:24:44,752-Speed 5386.85 samples/sec Loss 4.4104 LearningRate 0.0511 Epoch: 12 Global Step: 126050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:24:52,233-Speed 5476.44 samples/sec Loss 4.3898 LearningRate 0.0511 Epoch: 12 Global Step: 126060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:24:59,765-Speed 5438.41 samples/sec Loss 4.3696 LearningRate 0.0511 Epoch: 12 Global Step: 126070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:25:07,421-Speed 5351.06 samples/sec Loss 4.3546 LearningRate 0.0511 Epoch: 12 Global Step: 126080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:25:14,928-Speed 5456.84 samples/sec Loss 4.3232 LearningRate 0.0511 Epoch: 12 Global Step: 126090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:25:22,445-Speed 5449.84 samples/sec Loss 4.4148 LearningRate 0.0511 Epoch: 12 Global Step: 126100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:25:29,969-Speed 5444.73 samples/sec Loss 4.3943 LearningRate 0.0511 Epoch: 12 Global Step: 126110 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:25:37,517-Speed 5427.54 samples/sec Loss 4.3928 LearningRate 0.0510 Epoch: 12 Global Step: 126120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:25:44,964-Speed 5501.23 samples/sec Loss 4.3530 LearningRate 0.0510 Epoch: 12 Global Step: 126130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:25:52,457-Speed 5466.99 samples/sec Loss 4.3875 LearningRate 0.0510 Epoch: 12 Global Step: 126140 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:26:00,021-Speed 5415.61 samples/sec Loss 4.3911 LearningRate 0.0510 Epoch: 12 Global Step: 126150 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:26:07,703-Speed 5332.93 samples/sec Loss 4.3857 LearningRate 0.0510 Epoch: 12 Global Step: 126160 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:26:15,551-Speed 5220.12 samples/sec Loss 4.3474 LearningRate 0.0510 Epoch: 12 Global Step: 126170 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:26:23,118-Speed 5413.65 samples/sec Loss 4.3767 LearningRate 0.0510 Epoch: 12 Global Step: 126180 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:26:30,678-Speed 5418.59 samples/sec Loss 4.3783 LearningRate 0.0510 Epoch: 12 Global Step: 126190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:26:38,316-Speed 5363.07 samples/sec Loss 4.4130 LearningRate 0.0509 Epoch: 12 Global Step: 126200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:26:46,048-Speed 5298.53 samples/sec Loss 4.3771 LearningRate 0.0509 Epoch: 12 Global Step: 126210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:26:53,677-Speed 5369.92 samples/sec Loss 4.4006 LearningRate 0.0509 Epoch: 12 Global Step: 126220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:27:01,162-Speed 5472.07 samples/sec Loss 4.3734 LearningRate 0.0509 Epoch: 12 Global Step: 126230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:27:08,684-Speed 5446.85 samples/sec Loss 4.3962 LearningRate 0.0509 Epoch: 12 Global Step: 126240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:27:16,209-Speed 5443.31 samples/sec Loss 4.4365 LearningRate 0.0509 Epoch: 12 Global Step: 126250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:27:23,691-Speed 5475.62 samples/sec Loss 4.3172 LearningRate 0.0509 Epoch: 12 Global Step: 126260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:27:31,231-Speed 5432.58 samples/sec Loss 4.3372 LearningRate 0.0508 Epoch: 12 Global Step: 126270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:27:38,690-Speed 5492.59 samples/sec Loss 4.4338 LearningRate 0.0508 Epoch: 12 Global Step: 126280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:27:46,277-Speed 5399.12 samples/sec Loss 4.3321 LearningRate 0.0508 Epoch: 12 Global Step: 126290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:27:53,977-Speed 5320.16 samples/sec Loss 4.3820 LearningRate 0.0508 Epoch: 12 Global Step: 126300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:28:01,480-Speed 5459.73 samples/sec Loss 4.3744 LearningRate 0.0508 Epoch: 12 Global Step: 126310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:28:08,890-Speed 5528.63 samples/sec Loss 4.3608 LearningRate 0.0508 Epoch: 12 Global Step: 126320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:28:16,346-Speed 5494.38 samples/sec Loss 4.3736 LearningRate 0.0508 Epoch: 12 Global Step: 126330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:28:23,768-Speed 5520.00 samples/sec Loss 4.3496 LearningRate 0.0508 Epoch: 12 Global Step: 126340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:28:31,317-Speed 5426.39 samples/sec Loss 4.3320 LearningRate 0.0507 Epoch: 12 Global Step: 126350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:28:38,841-Speed 5444.10 samples/sec Loss 4.3101 LearningRate 0.0507 Epoch: 12 Global Step: 126360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:28:46,354-Speed 5452.98 samples/sec Loss 4.3376 LearningRate 0.0507 Epoch: 12 Global Step: 126370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:28:53,828-Speed 5481.00 samples/sec Loss 4.3825 LearningRate 0.0507 Epoch: 12 Global Step: 126380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:29:01,424-Speed 5393.36 samples/sec Loss 4.3432 LearningRate 0.0507 Epoch: 12 Global Step: 126390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:29:08,971-Speed 5427.76 samples/sec Loss 4.3407 LearningRate 0.0507 Epoch: 12 Global Step: 126400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:29:16,413-Speed 5505.34 samples/sec Loss 4.3408 LearningRate 0.0507 Epoch: 12 Global Step: 126410 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:29:23,878-Speed 5487.78 samples/sec Loss 4.3859 LearningRate 0.0507 Epoch: 12 Global Step: 126420 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:29:31,401-Speed 5445.00 samples/sec Loss 4.3628 LearningRate 0.0506 Epoch: 12 Global Step: 126430 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:29:38,943-Speed 5431.27 samples/sec Loss 4.2956 LearningRate 0.0506 Epoch: 12 Global Step: 126440 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:29:46,385-Speed 5505.02 samples/sec Loss 4.3345 LearningRate 0.0506 Epoch: 12 Global Step: 126450 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:29:53,824-Speed 5506.58 samples/sec Loss 4.3672 LearningRate 0.0506 Epoch: 12 Global Step: 126460 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:30:01,295-Speed 5483.34 samples/sec Loss 4.3140 LearningRate 0.0506 Epoch: 12 Global Step: 126470 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:30:12,393-Speed 3691.08 samples/sec Loss 4.3674 LearningRate 0.0506 Epoch: 12 Global Step: 126480 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:30:20,020-Speed 5371.33 samples/sec Loss 4.3714 LearningRate 0.0506 Epoch: 12 Global Step: 126490 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:30:27,539-Speed 5448.52 samples/sec Loss 4.3575 LearningRate 0.0506 Epoch: 12 Global Step: 126500 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-08 23:30:35,081-Speed 5431.77 samples/sec Loss 4.4059 LearningRate 0.0505 Epoch: 12 Global Step: 126510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:30:42,685-Speed 5386.69 samples/sec Loss 4.3110 LearningRate 0.0505 Epoch: 12 Global Step: 126520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:30:50,237-Speed 5424.89 samples/sec Loss 4.3551 LearningRate 0.0505 Epoch: 12 Global Step: 126530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:30:57,660-Speed 5518.74 samples/sec Loss 4.3491 LearningRate 0.0505 Epoch: 12 Global Step: 126540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:31:05,131-Speed 5483.38 samples/sec Loss 4.3403 LearningRate 0.0505 Epoch: 12 Global Step: 126550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:31:12,673-Speed 5431.63 samples/sec Loss 4.3621 LearningRate 0.0505 Epoch: 12 Global Step: 126560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:31:20,136-Speed 5489.10 samples/sec Loss 4.3820 LearningRate 0.0505 Epoch: 12 Global Step: 126570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:31:27,682-Speed 5428.96 samples/sec Loss 4.3115 LearningRate 0.0505 Epoch: 12 Global Step: 126580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:31:35,213-Speed 5439.51 samples/sec Loss 4.3591 LearningRate 0.0504 Epoch: 12 Global Step: 126590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:31:42,647-Speed 5510.52 samples/sec Loss 4.3449 LearningRate 0.0504 Epoch: 12 Global Step: 126600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:31:50,111-Speed 5488.82 samples/sec Loss 4.3260 LearningRate 0.0504 Epoch: 12 Global Step: 126610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:31:57,610-Speed 5462.25 samples/sec Loss 4.2898 LearningRate 0.0504 Epoch: 12 Global Step: 126620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-08 23:32:05,093-Speed 5474.82 samples/sec Loss 4.3843 LearningRate 0.0504 Epoch: 12 Global Step: 126630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-08 23:32:12,600-Speed 5456.60 samples/sec Loss 4.2784 LearningRate 0.0504 Epoch: 12 Global Step: 126640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:32:20,045-Speed 5502.36 samples/sec Loss 4.3136 LearningRate 0.0504 Epoch: 12 Global Step: 126650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:32:27,576-Speed 5440.02 samples/sec Loss 4.3064 LearningRate 0.0504 Epoch: 12 Global Step: 126660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:32:35,026-Speed 5498.51 samples/sec Loss 4.3357 LearningRate 0.0503 Epoch: 12 Global Step: 126670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:32:42,486-Speed 5491.69 samples/sec Loss 4.4048 LearningRate 0.0503 Epoch: 12 Global Step: 126680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:32:49,955-Speed 5484.37 samples/sec Loss 4.3980 LearningRate 0.0503 Epoch: 12 Global Step: 126690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:32:57,412-Speed 5493.38 samples/sec Loss 4.4269 LearningRate 0.0503 Epoch: 12 Global Step: 126700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:33:04,860-Speed 5500.91 samples/sec Loss 4.3330 LearningRate 0.0503 Epoch: 12 Global Step: 126710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:33:12,368-Speed 5455.92 samples/sec Loss 4.3159 LearningRate 0.0503 Epoch: 12 Global Step: 126720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:33:19,967-Speed 5390.54 samples/sec Loss 4.3475 LearningRate 0.0503 Epoch: 12 Global Step: 126730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:33:27,563-Speed 5393.85 samples/sec Loss 4.3315 LearningRate 0.0503 Epoch: 12 Global Step: 126740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:33:35,124-Speed 5417.67 samples/sec Loss 4.3471 LearningRate 0.0502 Epoch: 12 Global Step: 126750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:33:42,653-Speed 5441.20 samples/sec Loss 4.3280 LearningRate 0.0502 Epoch: 12 Global Step: 126760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:33:50,186-Speed 5437.37 samples/sec Loss 4.3544 LearningRate 0.0502 Epoch: 12 Global Step: 126770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:33:57,710-Speed 5445.42 samples/sec Loss 4.3735 LearningRate 0.0502 Epoch: 12 Global Step: 126780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:34:05,168-Speed 5492.61 samples/sec Loss 4.3684 LearningRate 0.0502 Epoch: 12 Global Step: 126790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:34:12,722-Speed 5423.02 samples/sec Loss 4.3665 LearningRate 0.0502 Epoch: 12 Global Step: 126800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:34:20,265-Speed 5430.60 samples/sec Loss 4.3679 LearningRate 0.0502 Epoch: 12 Global Step: 126810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:34:27,767-Speed 5460.70 samples/sec Loss 4.3398 LearningRate 0.0502 Epoch: 12 Global Step: 126820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:34:35,347-Speed 5404.35 samples/sec Loss 4.3568 LearningRate 0.0502 Epoch: 12 Global Step: 126830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:34:42,818-Speed 5482.98 samples/sec Loss 4.3205 LearningRate 0.0501 Epoch: 12 Global Step: 126840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:34:50,376-Speed 5420.64 samples/sec Loss 4.3547 LearningRate 0.0501 Epoch: 12 Global Step: 126850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:34:57,782-Speed 5531.23 samples/sec Loss 4.3231 LearningRate 0.0501 Epoch: 12 Global Step: 126860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:35:05,359-Speed 5407.02 samples/sec Loss 4.3388 LearningRate 0.0501 Epoch: 12 Global Step: 126870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:35:12,839-Speed 5476.19 samples/sec Loss 4.3624 LearningRate 0.0501 Epoch: 12 Global Step: 126880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:35:20,355-Speed 5450.64 samples/sec Loss 4.3346 LearningRate 0.0501 Epoch: 12 Global Step: 126890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:35:27,840-Speed 5472.78 samples/sec Loss 4.3147 LearningRate 0.0501 Epoch: 12 Global Step: 126900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:35:35,365-Speed 5444.34 samples/sec Loss 4.2828 LearningRate 0.0501 Epoch: 12 Global Step: 126910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:35:42,858-Speed 5466.96 samples/sec Loss 4.3705 LearningRate 0.0500 Epoch: 12 Global Step: 126920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:35:50,432-Speed 5408.52 samples/sec Loss 4.3030 LearningRate 0.0500 Epoch: 12 Global Step: 126930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:35:58,197-Speed 5275.75 samples/sec Loss 4.3030 LearningRate 0.0500 Epoch: 12 Global Step: 126940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:36:05,719-Speed 5446.44 samples/sec Loss 4.3521 LearningRate 0.0500 Epoch: 12 Global Step: 126950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:36:13,201-Speed 5474.93 samples/sec Loss 4.3372 LearningRate 0.0500 Epoch: 12 Global Step: 126960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:36:20,803-Speed 5388.42 samples/sec Loss 4.3277 LearningRate 0.0500 Epoch: 12 Global Step: 126970 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:36:28,391-Speed 5399.53 samples/sec Loss 4.3085 LearningRate 0.0500 Epoch: 12 Global Step: 126980 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:36:35,886-Speed 5464.97 samples/sec Loss 4.3662 LearningRate 0.0500 Epoch: 12 Global Step: 126990 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:36:43,460-Speed 5408.96 samples/sec Loss 4.3372 LearningRate 0.0499 Epoch: 12 Global Step: 127000 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:36:50,981-Speed 5446.27 samples/sec Loss 4.2958 LearningRate 0.0499 Epoch: 12 Global Step: 127010 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:36:58,564-Speed 5403.26 samples/sec Loss 4.3834 LearningRate 0.0499 Epoch: 12 Global Step: 127020 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:37:06,122-Speed 5419.39 samples/sec Loss 4.3334 LearningRate 0.0499 Epoch: 12 Global Step: 127030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:37:13,864-Speed 5291.76 samples/sec Loss 4.3362 LearningRate 0.0499 Epoch: 12 Global Step: 127040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:37:21,352-Speed 5470.17 samples/sec Loss 4.2933 LearningRate 0.0499 Epoch: 12 Global Step: 127050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:37:28,961-Speed 5384.09 samples/sec Loss 4.3156 LearningRate 0.0499 Epoch: 12 Global Step: 127060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:37:36,764-Speed 5250.04 samples/sec Loss 4.2812 LearningRate 0.0499 Epoch: 12 Global Step: 127070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:37:44,304-Speed 5432.82 samples/sec Loss 4.3194 LearningRate 0.0498 Epoch: 12 Global Step: 127080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:37:51,772-Speed 5485.31 samples/sec Loss 4.3064 LearningRate 0.0498 Epoch: 12 Global Step: 127090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:37:59,246-Speed 5481.56 samples/sec Loss 4.3467 LearningRate 0.0498 Epoch: 12 Global Step: 127100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:38:06,724-Speed 5478.07 samples/sec Loss 4.2980 LearningRate 0.0498 Epoch: 12 Global Step: 127110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:38:14,180-Speed 5494.48 samples/sec Loss 4.3425 LearningRate 0.0498 Epoch: 12 Global Step: 127120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:38:21,699-Speed 5448.34 samples/sec Loss 4.2699 LearningRate 0.0498 Epoch: 12 Global Step: 127130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:38:29,282-Speed 5402.24 samples/sec Loss 4.2821 LearningRate 0.0498 Epoch: 12 Global Step: 127140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:38:36,727-Speed 5502.59 samples/sec Loss 4.3310 LearningRate 0.0498 Epoch: 12 Global Step: 127150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:38:44,234-Speed 5456.34 samples/sec Loss 4.3608 LearningRate 0.0497 Epoch: 12 Global Step: 127160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:38:51,675-Speed 5505.36 samples/sec Loss 4.3024 LearningRate 0.0497 Epoch: 12 Global Step: 127170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:38:59,240-Speed 5415.30 samples/sec Loss 4.3060 LearningRate 0.0497 Epoch: 12 Global Step: 127180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:39:06,763-Speed 5445.68 samples/sec Loss 4.2918 LearningRate 0.0497 Epoch: 12 Global Step: 127190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:39:14,236-Speed 5481.65 samples/sec Loss 4.3217 LearningRate 0.0497 Epoch: 12 Global Step: 127200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:39:21,649-Speed 5525.90 samples/sec Loss 4.3308 LearningRate 0.0497 Epoch: 12 Global Step: 127210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:39:29,111-Speed 5489.95 samples/sec Loss 4.3513 LearningRate 0.0497 Epoch: 12 Global Step: 127220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:39:36,606-Speed 5465.55 samples/sec Loss 4.3141 LearningRate 0.0497 Epoch: 12 Global Step: 127230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:39:44,157-Speed 5425.09 samples/sec Loss 4.3089 LearningRate 0.0496 Epoch: 12 Global Step: 127240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:39:51,715-Speed 5420.14 samples/sec Loss 4.3246 LearningRate 0.0496 Epoch: 12 Global Step: 127250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:39:59,160-Speed 5502.26 samples/sec Loss 4.3314 LearningRate 0.0496 Epoch: 12 Global Step: 127260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:40:06,624-Speed 5488.85 samples/sec Loss 4.3176 LearningRate 0.0496 Epoch: 12 Global Step: 127270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:40:14,078-Speed 5495.23 samples/sec Loss 4.3356 LearningRate 0.0496 Epoch: 12 Global Step: 127280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:40:21,517-Speed 5506.55 samples/sec Loss 4.3192 LearningRate 0.0496 Epoch: 12 Global Step: 127290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:40:28,992-Speed 5480.44 samples/sec Loss 4.2939 LearningRate 0.0496 Epoch: 12 Global Step: 127300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:40:36,467-Speed 5480.96 samples/sec Loss 4.3246 LearningRate 0.0496 Epoch: 12 Global Step: 127310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:40:43,984-Speed 5449.12 samples/sec Loss 4.3400 LearningRate 0.0495 Epoch: 12 Global Step: 127320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:40:51,507-Speed 5445.36 samples/sec Loss 4.2977 LearningRate 0.0495 Epoch: 12 Global Step: 127330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:40:58,995-Speed 5470.78 samples/sec Loss 4.3157 LearningRate 0.0495 Epoch: 12 Global Step: 127340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:41:06,528-Speed 5438.16 samples/sec Loss 4.3258 LearningRate 0.0495 Epoch: 12 Global Step: 127350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:41:13,980-Speed 5497.52 samples/sec Loss 4.3192 LearningRate 0.0495 Epoch: 12 Global Step: 127360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:41:21,483-Speed 5459.51 samples/sec Loss 4.2934 LearningRate 0.0495 Epoch: 12 Global Step: 127370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:41:28,983-Speed 5462.05 samples/sec Loss 4.3138 LearningRate 0.0495 Epoch: 12 Global Step: 127380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:41:36,532-Speed 5426.70 samples/sec Loss 4.2643 LearningRate 0.0495 Epoch: 12 Global Step: 127390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:41:44,004-Speed 5482.85 samples/sec Loss 4.3361 LearningRate 0.0494 Epoch: 12 Global Step: 127400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:41:51,453-Speed 5498.79 samples/sec Loss 4.3220 LearningRate 0.0494 Epoch: 12 Global Step: 127410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:41:58,906-Speed 5496.36 samples/sec Loss 4.3420 LearningRate 0.0494 Epoch: 12 Global Step: 127420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:42:06,373-Speed 5486.33 samples/sec Loss 4.2551 LearningRate 0.0494 Epoch: 12 Global Step: 127430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:42:13,874-Speed 5461.78 samples/sec Loss 4.3327 LearningRate 0.0494 Epoch: 12 Global Step: 127440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:42:21,336-Speed 5489.27 samples/sec Loss 4.3385 LearningRate 0.0494 Epoch: 12 Global Step: 127450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:42:28,827-Speed 5468.68 samples/sec Loss 4.3296 LearningRate 0.0494 Epoch: 12 Global Step: 127460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:42:36,339-Speed 5453.72 samples/sec Loss 4.2856 LearningRate 0.0494 Epoch: 12 Global Step: 127470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:42:43,859-Speed 5447.29 samples/sec Loss 4.2799 LearningRate 0.0493 Epoch: 12 Global Step: 127480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:42:51,416-Speed 5420.69 samples/sec Loss 4.3113 LearningRate 0.0493 Epoch: 12 Global Step: 127490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:42:58,994-Speed 5406.06 samples/sec Loss 4.3004 LearningRate 0.0493 Epoch: 12 Global Step: 127500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:43:06,704-Speed 5313.02 samples/sec Loss 4.3125 LearningRate 0.0493 Epoch: 12 Global Step: 127510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:43:14,194-Speed 5469.94 samples/sec Loss 4.2963 LearningRate 0.0493 Epoch: 12 Global Step: 127520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:43:21,676-Speed 5475.02 samples/sec Loss 4.2889 LearningRate 0.0493 Epoch: 12 Global Step: 127530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:43:29,301-Speed 5372.84 samples/sec Loss 4.3255 LearningRate 0.0493 Epoch: 12 Global Step: 127540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:43:36,787-Speed 5472.22 samples/sec Loss 4.3016 LearningRate 0.0493 Epoch: 12 Global Step: 127550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:43:44,276-Speed 5470.69 samples/sec Loss 4.2981 LearningRate 0.0492 Epoch: 12 Global Step: 127560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:43:51,752-Speed 5479.45 samples/sec Loss 4.2997 LearningRate 0.0492 Epoch: 12 Global Step: 127570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:43:59,213-Speed 5490.21 samples/sec Loss 4.3763 LearningRate 0.0492 Epoch: 12 Global Step: 127580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:44:06,706-Speed 5467.42 samples/sec Loss 4.3135 LearningRate 0.0492 Epoch: 12 Global Step: 127590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:44:14,251-Speed 5429.43 samples/sec Loss 4.2853 LearningRate 0.0492 Epoch: 12 Global Step: 127600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:44:21,866-Speed 5379.41 samples/sec Loss 4.2855 LearningRate 0.0492 Epoch: 12 Global Step: 127610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:44:29,466-Speed 5390.20 samples/sec Loss 4.2878 LearningRate 0.0492 Epoch: 12 Global Step: 127620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:44:36,968-Speed 5460.77 samples/sec Loss 4.2695 LearningRate 0.0492 Epoch: 12 Global Step: 127630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:44:44,531-Speed 5416.42 samples/sec Loss 4.3248 LearningRate 0.0491 Epoch: 12 Global Step: 127640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:44:52,016-Speed 5472.87 samples/sec Loss 4.2851 LearningRate 0.0491 Epoch: 12 Global Step: 127650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:44:59,461-Speed 5502.36 samples/sec Loss 4.2484 LearningRate 0.0491 Epoch: 12 Global Step: 127660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:45:06,971-Speed 5454.92 samples/sec Loss 4.3263 LearningRate 0.0491 Epoch: 12 Global Step: 127670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:45:14,431-Speed 5491.38 samples/sec Loss 4.3045 LearningRate 0.0491 Epoch: 12 Global Step: 127680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:45:21,848-Speed 5523.43 samples/sec Loss 4.2741 LearningRate 0.0491 Epoch: 12 Global Step: 127690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:45:29,364-Speed 5450.09 samples/sec Loss 4.3288 LearningRate 0.0491 Epoch: 12 Global Step: 127700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:45:36,856-Speed 5468.10 samples/sec Loss 4.2985 LearningRate 0.0491 Epoch: 12 Global Step: 127710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:45:44,378-Speed 5446.38 samples/sec Loss 4.2683 LearningRate 0.0490 Epoch: 12 Global Step: 127720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:45:52,074-Speed 5323.12 samples/sec Loss 4.2803 LearningRate 0.0490 Epoch: 12 Global Step: 127730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:45:59,655-Speed 5403.34 samples/sec Loss 4.3113 LearningRate 0.0490 Epoch: 12 Global Step: 127740 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:46:07,093-Speed 5507.11 samples/sec Loss 4.2620 LearningRate 0.0490 Epoch: 12 Global Step: 127750 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:46:14,692-Speed 5391.56 samples/sec Loss 4.2688 LearningRate 0.0490 Epoch: 12 Global Step: 127760 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:46:22,254-Speed 5417.48 samples/sec Loss 4.2650 LearningRate 0.0490 Epoch: 12 Global Step: 127770 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:46:29,734-Speed 5476.14 samples/sec Loss 4.3106 LearningRate 0.0490 Epoch: 12 Global Step: 127780 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:46:37,220-Speed 5472.26 samples/sec Loss 4.3256 LearningRate 0.0490 Epoch: 12 Global Step: 127790 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:46:44,720-Speed 5462.24 samples/sec Loss 4.3105 LearningRate 0.0489 Epoch: 12 Global Step: 127800 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:46:52,181-Speed 5490.48 samples/sec Loss 4.2925 LearningRate 0.0489 Epoch: 12 Global Step: 127810 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:46:59,750-Speed 5411.79 samples/sec Loss 4.2348 LearningRate 0.0489 Epoch: 12 Global Step: 127820 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:47:07,224-Speed 5481.23 samples/sec Loss 4.2752 LearningRate 0.0489 Epoch: 12 Global Step: 127830 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:47:14,782-Speed 5420.71 samples/sec Loss 4.3289 LearningRate 0.0489 Epoch: 12 Global Step: 127840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:47:22,280-Speed 5463.62 samples/sec Loss 4.2421 LearningRate 0.0489 Epoch: 12 Global Step: 127850 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:47:29,799-Speed 5448.03 samples/sec Loss 4.2717 LearningRate 0.0489 Epoch: 12 Global Step: 127860 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:47:37,213-Speed 5525.09 samples/sec Loss 4.2365 LearningRate 0.0489 Epoch: 12 Global Step: 127870 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:47:44,680-Speed 5486.01 samples/sec Loss 4.3013 LearningRate 0.0489 Epoch: 12 Global Step: 127880 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:47:52,109-Speed 5514.81 samples/sec Loss 4.2988 LearningRate 0.0488 Epoch: 12 Global Step: 127890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:47:59,544-Speed 5509.81 samples/sec Loss 4.3191 LearningRate 0.0488 Epoch: 12 Global Step: 127900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:48:07,066-Speed 5445.46 samples/sec Loss 4.2838 LearningRate 0.0488 Epoch: 12 Global Step: 127910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:48:14,565-Speed 5463.52 samples/sec Loss 4.2588 LearningRate 0.0488 Epoch: 12 Global Step: 127920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:48:22,032-Speed 5485.97 samples/sec Loss 4.2544 LearningRate 0.0488 Epoch: 12 Global Step: 127930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:48:29,568-Speed 5436.46 samples/sec Loss 4.2666 LearningRate 0.0488 Epoch: 12 Global Step: 127940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:48:37,148-Speed 5404.09 samples/sec Loss 4.2737 LearningRate 0.0488 Epoch: 12 Global Step: 127950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:48:44,687-Speed 5434.07 samples/sec Loss 4.2527 LearningRate 0.0488 Epoch: 12 Global Step: 127960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:48:52,186-Speed 5462.98 samples/sec Loss 4.2537 LearningRate 0.0487 Epoch: 12 Global Step: 127970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:48:59,687-Speed 5461.17 samples/sec Loss 4.2860 LearningRate 0.0487 Epoch: 12 Global Step: 127980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:49:07,143-Speed 5494.52 samples/sec Loss 4.2400 LearningRate 0.0487 Epoch: 12 Global Step: 127990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:49:14,663-Speed 5447.21 samples/sec Loss 4.2407 LearningRate 0.0487 Epoch: 12 Global Step: 128000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:49:58,279-[lfw][128000]XNorm: 21.962453 Training: 2022-01-08 23:49:58,280-[lfw][128000]Accuracy-Flip: 0.99767+-0.00281 Training: 2022-01-08 23:49:58,280-[lfw][128000]Accuracy-Highest: 0.99817 Training: 2022-01-08 23:50:49,158-[cfp_fp][128000]XNorm: 20.414131 Training: 2022-01-08 23:50:49,159-[cfp_fp][128000]Accuracy-Flip: 0.99157+-0.00480 Training: 2022-01-08 23:50:49,159-[cfp_fp][128000]Accuracy-Highest: 0.99157 Training: 2022-01-08 23:51:33,032-[agedb_30][128000]XNorm: 21.981309 Training: 2022-01-08 23:51:33,032-[agedb_30][128000]Accuracy-Flip: 0.97767+-0.00824 Training: 2022-01-08 23:51:33,033-[agedb_30][128000]Accuracy-Highest: 0.98000 Training: 2022-01-08 23:51:40,627-Speed 280.62 samples/sec Loss 4.2651 LearningRate 0.0487 Epoch: 12 Global Step: 128010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:51:48,142-Speed 5451.51 samples/sec Loss 4.2561 LearningRate 0.0487 Epoch: 12 Global Step: 128020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:51:55,719-Speed 5407.18 samples/sec Loss 4.2506 LearningRate 0.0487 Epoch: 12 Global Step: 128030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:52:03,233-Speed 5451.58 samples/sec Loss 4.2804 LearningRate 0.0487 Epoch: 12 Global Step: 128040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:52:10,698-Speed 5487.50 samples/sec Loss 4.2709 LearningRate 0.0486 Epoch: 12 Global Step: 128050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:52:18,208-Speed 5455.36 samples/sec Loss 4.3037 LearningRate 0.0486 Epoch: 12 Global Step: 128060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:52:25,650-Speed 5504.32 samples/sec Loss 4.2571 LearningRate 0.0486 Epoch: 12 Global Step: 128070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:52:33,175-Speed 5443.47 samples/sec Loss 4.2536 LearningRate 0.0486 Epoch: 12 Global Step: 128080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:52:40,696-Speed 5447.13 samples/sec Loss 4.2698 LearningRate 0.0486 Epoch: 12 Global Step: 128090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:52:48,160-Speed 5488.31 samples/sec Loss 4.2636 LearningRate 0.0486 Epoch: 12 Global Step: 128100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:52:55,635-Speed 5480.96 samples/sec Loss 4.2877 LearningRate 0.0486 Epoch: 12 Global Step: 128110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:53:03,139-Speed 5458.63 samples/sec Loss 4.2993 LearningRate 0.0486 Epoch: 12 Global Step: 128120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:53:10,717-Speed 5405.90 samples/sec Loss 4.2572 LearningRate 0.0485 Epoch: 12 Global Step: 128130 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:53:18,250-Speed 5438.55 samples/sec Loss 4.2319 LearningRate 0.0485 Epoch: 12 Global Step: 128140 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:53:25,822-Speed 5410.05 samples/sec Loss 4.2817 LearningRate 0.0485 Epoch: 12 Global Step: 128150 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:53:33,277-Speed 5494.80 samples/sec Loss 4.2331 LearningRate 0.0485 Epoch: 12 Global Step: 128160 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:53:40,740-Speed 5488.74 samples/sec Loss 4.2900 LearningRate 0.0485 Epoch: 12 Global Step: 128170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:53:48,255-Speed 5451.34 samples/sec Loss 4.2538 LearningRate 0.0485 Epoch: 12 Global Step: 128180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:53:55,724-Speed 5488.38 samples/sec Loss 4.2569 LearningRate 0.0485 Epoch: 12 Global Step: 128190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:54:03,236-Speed 5452.68 samples/sec Loss 4.2222 LearningRate 0.0485 Epoch: 12 Global Step: 128200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:54:10,731-Speed 5466.09 samples/sec Loss 4.2890 LearningRate 0.0484 Epoch: 12 Global Step: 128210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:54:18,352-Speed 5375.26 samples/sec Loss 4.2708 LearningRate 0.0484 Epoch: 12 Global Step: 128220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:54:26,026-Speed 5338.70 samples/sec Loss 4.2678 LearningRate 0.0484 Epoch: 12 Global Step: 128230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:54:33,562-Speed 5435.74 samples/sec Loss 4.2905 LearningRate 0.0484 Epoch: 12 Global Step: 128240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:54:41,118-Speed 5421.36 samples/sec Loss 4.2312 LearningRate 0.0484 Epoch: 12 Global Step: 128250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:54:48,700-Speed 5402.92 samples/sec Loss 4.2511 LearningRate 0.0484 Epoch: 12 Global Step: 128260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:54:56,409-Speed 5314.15 samples/sec Loss 4.2374 LearningRate 0.0484 Epoch: 12 Global Step: 128270 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:55:04,001-Speed 5395.78 samples/sec Loss 4.1969 LearningRate 0.0484 Epoch: 12 Global Step: 128280 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:55:11,535-Speed 5437.42 samples/sec Loss 4.2669 LearningRate 0.0483 Epoch: 12 Global Step: 128290 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:55:19,182-Speed 5357.06 samples/sec Loss 4.1976 LearningRate 0.0483 Epoch: 12 Global Step: 128300 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:55:26,689-Speed 5457.61 samples/sec Loss 4.2585 LearningRate 0.0483 Epoch: 12 Global Step: 128310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:55:34,313-Speed 5372.69 samples/sec Loss 4.2388 LearningRate 0.0483 Epoch: 12 Global Step: 128320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:55:41,824-Speed 5453.50 samples/sec Loss 4.2628 LearningRate 0.0483 Epoch: 12 Global Step: 128330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:55:49,474-Speed 5355.47 samples/sec Loss 4.2202 LearningRate 0.0483 Epoch: 12 Global Step: 128340 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:55:56,987-Speed 5452.38 samples/sec Loss 4.2335 LearningRate 0.0483 Epoch: 12 Global Step: 128350 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-08 23:56:04,452-Speed 5487.84 samples/sec Loss 4.1972 LearningRate 0.0483 Epoch: 12 Global Step: 128360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:56:12,040-Speed 5398.62 samples/sec Loss 4.2345 LearningRate 0.0483 Epoch: 12 Global Step: 128370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:56:19,597-Speed 5420.81 samples/sec Loss 4.2981 LearningRate 0.0482 Epoch: 12 Global Step: 128380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:56:27,160-Speed 5417.22 samples/sec Loss 4.2439 LearningRate 0.0482 Epoch: 12 Global Step: 128390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:56:34,697-Speed 5434.99 samples/sec Loss 4.2117 LearningRate 0.0482 Epoch: 12 Global Step: 128400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:56:42,232-Speed 5436.83 samples/sec Loss 4.2450 LearningRate 0.0482 Epoch: 12 Global Step: 128410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:56:49,830-Speed 5391.28 samples/sec Loss 4.2750 LearningRate 0.0482 Epoch: 12 Global Step: 128420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:56:57,413-Speed 5402.91 samples/sec Loss 4.2569 LearningRate 0.0482 Epoch: 12 Global Step: 128430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:57:05,028-Speed 5379.66 samples/sec Loss 4.2170 LearningRate 0.0482 Epoch: 12 Global Step: 128440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:57:12,538-Speed 5454.44 samples/sec Loss 4.2486 LearningRate 0.0482 Epoch: 12 Global Step: 128450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:57:20,067-Speed 5440.82 samples/sec Loss 4.1789 LearningRate 0.0481 Epoch: 12 Global Step: 128460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:57:27,573-Speed 5458.29 samples/sec Loss 4.2594 LearningRate 0.0481 Epoch: 12 Global Step: 128470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:57:35,051-Speed 5478.13 samples/sec Loss 4.2071 LearningRate 0.0481 Epoch: 12 Global Step: 128480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:57:42,519-Speed 5485.24 samples/sec Loss 4.2370 LearningRate 0.0481 Epoch: 12 Global Step: 128490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:57:50,063-Speed 5430.18 samples/sec Loss 4.2570 LearningRate 0.0481 Epoch: 12 Global Step: 128500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:57:57,602-Speed 5434.01 samples/sec Loss 4.2566 LearningRate 0.0481 Epoch: 12 Global Step: 128510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:58:05,094-Speed 5468.54 samples/sec Loss 4.2826 LearningRate 0.0481 Epoch: 12 Global Step: 128520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:58:12,582-Speed 5470.28 samples/sec Loss 4.2368 LearningRate 0.0481 Epoch: 12 Global Step: 128530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:58:20,033-Speed 5497.69 samples/sec Loss 4.2713 LearningRate 0.0480 Epoch: 12 Global Step: 128540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:58:27,598-Speed 5415.33 samples/sec Loss 4.3260 LearningRate 0.0480 Epoch: 12 Global Step: 128550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:58:35,135-Speed 5435.78 samples/sec Loss 4.2767 LearningRate 0.0480 Epoch: 12 Global Step: 128560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:58:42,675-Speed 5432.60 samples/sec Loss 4.2658 LearningRate 0.0480 Epoch: 12 Global Step: 128570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:58:50,186-Speed 5454.06 samples/sec Loss 4.2594 LearningRate 0.0480 Epoch: 12 Global Step: 128580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:58:57,714-Speed 5441.87 samples/sec Loss 4.2132 LearningRate 0.0480 Epoch: 12 Global Step: 128590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:59:05,195-Speed 5476.42 samples/sec Loss 4.2563 LearningRate 0.0480 Epoch: 12 Global Step: 128600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:59:12,678-Speed 5474.31 samples/sec Loss 4.3009 LearningRate 0.0480 Epoch: 12 Global Step: 128610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:59:20,150-Speed 5482.01 samples/sec Loss 4.2381 LearningRate 0.0479 Epoch: 12 Global Step: 128620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-08 23:59:27,675-Speed 5443.96 samples/sec Loss 4.1895 LearningRate 0.0479 Epoch: 12 Global Step: 128630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:59:35,203-Speed 5442.35 samples/sec Loss 4.2850 LearningRate 0.0479 Epoch: 12 Global Step: 128640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:59:42,747-Speed 5430.12 samples/sec Loss 4.2332 LearningRate 0.0479 Epoch: 12 Global Step: 128650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:59:50,265-Speed 5448.75 samples/sec Loss 4.2528 LearningRate 0.0479 Epoch: 12 Global Step: 128660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-08 23:59:57,696-Speed 5512.70 samples/sec Loss 4.1849 LearningRate 0.0479 Epoch: 12 Global Step: 128670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:00:05,167-Speed 5483.97 samples/sec Loss 4.2528 LearningRate 0.0479 Epoch: 12 Global Step: 128680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:00:12,656-Speed 5469.44 samples/sec Loss 4.2192 LearningRate 0.0479 Epoch: 12 Global Step: 128690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:00:20,188-Speed 5439.00 samples/sec Loss 4.2159 LearningRate 0.0478 Epoch: 12 Global Step: 128700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:00:27,631-Speed 5504.07 samples/sec Loss 4.2423 LearningRate 0.0478 Epoch: 12 Global Step: 128710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:00:35,136-Speed 5458.49 samples/sec Loss 4.2241 LearningRate 0.0478 Epoch: 12 Global Step: 128720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:00:42,653-Speed 5449.80 samples/sec Loss 4.1821 LearningRate 0.0478 Epoch: 12 Global Step: 128730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-09 00:00:50,108-Speed 5495.09 samples/sec Loss 4.1909 LearningRate 0.0478 Epoch: 12 Global Step: 128740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:00:57,579-Speed 5483.47 samples/sec Loss 4.2361 LearningRate 0.0478 Epoch: 12 Global Step: 128750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:01:05,053-Speed 5481.40 samples/sec Loss 4.2613 LearningRate 0.0478 Epoch: 12 Global Step: 128760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:01:12,504-Speed 5497.90 samples/sec Loss 4.2227 LearningRate 0.0478 Epoch: 12 Global Step: 128770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:01:20,002-Speed 5463.14 samples/sec Loss 4.2499 LearningRate 0.0478 Epoch: 12 Global Step: 128780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:01:27,737-Speed 5296.37 samples/sec Loss 4.2249 LearningRate 0.0477 Epoch: 12 Global Step: 128790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:01:35,281-Speed 5430.06 samples/sec Loss 4.2425 LearningRate 0.0477 Epoch: 12 Global Step: 128800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:01:42,741-Speed 5491.81 samples/sec Loss 4.2139 LearningRate 0.0477 Epoch: 12 Global Step: 128810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:01:50,355-Speed 5379.79 samples/sec Loss 4.1920 LearningRate 0.0477 Epoch: 12 Global Step: 128820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:01:57,797-Speed 5504.95 samples/sec Loss 4.2301 LearningRate 0.0477 Epoch: 12 Global Step: 128830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:02:05,300-Speed 5460.00 samples/sec Loss 4.2091 LearningRate 0.0477 Epoch: 12 Global Step: 128840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-09 00:02:12,756-Speed 5494.70 samples/sec Loss 4.2780 LearningRate 0.0477 Epoch: 12 Global Step: 128850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:02:20,240-Speed 5473.41 samples/sec Loss 4.2470 LearningRate 0.0477 Epoch: 12 Global Step: 128860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:02:27,691-Speed 5498.09 samples/sec Loss 4.1671 LearningRate 0.0476 Epoch: 12 Global Step: 128870 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:02:35,226-Speed 5436.70 samples/sec Loss 4.1810 LearningRate 0.0476 Epoch: 12 Global Step: 128880 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:02:42,923-Speed 5322.29 samples/sec Loss 4.2176 LearningRate 0.0476 Epoch: 12 Global Step: 128890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:02:50,566-Speed 5359.93 samples/sec Loss 4.2397 LearningRate 0.0476 Epoch: 12 Global Step: 128900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:02:58,137-Speed 5410.99 samples/sec Loss 4.2306 LearningRate 0.0476 Epoch: 12 Global Step: 128910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:03:05,809-Speed 5339.39 samples/sec Loss 4.1908 LearningRate 0.0476 Epoch: 12 Global Step: 128920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:03:13,315-Speed 5458.05 samples/sec Loss 4.2175 LearningRate 0.0476 Epoch: 12 Global Step: 128930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:03:20,860-Speed 5429.14 samples/sec Loss 4.2786 LearningRate 0.0476 Epoch: 12 Global Step: 128940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:03:28,352-Speed 5467.77 samples/sec Loss 4.2467 LearningRate 0.0475 Epoch: 12 Global Step: 128950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:03:35,831-Speed 5477.38 samples/sec Loss 4.1829 LearningRate 0.0475 Epoch: 12 Global Step: 128960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:03:43,357-Speed 5443.75 samples/sec Loss 4.1519 LearningRate 0.0475 Epoch: 12 Global Step: 128970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:03:50,795-Speed 5507.71 samples/sec Loss 4.1957 LearningRate 0.0475 Epoch: 12 Global Step: 128980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:03:58,320-Speed 5443.51 samples/sec Loss 4.2380 LearningRate 0.0475 Epoch: 12 Global Step: 128990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:04:05,832-Speed 5452.85 samples/sec Loss 4.1797 LearningRate 0.0475 Epoch: 12 Global Step: 129000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:04:13,330-Speed 5464.06 samples/sec Loss 4.2135 LearningRate 0.0475 Epoch: 12 Global Step: 129010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:04:20,793-Speed 5489.02 samples/sec Loss 4.1972 LearningRate 0.0475 Epoch: 12 Global Step: 129020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:04:28,341-Speed 5427.01 samples/sec Loss 4.1944 LearningRate 0.0474 Epoch: 12 Global Step: 129030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:04:35,872-Speed 5439.30 samples/sec Loss 4.1952 LearningRate 0.0474 Epoch: 12 Global Step: 129040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:04:43,364-Speed 5468.36 samples/sec Loss 4.2280 LearningRate 0.0474 Epoch: 12 Global Step: 129050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:04:50,896-Speed 5439.21 samples/sec Loss 4.2227 LearningRate 0.0474 Epoch: 12 Global Step: 129060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:04:58,407-Speed 5454.08 samples/sec Loss 4.2274 LearningRate 0.0474 Epoch: 12 Global Step: 129070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-09 00:05:06,002-Speed 5392.96 samples/sec Loss 4.2439 LearningRate 0.0474 Epoch: 12 Global Step: 129080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:05:13,488-Speed 5472.54 samples/sec Loss 4.1579 LearningRate 0.0474 Epoch: 12 Global Step: 129090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:05:21,164-Speed 5336.63 samples/sec Loss 4.1994 LearningRate 0.0474 Epoch: 12 Global Step: 129100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:05:28,664-Speed 5462.40 samples/sec Loss 4.1819 LearningRate 0.0474 Epoch: 12 Global Step: 129110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:05:36,138-Speed 5481.18 samples/sec Loss 4.2196 LearningRate 0.0473 Epoch: 12 Global Step: 129120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:05:43,621-Speed 5474.52 samples/sec Loss 4.2139 LearningRate 0.0473 Epoch: 12 Global Step: 129130 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:05:51,053-Speed 5512.11 samples/sec Loss 4.2275 LearningRate 0.0473 Epoch: 12 Global Step: 129140 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:05:58,539-Speed 5471.62 samples/sec Loss 4.2418 LearningRate 0.0473 Epoch: 12 Global Step: 129150 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:06:06,053-Speed 5452.30 samples/sec Loss 4.2211 LearningRate 0.0473 Epoch: 12 Global Step: 129160 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:06:13,512-Speed 5491.53 samples/sec Loss 4.2080 LearningRate 0.0473 Epoch: 12 Global Step: 129170 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:06:21,035-Speed 5445.32 samples/sec Loss 4.2956 LearningRate 0.0473 Epoch: 12 Global Step: 129180 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:06:28,525-Speed 5469.60 samples/sec Loss 4.2118 LearningRate 0.0473 Epoch: 12 Global Step: 129190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:06:36,082-Speed 5420.95 samples/sec Loss 4.2156 LearningRate 0.0472 Epoch: 12 Global Step: 129200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:06:43,498-Speed 5524.00 samples/sec Loss 4.1902 LearningRate 0.0472 Epoch: 12 Global Step: 129210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:06:50,943-Speed 5502.09 samples/sec Loss 4.1861 LearningRate 0.0472 Epoch: 12 Global Step: 129220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:06:58,379-Speed 5509.02 samples/sec Loss 4.2300 LearningRate 0.0472 Epoch: 12 Global Step: 129230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:07:06,047-Speed 5342.67 samples/sec Loss 4.1758 LearningRate 0.0472 Epoch: 12 Global Step: 129240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:07:13,537-Speed 5468.86 samples/sec Loss 4.1852 LearningRate 0.0472 Epoch: 12 Global Step: 129250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:07:20,948-Speed 5528.31 samples/sec Loss 4.2074 LearningRate 0.0472 Epoch: 12 Global Step: 129260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:07:28,448-Speed 5461.64 samples/sec Loss 4.2131 LearningRate 0.0472 Epoch: 12 Global Step: 129270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:07:35,970-Speed 5446.00 samples/sec Loss 4.1513 LearningRate 0.0471 Epoch: 12 Global Step: 129280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:07:43,556-Speed 5400.27 samples/sec Loss 4.1953 LearningRate 0.0471 Epoch: 12 Global Step: 129290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:07:51,126-Speed 5412.24 samples/sec Loss 4.1797 LearningRate 0.0471 Epoch: 12 Global Step: 129300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:07:58,641-Speed 5451.29 samples/sec Loss 4.2183 LearningRate 0.0471 Epoch: 12 Global Step: 129310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:08:06,120-Speed 5476.95 samples/sec Loss 4.2457 LearningRate 0.0471 Epoch: 12 Global Step: 129320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:08:13,651-Speed 5439.45 samples/sec Loss 4.1560 LearningRate 0.0471 Epoch: 12 Global Step: 129330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:08:21,100-Speed 5499.86 samples/sec Loss 4.1993 LearningRate 0.0471 Epoch: 12 Global Step: 129340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:08:28,600-Speed 5461.41 samples/sec Loss 4.1741 LearningRate 0.0471 Epoch: 12 Global Step: 129350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:08:36,080-Speed 5476.74 samples/sec Loss 4.1648 LearningRate 0.0470 Epoch: 12 Global Step: 129360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:08:43,536-Speed 5494.52 samples/sec Loss 4.2128 LearningRate 0.0470 Epoch: 12 Global Step: 129370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:08:51,123-Speed 5399.18 samples/sec Loss 4.1618 LearningRate 0.0470 Epoch: 12 Global Step: 129380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:08:58,743-Speed 5376.72 samples/sec Loss 4.2199 LearningRate 0.0470 Epoch: 12 Global Step: 129390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:09:06,291-Speed 5426.82 samples/sec Loss 4.1730 LearningRate 0.0470 Epoch: 12 Global Step: 129400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:09:13,790-Speed 5462.49 samples/sec Loss 4.2190 LearningRate 0.0470 Epoch: 12 Global Step: 129410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:09:21,312-Speed 5445.93 samples/sec Loss 4.1833 LearningRate 0.0470 Epoch: 12 Global Step: 129420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:09:28,927-Speed 5380.36 samples/sec Loss 4.2524 LearningRate 0.0470 Epoch: 12 Global Step: 129430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:09:36,455-Speed 5441.27 samples/sec Loss 4.2249 LearningRate 0.0470 Epoch: 12 Global Step: 129440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:09:44,016-Speed 5417.65 samples/sec Loss 4.2031 LearningRate 0.0469 Epoch: 12 Global Step: 129450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:09:51,689-Speed 5339.16 samples/sec Loss 4.2148 LearningRate 0.0469 Epoch: 12 Global Step: 129460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:09:59,129-Speed 5506.30 samples/sec Loss 4.1931 LearningRate 0.0469 Epoch: 12 Global Step: 129470 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:10:06,599-Speed 5483.52 samples/sec Loss 4.1698 LearningRate 0.0469 Epoch: 12 Global Step: 129480 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:10:14,094-Speed 5466.15 samples/sec Loss 4.2150 LearningRate 0.0469 Epoch: 12 Global Step: 129490 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:10:21,549-Speed 5495.02 samples/sec Loss 4.1608 LearningRate 0.0469 Epoch: 12 Global Step: 129500 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:10:29,000-Speed 5497.60 samples/sec Loss 4.2178 LearningRate 0.0469 Epoch: 12 Global Step: 129510 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:10:36,481-Speed 5476.24 samples/sec Loss 4.1792 LearningRate 0.0469 Epoch: 12 Global Step: 129520 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:10:44,102-Speed 5374.93 samples/sec Loss 4.2170 LearningRate 0.0468 Epoch: 12 Global Step: 129530 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:10:51,633-Speed 5439.70 samples/sec Loss 4.1657 LearningRate 0.0468 Epoch: 12 Global Step: 129540 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:10:59,162-Speed 5441.07 samples/sec Loss 4.2486 LearningRate 0.0468 Epoch: 12 Global Step: 129550 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:11:06,733-Speed 5410.72 samples/sec Loss 4.1582 LearningRate 0.0468 Epoch: 12 Global Step: 129560 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:11:14,143-Speed 5528.79 samples/sec Loss 4.1867 LearningRate 0.0468 Epoch: 12 Global Step: 129570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:11:21,600-Speed 5492.66 samples/sec Loss 4.2168 LearningRate 0.0468 Epoch: 12 Global Step: 129580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:11:29,089-Speed 5471.13 samples/sec Loss 4.1983 LearningRate 0.0468 Epoch: 12 Global Step: 129590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:11:36,570-Speed 5475.47 samples/sec Loss 4.2064 LearningRate 0.0468 Epoch: 12 Global Step: 129600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:11:44,004-Speed 5510.39 samples/sec Loss 4.1849 LearningRate 0.0467 Epoch: 12 Global Step: 129610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:11:51,646-Speed 5360.90 samples/sec Loss 4.1962 LearningRate 0.0467 Epoch: 12 Global Step: 129620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:11:59,064-Speed 5522.28 samples/sec Loss 4.1969 LearningRate 0.0467 Epoch: 12 Global Step: 129630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:12:06,480-Speed 5524.11 samples/sec Loss 4.2159 LearningRate 0.0467 Epoch: 12 Global Step: 129640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:12:13,942-Speed 5489.73 samples/sec Loss 4.2573 LearningRate 0.0467 Epoch: 12 Global Step: 129650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:12:21,404-Speed 5489.98 samples/sec Loss 4.1852 LearningRate 0.0467 Epoch: 12 Global Step: 129660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:12:28,834-Speed 5513.23 samples/sec Loss 4.1708 LearningRate 0.0467 Epoch: 12 Global Step: 129670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:12:36,255-Speed 5520.47 samples/sec Loss 4.1902 LearningRate 0.0467 Epoch: 12 Global Step: 129680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:12:43,676-Speed 5520.51 samples/sec Loss 4.1331 LearningRate 0.0467 Epoch: 12 Global Step: 129690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:12:51,185-Speed 5455.13 samples/sec Loss 4.1717 LearningRate 0.0466 Epoch: 12 Global Step: 129700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:12:58,661-Speed 5479.50 samples/sec Loss 4.1524 LearningRate 0.0466 Epoch: 12 Global Step: 129710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:13:06,191-Speed 5440.80 samples/sec Loss 4.1594 LearningRate 0.0466 Epoch: 12 Global Step: 129720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:13:13,666-Speed 5480.51 samples/sec Loss 4.1857 LearningRate 0.0466 Epoch: 12 Global Step: 129730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:13:21,087-Speed 5520.18 samples/sec Loss 4.1780 LearningRate 0.0466 Epoch: 12 Global Step: 129740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:13:28,583-Speed 5464.83 samples/sec Loss 4.1921 LearningRate 0.0466 Epoch: 12 Global Step: 129750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:13:36,129-Speed 5429.02 samples/sec Loss 4.1780 LearningRate 0.0466 Epoch: 12 Global Step: 129760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:13:43,642-Speed 5452.36 samples/sec Loss 4.1677 LearningRate 0.0466 Epoch: 12 Global Step: 129770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-09 00:13:51,284-Speed 5360.54 samples/sec Loss 4.2015 LearningRate 0.0465 Epoch: 12 Global Step: 129780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:13:58,737-Speed 5496.36 samples/sec Loss 4.1457 LearningRate 0.0465 Epoch: 12 Global Step: 129790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:14:06,247-Speed 5454.94 samples/sec Loss 4.1865 LearningRate 0.0465 Epoch: 12 Global Step: 129800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:14:13,851-Speed 5387.39 samples/sec Loss 4.1994 LearningRate 0.0465 Epoch: 12 Global Step: 129810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:14:21,405-Speed 5423.44 samples/sec Loss 4.1851 LearningRate 0.0465 Epoch: 12 Global Step: 129820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:14:28,880-Speed 5479.64 samples/sec Loss 4.1367 LearningRate 0.0465 Epoch: 12 Global Step: 129830 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:14:36,451-Speed 5411.19 samples/sec Loss 4.1884 LearningRate 0.0465 Epoch: 12 Global Step: 129840 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:14:43,992-Speed 5432.67 samples/sec Loss 4.1884 LearningRate 0.0465 Epoch: 12 Global Step: 129850 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:14:51,488-Speed 5464.50 samples/sec Loss 4.1315 LearningRate 0.0464 Epoch: 12 Global Step: 129860 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:14:58,979-Speed 5468.35 samples/sec Loss 4.1678 LearningRate 0.0464 Epoch: 12 Global Step: 129870 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:15:06,553-Speed 5409.04 samples/sec Loss 4.1425 LearningRate 0.0464 Epoch: 12 Global Step: 129880 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:15:14,065-Speed 5453.48 samples/sec Loss 4.1518 LearningRate 0.0464 Epoch: 12 Global Step: 129890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:15:21,528-Speed 5488.99 samples/sec Loss 4.2375 LearningRate 0.0464 Epoch: 12 Global Step: 129900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:15:28,977-Speed 5499.37 samples/sec Loss 4.1748 LearningRate 0.0464 Epoch: 12 Global Step: 129910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:15:36,499-Speed 5446.57 samples/sec Loss 4.1474 LearningRate 0.0464 Epoch: 12 Global Step: 129920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:15:43,984-Speed 5472.80 samples/sec Loss 4.1904 LearningRate 0.0464 Epoch: 12 Global Step: 129930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:15:51,471-Speed 5471.39 samples/sec Loss 4.1680 LearningRate 0.0464 Epoch: 12 Global Step: 129940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:15:59,066-Speed 5393.46 samples/sec Loss 4.1841 LearningRate 0.0463 Epoch: 12 Global Step: 129950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:16:06,613-Speed 5428.63 samples/sec Loss 4.1937 LearningRate 0.0463 Epoch: 12 Global Step: 129960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:16:14,075-Speed 5489.81 samples/sec Loss 4.1614 LearningRate 0.0463 Epoch: 12 Global Step: 129970 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:16:21,490-Speed 5524.64 samples/sec Loss 4.1907 LearningRate 0.0463 Epoch: 12 Global Step: 129980 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:16:28,953-Speed 5489.21 samples/sec Loss 4.1533 LearningRate 0.0463 Epoch: 12 Global Step: 129990 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:16:36,486-Speed 5438.00 samples/sec Loss 4.1715 LearningRate 0.0463 Epoch: 12 Global Step: 130000 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:17:20,781-[lfw][130000]XNorm: 23.324596 Training: 2022-01-09 00:17:20,782-[lfw][130000]Accuracy-Flip: 0.99800+-0.00287 Training: 2022-01-09 00:17:20,782-[lfw][130000]Accuracy-Highest: 0.99817 Training: 2022-01-09 00:18:12,282-[cfp_fp][130000]XNorm: 21.926012 Training: 2022-01-09 00:18:12,283-[cfp_fp][130000]Accuracy-Flip: 0.98914+-0.00420 Training: 2022-01-09 00:18:12,283-[cfp_fp][130000]Accuracy-Highest: 0.99157 Training: 2022-01-09 00:18:56,575-[agedb_30][130000]XNorm: 23.240002 Training: 2022-01-09 00:18:56,576-[agedb_30][130000]Accuracy-Flip: 0.97833+-0.00703 Training: 2022-01-09 00:18:56,577-[agedb_30][130000]Accuracy-Highest: 0.98000 Training: 2022-01-09 00:19:04,167-Speed 277.36 samples/sec Loss 4.1150 LearningRate 0.0463 Epoch: 12 Global Step: 130010 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:19:11,637-Speed 5484.33 samples/sec Loss 4.1609 LearningRate 0.0463 Epoch: 12 Global Step: 130020 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:19:19,153-Speed 5450.32 samples/sec Loss 4.1701 LearningRate 0.0462 Epoch: 12 Global Step: 130030 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:19:26,659-Speed 5457.50 samples/sec Loss 4.1989 LearningRate 0.0462 Epoch: 12 Global Step: 130040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:19:34,245-Speed 5399.92 samples/sec Loss 4.1574 LearningRate 0.0462 Epoch: 12 Global Step: 130050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:19:41,849-Speed 5387.83 samples/sec Loss 4.2265 LearningRate 0.0462 Epoch: 12 Global Step: 130060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:19:49,446-Speed 5391.72 samples/sec Loss 4.1613 LearningRate 0.0462 Epoch: 12 Global Step: 130070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:19:56,905-Speed 5492.19 samples/sec Loss 4.1974 LearningRate 0.0462 Epoch: 12 Global Step: 130080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:20:04,379-Speed 5480.64 samples/sec Loss 4.1668 LearningRate 0.0462 Epoch: 12 Global Step: 130090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:20:11,934-Speed 5423.14 samples/sec Loss 4.1612 LearningRate 0.0462 Epoch: 12 Global Step: 130100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:20:19,432-Speed 5463.74 samples/sec Loss 4.1581 LearningRate 0.0461 Epoch: 12 Global Step: 130110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:20:27,029-Speed 5392.04 samples/sec Loss 4.1751 LearningRate 0.0461 Epoch: 12 Global Step: 130120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:20:34,584-Speed 5421.66 samples/sec Loss 4.1487 LearningRate 0.0461 Epoch: 12 Global Step: 130130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:20:42,140-Speed 5422.11 samples/sec Loss 4.1610 LearningRate 0.0461 Epoch: 12 Global Step: 130140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-09 00:20:49,632-Speed 5467.80 samples/sec Loss 4.1545 LearningRate 0.0461 Epoch: 12 Global Step: 130150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:20:57,159-Speed 5442.42 samples/sec Loss 4.1779 LearningRate 0.0461 Epoch: 12 Global Step: 130160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:21:04,678-Speed 5448.24 samples/sec Loss 4.1965 LearningRate 0.0461 Epoch: 12 Global Step: 130170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:21:12,197-Speed 5448.58 samples/sec Loss 4.1820 LearningRate 0.0461 Epoch: 12 Global Step: 130180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:21:19,715-Speed 5448.75 samples/sec Loss 4.1465 LearningRate 0.0461 Epoch: 12 Global Step: 130190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:21:27,180-Speed 5488.00 samples/sec Loss 4.2097 LearningRate 0.0460 Epoch: 12 Global Step: 130200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:21:34,670-Speed 5468.75 samples/sec Loss 4.1553 LearningRate 0.0460 Epoch: 12 Global Step: 130210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:21:42,081-Speed 5527.85 samples/sec Loss 4.1559 LearningRate 0.0460 Epoch: 12 Global Step: 130220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:21:49,555-Speed 5481.39 samples/sec Loss 4.1768 LearningRate 0.0460 Epoch: 12 Global Step: 130230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:21:56,962-Speed 5530.08 samples/sec Loss 4.1839 LearningRate 0.0460 Epoch: 12 Global Step: 130240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:22:04,437-Speed 5480.87 samples/sec Loss 4.1247 LearningRate 0.0460 Epoch: 12 Global Step: 130250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-09 00:22:11,978-Speed 5432.55 samples/sec Loss 4.1711 LearningRate 0.0460 Epoch: 12 Global Step: 130260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-09 00:22:19,448-Speed 5483.40 samples/sec Loss 4.1627 LearningRate 0.0460 Epoch: 12 Global Step: 130270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:22:26,938-Speed 5469.68 samples/sec Loss 4.1745 LearningRate 0.0459 Epoch: 12 Global Step: 130280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:22:34,507-Speed 5411.93 samples/sec Loss 4.1338 LearningRate 0.0459 Epoch: 12 Global Step: 130290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:22:42,034-Speed 5442.73 samples/sec Loss 4.1567 LearningRate 0.0459 Epoch: 12 Global Step: 130300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:22:49,613-Speed 5405.20 samples/sec Loss 4.1257 LearningRate 0.0459 Epoch: 12 Global Step: 130310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:22:57,201-Speed 5398.61 samples/sec Loss 4.1456 LearningRate 0.0459 Epoch: 12 Global Step: 130320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:23:04,794-Speed 5395.01 samples/sec Loss 4.1461 LearningRate 0.0459 Epoch: 12 Global Step: 130330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:23:12,295-Speed 5461.39 samples/sec Loss 4.1741 LearningRate 0.0459 Epoch: 12 Global Step: 130340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:23:19,791-Speed 5464.81 samples/sec Loss 4.1518 LearningRate 0.0459 Epoch: 12 Global Step: 130350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:23:27,315-Speed 5444.67 samples/sec Loss 4.1308 LearningRate 0.0459 Epoch: 12 Global Step: 130360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:23:34,765-Speed 5499.07 samples/sec Loss 4.1370 LearningRate 0.0458 Epoch: 12 Global Step: 130370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:23:42,274-Speed 5455.49 samples/sec Loss 4.1245 LearningRate 0.0458 Epoch: 12 Global Step: 130380 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:23:49,826-Speed 5424.30 samples/sec Loss 4.1715 LearningRate 0.0458 Epoch: 12 Global Step: 130390 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:23:57,448-Speed 5374.79 samples/sec Loss 4.1778 LearningRate 0.0458 Epoch: 12 Global Step: 130400 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:24:04,923-Speed 5479.92 samples/sec Loss 4.1944 LearningRate 0.0458 Epoch: 12 Global Step: 130410 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:24:12,475-Speed 5424.58 samples/sec Loss 4.1756 LearningRate 0.0458 Epoch: 12 Global Step: 130420 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:24:20,158-Speed 5331.76 samples/sec Loss 4.1597 LearningRate 0.0458 Epoch: 12 Global Step: 130430 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:24:27,706-Speed 5427.78 samples/sec Loss 4.1251 LearningRate 0.0458 Epoch: 12 Global Step: 130440 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:24:35,293-Speed 5398.82 samples/sec Loss 4.1717 LearningRate 0.0457 Epoch: 12 Global Step: 130450 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:24:42,924-Speed 5368.77 samples/sec Loss 4.1235 LearningRate 0.0457 Epoch: 12 Global Step: 130460 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:24:50,481-Speed 5420.98 samples/sec Loss 4.1752 LearningRate 0.0457 Epoch: 12 Global Step: 130470 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:24:58,057-Speed 5407.54 samples/sec Loss 4.1505 LearningRate 0.0457 Epoch: 12 Global Step: 130480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:25:05,655-Speed 5390.90 samples/sec Loss 4.1321 LearningRate 0.0457 Epoch: 12 Global Step: 130490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:25:13,254-Speed 5390.82 samples/sec Loss 4.1439 LearningRate 0.0457 Epoch: 12 Global Step: 130500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:25:20,788-Speed 5438.58 samples/sec Loss 4.1205 LearningRate 0.0457 Epoch: 12 Global Step: 130510 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:25:28,321-Speed 5438.30 samples/sec Loss 4.1289 LearningRate 0.0457 Epoch: 12 Global Step: 130520 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:25:35,886-Speed 5414.64 samples/sec Loss 4.1609 LearningRate 0.0456 Epoch: 12 Global Step: 130530 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:25:43,393-Speed 5457.17 samples/sec Loss 4.2003 LearningRate 0.0456 Epoch: 12 Global Step: 130540 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:25:50,978-Speed 5400.64 samples/sec Loss 4.1387 LearningRate 0.0456 Epoch: 12 Global Step: 130550 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:25:58,466-Speed 5470.85 samples/sec Loss 4.1759 LearningRate 0.0456 Epoch: 12 Global Step: 130560 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:26:06,159-Speed 5324.89 samples/sec Loss 4.1146 LearningRate 0.0456 Epoch: 12 Global Step: 130570 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:26:13,723-Speed 5415.81 samples/sec Loss 4.1373 LearningRate 0.0456 Epoch: 12 Global Step: 130580 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:26:21,268-Speed 5429.29 samples/sec Loss 4.0956 LearningRate 0.0456 Epoch: 12 Global Step: 130590 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:26:28,985-Speed 5309.15 samples/sec Loss 4.1440 LearningRate 0.0456 Epoch: 12 Global Step: 130600 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:26:36,605-Speed 5375.28 samples/sec Loss 4.1489 LearningRate 0.0456 Epoch: 12 Global Step: 130610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:26:44,122-Speed 5450.05 samples/sec Loss 4.1725 LearningRate 0.0455 Epoch: 12 Global Step: 130620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:26:51,657-Speed 5436.56 samples/sec Loss 4.0923 LearningRate 0.0455 Epoch: 12 Global Step: 130630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:26:59,305-Speed 5356.58 samples/sec Loss 4.1105 LearningRate 0.0455 Epoch: 12 Global Step: 130640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:27:07,014-Speed 5313.64 samples/sec Loss 4.1248 LearningRate 0.0455 Epoch: 12 Global Step: 130650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:27:14,546-Speed 5438.78 samples/sec Loss 4.1739 LearningRate 0.0455 Epoch: 12 Global Step: 130660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:27:22,095-Speed 5426.44 samples/sec Loss 4.1222 LearningRate 0.0455 Epoch: 12 Global Step: 130670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:27:29,583-Speed 5471.07 samples/sec Loss 4.1581 LearningRate 0.0455 Epoch: 12 Global Step: 130680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:27:37,097-Speed 5451.87 samples/sec Loss 4.1414 LearningRate 0.0455 Epoch: 12 Global Step: 130690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:27:44,603-Speed 5457.75 samples/sec Loss 4.1382 LearningRate 0.0454 Epoch: 12 Global Step: 130700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:27:52,313-Speed 5313.45 samples/sec Loss 4.1736 LearningRate 0.0454 Epoch: 12 Global Step: 130710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-09 00:27:59,822-Speed 5455.21 samples/sec Loss 4.1370 LearningRate 0.0454 Epoch: 12 Global Step: 130720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:28:07,323-Speed 5461.19 samples/sec Loss 4.1827 LearningRate 0.0454 Epoch: 12 Global Step: 130730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:28:14,859-Speed 5436.66 samples/sec Loss 4.1427 LearningRate 0.0454 Epoch: 12 Global Step: 130740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:28:22,348-Speed 5470.30 samples/sec Loss 4.1131 LearningRate 0.0454 Epoch: 12 Global Step: 130750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:28:30,036-Speed 5328.43 samples/sec Loss 4.1198 LearningRate 0.0454 Epoch: 12 Global Step: 130760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:28:37,672-Speed 5364.90 samples/sec Loss 4.1142 LearningRate 0.0454 Epoch: 12 Global Step: 130770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:28:45,249-Speed 5406.08 samples/sec Loss 4.1313 LearningRate 0.0454 Epoch: 12 Global Step: 130780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:28:52,771-Speed 5446.49 samples/sec Loss 4.1299 LearningRate 0.0453 Epoch: 12 Global Step: 130790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:29:00,306-Speed 5436.95 samples/sec Loss 4.1392 LearningRate 0.0453 Epoch: 12 Global Step: 130800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:29:07,857-Speed 5424.75 samples/sec Loss 4.1612 LearningRate 0.0453 Epoch: 12 Global Step: 130810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:29:15,422-Speed 5415.29 samples/sec Loss 4.0909 LearningRate 0.0453 Epoch: 12 Global Step: 130820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-09 00:29:22,892-Speed 5484.23 samples/sec Loss 4.0808 LearningRate 0.0453 Epoch: 12 Global Step: 130830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:29:30,535-Speed 5359.93 samples/sec Loss 4.0839 LearningRate 0.0453 Epoch: 12 Global Step: 130840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:29:37,988-Speed 5496.60 samples/sec Loss 4.0747 LearningRate 0.0453 Epoch: 12 Global Step: 130850 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:29:45,639-Speed 5353.89 samples/sec Loss 4.0907 LearningRate 0.0453 Epoch: 12 Global Step: 130860 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:29:53,332-Speed 5325.51 samples/sec Loss 4.1411 LearningRate 0.0452 Epoch: 12 Global Step: 130870 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:30:00,904-Speed 5410.60 samples/sec Loss 4.1521 LearningRate 0.0452 Epoch: 12 Global Step: 130880 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:30:08,544-Speed 5361.50 samples/sec Loss 4.1259 LearningRate 0.0452 Epoch: 12 Global Step: 130890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:30:16,068-Speed 5444.80 samples/sec Loss 4.0994 LearningRate 0.0452 Epoch: 12 Global Step: 130900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:30:23,525-Speed 5493.55 samples/sec Loss 4.1349 LearningRate 0.0452 Epoch: 12 Global Step: 130910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:30:30,962-Speed 5508.76 samples/sec Loss 4.1077 LearningRate 0.0452 Epoch: 12 Global Step: 130920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:30:38,511-Speed 5426.49 samples/sec Loss 4.1133 LearningRate 0.0452 Epoch: 12 Global Step: 130930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:30:46,041-Speed 5440.32 samples/sec Loss 4.1410 LearningRate 0.0452 Epoch: 12 Global Step: 130940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-01-09 00:30:53,544-Speed 5459.39 samples/sec Loss 4.0917 LearningRate 0.0452 Epoch: 12 Global Step: 130950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:31:01,012-Speed 5485.77 samples/sec Loss 4.1063 LearningRate 0.0451 Epoch: 12 Global Step: 130960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:31:08,579-Speed 5413.41 samples/sec Loss 4.1048 LearningRate 0.0451 Epoch: 12 Global Step: 130970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-09 00:31:16,167-Speed 5398.87 samples/sec Loss 4.0951 LearningRate 0.0451 Epoch: 12 Global Step: 130980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:31:23,657-Speed 5469.49 samples/sec Loss 4.1057 LearningRate 0.0451 Epoch: 12 Global Step: 130990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:31:31,272-Speed 5379.67 samples/sec Loss 4.1238 LearningRate 0.0451 Epoch: 12 Global Step: 131000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:31:38,845-Speed 5409.48 samples/sec Loss 4.1033 LearningRate 0.0451 Epoch: 12 Global Step: 131010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:31:46,419-Speed 5408.68 samples/sec Loss 4.1126 LearningRate 0.0451 Epoch: 12 Global Step: 131020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:31:54,122-Speed 5318.05 samples/sec Loss 4.0621 LearningRate 0.0451 Epoch: 12 Global Step: 131030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:32:01,715-Speed 5395.37 samples/sec Loss 4.1464 LearningRate 0.0450 Epoch: 12 Global Step: 131040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:32:09,244-Speed 5440.38 samples/sec Loss 4.1061 LearningRate 0.0450 Epoch: 12 Global Step: 131050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:32:16,718-Speed 5481.35 samples/sec Loss 4.1063 LearningRate 0.0450 Epoch: 12 Global Step: 131060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:32:24,214-Speed 5464.98 samples/sec Loss 4.1007 LearningRate 0.0450 Epoch: 12 Global Step: 131070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:32:31,691-Speed 5478.67 samples/sec Loss 4.0737 LearningRate 0.0450 Epoch: 12 Global Step: 131080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:32:39,171-Speed 5476.96 samples/sec Loss 4.0854 LearningRate 0.0450 Epoch: 12 Global Step: 131090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:32:46,792-Speed 5374.74 samples/sec Loss 4.1046 LearningRate 0.0450 Epoch: 12 Global Step: 131100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:32:54,305-Speed 5453.37 samples/sec Loss 4.0984 LearningRate 0.0450 Epoch: 12 Global Step: 131110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:33:01,838-Speed 5437.75 samples/sec Loss 4.0607 LearningRate 0.0450 Epoch: 12 Global Step: 131120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:33:09,401-Speed 5416.50 samples/sec Loss 4.1372 LearningRate 0.0449 Epoch: 12 Global Step: 131130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:33:16,882-Speed 5476.13 samples/sec Loss 4.1374 LearningRate 0.0449 Epoch: 12 Global Step: 131140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:33:24,335-Speed 5496.09 samples/sec Loss 4.0810 LearningRate 0.0449 Epoch: 12 Global Step: 131150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 00:33:31,918-Speed 5402.13 samples/sec Loss 4.1336 LearningRate 0.0449 Epoch: 12 Global Step: 131160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 00:33:39,451-Speed 5438.37 samples/sec Loss 4.0752 LearningRate 0.0449 Epoch: 12 Global Step: 131170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 00:33:46,978-Speed 5442.52 samples/sec Loss 4.1445 LearningRate 0.0449 Epoch: 12 Global Step: 131180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 00:33:54,488-Speed 5455.08 samples/sec Loss 4.0596 LearningRate 0.0449 Epoch: 12 Global Step: 131190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 00:34:02,053-Speed 5414.84 samples/sec Loss 4.0922 LearningRate 0.0449 Epoch: 12 Global Step: 131200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 00:34:09,607-Speed 5423.15 samples/sec Loss 4.1295 LearningRate 0.0448 Epoch: 12 Global Step: 131210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:34:17,201-Speed 5393.90 samples/sec Loss 4.1709 LearningRate 0.0448 Epoch: 12 Global Step: 131220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:34:24,772-Speed 5411.16 samples/sec Loss 4.0999 LearningRate 0.0448 Epoch: 12 Global Step: 131230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:34:32,308-Speed 5436.04 samples/sec Loss 4.0710 LearningRate 0.0448 Epoch: 12 Global Step: 131240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:34:39,866-Speed 5420.15 samples/sec Loss 4.1242 LearningRate 0.0448 Epoch: 12 Global Step: 131250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:34:47,451-Speed 5400.64 samples/sec Loss 4.1029 LearningRate 0.0448 Epoch: 12 Global Step: 131260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:34:54,979-Speed 5442.06 samples/sec Loss 4.0874 LearningRate 0.0448 Epoch: 12 Global Step: 131270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:35:02,494-Speed 5450.97 samples/sec Loss 4.1265 LearningRate 0.0448 Epoch: 12 Global Step: 131280 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:35:10,076-Speed 5402.61 samples/sec Loss 4.0552 LearningRate 0.0448 Epoch: 12 Global Step: 131290 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:35:17,603-Speed 5442.44 samples/sec Loss 4.0900 LearningRate 0.0447 Epoch: 12 Global Step: 131300 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:35:25,172-Speed 5412.21 samples/sec Loss 4.0987 LearningRate 0.0447 Epoch: 12 Global Step: 131310 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:35:32,706-Speed 5437.67 samples/sec Loss 4.0707 LearningRate 0.0447 Epoch: 12 Global Step: 131320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:35:40,278-Speed 5410.03 samples/sec Loss 4.0141 LearningRate 0.0447 Epoch: 12 Global Step: 131330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:35:47,802-Speed 5444.27 samples/sec Loss 4.1368 LearningRate 0.0447 Epoch: 12 Global Step: 131340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:35:55,313-Speed 5454.37 samples/sec Loss 4.1201 LearningRate 0.0447 Epoch: 12 Global Step: 131350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:36:02,937-Speed 5373.28 samples/sec Loss 4.0701 LearningRate 0.0447 Epoch: 12 Global Step: 131360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:36:10,533-Speed 5392.93 samples/sec Loss 4.1048 LearningRate 0.0447 Epoch: 12 Global Step: 131370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:36:18,051-Speed 5448.37 samples/sec Loss 4.0609 LearningRate 0.0446 Epoch: 12 Global Step: 131380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:36:25,727-Speed 5337.45 samples/sec Loss 4.1031 LearningRate 0.0446 Epoch: 12 Global Step: 131390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:36:33,314-Speed 5398.89 samples/sec Loss 4.0687 LearningRate 0.0446 Epoch: 12 Global Step: 131400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:36:40,952-Speed 5363.57 samples/sec Loss 4.0583 LearningRate 0.0446 Epoch: 12 Global Step: 131410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:36:48,474-Speed 5446.12 samples/sec Loss 4.1312 LearningRate 0.0446 Epoch: 12 Global Step: 131420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:36:56,012-Speed 5434.33 samples/sec Loss 4.1061 LearningRate 0.0446 Epoch: 12 Global Step: 131430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:37:03,531-Speed 5448.08 samples/sec Loss 4.0718 LearningRate 0.0446 Epoch: 12 Global Step: 131440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:37:11,082-Speed 5425.38 samples/sec Loss 4.0855 LearningRate 0.0446 Epoch: 12 Global Step: 131450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:37:18,698-Speed 5378.36 samples/sec Loss 4.1415 LearningRate 0.0446 Epoch: 12 Global Step: 131460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:37:26,347-Speed 5356.41 samples/sec Loss 4.0485 LearningRate 0.0445 Epoch: 12 Global Step: 131470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:37:33,965-Speed 5376.86 samples/sec Loss 4.0846 LearningRate 0.0445 Epoch: 12 Global Step: 131480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 00:37:41,463-Speed 5463.61 samples/sec Loss 4.0829 LearningRate 0.0445 Epoch: 12 Global Step: 131490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:37:48,961-Speed 5463.24 samples/sec Loss 4.0490 LearningRate 0.0445 Epoch: 12 Global Step: 131500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:37:56,488-Speed 5443.05 samples/sec Loss 4.0706 LearningRate 0.0445 Epoch: 12 Global Step: 131510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:38:04,091-Speed 5388.34 samples/sec Loss 4.0771 LearningRate 0.0445 Epoch: 12 Global Step: 131520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:38:11,521-Speed 5512.94 samples/sec Loss 4.0918 LearningRate 0.0445 Epoch: 12 Global Step: 131530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:38:19,023-Speed 5460.46 samples/sec Loss 4.0779 LearningRate 0.0445 Epoch: 12 Global Step: 131540 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:38:26,509-Speed 5472.80 samples/sec Loss 4.0747 LearningRate 0.0444 Epoch: 12 Global Step: 131550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:38:34,055-Speed 5428.49 samples/sec Loss 4.0614 LearningRate 0.0444 Epoch: 12 Global Step: 131560 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:38:41,594-Speed 5434.17 samples/sec Loss 4.1162 LearningRate 0.0444 Epoch: 12 Global Step: 131570 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:38:49,055-Speed 5490.28 samples/sec Loss 4.1203 LearningRate 0.0444 Epoch: 12 Global Step: 131580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:38:56,491-Speed 5509.31 samples/sec Loss 4.1070 LearningRate 0.0444 Epoch: 12 Global Step: 131590 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:39:04,048-Speed 5420.73 samples/sec Loss 4.0499 LearningRate 0.0444 Epoch: 12 Global Step: 131600 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:39:11,501-Speed 5496.39 samples/sec Loss 4.0267 LearningRate 0.0444 Epoch: 12 Global Step: 131610 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:39:19,006-Speed 5458.39 samples/sec Loss 4.0899 LearningRate 0.0444 Epoch: 12 Global Step: 131620 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:39:26,536-Speed 5440.57 samples/sec Loss 4.0915 LearningRate 0.0444 Epoch: 12 Global Step: 131630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:39:34,058-Speed 5446.47 samples/sec Loss 4.0571 LearningRate 0.0443 Epoch: 12 Global Step: 131640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:39:41,517-Speed 5491.76 samples/sec Loss 4.1019 LearningRate 0.0443 Epoch: 12 Global Step: 131650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:39:49,051-Speed 5437.25 samples/sec Loss 4.0819 LearningRate 0.0443 Epoch: 12 Global Step: 131660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:39:56,593-Speed 5431.93 samples/sec Loss 4.0659 LearningRate 0.0443 Epoch: 12 Global Step: 131670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:40:04,269-Speed 5336.90 samples/sec Loss 4.0906 LearningRate 0.0443 Epoch: 12 Global Step: 131680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:40:11,810-Speed 5432.30 samples/sec Loss 4.0166 LearningRate 0.0443 Epoch: 12 Global Step: 131690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:40:19,318-Speed 5456.43 samples/sec Loss 4.0766 LearningRate 0.0443 Epoch: 12 Global Step: 131700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:40:26,822-Speed 5459.09 samples/sec Loss 4.0760 LearningRate 0.0443 Epoch: 12 Global Step: 131710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:40:34,322-Speed 5461.72 samples/sec Loss 4.0554 LearningRate 0.0442 Epoch: 12 Global Step: 131720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:40:41,833-Speed 5454.11 samples/sec Loss 4.0688 LearningRate 0.0442 Epoch: 12 Global Step: 131730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:40:49,382-Speed 5426.56 samples/sec Loss 4.0893 LearningRate 0.0442 Epoch: 12 Global Step: 131740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:40:56,836-Speed 5495.86 samples/sec Loss 4.0372 LearningRate 0.0442 Epoch: 12 Global Step: 131750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:41:04,483-Speed 5357.16 samples/sec Loss 4.0986 LearningRate 0.0442 Epoch: 12 Global Step: 131760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:41:11,966-Speed 5474.48 samples/sec Loss 4.0743 LearningRate 0.0442 Epoch: 12 Global Step: 131770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:41:19,503-Speed 5435.35 samples/sec Loss 4.0959 LearningRate 0.0442 Epoch: 12 Global Step: 131780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:41:27,081-Speed 5405.26 samples/sec Loss 4.0642 LearningRate 0.0442 Epoch: 12 Global Step: 131790 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:41:34,669-Speed 5398.65 samples/sec Loss 4.0992 LearningRate 0.0442 Epoch: 12 Global Step: 131800 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:41:42,243-Speed 5408.84 samples/sec Loss 4.0671 LearningRate 0.0441 Epoch: 12 Global Step: 131810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:41:49,723-Speed 5476.60 samples/sec Loss 4.0882 LearningRate 0.0441 Epoch: 12 Global Step: 131820 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:41:57,192-Speed 5484.63 samples/sec Loss 4.0907 LearningRate 0.0441 Epoch: 12 Global Step: 131830 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:42:04,719-Speed 5442.22 samples/sec Loss 3.9949 LearningRate 0.0441 Epoch: 12 Global Step: 131840 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:42:12,200-Speed 5476.29 samples/sec Loss 4.0157 LearningRate 0.0441 Epoch: 12 Global Step: 131850 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:42:19,679-Speed 5477.44 samples/sec Loss 4.0568 LearningRate 0.0441 Epoch: 12 Global Step: 131860 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:42:27,218-Speed 5433.63 samples/sec Loss 4.0440 LearningRate 0.0441 Epoch: 12 Global Step: 131870 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:42:34,688-Speed 5483.94 samples/sec Loss 4.0631 LearningRate 0.0441 Epoch: 12 Global Step: 131880 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:42:42,223-Speed 5436.61 samples/sec Loss 4.0555 LearningRate 0.0440 Epoch: 12 Global Step: 131890 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:42:49,742-Speed 5448.61 samples/sec Loss 4.0287 LearningRate 0.0440 Epoch: 12 Global Step: 131900 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:42:57,314-Speed 5409.81 samples/sec Loss 4.0524 LearningRate 0.0440 Epoch: 12 Global Step: 131910 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:43:04,956-Speed 5360.46 samples/sec Loss 4.0645 LearningRate 0.0440 Epoch: 12 Global Step: 131920 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:43:12,486-Speed 5440.21 samples/sec Loss 4.1447 LearningRate 0.0440 Epoch: 12 Global Step: 131930 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:43:20,037-Speed 5425.37 samples/sec Loss 4.0613 LearningRate 0.0440 Epoch: 12 Global Step: 131940 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:43:27,650-Speed 5380.91 samples/sec Loss 4.0958 LearningRate 0.0440 Epoch: 12 Global Step: 131950 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:43:35,237-Speed 5399.71 samples/sec Loss 4.0567 LearningRate 0.0440 Epoch: 12 Global Step: 131960 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:43:42,926-Speed 5327.88 samples/sec Loss 4.0659 LearningRate 0.0440 Epoch: 12 Global Step: 131970 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:43:50,503-Speed 5406.56 samples/sec Loss 4.0749 LearningRate 0.0439 Epoch: 12 Global Step: 131980 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:43:57,988-Speed 5472.67 samples/sec Loss 4.0723 LearningRate 0.0439 Epoch: 12 Global Step: 131990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:44:05,786-Speed 5253.83 samples/sec Loss 4.0366 LearningRate 0.0439 Epoch: 12 Global Step: 132000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:44:49,804-[lfw][132000]XNorm: 23.863765 Training: 2022-01-09 00:44:49,805-[lfw][132000]Accuracy-Flip: 0.99750+-0.00291 Training: 2022-01-09 00:44:49,805-[lfw][132000]Accuracy-Highest: 0.99817 Training: 2022-01-09 00:45:41,109-[cfp_fp][132000]XNorm: 22.283643 Training: 2022-01-09 00:45:41,110-[cfp_fp][132000]Accuracy-Flip: 0.99100+-0.00523 Training: 2022-01-09 00:45:41,110-[cfp_fp][132000]Accuracy-Highest: 0.99157 Training: 2022-01-09 00:46:25,456-[agedb_30][132000]XNorm: 23.641398 Training: 2022-01-09 00:46:25,457-[agedb_30][132000]Accuracy-Flip: 0.98067+-0.00742 Training: 2022-01-09 00:46:25,458-[agedb_30][132000]Accuracy-Highest: 0.98067 Training: 2022-01-09 00:46:33,034-Speed 278.17 samples/sec Loss 4.0845 LearningRate 0.0439 Epoch: 12 Global Step: 132010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:46:40,502-Speed 5484.76 samples/sec Loss 4.0163 LearningRate 0.0439 Epoch: 12 Global Step: 132020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:46:48,112-Speed 5383.07 samples/sec Loss 4.0342 LearningRate 0.0439 Epoch: 12 Global Step: 132030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:46:55,614-Speed 5460.98 samples/sec Loss 4.0280 LearningRate 0.0439 Epoch: 12 Global Step: 132040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:47:03,107-Speed 5467.18 samples/sec Loss 4.0602 LearningRate 0.0439 Epoch: 12 Global Step: 132050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:47:10,597-Speed 5469.38 samples/sec Loss 4.0997 LearningRate 0.0438 Epoch: 12 Global Step: 132060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:47:18,085-Speed 5470.51 samples/sec Loss 3.9804 LearningRate 0.0438 Epoch: 12 Global Step: 132070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:47:25,707-Speed 5374.47 samples/sec Loss 3.9982 LearningRate 0.0438 Epoch: 12 Global Step: 132080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:47:33,279-Speed 5410.23 samples/sec Loss 4.0139 LearningRate 0.0438 Epoch: 12 Global Step: 132090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:47:40,832-Speed 5423.97 samples/sec Loss 4.0292 LearningRate 0.0438 Epoch: 12 Global Step: 132100 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:47:48,461-Speed 5369.30 samples/sec Loss 4.0294 LearningRate 0.0438 Epoch: 12 Global Step: 132110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:47:55,957-Speed 5465.07 samples/sec Loss 4.0704 LearningRate 0.0438 Epoch: 12 Global Step: 132120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:48:03,494-Speed 5435.47 samples/sec Loss 4.0337 LearningRate 0.0438 Epoch: 12 Global Step: 132130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:48:11,035-Speed 5432.22 samples/sec Loss 4.0535 LearningRate 0.0438 Epoch: 12 Global Step: 132140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:48:18,671-Speed 5364.93 samples/sec Loss 4.0565 LearningRate 0.0437 Epoch: 12 Global Step: 132150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:48:26,151-Speed 5476.76 samples/sec Loss 4.0650 LearningRate 0.0437 Epoch: 12 Global Step: 132160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:48:33,700-Speed 5426.57 samples/sec Loss 4.0296 LearningRate 0.0437 Epoch: 12 Global Step: 132170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:48:41,229-Speed 5441.11 samples/sec Loss 4.0465 LearningRate 0.0437 Epoch: 12 Global Step: 132180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:48:48,726-Speed 5464.12 samples/sec Loss 4.0644 LearningRate 0.0437 Epoch: 12 Global Step: 132190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:48:56,249-Speed 5445.23 samples/sec Loss 4.0698 LearningRate 0.0437 Epoch: 12 Global Step: 132200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:49:03,774-Speed 5444.03 samples/sec Loss 4.0304 LearningRate 0.0437 Epoch: 12 Global Step: 132210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:49:11,266-Speed 5468.65 samples/sec Loss 4.0288 LearningRate 0.0437 Epoch: 12 Global Step: 132220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:49:18,898-Speed 5367.38 samples/sec Loss 4.0344 LearningRate 0.0437 Epoch: 12 Global Step: 132230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:49:26,435-Speed 5435.09 samples/sec Loss 4.0123 LearningRate 0.0436 Epoch: 12 Global Step: 132240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 00:49:34,062-Speed 5371.18 samples/sec Loss 4.0394 LearningRate 0.0436 Epoch: 12 Global Step: 132250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 00:49:41,780-Speed 5307.70 samples/sec Loss 4.0495 LearningRate 0.0436 Epoch: 12 Global Step: 132260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:49:49,532-Speed 5284.44 samples/sec Loss 4.0333 LearningRate 0.0436 Epoch: 12 Global Step: 132270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:49:57,089-Speed 5420.67 samples/sec Loss 4.0347 LearningRate 0.0436 Epoch: 12 Global Step: 132280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:50:04,729-Speed 5361.73 samples/sec Loss 4.0360 LearningRate 0.0436 Epoch: 12 Global Step: 132290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:50:12,173-Speed 5503.79 samples/sec Loss 4.0513 LearningRate 0.0436 Epoch: 12 Global Step: 132300 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:50:19,734-Speed 5417.67 samples/sec Loss 4.0386 LearningRate 0.0436 Epoch: 12 Global Step: 132310 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:50:29,541-Speed 4176.93 samples/sec Loss 4.0809 LearningRate 0.0435 Epoch: 12 Global Step: 132320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:50:37,175-Speed 5366.84 samples/sec Loss 4.0570 LearningRate 0.0435 Epoch: 12 Global Step: 132330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:50:44,731-Speed 5421.54 samples/sec Loss 4.0533 LearningRate 0.0435 Epoch: 12 Global Step: 132340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:50:52,253-Speed 5445.69 samples/sec Loss 4.0021 LearningRate 0.0435 Epoch: 12 Global Step: 132350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:50:59,762-Speed 5455.09 samples/sec Loss 4.0101 LearningRate 0.0435 Epoch: 12 Global Step: 132360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:51:07,354-Speed 5396.36 samples/sec Loss 4.0198 LearningRate 0.0435 Epoch: 12 Global Step: 132370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:51:14,924-Speed 5411.76 samples/sec Loss 4.0552 LearningRate 0.0435 Epoch: 12 Global Step: 132380 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:51:22,455-Speed 5439.46 samples/sec Loss 4.0688 LearningRate 0.0435 Epoch: 12 Global Step: 132390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:51:30,075-Speed 5376.25 samples/sec Loss 4.0434 LearningRate 0.0435 Epoch: 12 Global Step: 132400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:51:37,605-Speed 5440.35 samples/sec Loss 4.0590 LearningRate 0.0434 Epoch: 12 Global Step: 132410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:51:45,147-Speed 5430.93 samples/sec Loss 4.0220 LearningRate 0.0434 Epoch: 12 Global Step: 132420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:51:52,774-Speed 5371.19 samples/sec Loss 3.9861 LearningRate 0.0434 Epoch: 12 Global Step: 132430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:52:00,371-Speed 5392.32 samples/sec Loss 4.0551 LearningRate 0.0434 Epoch: 12 Global Step: 132440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:52:07,879-Speed 5456.34 samples/sec Loss 4.0585 LearningRate 0.0434 Epoch: 12 Global Step: 132450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:52:15,550-Speed 5340.57 samples/sec Loss 4.0770 LearningRate 0.0434 Epoch: 12 Global Step: 132460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:52:23,147-Speed 5392.31 samples/sec Loss 4.0583 LearningRate 0.0434 Epoch: 12 Global Step: 132470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:52:30,992-Speed 5221.45 samples/sec Loss 4.0203 LearningRate 0.0434 Epoch: 12 Global Step: 132480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:52:38,550-Speed 5420.30 samples/sec Loss 4.0270 LearningRate 0.0433 Epoch: 12 Global Step: 132490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:52:46,113-Speed 5416.72 samples/sec Loss 4.0747 LearningRate 0.0433 Epoch: 12 Global Step: 132500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 00:52:53,693-Speed 5404.84 samples/sec Loss 3.9836 LearningRate 0.0433 Epoch: 12 Global Step: 132510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 00:53:01,329-Speed 5364.53 samples/sec Loss 4.0450 LearningRate 0.0433 Epoch: 12 Global Step: 132520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:53:08,939-Speed 5383.72 samples/sec Loss 4.0634 LearningRate 0.0433 Epoch: 12 Global Step: 132530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:53:16,507-Speed 5412.90 samples/sec Loss 4.0324 LearningRate 0.0433 Epoch: 12 Global Step: 132540 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:53:24,050-Speed 5430.51 samples/sec Loss 4.0656 LearningRate 0.0433 Epoch: 12 Global Step: 132550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:53:31,637-Speed 5399.50 samples/sec Loss 4.0088 LearningRate 0.0433 Epoch: 12 Global Step: 132560 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:53:39,301-Speed 5345.04 samples/sec Loss 3.9835 LearningRate 0.0433 Epoch: 12 Global Step: 132570 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:53:46,827-Speed 5443.74 samples/sec Loss 4.0636 LearningRate 0.0432 Epoch: 12 Global Step: 132580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:53:54,345-Speed 5448.48 samples/sec Loss 4.0147 LearningRate 0.0432 Epoch: 12 Global Step: 132590 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:54:01,898-Speed 5423.87 samples/sec Loss 4.0071 LearningRate 0.0432 Epoch: 12 Global Step: 132600 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:54:09,470-Speed 5409.81 samples/sec Loss 3.9989 LearningRate 0.0432 Epoch: 12 Global Step: 132610 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:54:17,050-Speed 5405.27 samples/sec Loss 4.0005 LearningRate 0.0432 Epoch: 12 Global Step: 132620 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:54:24,549-Speed 5462.04 samples/sec Loss 4.0082 LearningRate 0.0432 Epoch: 12 Global Step: 132630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:54:32,052-Speed 5460.07 samples/sec Loss 4.0385 LearningRate 0.0432 Epoch: 12 Global Step: 132640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:54:39,925-Speed 5203.75 samples/sec Loss 4.0482 LearningRate 0.0432 Epoch: 12 Global Step: 132650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:54:47,560-Speed 5365.03 samples/sec Loss 4.0271 LearningRate 0.0432 Epoch: 12 Global Step: 132660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:54:55,100-Speed 5432.87 samples/sec Loss 4.0400 LearningRate 0.0431 Epoch: 12 Global Step: 132670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:55:02,645-Speed 5429.70 samples/sec Loss 4.0372 LearningRate 0.0431 Epoch: 12 Global Step: 132680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:55:10,165-Speed 5447.74 samples/sec Loss 4.0147 LearningRate 0.0431 Epoch: 12 Global Step: 132690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:55:17,743-Speed 5406.15 samples/sec Loss 4.0522 LearningRate 0.0431 Epoch: 12 Global Step: 132700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:55:25,236-Speed 5466.77 samples/sec Loss 3.9911 LearningRate 0.0431 Epoch: 12 Global Step: 132710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:55:32,808-Speed 5410.59 samples/sec Loss 4.0846 LearningRate 0.0431 Epoch: 12 Global Step: 132720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:55:40,383-Speed 5408.42 samples/sec Loss 4.0280 LearningRate 0.0431 Epoch: 12 Global Step: 132730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:55:47,999-Speed 5378.96 samples/sec Loss 4.0289 LearningRate 0.0431 Epoch: 12 Global Step: 132740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:55:55,467-Speed 5485.37 samples/sec Loss 4.0197 LearningRate 0.0430 Epoch: 12 Global Step: 132750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:56:03,035-Speed 5412.87 samples/sec Loss 4.0056 LearningRate 0.0430 Epoch: 12 Global Step: 132760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:56:10,566-Speed 5439.76 samples/sec Loss 4.0446 LearningRate 0.0430 Epoch: 12 Global Step: 132770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:56:18,006-Speed 5506.06 samples/sec Loss 3.9983 LearningRate 0.0430 Epoch: 12 Global Step: 132780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:56:25,486-Speed 5476.48 samples/sec Loss 4.0237 LearningRate 0.0430 Epoch: 12 Global Step: 132790 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:56:32,993-Speed 5456.95 samples/sec Loss 4.0101 LearningRate 0.0430 Epoch: 12 Global Step: 132800 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:56:40,528-Speed 5437.79 samples/sec Loss 4.0335 LearningRate 0.0430 Epoch: 12 Global Step: 132810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:56:47,994-Speed 5487.62 samples/sec Loss 4.0028 LearningRate 0.0430 Epoch: 12 Global Step: 132820 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:56:55,500-Speed 5457.47 samples/sec Loss 3.9951 LearningRate 0.0430 Epoch: 12 Global Step: 132830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:57:02,978-Speed 5477.69 samples/sec Loss 3.9972 LearningRate 0.0429 Epoch: 12 Global Step: 132840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:57:10,542-Speed 5415.68 samples/sec Loss 4.0102 LearningRate 0.0429 Epoch: 12 Global Step: 132850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:57:18,038-Speed 5465.36 samples/sec Loss 4.0317 LearningRate 0.0429 Epoch: 12 Global Step: 132860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:57:25,498-Speed 5491.64 samples/sec Loss 3.9783 LearningRate 0.0429 Epoch: 12 Global Step: 132870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:57:32,966-Speed 5485.42 samples/sec Loss 3.9889 LearningRate 0.0429 Epoch: 12 Global Step: 132880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:57:40,518-Speed 5424.40 samples/sec Loss 3.9870 LearningRate 0.0429 Epoch: 12 Global Step: 132890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:57:47,989-Speed 5483.30 samples/sec Loss 3.9864 LearningRate 0.0429 Epoch: 12 Global Step: 132900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:57:55,485-Speed 5464.94 samples/sec Loss 3.9910 LearningRate 0.0429 Epoch: 12 Global Step: 132910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:58:03,144-Speed 5348.20 samples/sec Loss 4.0173 LearningRate 0.0429 Epoch: 12 Global Step: 132920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:58:10,689-Speed 5429.77 samples/sec Loss 4.0106 LearningRate 0.0428 Epoch: 12 Global Step: 132930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 00:58:18,157-Speed 5485.20 samples/sec Loss 4.0116 LearningRate 0.0428 Epoch: 12 Global Step: 132940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:58:25,699-Speed 5432.02 samples/sec Loss 3.9847 LearningRate 0.0428 Epoch: 12 Global Step: 132950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:58:33,267-Speed 5412.60 samples/sec Loss 3.9861 LearningRate 0.0428 Epoch: 12 Global Step: 132960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:58:40,772-Speed 5458.47 samples/sec Loss 4.0254 LearningRate 0.0428 Epoch: 12 Global Step: 132970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:58:48,379-Speed 5385.52 samples/sec Loss 4.0211 LearningRate 0.0428 Epoch: 12 Global Step: 132980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:58:55,967-Speed 5398.76 samples/sec Loss 3.9695 LearningRate 0.0428 Epoch: 12 Global Step: 132990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:59:03,463-Speed 5465.00 samples/sec Loss 3.9619 LearningRate 0.0428 Epoch: 12 Global Step: 133000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:59:11,013-Speed 5425.57 samples/sec Loss 3.9730 LearningRate 0.0427 Epoch: 12 Global Step: 133010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:59:18,588-Speed 5407.99 samples/sec Loss 3.9794 LearningRate 0.0427 Epoch: 12 Global Step: 133020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 00:59:26,100-Speed 5453.24 samples/sec Loss 4.0070 LearningRate 0.0427 Epoch: 12 Global Step: 133030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:59:33,603-Speed 5459.95 samples/sec Loss 3.9897 LearningRate 0.0427 Epoch: 12 Global Step: 133040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:59:41,100-Speed 5463.98 samples/sec Loss 3.9842 LearningRate 0.0427 Epoch: 12 Global Step: 133050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:59:48,616-Speed 5449.87 samples/sec Loss 3.9849 LearningRate 0.0427 Epoch: 12 Global Step: 133060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 00:59:56,175-Speed 5419.74 samples/sec Loss 4.0030 LearningRate 0.0427 Epoch: 12 Global Step: 133070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:00:03,701-Speed 5443.43 samples/sec Loss 4.0188 LearningRate 0.0427 Epoch: 12 Global Step: 133080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:00:11,166-Speed 5487.42 samples/sec Loss 4.0538 LearningRate 0.0427 Epoch: 12 Global Step: 133090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:00:18,749-Speed 5401.51 samples/sec Loss 4.0109 LearningRate 0.0426 Epoch: 12 Global Step: 133100 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:00:26,283-Speed 5437.56 samples/sec Loss 4.0120 LearningRate 0.0426 Epoch: 12 Global Step: 133110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:00:34,007-Speed 5303.92 samples/sec Loss 3.9876 LearningRate 0.0426 Epoch: 12 Global Step: 133120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:00:41,173-Speed 5716.06 samples/sec Loss 3.9885 LearningRate 0.0426 Epoch: 12 Global Step: 133130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:00:48,730-Speed 5420.97 samples/sec Loss 3.9977 LearningRate 0.0426 Epoch: 12 Global Step: 133140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:00:56,369-Speed 5362.99 samples/sec Loss 3.9464 LearningRate 0.0426 Epoch: 12 Global Step: 133150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:01:03,903-Speed 5437.77 samples/sec Loss 3.9626 LearningRate 0.0426 Epoch: 12 Global Step: 133160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:01:11,505-Speed 5388.06 samples/sec Loss 3.9771 LearningRate 0.0426 Epoch: 12 Global Step: 133170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:01:18,994-Speed 5470.12 samples/sec Loss 4.0007 LearningRate 0.0426 Epoch: 12 Global Step: 133180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:01:26,590-Speed 5392.97 samples/sec Loss 3.9632 LearningRate 0.0425 Epoch: 12 Global Step: 133190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:01:34,064-Speed 5481.22 samples/sec Loss 4.0136 LearningRate 0.0425 Epoch: 12 Global Step: 133200 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:01:41,649-Speed 5401.10 samples/sec Loss 4.0004 LearningRate 0.0425 Epoch: 12 Global Step: 133210 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:01:49,126-Speed 5478.26 samples/sec Loss 4.0328 LearningRate 0.0425 Epoch: 12 Global Step: 133220 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:01:56,702-Speed 5407.57 samples/sec Loss 3.9841 LearningRate 0.0425 Epoch: 12 Global Step: 133230 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:02:04,338-Speed 5364.71 samples/sec Loss 3.9934 LearningRate 0.0425 Epoch: 12 Global Step: 133240 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:02:11,894-Speed 5421.64 samples/sec Loss 3.9798 LearningRate 0.0425 Epoch: 12 Global Step: 133250 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:02:19,372-Speed 5478.30 samples/sec Loss 3.9894 LearningRate 0.0425 Epoch: 12 Global Step: 133260 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:02:26,878-Speed 5457.73 samples/sec Loss 3.9594 LearningRate 0.0425 Epoch: 12 Global Step: 133270 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:02:34,357-Speed 5476.84 samples/sec Loss 3.9973 LearningRate 0.0424 Epoch: 12 Global Step: 133280 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:02:41,817-Speed 5491.79 samples/sec Loss 4.0167 LearningRate 0.0424 Epoch: 12 Global Step: 133290 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:02:49,288-Speed 5483.07 samples/sec Loss 4.0087 LearningRate 0.0424 Epoch: 12 Global Step: 133300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:02:56,853-Speed 5415.14 samples/sec Loss 3.9941 LearningRate 0.0424 Epoch: 12 Global Step: 133310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:03:04,363-Speed 5454.92 samples/sec Loss 3.9775 LearningRate 0.0424 Epoch: 12 Global Step: 133320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:03:11,922-Speed 5418.93 samples/sec Loss 3.9760 LearningRate 0.0424 Epoch: 12 Global Step: 133330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:03:19,587-Speed 5344.38 samples/sec Loss 3.9771 LearningRate 0.0424 Epoch: 12 Global Step: 133340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:03:27,396-Speed 5245.97 samples/sec Loss 3.9719 LearningRate 0.0424 Epoch: 12 Global Step: 133350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:03:34,937-Speed 5432.22 samples/sec Loss 4.0166 LearningRate 0.0423 Epoch: 12 Global Step: 133360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:03:42,502-Speed 5415.29 samples/sec Loss 3.9837 LearningRate 0.0423 Epoch: 12 Global Step: 133370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:03:50,101-Speed 5391.01 samples/sec Loss 3.9718 LearningRate 0.0423 Epoch: 12 Global Step: 133380 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:03:57,643-Speed 5431.51 samples/sec Loss 3.9875 LearningRate 0.0423 Epoch: 12 Global Step: 133390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:04:05,143-Speed 5461.75 samples/sec Loss 4.0198 LearningRate 0.0423 Epoch: 12 Global Step: 133400 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:04:12,629-Speed 5472.38 samples/sec Loss 3.9385 LearningRate 0.0423 Epoch: 12 Global Step: 133410 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:04:20,166-Speed 5435.18 samples/sec Loss 3.9050 LearningRate 0.0423 Epoch: 12 Global Step: 133420 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:04:27,630-Speed 5488.20 samples/sec Loss 3.9619 LearningRate 0.0423 Epoch: 12 Global Step: 133430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:04:35,216-Speed 5400.76 samples/sec Loss 3.9187 LearningRate 0.0423 Epoch: 12 Global Step: 133440 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:04:42,774-Speed 5420.27 samples/sec Loss 3.9625 LearningRate 0.0422 Epoch: 12 Global Step: 133450 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:04:50,364-Speed 5397.03 samples/sec Loss 3.9567 LearningRate 0.0422 Epoch: 12 Global Step: 133460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:04:57,862-Speed 5463.26 samples/sec Loss 3.9271 LearningRate 0.0422 Epoch: 12 Global Step: 133470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:05:05,375-Speed 5453.18 samples/sec Loss 3.9771 LearningRate 0.0422 Epoch: 12 Global Step: 133480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:05:12,884-Speed 5455.65 samples/sec Loss 3.9646 LearningRate 0.0422 Epoch: 12 Global Step: 133490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:05:20,427-Speed 5430.87 samples/sec Loss 3.9505 LearningRate 0.0422 Epoch: 12 Global Step: 133500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:05:28,039-Speed 5381.61 samples/sec Loss 4.0038 LearningRate 0.0422 Epoch: 12 Global Step: 133510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:05:35,515-Speed 5479.55 samples/sec Loss 3.9896 LearningRate 0.0422 Epoch: 12 Global Step: 133520 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:05:43,131-Speed 5379.04 samples/sec Loss 4.0104 LearningRate 0.0422 Epoch: 12 Global Step: 133530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:05:50,678-Speed 5427.90 samples/sec Loss 3.8964 LearningRate 0.0421 Epoch: 12 Global Step: 133540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:05:58,184-Speed 5457.24 samples/sec Loss 3.9588 LearningRate 0.0421 Epoch: 12 Global Step: 133550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:06:05,733-Speed 5427.47 samples/sec Loss 4.0027 LearningRate 0.0421 Epoch: 12 Global Step: 133560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:06:13,232-Speed 5462.12 samples/sec Loss 4.0007 LearningRate 0.0421 Epoch: 12 Global Step: 133570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:06:20,715-Speed 5474.87 samples/sec Loss 3.9818 LearningRate 0.0421 Epoch: 12 Global Step: 133580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:06:28,338-Speed 5373.29 samples/sec Loss 3.9629 LearningRate 0.0421 Epoch: 12 Global Step: 133590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:06:35,941-Speed 5388.77 samples/sec Loss 4.0087 LearningRate 0.0421 Epoch: 12 Global Step: 133600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:06:43,513-Speed 5409.97 samples/sec Loss 3.9790 LearningRate 0.0421 Epoch: 12 Global Step: 133610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:06:50,955-Speed 5504.59 samples/sec Loss 3.9494 LearningRate 0.0421 Epoch: 12 Global Step: 133620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:06:58,477-Speed 5445.80 samples/sec Loss 3.9575 LearningRate 0.0420 Epoch: 12 Global Step: 133630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:07:05,974-Speed 5464.23 samples/sec Loss 3.9999 LearningRate 0.0420 Epoch: 12 Global Step: 133640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:07:13,497-Speed 5445.93 samples/sec Loss 3.9738 LearningRate 0.0420 Epoch: 12 Global Step: 133650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:07:20,969-Speed 5482.55 samples/sec Loss 3.9756 LearningRate 0.0420 Epoch: 12 Global Step: 133660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:07:28,533-Speed 5415.59 samples/sec Loss 4.0230 LearningRate 0.0420 Epoch: 12 Global Step: 133670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:07:36,143-Speed 5383.41 samples/sec Loss 3.9874 LearningRate 0.0420 Epoch: 12 Global Step: 133680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:07:43,697-Speed 5423.10 samples/sec Loss 3.9508 LearningRate 0.0420 Epoch: 12 Global Step: 133690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:07:51,224-Speed 5442.66 samples/sec Loss 3.9444 LearningRate 0.0420 Epoch: 12 Global Step: 133700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:07:58,758-Speed 5436.86 samples/sec Loss 3.9454 LearningRate 0.0419 Epoch: 12 Global Step: 133710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:08:06,250-Speed 5468.23 samples/sec Loss 3.9779 LearningRate 0.0419 Epoch: 12 Global Step: 133720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:08:13,815-Speed 5415.11 samples/sec Loss 3.9601 LearningRate 0.0419 Epoch: 12 Global Step: 133730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:08:21,390-Speed 5407.64 samples/sec Loss 3.9761 LearningRate 0.0419 Epoch: 12 Global Step: 133740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:08:28,929-Speed 5433.94 samples/sec Loss 3.9630 LearningRate 0.0419 Epoch: 12 Global Step: 133750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:08:36,442-Speed 5452.52 samples/sec Loss 3.9301 LearningRate 0.0419 Epoch: 12 Global Step: 133760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:08:43,971-Speed 5441.15 samples/sec Loss 3.9862 LearningRate 0.0419 Epoch: 12 Global Step: 133770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:08:51,475-Speed 5458.66 samples/sec Loss 3.9496 LearningRate 0.0419 Epoch: 12 Global Step: 133780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:08:59,000-Speed 5444.39 samples/sec Loss 3.9898 LearningRate 0.0419 Epoch: 12 Global Step: 133790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:09:06,576-Speed 5406.87 samples/sec Loss 3.9896 LearningRate 0.0418 Epoch: 12 Global Step: 133800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:09:14,047-Speed 5483.03 samples/sec Loss 3.9377 LearningRate 0.0418 Epoch: 12 Global Step: 133810 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-09 01:09:21,864-Speed 5240.98 samples/sec Loss 3.9379 LearningRate 0.0418 Epoch: 12 Global Step: 133820 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-09 01:09:29,365-Speed 5460.86 samples/sec Loss 3.9486 LearningRate 0.0418 Epoch: 12 Global Step: 133830 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-09 01:09:37,064-Speed 5320.92 samples/sec Loss 3.9763 LearningRate 0.0418 Epoch: 12 Global Step: 133840 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-09 01:09:44,704-Speed 5361.73 samples/sec Loss 4.0048 LearningRate 0.0418 Epoch: 12 Global Step: 133850 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-09 01:09:52,278-Speed 5408.55 samples/sec Loss 3.9392 LearningRate 0.0418 Epoch: 12 Global Step: 133860 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-09 01:09:59,791-Speed 5453.03 samples/sec Loss 3.9301 LearningRate 0.0418 Epoch: 12 Global Step: 133870 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-09 01:10:07,374-Speed 5402.09 samples/sec Loss 3.9267 LearningRate 0.0418 Epoch: 12 Global Step: 133880 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-09 01:10:14,928-Speed 5422.95 samples/sec Loss 3.9343 LearningRate 0.0417 Epoch: 12 Global Step: 133890 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-09 01:10:22,556-Speed 5370.11 samples/sec Loss 3.9684 LearningRate 0.0417 Epoch: 12 Global Step: 133900 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-09 01:10:30,193-Speed 5364.43 samples/sec Loss 3.9221 LearningRate 0.0417 Epoch: 12 Global Step: 133910 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:10:37,878-Speed 5330.61 samples/sec Loss 3.9087 LearningRate 0.0417 Epoch: 12 Global Step: 133920 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:10:45,358-Speed 5476.71 samples/sec Loss 3.9882 LearningRate 0.0417 Epoch: 12 Global Step: 133930 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:10:52,887-Speed 5440.67 samples/sec Loss 3.9640 LearningRate 0.0417 Epoch: 12 Global Step: 133940 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:11:00,499-Speed 5381.97 samples/sec Loss 3.9284 LearningRate 0.0417 Epoch: 12 Global Step: 133950 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:11:07,950-Speed 5497.83 samples/sec Loss 3.9788 LearningRate 0.0417 Epoch: 12 Global Step: 133960 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:11:15,426-Speed 5479.47 samples/sec Loss 3.9817 LearningRate 0.0417 Epoch: 12 Global Step: 133970 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:11:22,962-Speed 5436.53 samples/sec Loss 3.9831 LearningRate 0.0416 Epoch: 12 Global Step: 133980 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:11:30,393-Speed 5512.49 samples/sec Loss 3.9269 LearningRate 0.0416 Epoch: 12 Global Step: 133990 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:11:37,887-Speed 5466.39 samples/sec Loss 3.9428 LearningRate 0.0416 Epoch: 12 Global Step: 134000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:12:22,025-[lfw][134000]XNorm: 24.295032 Training: 2022-01-09 01:12:22,026-[lfw][134000]Accuracy-Flip: 0.99800+-0.00277 Training: 2022-01-09 01:12:22,026-[lfw][134000]Accuracy-Highest: 0.99817 Training: 2022-01-09 01:13:13,800-[cfp_fp][134000]XNorm: 22.341688 Training: 2022-01-09 01:13:13,801-[cfp_fp][134000]Accuracy-Flip: 0.99157+-0.00386 Training: 2022-01-09 01:13:13,802-[cfp_fp][134000]Accuracy-Highest: 0.99157 Training: 2022-01-09 01:13:58,203-[agedb_30][134000]XNorm: 23.833087 Training: 2022-01-09 01:13:58,203-[agedb_30][134000]Accuracy-Flip: 0.97900+-0.00569 Training: 2022-01-09 01:13:58,204-[agedb_30][134000]Accuracy-Highest: 0.98067 Training: 2022-01-09 01:14:05,827-Speed 276.87 samples/sec Loss 3.9396 LearningRate 0.0416 Epoch: 12 Global Step: 134010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:14:13,320-Speed 5467.55 samples/sec Loss 3.9851 LearningRate 0.0416 Epoch: 12 Global Step: 134020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:14:20,807-Speed 5471.06 samples/sec Loss 3.9810 LearningRate 0.0416 Epoch: 12 Global Step: 134030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:14:28,246-Speed 5507.45 samples/sec Loss 3.9674 LearningRate 0.0416 Epoch: 12 Global Step: 134040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:14:35,718-Speed 5481.93 samples/sec Loss 3.9290 LearningRate 0.0416 Epoch: 12 Global Step: 134050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:14:43,200-Speed 5475.72 samples/sec Loss 3.9697 LearningRate 0.0416 Epoch: 12 Global Step: 134060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:14:50,739-Speed 5433.98 samples/sec Loss 3.9216 LearningRate 0.0415 Epoch: 12 Global Step: 134070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:14:58,149-Speed 5528.32 samples/sec Loss 3.9248 LearningRate 0.0415 Epoch: 12 Global Step: 134080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:15:05,698-Speed 5426.64 samples/sec Loss 3.9387 LearningRate 0.0415 Epoch: 12 Global Step: 134090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:15:13,201-Speed 5460.05 samples/sec Loss 3.9264 LearningRate 0.0415 Epoch: 12 Global Step: 134100 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:15:20,757-Speed 5421.23 samples/sec Loss 3.9363 LearningRate 0.0415 Epoch: 12 Global Step: 134110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:15:28,299-Speed 5431.40 samples/sec Loss 3.9350 LearningRate 0.0415 Epoch: 12 Global Step: 134120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:15:35,921-Speed 5375.31 samples/sec Loss 3.9355 LearningRate 0.0415 Epoch: 12 Global Step: 134130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:15:43,392-Speed 5483.55 samples/sec Loss 3.9272 LearningRate 0.0415 Epoch: 12 Global Step: 134140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:15:50,911-Speed 5447.94 samples/sec Loss 3.8716 LearningRate 0.0414 Epoch: 12 Global Step: 134150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:15:58,482-Speed 5410.37 samples/sec Loss 3.9520 LearningRate 0.0414 Epoch: 12 Global Step: 134160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:16:06,066-Speed 5402.13 samples/sec Loss 3.9266 LearningRate 0.0414 Epoch: 12 Global Step: 134170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:16:13,708-Speed 5360.24 samples/sec Loss 3.9394 LearningRate 0.0414 Epoch: 12 Global Step: 134180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:16:21,175-Speed 5486.16 samples/sec Loss 3.8778 LearningRate 0.0414 Epoch: 12 Global Step: 134190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:16:28,750-Speed 5408.17 samples/sec Loss 3.9327 LearningRate 0.0414 Epoch: 12 Global Step: 134200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:16:36,227-Speed 5478.95 samples/sec Loss 3.9444 LearningRate 0.0414 Epoch: 12 Global Step: 134210 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:16:43,674-Speed 5501.08 samples/sec Loss 3.9854 LearningRate 0.0414 Epoch: 12 Global Step: 134220 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:16:51,238-Speed 5415.40 samples/sec Loss 3.9402 LearningRate 0.0414 Epoch: 12 Global Step: 134230 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:16:58,700-Speed 5489.82 samples/sec Loss 3.9303 LearningRate 0.0413 Epoch: 12 Global Step: 134240 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:17:06,217-Speed 5449.57 samples/sec Loss 3.9328 LearningRate 0.0413 Epoch: 12 Global Step: 134250 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:17:13,675-Speed 5493.31 samples/sec Loss 3.9139 LearningRate 0.0413 Epoch: 12 Global Step: 134260 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:17:21,157-Speed 5475.13 samples/sec Loss 3.9034 LearningRate 0.0413 Epoch: 12 Global Step: 134270 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:17:28,679-Speed 5445.56 samples/sec Loss 3.9146 LearningRate 0.0413 Epoch: 12 Global Step: 134280 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:17:36,193-Speed 5452.05 samples/sec Loss 3.9661 LearningRate 0.0413 Epoch: 12 Global Step: 134290 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:17:43,688-Speed 5465.55 samples/sec Loss 3.9457 LearningRate 0.0413 Epoch: 12 Global Step: 134300 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:17:51,205-Speed 5450.12 samples/sec Loss 3.9403 LearningRate 0.0413 Epoch: 12 Global Step: 134310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:17:58,688-Speed 5474.44 samples/sec Loss 3.8955 LearningRate 0.0413 Epoch: 12 Global Step: 134320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:18:06,209-Speed 5446.51 samples/sec Loss 3.9197 LearningRate 0.0412 Epoch: 12 Global Step: 134330 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:18:13,715-Speed 5457.43 samples/sec Loss 3.9112 LearningRate 0.0412 Epoch: 12 Global Step: 134340 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:18:21,304-Speed 5398.45 samples/sec Loss 3.9280 LearningRate 0.0412 Epoch: 12 Global Step: 134350 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:18:29,002-Speed 5321.14 samples/sec Loss 3.9366 LearningRate 0.0412 Epoch: 12 Global Step: 134360 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:18:36,511-Speed 5455.46 samples/sec Loss 3.9335 LearningRate 0.0412 Epoch: 12 Global Step: 134370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:18:44,052-Speed 5432.98 samples/sec Loss 3.9321 LearningRate 0.0412 Epoch: 12 Global Step: 134380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:18:51,558-Speed 5457.88 samples/sec Loss 3.9530 LearningRate 0.0412 Epoch: 12 Global Step: 134390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:18:59,118-Speed 5418.68 samples/sec Loss 3.9322 LearningRate 0.0412 Epoch: 12 Global Step: 134400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:19:06,581-Speed 5488.78 samples/sec Loss 3.9628 LearningRate 0.0412 Epoch: 12 Global Step: 134410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:19:14,099-Speed 5449.33 samples/sec Loss 3.9622 LearningRate 0.0411 Epoch: 12 Global Step: 134420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:19:21,572-Speed 5481.91 samples/sec Loss 3.9605 LearningRate 0.0411 Epoch: 12 Global Step: 134430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:19:29,040-Speed 5485.22 samples/sec Loss 3.9102 LearningRate 0.0411 Epoch: 12 Global Step: 134440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:19:36,554-Speed 5451.45 samples/sec Loss 3.9641 LearningRate 0.0411 Epoch: 12 Global Step: 134450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:19:44,094-Speed 5433.25 samples/sec Loss 3.9326 LearningRate 0.0411 Epoch: 12 Global Step: 134460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:19:51,628-Speed 5438.17 samples/sec Loss 3.9273 LearningRate 0.0411 Epoch: 12 Global Step: 134470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:19:59,207-Speed 5404.66 samples/sec Loss 3.9314 LearningRate 0.0411 Epoch: 12 Global Step: 134480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:20:06,660-Speed 5496.32 samples/sec Loss 3.9346 LearningRate 0.0411 Epoch: 12 Global Step: 134490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:20:14,202-Speed 5431.73 samples/sec Loss 3.9020 LearningRate 0.0411 Epoch: 12 Global Step: 134500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:20:21,846-Speed 5359.87 samples/sec Loss 3.8788 LearningRate 0.0410 Epoch: 12 Global Step: 134510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 01:20:29,424-Speed 5405.75 samples/sec Loss 3.9431 LearningRate 0.0410 Epoch: 12 Global Step: 134520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:20:36,944-Speed 5447.13 samples/sec Loss 3.9140 LearningRate 0.0410 Epoch: 12 Global Step: 134530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:20:44,458-Speed 5452.39 samples/sec Loss 3.9303 LearningRate 0.0410 Epoch: 12 Global Step: 134540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:20:52,056-Speed 5391.48 samples/sec Loss 3.9377 LearningRate 0.0410 Epoch: 12 Global Step: 134550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:20:59,613-Speed 5421.41 samples/sec Loss 3.8965 LearningRate 0.0410 Epoch: 12 Global Step: 134560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:21:07,153-Speed 5432.53 samples/sec Loss 3.9474 LearningRate 0.0410 Epoch: 12 Global Step: 134570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:21:14,622-Speed 5485.26 samples/sec Loss 3.9161 LearningRate 0.0410 Epoch: 12 Global Step: 134580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:21:22,158-Speed 5436.33 samples/sec Loss 3.9308 LearningRate 0.0410 Epoch: 12 Global Step: 134590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:21:29,763-Speed 5386.24 samples/sec Loss 3.9070 LearningRate 0.0409 Epoch: 12 Global Step: 134600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:21:37,351-Speed 5398.40 samples/sec Loss 3.9138 LearningRate 0.0409 Epoch: 12 Global Step: 134610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:21:44,883-Speed 5439.68 samples/sec Loss 3.8699 LearningRate 0.0409 Epoch: 12 Global Step: 134620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 01:21:52,363-Speed 5476.09 samples/sec Loss 3.9345 LearningRate 0.0409 Epoch: 12 Global Step: 134630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:21:59,963-Speed 5390.28 samples/sec Loss 3.9178 LearningRate 0.0409 Epoch: 12 Global Step: 134640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:22:07,459-Speed 5464.89 samples/sec Loss 3.9045 LearningRate 0.0409 Epoch: 12 Global Step: 134650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:22:15,048-Speed 5398.17 samples/sec Loss 3.9256 LearningRate 0.0409 Epoch: 12 Global Step: 134660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:22:22,560-Speed 5453.53 samples/sec Loss 3.9077 LearningRate 0.0409 Epoch: 12 Global Step: 134670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:22:30,187-Speed 5370.97 samples/sec Loss 3.9145 LearningRate 0.0409 Epoch: 12 Global Step: 134680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:22:37,928-Speed 5291.77 samples/sec Loss 3.9366 LearningRate 0.0408 Epoch: 12 Global Step: 134690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:22:45,477-Speed 5426.54 samples/sec Loss 3.9407 LearningRate 0.0408 Epoch: 12 Global Step: 134700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:22:52,992-Speed 5451.45 samples/sec Loss 3.9004 LearningRate 0.0408 Epoch: 12 Global Step: 134710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:23:00,588-Speed 5392.86 samples/sec Loss 3.8958 LearningRate 0.0408 Epoch: 12 Global Step: 134720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:23:08,239-Speed 5353.98 samples/sec Loss 3.9070 LearningRate 0.0408 Epoch: 12 Global Step: 134730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:23:15,772-Speed 5438.20 samples/sec Loss 3.9342 LearningRate 0.0408 Epoch: 12 Global Step: 134740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:23:23,236-Speed 5488.13 samples/sec Loss 3.9018 LearningRate 0.0408 Epoch: 12 Global Step: 134750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:23:30,953-Speed 5308.92 samples/sec Loss 3.8712 LearningRate 0.0408 Epoch: 12 Global Step: 134760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:23:38,534-Speed 5403.70 samples/sec Loss 3.9306 LearningRate 0.0408 Epoch: 12 Global Step: 134770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:23:46,138-Speed 5386.68 samples/sec Loss 3.9109 LearningRate 0.0407 Epoch: 12 Global Step: 134780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:23:53,619-Speed 5476.21 samples/sec Loss 3.9256 LearningRate 0.0407 Epoch: 12 Global Step: 134790 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:24:01,115-Speed 5465.22 samples/sec Loss 3.9147 LearningRate 0.0407 Epoch: 12 Global Step: 134800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:24:24,850-Speed 1725.80 samples/sec Loss 3.8855 LearningRate 0.0407 Epoch: 13 Global Step: 134810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:24:32,349-Speed 5462.51 samples/sec Loss 3.9445 LearningRate 0.0407 Epoch: 13 Global Step: 134820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:24:39,827-Speed 5478.53 samples/sec Loss 3.9019 LearningRate 0.0407 Epoch: 13 Global Step: 134830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:24:47,348-Speed 5446.78 samples/sec Loss 3.9364 LearningRate 0.0407 Epoch: 13 Global Step: 134840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:24:54,839-Speed 5468.40 samples/sec Loss 3.9310 LearningRate 0.0407 Epoch: 13 Global Step: 134850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:25:02,350-Speed 5454.39 samples/sec Loss 3.8764 LearningRate 0.0406 Epoch: 13 Global Step: 134860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:25:09,974-Speed 5373.07 samples/sec Loss 3.9357 LearningRate 0.0406 Epoch: 13 Global Step: 134870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:25:17,446-Speed 5482.47 samples/sec Loss 3.9385 LearningRate 0.0406 Epoch: 13 Global Step: 134880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:25:24,865-Speed 5521.06 samples/sec Loss 3.9028 LearningRate 0.0406 Epoch: 13 Global Step: 134890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:25:32,286-Speed 5520.77 samples/sec Loss 3.8888 LearningRate 0.0406 Epoch: 13 Global Step: 134900 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:25:39,725-Speed 5507.00 samples/sec Loss 3.8591 LearningRate 0.0406 Epoch: 13 Global Step: 134910 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:25:47,195-Speed 5483.86 samples/sec Loss 3.9079 LearningRate 0.0406 Epoch: 13 Global Step: 134920 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:25:54,674-Speed 5476.74 samples/sec Loss 3.8940 LearningRate 0.0406 Epoch: 13 Global Step: 134930 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:26:02,172-Speed 5463.62 samples/sec Loss 3.9098 LearningRate 0.0406 Epoch: 13 Global Step: 134940 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:26:09,641-Speed 5485.38 samples/sec Loss 3.8776 LearningRate 0.0405 Epoch: 13 Global Step: 134950 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:26:17,298-Speed 5349.84 samples/sec Loss 3.8686 LearningRate 0.0405 Epoch: 13 Global Step: 134960 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:26:25,128-Speed 5231.62 samples/sec Loss 3.8720 LearningRate 0.0405 Epoch: 13 Global Step: 134970 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:26:32,784-Speed 5350.84 samples/sec Loss 3.8711 LearningRate 0.0405 Epoch: 13 Global Step: 134980 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:26:40,473-Speed 5328.01 samples/sec Loss 3.8262 LearningRate 0.0405 Epoch: 13 Global Step: 134990 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:26:48,143-Speed 5341.01 samples/sec Loss 3.9301 LearningRate 0.0405 Epoch: 13 Global Step: 135000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:26:55,842-Speed 5320.15 samples/sec Loss 3.8230 LearningRate 0.0405 Epoch: 13 Global Step: 135010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:27:03,478-Speed 5365.07 samples/sec Loss 3.8702 LearningRate 0.0405 Epoch: 13 Global Step: 135020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:27:11,140-Speed 5346.64 samples/sec Loss 3.8684 LearningRate 0.0405 Epoch: 13 Global Step: 135030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:27:18,709-Speed 5412.13 samples/sec Loss 3.8979 LearningRate 0.0404 Epoch: 13 Global Step: 135040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:27:26,236-Speed 5442.39 samples/sec Loss 3.9180 LearningRate 0.0404 Epoch: 13 Global Step: 135050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:27:33,773-Speed 5435.32 samples/sec Loss 3.8830 LearningRate 0.0404 Epoch: 13 Global Step: 135060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:27:41,415-Speed 5360.38 samples/sec Loss 3.9087 LearningRate 0.0404 Epoch: 13 Global Step: 135070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:27:48,999-Speed 5401.56 samples/sec Loss 3.8990 LearningRate 0.0404 Epoch: 13 Global Step: 135080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:27:56,611-Speed 5381.53 samples/sec Loss 3.8639 LearningRate 0.0404 Epoch: 13 Global Step: 135090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-09 01:28:04,157-Speed 5429.07 samples/sec Loss 3.8977 LearningRate 0.0404 Epoch: 13 Global Step: 135100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:28:11,793-Speed 5364.96 samples/sec Loss 3.8681 LearningRate 0.0404 Epoch: 13 Global Step: 135110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:28:19,325-Speed 5438.77 samples/sec Loss 3.8842 LearningRate 0.0404 Epoch: 13 Global Step: 135120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:28:26,962-Speed 5363.65 samples/sec Loss 3.8706 LearningRate 0.0403 Epoch: 13 Global Step: 135130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:28:34,517-Speed 5422.75 samples/sec Loss 3.8466 LearningRate 0.0403 Epoch: 13 Global Step: 135140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:28:42,022-Speed 5458.37 samples/sec Loss 3.8460 LearningRate 0.0403 Epoch: 13 Global Step: 135150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:28:49,526-Speed 5459.16 samples/sec Loss 3.8842 LearningRate 0.0403 Epoch: 13 Global Step: 135160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:28:57,097-Speed 5410.70 samples/sec Loss 3.8718 LearningRate 0.0403 Epoch: 13 Global Step: 135170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:29:04,636-Speed 5433.96 samples/sec Loss 3.8034 LearningRate 0.0403 Epoch: 13 Global Step: 135180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:29:12,214-Speed 5405.58 samples/sec Loss 3.8668 LearningRate 0.0403 Epoch: 13 Global Step: 135190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:29:19,754-Speed 5432.88 samples/sec Loss 3.9135 LearningRate 0.0403 Epoch: 13 Global Step: 135200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:29:27,259-Speed 5458.41 samples/sec Loss 3.8740 LearningRate 0.0403 Epoch: 13 Global Step: 135210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:29:34,755-Speed 5464.96 samples/sec Loss 3.8986 LearningRate 0.0402 Epoch: 13 Global Step: 135220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:29:42,238-Speed 5474.27 samples/sec Loss 3.8588 LearningRate 0.0402 Epoch: 13 Global Step: 135230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:29:49,676-Speed 5507.86 samples/sec Loss 3.8966 LearningRate 0.0402 Epoch: 13 Global Step: 135240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:29:57,115-Speed 5507.02 samples/sec Loss 3.8526 LearningRate 0.0402 Epoch: 13 Global Step: 135250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:30:04,609-Speed 5466.40 samples/sec Loss 3.8800 LearningRate 0.0402 Epoch: 13 Global Step: 135260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:30:12,115-Speed 5458.01 samples/sec Loss 3.8810 LearningRate 0.0402 Epoch: 13 Global Step: 135270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:30:19,556-Speed 5505.48 samples/sec Loss 3.8713 LearningRate 0.0402 Epoch: 13 Global Step: 135280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:30:27,147-Speed 5396.23 samples/sec Loss 3.8723 LearningRate 0.0402 Epoch: 13 Global Step: 135290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:30:34,626-Speed 5477.54 samples/sec Loss 3.8629 LearningRate 0.0402 Epoch: 13 Global Step: 135300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 01:30:42,134-Speed 5456.20 samples/sec Loss 3.8707 LearningRate 0.0401 Epoch: 13 Global Step: 135310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 01:30:49,735-Speed 5389.19 samples/sec Loss 3.8665 LearningRate 0.0401 Epoch: 13 Global Step: 135320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-09 01:30:57,239-Speed 5459.44 samples/sec Loss 3.8359 LearningRate 0.0401 Epoch: 13 Global Step: 135330 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-09 01:31:04,820-Speed 5403.71 samples/sec Loss 3.9137 LearningRate 0.0401 Epoch: 13 Global Step: 135340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:31:12,347-Speed 5442.22 samples/sec Loss 3.8651 LearningRate 0.0401 Epoch: 13 Global Step: 135350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:31:19,825-Speed 5478.11 samples/sec Loss 3.8723 LearningRate 0.0401 Epoch: 13 Global Step: 135360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:31:27,305-Speed 5476.98 samples/sec Loss 3.8593 LearningRate 0.0401 Epoch: 13 Global Step: 135370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:31:34,868-Speed 5416.75 samples/sec Loss 3.8656 LearningRate 0.0401 Epoch: 13 Global Step: 135380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:31:42,379-Speed 5453.96 samples/sec Loss 3.8969 LearningRate 0.0401 Epoch: 13 Global Step: 135390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:31:49,892-Speed 5452.47 samples/sec Loss 3.8451 LearningRate 0.0400 Epoch: 13 Global Step: 135400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:31:57,364-Speed 5482.52 samples/sec Loss 3.8287 LearningRate 0.0400 Epoch: 13 Global Step: 135410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:32:04,817-Speed 5496.55 samples/sec Loss 3.8481 LearningRate 0.0400 Epoch: 13 Global Step: 135420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:32:12,378-Speed 5417.73 samples/sec Loss 3.8669 LearningRate 0.0400 Epoch: 13 Global Step: 135430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 01:32:19,898-Speed 5447.94 samples/sec Loss 3.8206 LearningRate 0.0400 Epoch: 13 Global Step: 135440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:32:27,472-Speed 5408.72 samples/sec Loss 3.8682 LearningRate 0.0400 Epoch: 13 Global Step: 135450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:32:34,950-Speed 5477.70 samples/sec Loss 3.9144 LearningRate 0.0400 Epoch: 13 Global Step: 135460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:32:42,447-Speed 5464.30 samples/sec Loss 3.8984 LearningRate 0.0400 Epoch: 13 Global Step: 135470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:32:49,919-Speed 5482.11 samples/sec Loss 3.8964 LearningRate 0.0400 Epoch: 13 Global Step: 135480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:32:57,438-Speed 5448.26 samples/sec Loss 3.9086 LearningRate 0.0399 Epoch: 13 Global Step: 135490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:33:04,967-Speed 5441.63 samples/sec Loss 3.8410 LearningRate 0.0399 Epoch: 13 Global Step: 135500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:33:12,579-Speed 5380.84 samples/sec Loss 3.8912 LearningRate 0.0399 Epoch: 13 Global Step: 135510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:33:20,082-Speed 5460.81 samples/sec Loss 3.8815 LearningRate 0.0399 Epoch: 13 Global Step: 135520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:33:27,691-Speed 5383.45 samples/sec Loss 3.8732 LearningRate 0.0399 Epoch: 13 Global Step: 135530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:33:35,234-Speed 5430.92 samples/sec Loss 3.8711 LearningRate 0.0399 Epoch: 13 Global Step: 135540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 01:33:42,687-Speed 5496.29 samples/sec Loss 3.8904 LearningRate 0.0399 Epoch: 13 Global Step: 135550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 01:33:50,147-Speed 5491.55 samples/sec Loss 3.9092 LearningRate 0.0399 Epoch: 13 Global Step: 135560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 01:33:57,632-Speed 5472.67 samples/sec Loss 3.8242 LearningRate 0.0399 Epoch: 13 Global Step: 135570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 01:34:05,062-Speed 5513.36 samples/sec Loss 3.9079 LearningRate 0.0398 Epoch: 13 Global Step: 135580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:34:12,593-Speed 5439.31 samples/sec Loss 3.8597 LearningRate 0.0398 Epoch: 13 Global Step: 135590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:34:20,074-Speed 5476.49 samples/sec Loss 3.8384 LearningRate 0.0398 Epoch: 13 Global Step: 135600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:34:27,606-Speed 5438.96 samples/sec Loss 3.8431 LearningRate 0.0398 Epoch: 13 Global Step: 135610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:34:35,094-Speed 5470.52 samples/sec Loss 3.8494 LearningRate 0.0398 Epoch: 13 Global Step: 135620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:34:42,566-Speed 5482.31 samples/sec Loss 3.8704 LearningRate 0.0398 Epoch: 13 Global Step: 135630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:34:50,099-Speed 5438.23 samples/sec Loss 3.8507 LearningRate 0.0398 Epoch: 13 Global Step: 135640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:34:57,642-Speed 5430.97 samples/sec Loss 3.8573 LearningRate 0.0398 Epoch: 13 Global Step: 135650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:35:05,088-Speed 5501.52 samples/sec Loss 3.8275 LearningRate 0.0398 Epoch: 13 Global Step: 135660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:35:12,579-Speed 5468.69 samples/sec Loss 3.8845 LearningRate 0.0397 Epoch: 13 Global Step: 135670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:35:20,176-Speed 5392.41 samples/sec Loss 3.8496 LearningRate 0.0397 Epoch: 13 Global Step: 135680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 01:35:27,676-Speed 5462.20 samples/sec Loss 3.8281 LearningRate 0.0397 Epoch: 13 Global Step: 135690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:35:35,110-Speed 5510.27 samples/sec Loss 3.8325 LearningRate 0.0397 Epoch: 13 Global Step: 135700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:35:42,561-Speed 5498.26 samples/sec Loss 3.8903 LearningRate 0.0397 Epoch: 13 Global Step: 135710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:35:50,018-Speed 5493.31 samples/sec Loss 3.8216 LearningRate 0.0397 Epoch: 13 Global Step: 135720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:35:57,479-Speed 5491.18 samples/sec Loss 3.8656 LearningRate 0.0397 Epoch: 13 Global Step: 135730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:36:04,916-Speed 5508.07 samples/sec Loss 3.8864 LearningRate 0.0397 Epoch: 13 Global Step: 135740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:36:12,467-Speed 5425.01 samples/sec Loss 3.8672 LearningRate 0.0397 Epoch: 13 Global Step: 135750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:36:20,142-Speed 5337.26 samples/sec Loss 3.8841 LearningRate 0.0396 Epoch: 13 Global Step: 135760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:36:27,726-Speed 5402.18 samples/sec Loss 3.8893 LearningRate 0.0396 Epoch: 13 Global Step: 135770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:36:35,256-Speed 5439.93 samples/sec Loss 3.8515 LearningRate 0.0396 Epoch: 13 Global Step: 135780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:36:42,896-Speed 5362.39 samples/sec Loss 3.8473 LearningRate 0.0396 Epoch: 13 Global Step: 135790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:36:50,597-Speed 5319.06 samples/sec Loss 3.8377 LearningRate 0.0396 Epoch: 13 Global Step: 135800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:36:58,160-Speed 5417.28 samples/sec Loss 3.8452 LearningRate 0.0396 Epoch: 13 Global Step: 135810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:37:05,766-Speed 5385.55 samples/sec Loss 3.8933 LearningRate 0.0396 Epoch: 13 Global Step: 135820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:37:13,469-Speed 5318.41 samples/sec Loss 3.8921 LearningRate 0.0396 Epoch: 13 Global Step: 135830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:37:21,214-Speed 5289.13 samples/sec Loss 3.8571 LearningRate 0.0396 Epoch: 13 Global Step: 135840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:37:28,760-Speed 5428.59 samples/sec Loss 3.8183 LearningRate 0.0395 Epoch: 13 Global Step: 135850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:37:36,306-Speed 5428.71 samples/sec Loss 3.8245 LearningRate 0.0395 Epoch: 13 Global Step: 135860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:37:43,928-Speed 5374.50 samples/sec Loss 3.8869 LearningRate 0.0395 Epoch: 13 Global Step: 135870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:37:51,548-Speed 5375.92 samples/sec Loss 3.8655 LearningRate 0.0395 Epoch: 13 Global Step: 135880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:37:59,144-Speed 5393.23 samples/sec Loss 3.8585 LearningRate 0.0395 Epoch: 13 Global Step: 135890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:38:06,686-Speed 5431.75 samples/sec Loss 3.8469 LearningRate 0.0395 Epoch: 13 Global Step: 135900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:38:14,226-Speed 5432.94 samples/sec Loss 3.8315 LearningRate 0.0395 Epoch: 13 Global Step: 135910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:38:21,759-Speed 5438.09 samples/sec Loss 3.8875 LearningRate 0.0395 Epoch: 13 Global Step: 135920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:38:29,339-Speed 5404.89 samples/sec Loss 3.8061 LearningRate 0.0395 Epoch: 13 Global Step: 135930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:38:36,957-Speed 5377.12 samples/sec Loss 3.8975 LearningRate 0.0394 Epoch: 13 Global Step: 135940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:38:44,543-Speed 5399.90 samples/sec Loss 3.8129 LearningRate 0.0394 Epoch: 13 Global Step: 135950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:38:52,192-Speed 5355.14 samples/sec Loss 3.9036 LearningRate 0.0394 Epoch: 13 Global Step: 135960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:38:59,715-Speed 5446.04 samples/sec Loss 3.8247 LearningRate 0.0394 Epoch: 13 Global Step: 135970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:39:07,274-Speed 5419.46 samples/sec Loss 3.8042 LearningRate 0.0394 Epoch: 13 Global Step: 135980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:39:14,778-Speed 5458.78 samples/sec Loss 3.8883 LearningRate 0.0394 Epoch: 13 Global Step: 135990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:39:22,247-Speed 5484.61 samples/sec Loss 3.8274 LearningRate 0.0394 Epoch: 13 Global Step: 136000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:40:06,309-[lfw][136000]XNorm: 22.902075 Training: 2022-01-09 01:40:06,310-[lfw][136000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-01-09 01:40:06,311-[lfw][136000]Accuracy-Highest: 0.99817 Training: 2022-01-09 01:40:57,383-[cfp_fp][136000]XNorm: 21.327764 Training: 2022-01-09 01:40:57,384-[cfp_fp][136000]Accuracy-Flip: 0.99029+-0.00455 Training: 2022-01-09 01:40:57,384-[cfp_fp][136000]Accuracy-Highest: 0.99157 Training: 2022-01-09 01:41:41,288-[agedb_30][136000]XNorm: 23.001569 Training: 2022-01-09 01:41:41,289-[agedb_30][136000]Accuracy-Flip: 0.98033+-0.00839 Training: 2022-01-09 01:41:41,289-[agedb_30][136000]Accuracy-Highest: 0.98067 Training: 2022-01-09 01:41:48,868-Speed 279.36 samples/sec Loss 3.8001 LearningRate 0.0394 Epoch: 13 Global Step: 136010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:41:56,400-Speed 5439.19 samples/sec Loss 3.8782 LearningRate 0.0394 Epoch: 13 Global Step: 136020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:42:03,927-Speed 5442.88 samples/sec Loss 3.7791 LearningRate 0.0393 Epoch: 13 Global Step: 136030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:42:11,442-Speed 5450.28 samples/sec Loss 3.8024 LearningRate 0.0393 Epoch: 13 Global Step: 136040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:42:18,905-Speed 5489.11 samples/sec Loss 3.8686 LearningRate 0.0393 Epoch: 13 Global Step: 136050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:42:26,391-Speed 5472.75 samples/sec Loss 3.8608 LearningRate 0.0393 Epoch: 13 Global Step: 136060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:42:33,918-Speed 5441.98 samples/sec Loss 3.8366 LearningRate 0.0393 Epoch: 13 Global Step: 136070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:42:41,439-Speed 5447.38 samples/sec Loss 3.8411 LearningRate 0.0393 Epoch: 13 Global Step: 136080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 01:42:48,960-Speed 5446.47 samples/sec Loss 3.8720 LearningRate 0.0393 Epoch: 13 Global Step: 136090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:42:56,596-Speed 5365.08 samples/sec Loss 3.8593 LearningRate 0.0393 Epoch: 13 Global Step: 136100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:43:04,179-Speed 5401.76 samples/sec Loss 3.8350 LearningRate 0.0393 Epoch: 13 Global Step: 136110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:43:11,733-Speed 5423.17 samples/sec Loss 3.8244 LearningRate 0.0392 Epoch: 13 Global Step: 136120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:43:19,207-Speed 5480.55 samples/sec Loss 3.8324 LearningRate 0.0392 Epoch: 13 Global Step: 136130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:43:26,679-Speed 5483.25 samples/sec Loss 3.8185 LearningRate 0.0392 Epoch: 13 Global Step: 136140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:43:34,163-Speed 5473.14 samples/sec Loss 3.9016 LearningRate 0.0392 Epoch: 13 Global Step: 136150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:43:41,673-Speed 5455.32 samples/sec Loss 3.8679 LearningRate 0.0392 Epoch: 13 Global Step: 136160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:43:49,228-Speed 5422.11 samples/sec Loss 3.8072 LearningRate 0.0392 Epoch: 13 Global Step: 136170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:43:56,768-Speed 5432.92 samples/sec Loss 3.7863 LearningRate 0.0392 Epoch: 13 Global Step: 136180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:44:04,308-Speed 5433.53 samples/sec Loss 3.7997 LearningRate 0.0392 Epoch: 13 Global Step: 136190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 01:44:11,873-Speed 5414.34 samples/sec Loss 3.7990 LearningRate 0.0392 Epoch: 13 Global Step: 136200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:44:19,451-Speed 5406.06 samples/sec Loss 3.8798 LearningRate 0.0392 Epoch: 13 Global Step: 136210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:44:26,967-Speed 5450.71 samples/sec Loss 3.8298 LearningRate 0.0391 Epoch: 13 Global Step: 136220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:44:34,414-Speed 5500.85 samples/sec Loss 3.8189 LearningRate 0.0391 Epoch: 13 Global Step: 136230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:44:41,971-Speed 5420.48 samples/sec Loss 3.8435 LearningRate 0.0391 Epoch: 13 Global Step: 136240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:44:49,489-Speed 5449.21 samples/sec Loss 3.8322 LearningRate 0.0391 Epoch: 13 Global Step: 136250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:44:57,081-Speed 5396.47 samples/sec Loss 3.7697 LearningRate 0.0391 Epoch: 13 Global Step: 136260 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:45:04,699-Speed 5377.30 samples/sec Loss 3.8239 LearningRate 0.0391 Epoch: 13 Global Step: 136270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:45:12,200-Speed 5461.03 samples/sec Loss 3.8464 LearningRate 0.0391 Epoch: 13 Global Step: 136280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:45:19,746-Speed 5428.71 samples/sec Loss 3.8626 LearningRate 0.0391 Epoch: 13 Global Step: 136290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:45:27,311-Speed 5415.87 samples/sec Loss 3.7912 LearningRate 0.0391 Epoch: 13 Global Step: 136300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 01:45:34,753-Speed 5504.22 samples/sec Loss 3.8404 LearningRate 0.0390 Epoch: 13 Global Step: 136310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:45:42,295-Speed 5431.51 samples/sec Loss 3.8562 LearningRate 0.0390 Epoch: 13 Global Step: 136320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:45:49,824-Speed 5441.44 samples/sec Loss 3.8084 LearningRate 0.0390 Epoch: 13 Global Step: 136330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:45:57,325-Speed 5461.01 samples/sec Loss 3.7875 LearningRate 0.0390 Epoch: 13 Global Step: 136340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:46:05,030-Speed 5317.42 samples/sec Loss 3.7794 LearningRate 0.0390 Epoch: 13 Global Step: 136350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:46:12,580-Speed 5425.71 samples/sec Loss 3.8117 LearningRate 0.0390 Epoch: 13 Global Step: 136360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:46:20,112-Speed 5438.96 samples/sec Loss 3.8303 LearningRate 0.0390 Epoch: 13 Global Step: 136370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:46:27,825-Speed 5311.35 samples/sec Loss 3.8259 LearningRate 0.0390 Epoch: 13 Global Step: 136380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:46:35,446-Speed 5374.84 samples/sec Loss 3.8229 LearningRate 0.0390 Epoch: 13 Global Step: 136390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:46:43,017-Speed 5410.68 samples/sec Loss 3.7763 LearningRate 0.0389 Epoch: 13 Global Step: 136400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:46:50,561-Speed 5430.22 samples/sec Loss 3.7874 LearningRate 0.0389 Epoch: 13 Global Step: 136410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 01:46:58,158-Speed 5392.30 samples/sec Loss 3.7955 LearningRate 0.0389 Epoch: 13 Global Step: 136420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:47:05,658-Speed 5462.44 samples/sec Loss 3.8067 LearningRate 0.0389 Epoch: 13 Global Step: 136430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:47:13,217-Speed 5418.80 samples/sec Loss 3.8069 LearningRate 0.0389 Epoch: 13 Global Step: 136440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:47:20,742-Speed 5444.18 samples/sec Loss 3.8331 LearningRate 0.0389 Epoch: 13 Global Step: 136450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:47:28,303-Speed 5417.71 samples/sec Loss 3.8163 LearningRate 0.0389 Epoch: 13 Global Step: 136460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:47:35,821-Speed 5448.99 samples/sec Loss 3.8264 LearningRate 0.0389 Epoch: 13 Global Step: 136470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:47:43,343-Speed 5446.14 samples/sec Loss 3.7909 LearningRate 0.0389 Epoch: 13 Global Step: 136480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:47:50,946-Speed 5388.06 samples/sec Loss 3.7887 LearningRate 0.0388 Epoch: 13 Global Step: 136490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:47:58,560-Speed 5380.05 samples/sec Loss 3.7912 LearningRate 0.0388 Epoch: 13 Global Step: 136500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:48:06,086-Speed 5443.45 samples/sec Loss 3.8010 LearningRate 0.0388 Epoch: 13 Global Step: 136510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:48:13,628-Speed 5431.42 samples/sec Loss 3.8236 LearningRate 0.0388 Epoch: 13 Global Step: 136520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 01:48:21,176-Speed 5427.54 samples/sec Loss 3.8344 LearningRate 0.0388 Epoch: 13 Global Step: 136530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:48:28,758-Speed 5402.68 samples/sec Loss 3.8310 LearningRate 0.0388 Epoch: 13 Global Step: 136540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:48:36,301-Speed 5431.45 samples/sec Loss 3.8233 LearningRate 0.0388 Epoch: 13 Global Step: 136550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:48:43,876-Speed 5407.75 samples/sec Loss 3.8213 LearningRate 0.0388 Epoch: 13 Global Step: 136560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:48:51,428-Speed 5424.59 samples/sec Loss 3.8000 LearningRate 0.0388 Epoch: 13 Global Step: 136570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:48:58,944-Speed 5449.73 samples/sec Loss 3.7620 LearningRate 0.0387 Epoch: 13 Global Step: 136580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:49:06,493-Speed 5427.37 samples/sec Loss 3.8281 LearningRate 0.0387 Epoch: 13 Global Step: 136590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:49:14,029-Speed 5435.50 samples/sec Loss 3.7824 LearningRate 0.0387 Epoch: 13 Global Step: 136600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:49:21,501-Speed 5482.42 samples/sec Loss 3.8093 LearningRate 0.0387 Epoch: 13 Global Step: 136610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:49:29,064-Speed 5416.36 samples/sec Loss 3.8308 LearningRate 0.0387 Epoch: 13 Global Step: 136620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:49:36,566-Speed 5460.98 samples/sec Loss 3.7947 LearningRate 0.0387 Epoch: 13 Global Step: 136630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:49:44,099-Speed 5437.64 samples/sec Loss 3.7933 LearningRate 0.0387 Epoch: 13 Global Step: 136640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:49:51,648-Speed 5427.11 samples/sec Loss 3.7504 LearningRate 0.0387 Epoch: 13 Global Step: 136650 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:49:59,204-Speed 5421.25 samples/sec Loss 3.7847 LearningRate 0.0387 Epoch: 13 Global Step: 136660 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:50:06,714-Speed 5454.79 samples/sec Loss 3.8403 LearningRate 0.0386 Epoch: 13 Global Step: 136670 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:50:14,251-Speed 5434.80 samples/sec Loss 3.7889 LearningRate 0.0386 Epoch: 13 Global Step: 136680 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:50:21,820-Speed 5412.82 samples/sec Loss 3.8198 LearningRate 0.0386 Epoch: 13 Global Step: 136690 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:50:29,423-Speed 5387.31 samples/sec Loss 3.8202 LearningRate 0.0386 Epoch: 13 Global Step: 136700 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:50:36,955-Speed 5440.95 samples/sec Loss 3.8433 LearningRate 0.0386 Epoch: 13 Global Step: 136710 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:50:44,670-Speed 5309.50 samples/sec Loss 3.7806 LearningRate 0.0386 Epoch: 13 Global Step: 136720 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:50:52,323-Speed 5352.70 samples/sec Loss 3.8234 LearningRate 0.0386 Epoch: 13 Global Step: 136730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:50:59,888-Speed 5414.71 samples/sec Loss 3.7830 LearningRate 0.0386 Epoch: 13 Global Step: 136740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:51:07,433-Speed 5430.06 samples/sec Loss 3.7992 LearningRate 0.0386 Epoch: 13 Global Step: 136750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:51:14,989-Speed 5420.96 samples/sec Loss 3.7990 LearningRate 0.0385 Epoch: 13 Global Step: 136760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:51:22,451-Speed 5490.29 samples/sec Loss 3.7776 LearningRate 0.0385 Epoch: 13 Global Step: 136770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:51:30,102-Speed 5353.54 samples/sec Loss 3.8365 LearningRate 0.0385 Epoch: 13 Global Step: 136780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:51:37,735-Speed 5367.15 samples/sec Loss 3.8325 LearningRate 0.0385 Epoch: 13 Global Step: 136790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:51:45,414-Speed 5334.91 samples/sec Loss 3.7528 LearningRate 0.0385 Epoch: 13 Global Step: 136800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:51:53,019-Speed 5387.00 samples/sec Loss 3.8177 LearningRate 0.0385 Epoch: 13 Global Step: 136810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:52:00,562-Speed 5430.22 samples/sec Loss 3.8049 LearningRate 0.0385 Epoch: 13 Global Step: 136820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:52:08,068-Speed 5457.67 samples/sec Loss 3.8293 LearningRate 0.0385 Epoch: 13 Global Step: 136830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:52:15,662-Speed 5394.74 samples/sec Loss 3.7654 LearningRate 0.0385 Epoch: 13 Global Step: 136840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:52:23,186-Speed 5444.17 samples/sec Loss 3.7501 LearningRate 0.0384 Epoch: 13 Global Step: 136850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:52:30,702-Speed 5450.62 samples/sec Loss 3.8242 LearningRate 0.0384 Epoch: 13 Global Step: 136860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:52:38,233-Speed 5439.35 samples/sec Loss 3.7520 LearningRate 0.0384 Epoch: 13 Global Step: 136870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:52:45,822-Speed 5397.66 samples/sec Loss 3.8127 LearningRate 0.0384 Epoch: 13 Global Step: 136880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:52:53,551-Speed 5300.57 samples/sec Loss 3.8179 LearningRate 0.0384 Epoch: 13 Global Step: 136890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:53:01,093-Speed 5431.28 samples/sec Loss 3.7895 LearningRate 0.0384 Epoch: 13 Global Step: 136900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:53:08,692-Speed 5391.07 samples/sec Loss 3.8253 LearningRate 0.0384 Epoch: 13 Global Step: 136910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:53:16,190-Speed 5463.79 samples/sec Loss 3.7737 LearningRate 0.0384 Epoch: 13 Global Step: 136920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:53:23,741-Speed 5425.07 samples/sec Loss 3.7704 LearningRate 0.0384 Epoch: 13 Global Step: 136930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:53:31,262-Speed 5446.38 samples/sec Loss 3.8242 LearningRate 0.0384 Epoch: 13 Global Step: 136940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:53:38,793-Speed 5439.71 samples/sec Loss 3.7498 LearningRate 0.0383 Epoch: 13 Global Step: 136950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:53:46,347-Speed 5422.94 samples/sec Loss 3.7327 LearningRate 0.0383 Epoch: 13 Global Step: 136960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:53:53,924-Speed 5406.45 samples/sec Loss 3.7613 LearningRate 0.0383 Epoch: 13 Global Step: 136970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 01:54:01,522-Speed 5391.17 samples/sec Loss 3.8032 LearningRate 0.0383 Epoch: 13 Global Step: 136980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:54:09,090-Speed 5413.09 samples/sec Loss 3.7838 LearningRate 0.0383 Epoch: 13 Global Step: 136990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:54:16,681-Speed 5397.13 samples/sec Loss 3.7543 LearningRate 0.0383 Epoch: 13 Global Step: 137000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:54:24,324-Speed 5359.28 samples/sec Loss 3.7634 LearningRate 0.0383 Epoch: 13 Global Step: 137010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:54:31,887-Speed 5416.81 samples/sec Loss 3.7730 LearningRate 0.0383 Epoch: 13 Global Step: 137020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:54:39,345-Speed 5492.29 samples/sec Loss 3.8405 LearningRate 0.0383 Epoch: 13 Global Step: 137030 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:54:46,908-Speed 5416.63 samples/sec Loss 3.7769 LearningRate 0.0382 Epoch: 13 Global Step: 137040 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:54:54,512-Speed 5388.65 samples/sec Loss 3.8275 LearningRate 0.0382 Epoch: 13 Global Step: 137050 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:55:02,058-Speed 5428.28 samples/sec Loss 3.7618 LearningRate 0.0382 Epoch: 13 Global Step: 137060 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:55:09,590-Speed 5438.57 samples/sec Loss 3.8272 LearningRate 0.0382 Epoch: 13 Global Step: 137070 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:55:17,118-Speed 5442.10 samples/sec Loss 3.7763 LearningRate 0.0382 Epoch: 13 Global Step: 137080 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:55:24,656-Speed 5434.77 samples/sec Loss 3.7953 LearningRate 0.0382 Epoch: 13 Global Step: 137090 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:55:32,263-Speed 5384.67 samples/sec Loss 3.8016 LearningRate 0.0382 Epoch: 13 Global Step: 137100 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:55:39,765-Speed 5460.64 samples/sec Loss 3.7854 LearningRate 0.0382 Epoch: 13 Global Step: 137110 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:55:47,239-Speed 5481.04 samples/sec Loss 3.7374 LearningRate 0.0382 Epoch: 13 Global Step: 137120 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:55:54,802-Speed 5417.16 samples/sec Loss 3.7691 LearningRate 0.0381 Epoch: 13 Global Step: 137130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:56:02,347-Speed 5429.15 samples/sec Loss 3.7527 LearningRate 0.0381 Epoch: 13 Global Step: 137140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:56:09,890-Speed 5430.53 samples/sec Loss 3.7479 LearningRate 0.0381 Epoch: 13 Global Step: 137150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:56:17,480-Speed 5397.65 samples/sec Loss 3.7705 LearningRate 0.0381 Epoch: 13 Global Step: 137160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:56:25,135-Speed 5351.15 samples/sec Loss 3.8179 LearningRate 0.0381 Epoch: 13 Global Step: 137170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 01:56:32,618-Speed 5474.49 samples/sec Loss 3.7625 LearningRate 0.0381 Epoch: 13 Global Step: 137180 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:56:40,208-Speed 5396.76 samples/sec Loss 3.7926 LearningRate 0.0381 Epoch: 13 Global Step: 137190 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:56:47,764-Speed 5422.18 samples/sec Loss 3.7390 LearningRate 0.0381 Epoch: 13 Global Step: 137200 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:56:55,332-Speed 5413.15 samples/sec Loss 3.8115 LearningRate 0.0381 Epoch: 13 Global Step: 137210 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:57:02,801-Speed 5484.76 samples/sec Loss 3.7665 LearningRate 0.0380 Epoch: 13 Global Step: 137220 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:57:10,352-Speed 5425.05 samples/sec Loss 3.7756 LearningRate 0.0380 Epoch: 13 Global Step: 137230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:57:17,959-Speed 5384.88 samples/sec Loss 3.7745 LearningRate 0.0380 Epoch: 13 Global Step: 137240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:57:25,460-Speed 5461.68 samples/sec Loss 3.7997 LearningRate 0.0380 Epoch: 13 Global Step: 137250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:57:32,939-Speed 5476.98 samples/sec Loss 3.7982 LearningRate 0.0380 Epoch: 13 Global Step: 137260 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-09 01:57:40,418-Speed 5477.27 samples/sec Loss 3.7780 LearningRate 0.0380 Epoch: 13 Global Step: 137270 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-09 01:57:47,933-Speed 5451.64 samples/sec Loss 3.7934 LearningRate 0.0380 Epoch: 13 Global Step: 137280 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-09 01:57:55,442-Speed 5455.78 samples/sec Loss 3.7438 LearningRate 0.0380 Epoch: 13 Global Step: 137290 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-09 01:58:03,036-Speed 5393.87 samples/sec Loss 3.7614 LearningRate 0.0380 Epoch: 13 Global Step: 137300 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-09 01:58:10,527-Speed 5468.69 samples/sec Loss 3.7399 LearningRate 0.0379 Epoch: 13 Global Step: 137310 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-09 01:58:18,091-Speed 5415.52 samples/sec Loss 3.7388 LearningRate 0.0379 Epoch: 13 Global Step: 137320 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-09 01:58:25,051-Speed 5886.10 samples/sec Loss 3.7954 LearningRate 0.0379 Epoch: 13 Global Step: 137330 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-09 01:58:32,361-Speed 5604.30 samples/sec Loss 3.7850 LearningRate 0.0379 Epoch: 13 Global Step: 137340 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-09 01:58:39,852-Speed 5468.44 samples/sec Loss 3.7401 LearningRate 0.0379 Epoch: 13 Global Step: 137350 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-01-09 01:58:47,399-Speed 5428.26 samples/sec Loss 3.7597 LearningRate 0.0379 Epoch: 13 Global Step: 137360 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:58:54,900-Speed 5460.82 samples/sec Loss 3.7965 LearningRate 0.0379 Epoch: 13 Global Step: 137370 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:59:02,535-Speed 5365.79 samples/sec Loss 3.7451 LearningRate 0.0379 Epoch: 13 Global Step: 137380 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:59:10,056-Speed 5446.20 samples/sec Loss 3.7975 LearningRate 0.0379 Epoch: 13 Global Step: 137390 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:59:17,717-Speed 5347.91 samples/sec Loss 3.7738 LearningRate 0.0379 Epoch: 13 Global Step: 137400 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:59:25,205-Speed 5470.86 samples/sec Loss 3.7810 LearningRate 0.0378 Epoch: 13 Global Step: 137410 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:59:32,680-Speed 5480.43 samples/sec Loss 3.7118 LearningRate 0.0378 Epoch: 13 Global Step: 137420 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:59:40,283-Speed 5387.47 samples/sec Loss 3.7303 LearningRate 0.0378 Epoch: 13 Global Step: 137430 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:59:47,802-Speed 5448.62 samples/sec Loss 3.7739 LearningRate 0.0378 Epoch: 13 Global Step: 137440 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 01:59:55,271-Speed 5484.67 samples/sec Loss 3.7598 LearningRate 0.0378 Epoch: 13 Global Step: 137450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:00:02,794-Speed 5444.94 samples/sec Loss 3.7184 LearningRate 0.0378 Epoch: 13 Global Step: 137460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:00:10,352-Speed 5420.64 samples/sec Loss 3.7208 LearningRate 0.0378 Epoch: 13 Global Step: 137470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:00:17,841-Speed 5469.57 samples/sec Loss 3.7314 LearningRate 0.0378 Epoch: 13 Global Step: 137480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:00:25,345-Speed 5459.20 samples/sec Loss 3.7859 LearningRate 0.0378 Epoch: 13 Global Step: 137490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:00:32,827-Speed 5475.12 samples/sec Loss 3.7727 LearningRate 0.0377 Epoch: 13 Global Step: 137500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:00:40,398-Speed 5411.18 samples/sec Loss 3.7118 LearningRate 0.0377 Epoch: 13 Global Step: 137510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:00:47,871-Speed 5481.41 samples/sec Loss 3.7010 LearningRate 0.0377 Epoch: 13 Global Step: 137520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:00:55,385-Speed 5451.88 samples/sec Loss 3.7792 LearningRate 0.0377 Epoch: 13 Global Step: 137530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:01:02,889-Speed 5459.53 samples/sec Loss 3.7673 LearningRate 0.0377 Epoch: 13 Global Step: 137540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:01:10,385-Speed 5465.01 samples/sec Loss 3.7796 LearningRate 0.0377 Epoch: 13 Global Step: 137550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:01:17,903-Speed 5448.49 samples/sec Loss 3.7758 LearningRate 0.0377 Epoch: 13 Global Step: 137560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 02:01:25,463-Speed 5419.08 samples/sec Loss 3.7572 LearningRate 0.0377 Epoch: 13 Global Step: 137570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 02:01:32,985-Speed 5446.76 samples/sec Loss 3.7325 LearningRate 0.0377 Epoch: 13 Global Step: 137580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:01:40,627-Speed 5359.71 samples/sec Loss 3.7123 LearningRate 0.0376 Epoch: 13 Global Step: 137590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:01:48,195-Speed 5413.27 samples/sec Loss 3.7107 LearningRate 0.0376 Epoch: 13 Global Step: 137600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:01:55,692-Speed 5464.25 samples/sec Loss 3.7445 LearningRate 0.0376 Epoch: 13 Global Step: 137610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:02:03,190-Speed 5463.95 samples/sec Loss 3.7721 LearningRate 0.0376 Epoch: 13 Global Step: 137620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:02:10,665-Speed 5479.96 samples/sec Loss 3.7249 LearningRate 0.0376 Epoch: 13 Global Step: 137630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:02:18,217-Speed 5424.66 samples/sec Loss 3.7094 LearningRate 0.0376 Epoch: 13 Global Step: 137640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:02:25,805-Speed 5398.36 samples/sec Loss 3.7568 LearningRate 0.0376 Epoch: 13 Global Step: 137650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:02:33,291-Speed 5472.56 samples/sec Loss 3.7551 LearningRate 0.0376 Epoch: 13 Global Step: 137660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:02:40,810-Speed 5448.47 samples/sec Loss 3.7704 LearningRate 0.0376 Epoch: 13 Global Step: 137670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:02:48,408-Speed 5390.98 samples/sec Loss 3.7248 LearningRate 0.0375 Epoch: 13 Global Step: 137680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 02:02:55,969-Speed 5418.26 samples/sec Loss 3.8182 LearningRate 0.0375 Epoch: 13 Global Step: 137690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:03:03,535-Speed 5414.84 samples/sec Loss 3.7509 LearningRate 0.0375 Epoch: 13 Global Step: 137700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:03:11,218-Speed 5331.93 samples/sec Loss 3.7524 LearningRate 0.0375 Epoch: 13 Global Step: 137710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:03:18,831-Speed 5380.93 samples/sec Loss 3.6958 LearningRate 0.0375 Epoch: 13 Global Step: 137720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:03:26,369-Speed 5434.92 samples/sec Loss 3.7362 LearningRate 0.0375 Epoch: 13 Global Step: 137730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:03:33,816-Speed 5500.30 samples/sec Loss 3.7946 LearningRate 0.0375 Epoch: 13 Global Step: 137740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:03:41,482-Speed 5343.69 samples/sec Loss 3.7402 LearningRate 0.0375 Epoch: 13 Global Step: 137750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:03:48,994-Speed 5453.69 samples/sec Loss 3.7114 LearningRate 0.0375 Epoch: 13 Global Step: 137760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:03:56,560-Speed 5414.10 samples/sec Loss 3.7470 LearningRate 0.0375 Epoch: 13 Global Step: 137770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:04:04,188-Speed 5370.90 samples/sec Loss 3.7073 LearningRate 0.0374 Epoch: 13 Global Step: 137780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:04:11,650-Speed 5489.87 samples/sec Loss 3.7406 LearningRate 0.0374 Epoch: 13 Global Step: 137790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 02:04:19,207-Speed 5420.87 samples/sec Loss 3.7449 LearningRate 0.0374 Epoch: 13 Global Step: 137800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 02:04:26,956-Speed 5286.48 samples/sec Loss 3.7804 LearningRate 0.0374 Epoch: 13 Global Step: 137810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:04:34,646-Speed 5326.84 samples/sec Loss 3.7159 LearningRate 0.0374 Epoch: 13 Global Step: 137820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:04:42,284-Speed 5363.22 samples/sec Loss 3.7583 LearningRate 0.0374 Epoch: 13 Global Step: 137830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:04:49,757-Speed 5481.51 samples/sec Loss 3.7238 LearningRate 0.0374 Epoch: 13 Global Step: 137840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:04:57,256-Speed 5463.15 samples/sec Loss 3.7821 LearningRate 0.0374 Epoch: 13 Global Step: 137850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:05:04,813-Speed 5421.45 samples/sec Loss 3.7683 LearningRate 0.0374 Epoch: 13 Global Step: 137860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:05:12,421-Speed 5384.16 samples/sec Loss 3.7891 LearningRate 0.0373 Epoch: 13 Global Step: 137870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:05:20,042-Speed 5375.24 samples/sec Loss 3.7321 LearningRate 0.0373 Epoch: 13 Global Step: 137880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:05:27,747-Speed 5316.91 samples/sec Loss 3.7541 LearningRate 0.0373 Epoch: 13 Global Step: 137890 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:05:35,364-Speed 5378.46 samples/sec Loss 3.6941 LearningRate 0.0373 Epoch: 13 Global Step: 137900 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:05:42,847-Speed 5473.60 samples/sec Loss 3.7567 LearningRate 0.0373 Epoch: 13 Global Step: 137910 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:05:50,428-Speed 5403.90 samples/sec Loss 3.7224 LearningRate 0.0373 Epoch: 13 Global Step: 137920 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:05:58,054-Speed 5372.21 samples/sec Loss 3.7568 LearningRate 0.0373 Epoch: 13 Global Step: 137930 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:06:05,639-Speed 5400.69 samples/sec Loss 3.7639 LearningRate 0.0373 Epoch: 13 Global Step: 137940 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:06:13,165-Speed 5443.08 samples/sec Loss 3.7704 LearningRate 0.0373 Epoch: 13 Global Step: 137950 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:06:20,733-Speed 5412.83 samples/sec Loss 3.7371 LearningRate 0.0372 Epoch: 13 Global Step: 137960 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:06:28,268-Speed 5436.46 samples/sec Loss 3.7566 LearningRate 0.0372 Epoch: 13 Global Step: 137970 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:06:35,844-Speed 5407.27 samples/sec Loss 3.7516 LearningRate 0.0372 Epoch: 13 Global Step: 137980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:06:43,398-Speed 5423.38 samples/sec Loss 3.7388 LearningRate 0.0372 Epoch: 13 Global Step: 137990 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:06:50,948-Speed 5425.56 samples/sec Loss 3.7260 LearningRate 0.0372 Epoch: 13 Global Step: 138000 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:07:35,296-[lfw][138000]XNorm: 22.762273 Training: 2022-01-09 02:07:35,297-[lfw][138000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-01-09 02:07:35,298-[lfw][138000]Accuracy-Highest: 0.99817 Training: 2022-01-09 02:08:26,870-[cfp_fp][138000]XNorm: 21.471200 Training: 2022-01-09 02:08:26,870-[cfp_fp][138000]Accuracy-Flip: 0.99186+-0.00367 Training: 2022-01-09 02:08:26,871-[cfp_fp][138000]Accuracy-Highest: 0.99186 Training: 2022-01-09 02:09:11,247-[agedb_30][138000]XNorm: 22.569319 Training: 2022-01-09 02:09:11,248-[agedb_30][138000]Accuracy-Flip: 0.97950+-0.00843 Training: 2022-01-09 02:09:11,249-[agedb_30][138000]Accuracy-Highest: 0.98067 Training: 2022-01-09 02:09:18,796-Speed 277.04 samples/sec Loss 3.7616 LearningRate 0.0372 Epoch: 13 Global Step: 138010 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:09:26,240-Speed 5503.08 samples/sec Loss 3.7489 LearningRate 0.0372 Epoch: 13 Global Step: 138020 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:09:33,823-Speed 5402.14 samples/sec Loss 3.7245 LearningRate 0.0372 Epoch: 13 Global Step: 138030 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:09:41,432-Speed 5384.40 samples/sec Loss 3.6847 LearningRate 0.0372 Epoch: 13 Global Step: 138040 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:09:48,876-Speed 5502.40 samples/sec Loss 3.6946 LearningRate 0.0372 Epoch: 13 Global Step: 138050 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:09:56,569-Speed 5325.41 samples/sec Loss 3.7292 LearningRate 0.0371 Epoch: 13 Global Step: 138060 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:10:04,118-Speed 5426.26 samples/sec Loss 3.7400 LearningRate 0.0371 Epoch: 13 Global Step: 138070 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:10:11,702-Speed 5402.14 samples/sec Loss 3.7122 LearningRate 0.0371 Epoch: 13 Global Step: 138080 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:10:19,330-Speed 5370.23 samples/sec Loss 3.7222 LearningRate 0.0371 Epoch: 13 Global Step: 138090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:10:26,896-Speed 5414.39 samples/sec Loss 3.7063 LearningRate 0.0371 Epoch: 13 Global Step: 138100 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:10:34,368-Speed 5481.77 samples/sec Loss 3.7568 LearningRate 0.0371 Epoch: 13 Global Step: 138110 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:10:41,876-Speed 5456.68 samples/sec Loss 3.7299 LearningRate 0.0371 Epoch: 13 Global Step: 138120 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:10:49,375-Speed 5462.38 samples/sec Loss 3.7033 LearningRate 0.0371 Epoch: 13 Global Step: 138130 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:10:56,883-Speed 5456.62 samples/sec Loss 3.7380 LearningRate 0.0371 Epoch: 13 Global Step: 138140 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:11:04,345-Speed 5489.56 samples/sec Loss 3.7189 LearningRate 0.0370 Epoch: 13 Global Step: 138150 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:11:11,868-Speed 5445.91 samples/sec Loss 3.6471 LearningRate 0.0370 Epoch: 13 Global Step: 138160 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:11:19,431-Speed 5416.77 samples/sec Loss 3.7136 LearningRate 0.0370 Epoch: 13 Global Step: 138170 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:11:26,901-Speed 5483.45 samples/sec Loss 3.7324 LearningRate 0.0370 Epoch: 13 Global Step: 138180 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:11:34,490-Speed 5398.12 samples/sec Loss 3.7531 LearningRate 0.0370 Epoch: 13 Global Step: 138190 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:11:41,971-Speed 5475.87 samples/sec Loss 3.7238 LearningRate 0.0370 Epoch: 13 Global Step: 138200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:11:49,591-Speed 5375.80 samples/sec Loss 3.7240 LearningRate 0.0370 Epoch: 13 Global Step: 138210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:11:57,036-Speed 5502.67 samples/sec Loss 3.7188 LearningRate 0.0370 Epoch: 13 Global Step: 138220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:12:04,560-Speed 5444.31 samples/sec Loss 3.7506 LearningRate 0.0370 Epoch: 13 Global Step: 138230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:12:12,113-Speed 5424.55 samples/sec Loss 3.6995 LearningRate 0.0369 Epoch: 13 Global Step: 138240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:12:19,651-Speed 5434.46 samples/sec Loss 3.7111 LearningRate 0.0369 Epoch: 13 Global Step: 138250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:12:27,106-Speed 5495.15 samples/sec Loss 3.6959 LearningRate 0.0369 Epoch: 13 Global Step: 138260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:12:34,622-Speed 5450.26 samples/sec Loss 3.6966 LearningRate 0.0369 Epoch: 13 Global Step: 138270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:12:42,143-Speed 5446.87 samples/sec Loss 3.6945 LearningRate 0.0369 Epoch: 13 Global Step: 138280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:12:49,658-Speed 5450.65 samples/sec Loss 3.6992 LearningRate 0.0369 Epoch: 13 Global Step: 138290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:12:57,172-Speed 5451.88 samples/sec Loss 3.7443 LearningRate 0.0369 Epoch: 13 Global Step: 138300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:13:04,709-Speed 5435.24 samples/sec Loss 3.7014 LearningRate 0.0369 Epoch: 13 Global Step: 138310 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:13:12,259-Speed 5426.48 samples/sec Loss 3.6649 LearningRate 0.0369 Epoch: 13 Global Step: 138320 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:13:19,848-Speed 5397.63 samples/sec Loss 3.7135 LearningRate 0.0369 Epoch: 13 Global Step: 138330 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:13:27,382-Speed 5437.54 samples/sec Loss 3.6951 LearningRate 0.0368 Epoch: 13 Global Step: 138340 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:13:34,915-Speed 5437.95 samples/sec Loss 3.7134 LearningRate 0.0368 Epoch: 13 Global Step: 138350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:13:42,420-Speed 5457.99 samples/sec Loss 3.6808 LearningRate 0.0368 Epoch: 13 Global Step: 138360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:13:49,920-Speed 5462.81 samples/sec Loss 3.7418 LearningRate 0.0368 Epoch: 13 Global Step: 138370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:13:57,464-Speed 5429.70 samples/sec Loss 3.6837 LearningRate 0.0368 Epoch: 13 Global Step: 138380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:14:05,140-Speed 5337.60 samples/sec Loss 3.6750 LearningRate 0.0368 Epoch: 13 Global Step: 138390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:14:12,642-Speed 5459.88 samples/sec Loss 3.7820 LearningRate 0.0368 Epoch: 13 Global Step: 138400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:14:20,148-Speed 5458.30 samples/sec Loss 3.7148 LearningRate 0.0368 Epoch: 13 Global Step: 138410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:14:27,697-Speed 5426.43 samples/sec Loss 3.7586 LearningRate 0.0368 Epoch: 13 Global Step: 138420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:14:35,204-Speed 5456.77 samples/sec Loss 3.7163 LearningRate 0.0367 Epoch: 13 Global Step: 138430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:14:42,774-Speed 5411.69 samples/sec Loss 3.7230 LearningRate 0.0367 Epoch: 13 Global Step: 138440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:14:50,356-Speed 5403.08 samples/sec Loss 3.7048 LearningRate 0.0367 Epoch: 13 Global Step: 138450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 02:14:57,853-Speed 5463.77 samples/sec Loss 3.7431 LearningRate 0.0367 Epoch: 13 Global Step: 138460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 02:15:05,370-Speed 5449.76 samples/sec Loss 3.6964 LearningRate 0.0367 Epoch: 13 Global Step: 138470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:15:12,871-Speed 5461.61 samples/sec Loss 3.6810 LearningRate 0.0367 Epoch: 13 Global Step: 138480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:15:20,392-Speed 5447.06 samples/sec Loss 3.6582 LearningRate 0.0367 Epoch: 13 Global Step: 138490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:15:28,111-Speed 5306.83 samples/sec Loss 3.7125 LearningRate 0.0367 Epoch: 13 Global Step: 138500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:15:35,609-Speed 5463.47 samples/sec Loss 3.7341 LearningRate 0.0367 Epoch: 13 Global Step: 138510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:15:43,203-Speed 5394.08 samples/sec Loss 3.7314 LearningRate 0.0367 Epoch: 13 Global Step: 138520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:15:50,727-Speed 5445.16 samples/sec Loss 3.6947 LearningRate 0.0366 Epoch: 13 Global Step: 138530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:15:58,305-Speed 5405.62 samples/sec Loss 3.7145 LearningRate 0.0366 Epoch: 13 Global Step: 138540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:16:05,920-Speed 5379.12 samples/sec Loss 3.6691 LearningRate 0.0366 Epoch: 13 Global Step: 138550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:16:13,588-Speed 5342.65 samples/sec Loss 3.7316 LearningRate 0.0366 Epoch: 13 Global Step: 138560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:16:21,231-Speed 5360.46 samples/sec Loss 3.6982 LearningRate 0.0366 Epoch: 13 Global Step: 138570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 02:16:28,839-Speed 5384.31 samples/sec Loss 3.7210 LearningRate 0.0366 Epoch: 13 Global Step: 138580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 02:16:36,314-Speed 5480.55 samples/sec Loss 3.7602 LearningRate 0.0366 Epoch: 13 Global Step: 138590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:16:43,897-Speed 5401.77 samples/sec Loss 3.7083 LearningRate 0.0366 Epoch: 13 Global Step: 138600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:16:51,557-Speed 5348.53 samples/sec Loss 3.7008 LearningRate 0.0366 Epoch: 13 Global Step: 138610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:16:59,146-Speed 5397.67 samples/sec Loss 3.6501 LearningRate 0.0365 Epoch: 13 Global Step: 138620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:17:06,620-Speed 5481.52 samples/sec Loss 3.6647 LearningRate 0.0365 Epoch: 13 Global Step: 138630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:17:14,143-Speed 5445.18 samples/sec Loss 3.6740 LearningRate 0.0365 Epoch: 13 Global Step: 138640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:17:21,673-Speed 5439.74 samples/sec Loss 3.7293 LearningRate 0.0365 Epoch: 13 Global Step: 138650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:17:29,188-Speed 5451.26 samples/sec Loss 3.7128 LearningRate 0.0365 Epoch: 13 Global Step: 138660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:17:36,699-Speed 5454.25 samples/sec Loss 3.7042 LearningRate 0.0365 Epoch: 13 Global Step: 138670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:17:44,179-Speed 5475.93 samples/sec Loss 3.7042 LearningRate 0.0365 Epoch: 13 Global Step: 138680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:17:51,646-Speed 5486.35 samples/sec Loss 3.6834 LearningRate 0.0365 Epoch: 13 Global Step: 138690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:17:59,111-Speed 5487.93 samples/sec Loss 3.6884 LearningRate 0.0365 Epoch: 13 Global Step: 138700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:18:06,651-Speed 5432.77 samples/sec Loss 3.7406 LearningRate 0.0364 Epoch: 13 Global Step: 138710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:18:14,174-Speed 5445.61 samples/sec Loss 3.6437 LearningRate 0.0364 Epoch: 13 Global Step: 138720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:18:21,776-Speed 5388.08 samples/sec Loss 3.6778 LearningRate 0.0364 Epoch: 13 Global Step: 138730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:18:29,299-Speed 5446.05 samples/sec Loss 3.6349 LearningRate 0.0364 Epoch: 13 Global Step: 138740 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:18:36,820-Speed 5446.91 samples/sec Loss 3.7022 LearningRate 0.0364 Epoch: 13 Global Step: 138750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:18:44,282-Speed 5489.23 samples/sec Loss 3.7077 LearningRate 0.0364 Epoch: 13 Global Step: 138760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:18:51,797-Speed 5451.21 samples/sec Loss 3.6776 LearningRate 0.0364 Epoch: 13 Global Step: 138770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:18:59,425-Speed 5370.84 samples/sec Loss 3.7194 LearningRate 0.0364 Epoch: 13 Global Step: 138780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:19:06,984-Speed 5419.06 samples/sec Loss 3.6767 LearningRate 0.0364 Epoch: 13 Global Step: 138790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:19:14,504-Speed 5447.36 samples/sec Loss 3.6582 LearningRate 0.0364 Epoch: 13 Global Step: 138800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:19:22,097-Speed 5395.45 samples/sec Loss 3.6955 LearningRate 0.0363 Epoch: 13 Global Step: 138810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:19:29,579-Speed 5474.87 samples/sec Loss 3.6985 LearningRate 0.0363 Epoch: 13 Global Step: 138820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:19:37,156-Speed 5407.39 samples/sec Loss 3.7107 LearningRate 0.0363 Epoch: 13 Global Step: 138830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:19:44,743-Speed 5398.85 samples/sec Loss 3.6884 LearningRate 0.0363 Epoch: 13 Global Step: 138840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:19:52,376-Speed 5366.94 samples/sec Loss 3.6432 LearningRate 0.0363 Epoch: 13 Global Step: 138850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:20:00,101-Speed 5303.88 samples/sec Loss 3.7118 LearningRate 0.0363 Epoch: 13 Global Step: 138860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:20:07,732-Speed 5368.08 samples/sec Loss 3.6537 LearningRate 0.0363 Epoch: 13 Global Step: 138870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:20:15,204-Speed 5482.63 samples/sec Loss 3.7023 LearningRate 0.0363 Epoch: 13 Global Step: 138880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:20:23,044-Speed 5224.83 samples/sec Loss 3.6495 LearningRate 0.0363 Epoch: 13 Global Step: 138890 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:20:30,580-Speed 5436.01 samples/sec Loss 3.6841 LearningRate 0.0362 Epoch: 13 Global Step: 138900 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:20:38,088-Speed 5456.36 samples/sec Loss 3.7245 LearningRate 0.0362 Epoch: 13 Global Step: 138910 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:20:45,603-Speed 5451.58 samples/sec Loss 3.6884 LearningRate 0.0362 Epoch: 13 Global Step: 138920 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:20:53,081-Speed 5477.67 samples/sec Loss 3.6495 LearningRate 0.0362 Epoch: 13 Global Step: 138930 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:21:00,570-Speed 5469.85 samples/sec Loss 3.6801 LearningRate 0.0362 Epoch: 13 Global Step: 138940 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:21:08,091-Speed 5447.37 samples/sec Loss 3.6789 LearningRate 0.0362 Epoch: 13 Global Step: 138950 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:21:15,634-Speed 5430.90 samples/sec Loss 3.6556 LearningRate 0.0362 Epoch: 13 Global Step: 138960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:21:23,185-Speed 5424.95 samples/sec Loss 3.6550 LearningRate 0.0362 Epoch: 13 Global Step: 138970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:21:30,781-Speed 5392.36 samples/sec Loss 3.6544 LearningRate 0.0362 Epoch: 13 Global Step: 138980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:21:38,316-Speed 5437.01 samples/sec Loss 3.7097 LearningRate 0.0362 Epoch: 13 Global Step: 138990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:21:45,828-Speed 5454.08 samples/sec Loss 3.6797 LearningRate 0.0361 Epoch: 13 Global Step: 139000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:21:53,293-Speed 5486.79 samples/sec Loss 3.6590 LearningRate 0.0361 Epoch: 13 Global Step: 139010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:22:00,862-Speed 5412.39 samples/sec Loss 3.6523 LearningRate 0.0361 Epoch: 13 Global Step: 139020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:22:08,332-Speed 5484.52 samples/sec Loss 3.6375 LearningRate 0.0361 Epoch: 13 Global Step: 139030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:22:15,785-Speed 5496.37 samples/sec Loss 3.6762 LearningRate 0.0361 Epoch: 13 Global Step: 139040 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:22:23,235-Speed 5498.22 samples/sec Loss 3.6848 LearningRate 0.0361 Epoch: 13 Global Step: 139050 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:22:30,667-Speed 5512.45 samples/sec Loss 3.7098 LearningRate 0.0361 Epoch: 13 Global Step: 139060 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:22:38,161-Speed 5466.76 samples/sec Loss 3.6633 LearningRate 0.0361 Epoch: 13 Global Step: 139070 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:22:45,767-Speed 5385.66 samples/sec Loss 3.6805 LearningRate 0.0361 Epoch: 13 Global Step: 139080 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:22:53,303-Speed 5435.99 samples/sec Loss 3.6914 LearningRate 0.0360 Epoch: 13 Global Step: 139090 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:23:00,728-Speed 5516.84 samples/sec Loss 3.6400 LearningRate 0.0360 Epoch: 13 Global Step: 139100 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:23:08,274-Speed 5428.69 samples/sec Loss 3.6983 LearningRate 0.0360 Epoch: 13 Global Step: 139110 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:23:15,762-Speed 5471.56 samples/sec Loss 3.6653 LearningRate 0.0360 Epoch: 13 Global Step: 139120 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:23:23,361-Speed 5390.34 samples/sec Loss 3.6721 LearningRate 0.0360 Epoch: 13 Global Step: 139130 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:23:30,958-Speed 5392.72 samples/sec Loss 3.7068 LearningRate 0.0360 Epoch: 13 Global Step: 139140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:23:38,471-Speed 5452.63 samples/sec Loss 3.6405 LearningRate 0.0360 Epoch: 13 Global Step: 139150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:23:45,957-Speed 5472.25 samples/sec Loss 3.6292 LearningRate 0.0360 Epoch: 13 Global Step: 139160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:23:53,435-Speed 5478.69 samples/sec Loss 3.7053 LearningRate 0.0360 Epoch: 13 Global Step: 139170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:24:00,981-Speed 5428.08 samples/sec Loss 3.7123 LearningRate 0.0360 Epoch: 13 Global Step: 139180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:24:08,493-Speed 5453.20 samples/sec Loss 3.6664 LearningRate 0.0359 Epoch: 13 Global Step: 139190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:24:16,006-Speed 5453.24 samples/sec Loss 3.6988 LearningRate 0.0359 Epoch: 13 Global Step: 139200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:24:23,600-Speed 5394.08 samples/sec Loss 3.6448 LearningRate 0.0359 Epoch: 13 Global Step: 139210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:24:31,051-Speed 5498.08 samples/sec Loss 3.6762 LearningRate 0.0359 Epoch: 13 Global Step: 139220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:24:38,456-Speed 5531.86 samples/sec Loss 3.6188 LearningRate 0.0359 Epoch: 13 Global Step: 139230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:24:45,911-Speed 5495.44 samples/sec Loss 3.6852 LearningRate 0.0359 Epoch: 13 Global Step: 139240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 02:24:53,424-Speed 5452.90 samples/sec Loss 3.6611 LearningRate 0.0359 Epoch: 13 Global Step: 139250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:25:00,888-Speed 5487.96 samples/sec Loss 3.6913 LearningRate 0.0359 Epoch: 13 Global Step: 139260 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:25:08,405-Speed 5450.04 samples/sec Loss 3.6906 LearningRate 0.0359 Epoch: 13 Global Step: 139270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:25:15,967-Speed 5417.71 samples/sec Loss 3.5977 LearningRate 0.0358 Epoch: 13 Global Step: 139280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:25:23,641-Speed 5338.18 samples/sec Loss 3.6501 LearningRate 0.0358 Epoch: 13 Global Step: 139290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:25:31,225-Speed 5401.44 samples/sec Loss 3.6496 LearningRate 0.0358 Epoch: 13 Global Step: 139300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:25:38,819-Speed 5394.48 samples/sec Loss 3.6912 LearningRate 0.0358 Epoch: 13 Global Step: 139310 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:25:46,356-Speed 5435.36 samples/sec Loss 3.6132 LearningRate 0.0358 Epoch: 13 Global Step: 139320 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:25:53,803-Speed 5501.04 samples/sec Loss 3.6394 LearningRate 0.0358 Epoch: 13 Global Step: 139330 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:26:01,349-Speed 5428.60 samples/sec Loss 3.6457 LearningRate 0.0358 Epoch: 13 Global Step: 139340 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:26:08,914-Speed 5415.28 samples/sec Loss 3.6285 LearningRate 0.0358 Epoch: 13 Global Step: 139350 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:26:16,492-Speed 5405.94 samples/sec Loss 3.6744 LearningRate 0.0358 Epoch: 13 Global Step: 139360 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:26:24,091-Speed 5390.58 samples/sec Loss 3.6693 LearningRate 0.0358 Epoch: 13 Global Step: 139370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:26:31,555-Speed 5488.98 samples/sec Loss 3.6636 LearningRate 0.0357 Epoch: 13 Global Step: 139380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:26:39,106-Speed 5424.68 samples/sec Loss 3.6742 LearningRate 0.0357 Epoch: 13 Global Step: 139390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:26:46,629-Speed 5446.03 samples/sec Loss 3.6588 LearningRate 0.0357 Epoch: 13 Global Step: 139400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:26:54,146-Speed 5449.58 samples/sec Loss 3.6797 LearningRate 0.0357 Epoch: 13 Global Step: 139410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:27:01,777-Speed 5367.79 samples/sec Loss 3.6311 LearningRate 0.0357 Epoch: 13 Global Step: 139420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:27:09,316-Speed 5433.92 samples/sec Loss 3.6340 LearningRate 0.0357 Epoch: 13 Global Step: 139430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:27:16,813-Speed 5464.24 samples/sec Loss 3.6420 LearningRate 0.0357 Epoch: 13 Global Step: 139440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:27:24,428-Speed 5379.41 samples/sec Loss 3.6525 LearningRate 0.0357 Epoch: 13 Global Step: 139450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:27:31,954-Speed 5442.94 samples/sec Loss 3.6903 LearningRate 0.0357 Epoch: 13 Global Step: 139460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:27:39,502-Speed 5427.13 samples/sec Loss 3.6816 LearningRate 0.0356 Epoch: 13 Global Step: 139470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-09 02:27:47,037-Speed 5437.19 samples/sec Loss 3.6413 LearningRate 0.0356 Epoch: 13 Global Step: 139480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:27:54,775-Speed 5294.00 samples/sec Loss 3.6058 LearningRate 0.0356 Epoch: 13 Global Step: 139490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:28:02,383-Speed 5384.59 samples/sec Loss 3.5981 LearningRate 0.0356 Epoch: 13 Global Step: 139500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:28:09,961-Speed 5405.62 samples/sec Loss 3.6586 LearningRate 0.0356 Epoch: 13 Global Step: 139510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:28:17,528-Speed 5414.15 samples/sec Loss 3.6577 LearningRate 0.0356 Epoch: 13 Global Step: 139520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:28:25,071-Speed 5430.88 samples/sec Loss 3.6765 LearningRate 0.0356 Epoch: 13 Global Step: 139530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:28:32,674-Speed 5388.28 samples/sec Loss 3.6450 LearningRate 0.0356 Epoch: 13 Global Step: 139540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:28:40,238-Speed 5415.54 samples/sec Loss 3.6345 LearningRate 0.0356 Epoch: 13 Global Step: 139550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:28:47,777-Speed 5434.30 samples/sec Loss 3.6188 LearningRate 0.0356 Epoch: 13 Global Step: 139560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:28:55,441-Speed 5345.17 samples/sec Loss 3.6627 LearningRate 0.0355 Epoch: 13 Global Step: 139570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:29:02,962-Speed 5446.43 samples/sec Loss 3.6482 LearningRate 0.0355 Epoch: 13 Global Step: 139580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:29:10,542-Speed 5404.53 samples/sec Loss 3.6234 LearningRate 0.0355 Epoch: 13 Global Step: 139590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:29:18,004-Speed 5490.44 samples/sec Loss 3.6436 LearningRate 0.0355 Epoch: 13 Global Step: 139600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:29:25,595-Speed 5396.44 samples/sec Loss 3.6210 LearningRate 0.0355 Epoch: 13 Global Step: 139610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:29:33,132-Speed 5435.24 samples/sec Loss 3.6271 LearningRate 0.0355 Epoch: 13 Global Step: 139620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:29:40,769-Speed 5363.61 samples/sec Loss 3.6633 LearningRate 0.0355 Epoch: 13 Global Step: 139630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:29:48,204-Speed 5510.18 samples/sec Loss 3.7020 LearningRate 0.0355 Epoch: 13 Global Step: 139640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:29:55,232-Speed 5829.17 samples/sec Loss 3.6238 LearningRate 0.0355 Epoch: 13 Global Step: 139650 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-01-09 02:30:02,201-Speed 5877.87 samples/sec Loss 3.6354 LearningRate 0.0354 Epoch: 13 Global Step: 139660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:30:09,204-Speed 5849.69 samples/sec Loss 3.6361 LearningRate 0.0354 Epoch: 13 Global Step: 139670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:30:16,202-Speed 5854.08 samples/sec Loss 3.6708 LearningRate 0.0354 Epoch: 13 Global Step: 139680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-09 02:30:23,690-Speed 5471.16 samples/sec Loss 3.6663 LearningRate 0.0354 Epoch: 13 Global Step: 139690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:30:31,163-Speed 5481.46 samples/sec Loss 3.6083 LearningRate 0.0354 Epoch: 13 Global Step: 139700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:30:38,795-Speed 5367.38 samples/sec Loss 3.6108 LearningRate 0.0354 Epoch: 13 Global Step: 139710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:30:46,360-Speed 5415.14 samples/sec Loss 3.6475 LearningRate 0.0354 Epoch: 13 Global Step: 139720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:30:53,864-Speed 5459.32 samples/sec Loss 3.6996 LearningRate 0.0354 Epoch: 13 Global Step: 139730 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:31:01,407-Speed 5430.94 samples/sec Loss 3.6197 LearningRate 0.0354 Epoch: 13 Global Step: 139740 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:31:08,964-Speed 5420.63 samples/sec Loss 3.6132 LearningRate 0.0354 Epoch: 13 Global Step: 139750 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:31:16,554-Speed 5397.42 samples/sec Loss 3.5903 LearningRate 0.0353 Epoch: 13 Global Step: 139760 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:31:24,084-Speed 5440.00 samples/sec Loss 3.6159 LearningRate 0.0353 Epoch: 13 Global Step: 139770 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:31:31,619-Speed 5437.20 samples/sec Loss 3.6343 LearningRate 0.0353 Epoch: 13 Global Step: 139780 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:31:39,069-Speed 5498.11 samples/sec Loss 3.6287 LearningRate 0.0353 Epoch: 13 Global Step: 139790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:31:46,593-Speed 5445.00 samples/sec Loss 3.6431 LearningRate 0.0353 Epoch: 13 Global Step: 139800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:31:54,157-Speed 5416.28 samples/sec Loss 3.6185 LearningRate 0.0353 Epoch: 13 Global Step: 139810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:32:01,756-Speed 5390.31 samples/sec Loss 3.6773 LearningRate 0.0353 Epoch: 13 Global Step: 139820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:32:09,597-Speed 5224.76 samples/sec Loss 3.6337 LearningRate 0.0353 Epoch: 13 Global Step: 139830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:32:17,218-Speed 5375.07 samples/sec Loss 3.6080 LearningRate 0.0353 Epoch: 13 Global Step: 139840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:32:24,743-Speed 5444.17 samples/sec Loss 3.6640 LearningRate 0.0352 Epoch: 13 Global Step: 139850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:32:32,264-Speed 5447.14 samples/sec Loss 3.7000 LearningRate 0.0352 Epoch: 13 Global Step: 139860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:32:39,951-Speed 5328.88 samples/sec Loss 3.6216 LearningRate 0.0352 Epoch: 13 Global Step: 139870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:32:47,430-Speed 5477.49 samples/sec Loss 3.6282 LearningRate 0.0352 Epoch: 13 Global Step: 139880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:32:55,007-Speed 5406.66 samples/sec Loss 3.6245 LearningRate 0.0352 Epoch: 13 Global Step: 139890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:33:02,481-Speed 5480.77 samples/sec Loss 3.6218 LearningRate 0.0352 Epoch: 13 Global Step: 139900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:33:10,424-Speed 5157.40 samples/sec Loss 3.6521 LearningRate 0.0352 Epoch: 13 Global Step: 139910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:33:18,057-Speed 5366.78 samples/sec Loss 3.6190 LearningRate 0.0352 Epoch: 13 Global Step: 139920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:33:25,750-Speed 5325.26 samples/sec Loss 3.5811 LearningRate 0.0352 Epoch: 13 Global Step: 139930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:33:33,341-Speed 5396.63 samples/sec Loss 3.6096 LearningRate 0.0352 Epoch: 13 Global Step: 139940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:33:40,922-Speed 5403.61 samples/sec Loss 3.6196 LearningRate 0.0351 Epoch: 13 Global Step: 139950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:33:48,504-Speed 5403.14 samples/sec Loss 3.6301 LearningRate 0.0351 Epoch: 13 Global Step: 139960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:33:56,217-Speed 5310.52 samples/sec Loss 3.6605 LearningRate 0.0351 Epoch: 13 Global Step: 139970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:34:03,837-Speed 5376.91 samples/sec Loss 3.6638 LearningRate 0.0351 Epoch: 13 Global Step: 139980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:34:11,346-Speed 5455.78 samples/sec Loss 3.6225 LearningRate 0.0351 Epoch: 13 Global Step: 139990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:34:18,878-Speed 5438.37 samples/sec Loss 3.6519 LearningRate 0.0351 Epoch: 13 Global Step: 140000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:35:02,570-[lfw][140000]XNorm: 22.488085 Training: 2022-01-09 02:35:02,571-[lfw][140000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-01-09 02:35:02,572-[lfw][140000]Accuracy-Highest: 0.99817 Training: 2022-01-09 02:35:53,391-[cfp_fp][140000]XNorm: 21.077209 Training: 2022-01-09 02:35:53,392-[cfp_fp][140000]Accuracy-Flip: 0.99157+-0.00370 Training: 2022-01-09 02:35:53,392-[cfp_fp][140000]Accuracy-Highest: 0.99186 Training: 2022-01-09 02:36:37,207-[agedb_30][140000]XNorm: 22.369857 Training: 2022-01-09 02:36:37,208-[agedb_30][140000]Accuracy-Flip: 0.98067+-0.00620 Training: 2022-01-09 02:36:37,208-[agedb_30][140000]Accuracy-Highest: 0.98067 Training: 2022-01-09 02:36:44,854-Speed 280.60 samples/sec Loss 3.6122 LearningRate 0.0351 Epoch: 13 Global Step: 140010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:36:52,447-Speed 5394.71 samples/sec Loss 3.6320 LearningRate 0.0351 Epoch: 13 Global Step: 140020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:37:00,028-Speed 5403.33 samples/sec Loss 3.6185 LearningRate 0.0351 Epoch: 13 Global Step: 140030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:37:07,570-Speed 5432.50 samples/sec Loss 3.6573 LearningRate 0.0350 Epoch: 13 Global Step: 140040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:37:15,156-Speed 5399.59 samples/sec Loss 3.6381 LearningRate 0.0350 Epoch: 13 Global Step: 140050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:37:22,719-Speed 5416.79 samples/sec Loss 3.6063 LearningRate 0.0350 Epoch: 13 Global Step: 140060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:37:30,218-Speed 5462.45 samples/sec Loss 3.6520 LearningRate 0.0350 Epoch: 13 Global Step: 140070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:37:37,794-Speed 5407.15 samples/sec Loss 3.5989 LearningRate 0.0350 Epoch: 13 Global Step: 140080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:37:45,336-Speed 5431.93 samples/sec Loss 3.6559 LearningRate 0.0350 Epoch: 13 Global Step: 140090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:37:52,790-Speed 5495.37 samples/sec Loss 3.6150 LearningRate 0.0350 Epoch: 13 Global Step: 140100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:38:00,297-Speed 5457.21 samples/sec Loss 3.5827 LearningRate 0.0350 Epoch: 13 Global Step: 140110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:38:07,790-Speed 5467.23 samples/sec Loss 3.6030 LearningRate 0.0350 Epoch: 13 Global Step: 140120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:38:15,357-Speed 5413.41 samples/sec Loss 3.6510 LearningRate 0.0350 Epoch: 13 Global Step: 140130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-09 02:38:22,933-Speed 5407.49 samples/sec Loss 3.6348 LearningRate 0.0349 Epoch: 13 Global Step: 140140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:38:30,494-Speed 5417.96 samples/sec Loss 3.6276 LearningRate 0.0349 Epoch: 13 Global Step: 140150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:38:38,028-Speed 5437.27 samples/sec Loss 3.5986 LearningRate 0.0349 Epoch: 13 Global Step: 140160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:38:45,511-Speed 5474.50 samples/sec Loss 3.5898 LearningRate 0.0349 Epoch: 13 Global Step: 140170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:38:53,023-Speed 5453.69 samples/sec Loss 3.5822 LearningRate 0.0349 Epoch: 13 Global Step: 140180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:39:00,496-Speed 5481.41 samples/sec Loss 3.5871 LearningRate 0.0349 Epoch: 13 Global Step: 140190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:39:08,073-Speed 5406.49 samples/sec Loss 3.6392 LearningRate 0.0349 Epoch: 13 Global Step: 140200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:39:15,617-Speed 5430.12 samples/sec Loss 3.5882 LearningRate 0.0349 Epoch: 13 Global Step: 140210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:39:23,185-Speed 5413.03 samples/sec Loss 3.6116 LearningRate 0.0349 Epoch: 13 Global Step: 140220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:39:30,710-Speed 5443.54 samples/sec Loss 3.6193 LearningRate 0.0349 Epoch: 13 Global Step: 140230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:39:38,151-Speed 5505.93 samples/sec Loss 3.6305 LearningRate 0.0348 Epoch: 13 Global Step: 140240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-09 02:39:45,577-Speed 5516.22 samples/sec Loss 3.5911 LearningRate 0.0348 Epoch: 13 Global Step: 140250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:39:53,162-Speed 5400.91 samples/sec Loss 3.5735 LearningRate 0.0348 Epoch: 13 Global Step: 140260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:40:00,610-Speed 5500.20 samples/sec Loss 3.5891 LearningRate 0.0348 Epoch: 13 Global Step: 140270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:40:08,229-Speed 5376.82 samples/sec Loss 3.6176 LearningRate 0.0348 Epoch: 13 Global Step: 140280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:40:15,688-Speed 5491.53 samples/sec Loss 3.6032 LearningRate 0.0348 Epoch: 13 Global Step: 140290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:40:23,348-Speed 5348.19 samples/sec Loss 3.6157 LearningRate 0.0348 Epoch: 13 Global Step: 140300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:40:30,804-Speed 5494.75 samples/sec Loss 3.6170 LearningRate 0.0348 Epoch: 13 Global Step: 140310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:40:38,290-Speed 5472.22 samples/sec Loss 3.6144 LearningRate 0.0348 Epoch: 13 Global Step: 140320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:40:45,928-Speed 5363.25 samples/sec Loss 3.6037 LearningRate 0.0347 Epoch: 13 Global Step: 140330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:40:53,412-Speed 5473.78 samples/sec Loss 3.5988 LearningRate 0.0347 Epoch: 13 Global Step: 140340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:41:00,867-Speed 5494.81 samples/sec Loss 3.6214 LearningRate 0.0347 Epoch: 13 Global Step: 140350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:41:08,423-Speed 5420.89 samples/sec Loss 3.6007 LearningRate 0.0347 Epoch: 13 Global Step: 140360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:41:15,895-Speed 5483.02 samples/sec Loss 3.6097 LearningRate 0.0347 Epoch: 13 Global Step: 140370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:41:23,390-Speed 5465.38 samples/sec Loss 3.6195 LearningRate 0.0347 Epoch: 13 Global Step: 140380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:41:30,977-Speed 5399.49 samples/sec Loss 3.6208 LearningRate 0.0347 Epoch: 13 Global Step: 140390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:41:38,449-Speed 5482.85 samples/sec Loss 3.6001 LearningRate 0.0347 Epoch: 13 Global Step: 140400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:41:45,980-Speed 5439.27 samples/sec Loss 3.5547 LearningRate 0.0347 Epoch: 13 Global Step: 140410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:41:53,536-Speed 5421.62 samples/sec Loss 3.6147 LearningRate 0.0347 Epoch: 13 Global Step: 140420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:42:01,085-Speed 5426.76 samples/sec Loss 3.6561 LearningRate 0.0346 Epoch: 13 Global Step: 140430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:42:08,537-Speed 5497.12 samples/sec Loss 3.6180 LearningRate 0.0346 Epoch: 13 Global Step: 140440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:42:16,043-Speed 5457.20 samples/sec Loss 3.6208 LearningRate 0.0346 Epoch: 13 Global Step: 140450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:42:23,642-Speed 5391.43 samples/sec Loss 3.6116 LearningRate 0.0346 Epoch: 13 Global Step: 140460 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:42:31,232-Speed 5397.40 samples/sec Loss 3.5744 LearningRate 0.0346 Epoch: 13 Global Step: 140470 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:42:38,714-Speed 5474.61 samples/sec Loss 3.6190 LearningRate 0.0346 Epoch: 13 Global Step: 140480 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:42:46,260-Speed 5428.99 samples/sec Loss 3.6198 LearningRate 0.0346 Epoch: 13 Global Step: 140490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:42:53,808-Speed 5427.74 samples/sec Loss 3.6145 LearningRate 0.0346 Epoch: 13 Global Step: 140500 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:43:01,411-Speed 5387.80 samples/sec Loss 3.5320 LearningRate 0.0346 Epoch: 13 Global Step: 140510 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:43:08,937-Speed 5443.07 samples/sec Loss 3.6213 LearningRate 0.0346 Epoch: 13 Global Step: 140520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:43:16,453-Speed 5450.77 samples/sec Loss 3.6161 LearningRate 0.0345 Epoch: 13 Global Step: 140530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:43:23,934-Speed 5476.11 samples/sec Loss 3.5998 LearningRate 0.0345 Epoch: 13 Global Step: 140540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:43:31,496-Speed 5416.97 samples/sec Loss 3.5696 LearningRate 0.0345 Epoch: 13 Global Step: 140550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:43:39,040-Speed 5430.46 samples/sec Loss 3.6360 LearningRate 0.0345 Epoch: 13 Global Step: 140560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:43:46,572-Speed 5438.47 samples/sec Loss 3.5942 LearningRate 0.0345 Epoch: 13 Global Step: 140570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:43:54,181-Speed 5384.03 samples/sec Loss 3.5604 LearningRate 0.0345 Epoch: 13 Global Step: 140580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:44:01,711-Speed 5440.77 samples/sec Loss 3.5962 LearningRate 0.0345 Epoch: 13 Global Step: 140590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:44:09,283-Speed 5410.35 samples/sec Loss 3.6152 LearningRate 0.0345 Epoch: 13 Global Step: 140600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:44:16,724-Speed 5505.40 samples/sec Loss 3.5913 LearningRate 0.0345 Epoch: 13 Global Step: 140610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:44:24,354-Speed 5368.68 samples/sec Loss 3.5931 LearningRate 0.0344 Epoch: 13 Global Step: 140620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:44:31,986-Speed 5367.52 samples/sec Loss 3.6037 LearningRate 0.0344 Epoch: 13 Global Step: 140630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:44:39,509-Speed 5445.73 samples/sec Loss 3.5683 LearningRate 0.0344 Epoch: 13 Global Step: 140640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:44:47,076-Speed 5413.64 samples/sec Loss 3.5855 LearningRate 0.0344 Epoch: 13 Global Step: 140650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:44:54,646-Speed 5411.47 samples/sec Loss 3.5468 LearningRate 0.0344 Epoch: 13 Global Step: 140660 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:45:02,127-Speed 5476.20 samples/sec Loss 3.6147 LearningRate 0.0344 Epoch: 13 Global Step: 140670 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:45:09,653-Speed 5443.15 samples/sec Loss 3.6046 LearningRate 0.0344 Epoch: 13 Global Step: 140680 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:45:17,207-Speed 5422.71 samples/sec Loss 3.5373 LearningRate 0.0344 Epoch: 13 Global Step: 140690 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:45:24,731-Speed 5445.32 samples/sec Loss 3.5618 LearningRate 0.0344 Epoch: 13 Global Step: 140700 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:45:32,411-Speed 5333.88 samples/sec Loss 3.6114 LearningRate 0.0344 Epoch: 13 Global Step: 140710 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:45:39,978-Speed 5413.98 samples/sec Loss 3.5752 LearningRate 0.0343 Epoch: 13 Global Step: 140720 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:45:47,494-Speed 5450.40 samples/sec Loss 3.6137 LearningRate 0.0343 Epoch: 13 Global Step: 140730 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:45:54,955-Speed 5490.59 samples/sec Loss 3.6148 LearningRate 0.0343 Epoch: 13 Global Step: 140740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:46:02,457-Speed 5460.69 samples/sec Loss 3.5832 LearningRate 0.0343 Epoch: 13 Global Step: 140750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:46:10,012-Speed 5422.38 samples/sec Loss 3.6365 LearningRate 0.0343 Epoch: 13 Global Step: 140760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:46:17,597-Speed 5400.83 samples/sec Loss 3.5817 LearningRate 0.0343 Epoch: 13 Global Step: 140770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:46:25,206-Speed 5383.70 samples/sec Loss 3.5656 LearningRate 0.0343 Epoch: 13 Global Step: 140780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:46:32,723-Speed 5449.76 samples/sec Loss 3.5691 LearningRate 0.0343 Epoch: 13 Global Step: 140790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:46:40,415-Speed 5325.13 samples/sec Loss 3.6508 LearningRate 0.0343 Epoch: 13 Global Step: 140800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:46:47,925-Speed 5461.43 samples/sec Loss 3.5793 LearningRate 0.0343 Epoch: 13 Global Step: 140810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:46:55,494-Speed 5412.01 samples/sec Loss 3.5897 LearningRate 0.0342 Epoch: 13 Global Step: 140820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:47:03,028-Speed 5437.17 samples/sec Loss 3.5730 LearningRate 0.0342 Epoch: 13 Global Step: 140830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:47:10,527-Speed 5462.75 samples/sec Loss 3.5434 LearningRate 0.0342 Epoch: 13 Global Step: 140840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:47:18,029-Speed 5460.79 samples/sec Loss 3.5568 LearningRate 0.0342 Epoch: 13 Global Step: 140850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:47:25,538-Speed 5455.25 samples/sec Loss 3.5678 LearningRate 0.0342 Epoch: 13 Global Step: 140860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:47:33,028-Speed 5469.25 samples/sec Loss 3.5674 LearningRate 0.0342 Epoch: 13 Global Step: 140870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:47:40,649-Speed 5375.72 samples/sec Loss 3.5903 LearningRate 0.0342 Epoch: 13 Global Step: 140880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:47:48,319-Speed 5340.89 samples/sec Loss 3.5898 LearningRate 0.0342 Epoch: 13 Global Step: 140890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:47:55,824-Speed 5458.69 samples/sec Loss 3.6212 LearningRate 0.0342 Epoch: 13 Global Step: 140900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:48:03,374-Speed 5425.52 samples/sec Loss 3.5812 LearningRate 0.0342 Epoch: 13 Global Step: 140910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:48:10,838-Speed 5488.80 samples/sec Loss 3.5941 LearningRate 0.0341 Epoch: 13 Global Step: 140920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:48:18,325-Speed 5471.24 samples/sec Loss 3.5575 LearningRate 0.0341 Epoch: 13 Global Step: 140930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:48:25,849-Speed 5444.66 samples/sec Loss 3.5843 LearningRate 0.0341 Epoch: 13 Global Step: 140940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:48:33,364-Speed 5450.65 samples/sec Loss 3.5423 LearningRate 0.0341 Epoch: 13 Global Step: 140950 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:48:40,882-Speed 5448.98 samples/sec Loss 3.5813 LearningRate 0.0341 Epoch: 13 Global Step: 140960 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:48:48,422-Speed 5433.28 samples/sec Loss 3.6218 LearningRate 0.0341 Epoch: 13 Global Step: 140970 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:48:55,958-Speed 5436.00 samples/sec Loss 3.5705 LearningRate 0.0341 Epoch: 13 Global Step: 140980 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:49:03,541-Speed 5402.19 samples/sec Loss 3.5951 LearningRate 0.0341 Epoch: 13 Global Step: 140990 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:49:11,101-Speed 5419.11 samples/sec Loss 3.5620 LearningRate 0.0341 Epoch: 13 Global Step: 141000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:49:18,600-Speed 5462.18 samples/sec Loss 3.5853 LearningRate 0.0340 Epoch: 13 Global Step: 141010 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:49:26,204-Speed 5387.70 samples/sec Loss 3.5758 LearningRate 0.0340 Epoch: 13 Global Step: 141020 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:49:33,763-Speed 5419.38 samples/sec Loss 3.5657 LearningRate 0.0340 Epoch: 13 Global Step: 141030 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:49:41,334-Speed 5411.10 samples/sec Loss 3.5884 LearningRate 0.0340 Epoch: 13 Global Step: 141040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:49:48,967-Speed 5366.87 samples/sec Loss 3.5793 LearningRate 0.0340 Epoch: 13 Global Step: 141050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:49:56,607-Speed 5362.15 samples/sec Loss 3.5461 LearningRate 0.0340 Epoch: 13 Global Step: 141060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:50:04,167-Speed 5418.66 samples/sec Loss 3.5465 LearningRate 0.0340 Epoch: 13 Global Step: 141070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:50:11,683-Speed 5450.51 samples/sec Loss 3.5537 LearningRate 0.0340 Epoch: 13 Global Step: 141080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:50:19,294-Speed 5382.27 samples/sec Loss 3.6032 LearningRate 0.0340 Epoch: 13 Global Step: 141090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:50:26,870-Speed 5407.67 samples/sec Loss 3.5372 LearningRate 0.0340 Epoch: 13 Global Step: 141100 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:50:34,437-Speed 5413.52 samples/sec Loss 3.5995 LearningRate 0.0339 Epoch: 13 Global Step: 141110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:50:41,978-Speed 5432.59 samples/sec Loss 3.5503 LearningRate 0.0339 Epoch: 13 Global Step: 141120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:50:49,437-Speed 5491.46 samples/sec Loss 3.5666 LearningRate 0.0339 Epoch: 13 Global Step: 141130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:50:57,006-Speed 5412.67 samples/sec Loss 3.5726 LearningRate 0.0339 Epoch: 13 Global Step: 141140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:51:04,618-Speed 5381.59 samples/sec Loss 3.5855 LearningRate 0.0339 Epoch: 13 Global Step: 141150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:51:12,203-Speed 5401.13 samples/sec Loss 3.5441 LearningRate 0.0339 Epoch: 13 Global Step: 141160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:51:19,888-Speed 5330.20 samples/sec Loss 3.5664 LearningRate 0.0339 Epoch: 13 Global Step: 141170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:51:27,595-Speed 5315.12 samples/sec Loss 3.5269 LearningRate 0.0339 Epoch: 13 Global Step: 141180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:51:35,113-Speed 5449.51 samples/sec Loss 3.5829 LearningRate 0.0339 Epoch: 13 Global Step: 141190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:51:42,718-Speed 5386.64 samples/sec Loss 3.5538 LearningRate 0.0339 Epoch: 13 Global Step: 141200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:51:50,243-Speed 5443.32 samples/sec Loss 3.5877 LearningRate 0.0338 Epoch: 13 Global Step: 141210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-09 02:51:57,731-Speed 5470.65 samples/sec Loss 3.5719 LearningRate 0.0338 Epoch: 13 Global Step: 141220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:52:05,242-Speed 5454.77 samples/sec Loss 3.5358 LearningRate 0.0338 Epoch: 13 Global Step: 141230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:52:12,752-Speed 5454.94 samples/sec Loss 3.4967 LearningRate 0.0338 Epoch: 13 Global Step: 141240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:52:20,305-Speed 5422.96 samples/sec Loss 3.5513 LearningRate 0.0338 Epoch: 13 Global Step: 141250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:52:27,791-Speed 5472.62 samples/sec Loss 3.5200 LearningRate 0.0338 Epoch: 13 Global Step: 141260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:52:35,271-Speed 5477.08 samples/sec Loss 3.5326 LearningRate 0.0338 Epoch: 13 Global Step: 141270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:52:42,792-Speed 5446.65 samples/sec Loss 3.5183 LearningRate 0.0338 Epoch: 13 Global Step: 141280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:52:50,374-Speed 5402.48 samples/sec Loss 3.5689 LearningRate 0.0338 Epoch: 13 Global Step: 141290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:52:57,981-Speed 5385.18 samples/sec Loss 3.5440 LearningRate 0.0338 Epoch: 13 Global Step: 141300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:53:05,532-Speed 5425.63 samples/sec Loss 3.5882 LearningRate 0.0337 Epoch: 13 Global Step: 141310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:53:13,000-Speed 5484.93 samples/sec Loss 3.5241 LearningRate 0.0337 Epoch: 13 Global Step: 141320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-09 02:53:20,636-Speed 5364.49 samples/sec Loss 3.5723 LearningRate 0.0337 Epoch: 13 Global Step: 141330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:53:28,285-Speed 5356.40 samples/sec Loss 3.5486 LearningRate 0.0337 Epoch: 13 Global Step: 141340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:53:35,759-Speed 5480.40 samples/sec Loss 3.5685 LearningRate 0.0337 Epoch: 13 Global Step: 141350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:53:43,276-Speed 5450.05 samples/sec Loss 3.5648 LearningRate 0.0337 Epoch: 13 Global Step: 141360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:53:50,748-Speed 5482.63 samples/sec Loss 3.5176 LearningRate 0.0337 Epoch: 13 Global Step: 141370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:53:58,264-Speed 5450.63 samples/sec Loss 3.5240 LearningRate 0.0337 Epoch: 13 Global Step: 141380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:54:05,840-Speed 5407.15 samples/sec Loss 3.5246 LearningRate 0.0337 Epoch: 13 Global Step: 141390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:54:13,444-Speed 5387.41 samples/sec Loss 3.5648 LearningRate 0.0336 Epoch: 13 Global Step: 141400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:54:20,999-Speed 5422.51 samples/sec Loss 3.5240 LearningRate 0.0336 Epoch: 13 Global Step: 141410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:54:28,704-Speed 5316.41 samples/sec Loss 3.5313 LearningRate 0.0336 Epoch: 13 Global Step: 141420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:54:36,386-Speed 5333.32 samples/sec Loss 3.5193 LearningRate 0.0336 Epoch: 13 Global Step: 141430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-09 02:54:43,996-Speed 5383.23 samples/sec Loss 3.5222 LearningRate 0.0336 Epoch: 13 Global Step: 141440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-09 02:54:51,527-Speed 5439.81 samples/sec Loss 3.5224 LearningRate 0.0336 Epoch: 13 Global Step: 141450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:54:59,053-Speed 5442.69 samples/sec Loss 3.5466 LearningRate 0.0336 Epoch: 13 Global Step: 141460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:55:06,538-Speed 5473.21 samples/sec Loss 3.5010 LearningRate 0.0336 Epoch: 13 Global Step: 141470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:55:14,059-Speed 5446.85 samples/sec Loss 3.5455 LearningRate 0.0336 Epoch: 13 Global Step: 141480 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:55:21,527-Speed 5485.87 samples/sec Loss 3.5438 LearningRate 0.0336 Epoch: 13 Global Step: 141490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:55:29,029-Speed 5460.40 samples/sec Loss 3.5410 LearningRate 0.0335 Epoch: 13 Global Step: 141500 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:55:36,523-Speed 5465.54 samples/sec Loss 3.5615 LearningRate 0.0335 Epoch: 13 Global Step: 141510 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:55:44,088-Speed 5415.90 samples/sec Loss 3.5430 LearningRate 0.0335 Epoch: 13 Global Step: 141520 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:55:51,697-Speed 5384.05 samples/sec Loss 3.5550 LearningRate 0.0335 Epoch: 13 Global Step: 141530 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:55:59,196-Speed 5462.87 samples/sec Loss 3.5578 LearningRate 0.0335 Epoch: 13 Global Step: 141540 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:56:06,697-Speed 5461.01 samples/sec Loss 3.5518 LearningRate 0.0335 Epoch: 13 Global Step: 141550 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:56:14,282-Speed 5401.36 samples/sec Loss 3.5599 LearningRate 0.0335 Epoch: 13 Global Step: 141560 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 02:56:21,798-Speed 5450.24 samples/sec Loss 3.5867 LearningRate 0.0335 Epoch: 13 Global Step: 141570 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 02:56:29,402-Speed 5386.88 samples/sec Loss 3.5581 LearningRate 0.0335 Epoch: 13 Global Step: 141580 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 02:56:37,021-Speed 5376.84 samples/sec Loss 3.5092 LearningRate 0.0335 Epoch: 13 Global Step: 141590 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 02:56:44,572-Speed 5425.70 samples/sec Loss 3.4811 LearningRate 0.0334 Epoch: 13 Global Step: 141600 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 02:56:52,100-Speed 5441.86 samples/sec Loss 3.5063 LearningRate 0.0334 Epoch: 13 Global Step: 141610 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 02:56:59,649-Speed 5426.02 samples/sec Loss 3.6013 LearningRate 0.0334 Epoch: 13 Global Step: 141620 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 02:57:07,212-Speed 5416.46 samples/sec Loss 3.5057 LearningRate 0.0334 Epoch: 13 Global Step: 141630 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 02:57:14,874-Speed 5346.90 samples/sec Loss 3.5491 LearningRate 0.0334 Epoch: 13 Global Step: 141640 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 02:57:22,403-Speed 5441.29 samples/sec Loss 3.5051 LearningRate 0.0334 Epoch: 13 Global Step: 141650 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 02:57:29,933-Speed 5439.89 samples/sec Loss 3.5224 LearningRate 0.0334 Epoch: 13 Global Step: 141660 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:57:37,458-Speed 5443.95 samples/sec Loss 3.5452 LearningRate 0.0334 Epoch: 13 Global Step: 141670 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:57:44,965-Speed 5456.96 samples/sec Loss 3.5648 LearningRate 0.0334 Epoch: 13 Global Step: 141680 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:57:52,487-Speed 5446.19 samples/sec Loss 3.5141 LearningRate 0.0334 Epoch: 13 Global Step: 141690 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:57:59,977-Speed 5469.39 samples/sec Loss 3.5363 LearningRate 0.0333 Epoch: 13 Global Step: 141700 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:58:07,578-Speed 5388.86 samples/sec Loss 3.4769 LearningRate 0.0333 Epoch: 13 Global Step: 141710 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:58:15,137-Speed 5419.89 samples/sec Loss 3.5051 LearningRate 0.0333 Epoch: 13 Global Step: 141720 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:58:22,768-Speed 5368.41 samples/sec Loss 3.4958 LearningRate 0.0333 Epoch: 13 Global Step: 141730 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:58:30,283-Speed 5450.67 samples/sec Loss 3.4834 LearningRate 0.0333 Epoch: 13 Global Step: 141740 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:58:37,829-Speed 5428.91 samples/sec Loss 3.5229 LearningRate 0.0333 Epoch: 13 Global Step: 141750 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 02:58:45,453-Speed 5373.44 samples/sec Loss 3.5371 LearningRate 0.0333 Epoch: 13 Global Step: 141760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:58:52,932-Speed 5477.24 samples/sec Loss 3.5380 LearningRate 0.0333 Epoch: 13 Global Step: 141770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:59:00,543-Speed 5382.53 samples/sec Loss 3.5345 LearningRate 0.0333 Epoch: 13 Global Step: 141780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:59:08,103-Speed 5417.93 samples/sec Loss 3.5355 LearningRate 0.0333 Epoch: 13 Global Step: 141790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:59:15,650-Speed 5428.45 samples/sec Loss 3.5028 LearningRate 0.0332 Epoch: 13 Global Step: 141800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:59:23,177-Speed 5442.67 samples/sec Loss 3.5285 LearningRate 0.0332 Epoch: 13 Global Step: 141810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:59:30,844-Speed 5342.97 samples/sec Loss 3.4829 LearningRate 0.0332 Epoch: 13 Global Step: 141820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:59:38,369-Speed 5443.67 samples/sec Loss 3.5406 LearningRate 0.0332 Epoch: 13 Global Step: 141830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:59:45,974-Speed 5386.85 samples/sec Loss 3.5100 LearningRate 0.0332 Epoch: 13 Global Step: 141840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 02:59:53,480-Speed 5457.74 samples/sec Loss 3.5430 LearningRate 0.0332 Epoch: 13 Global Step: 141850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:00:00,978-Speed 5463.10 samples/sec Loss 3.4469 LearningRate 0.0332 Epoch: 13 Global Step: 141860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:00:08,559-Speed 5403.55 samples/sec Loss 3.5413 LearningRate 0.0332 Epoch: 13 Global Step: 141870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:00:16,104-Speed 5429.82 samples/sec Loss 3.5100 LearningRate 0.0332 Epoch: 13 Global Step: 141880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:00:23,880-Speed 5268.36 samples/sec Loss 3.5403 LearningRate 0.0332 Epoch: 13 Global Step: 141890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:00:31,451-Speed 5410.37 samples/sec Loss 3.5009 LearningRate 0.0331 Epoch: 13 Global Step: 141900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:00:39,071-Speed 5376.12 samples/sec Loss 3.5314 LearningRate 0.0331 Epoch: 13 Global Step: 141910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:00:46,674-Speed 5388.13 samples/sec Loss 3.5337 LearningRate 0.0331 Epoch: 13 Global Step: 141920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:00:54,244-Speed 5411.87 samples/sec Loss 3.4758 LearningRate 0.0331 Epoch: 13 Global Step: 141930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:01:01,764-Speed 5447.20 samples/sec Loss 3.5342 LearningRate 0.0331 Epoch: 13 Global Step: 141940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:01:09,264-Speed 5462.45 samples/sec Loss 3.5496 LearningRate 0.0331 Epoch: 13 Global Step: 141950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:01:16,793-Speed 5440.56 samples/sec Loss 3.5011 LearningRate 0.0331 Epoch: 13 Global Step: 141960 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:01:24,307-Speed 5452.25 samples/sec Loss 3.5256 LearningRate 0.0331 Epoch: 13 Global Step: 141970 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:01:31,898-Speed 5397.12 samples/sec Loss 3.4880 LearningRate 0.0331 Epoch: 13 Global Step: 141980 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:01:39,455-Speed 5420.64 samples/sec Loss 3.4975 LearningRate 0.0330 Epoch: 13 Global Step: 141990 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:01:47,040-Speed 5400.61 samples/sec Loss 3.5195 LearningRate 0.0330 Epoch: 13 Global Step: 142000 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:02:31,146-[lfw][142000]XNorm: 22.636040 Training: 2022-01-09 03:02:31,147-[lfw][142000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-01-09 03:02:31,147-[lfw][142000]Accuracy-Highest: 0.99817 Training: 2022-01-09 03:03:23,131-[cfp_fp][142000]XNorm: 21.083420 Training: 2022-01-09 03:03:23,131-[cfp_fp][142000]Accuracy-Flip: 0.99129+-0.00445 Training: 2022-01-09 03:03:23,132-[cfp_fp][142000]Accuracy-Highest: 0.99186 Training: 2022-01-09 03:04:07,614-[agedb_30][142000]XNorm: 22.439980 Training: 2022-01-09 03:04:07,615-[agedb_30][142000]Accuracy-Flip: 0.98033+-0.00710 Training: 2022-01-09 03:04:07,615-[agedb_30][142000]Accuracy-Highest: 0.98067 Training: 2022-01-09 03:04:15,332-Speed 276.22 samples/sec Loss 3.5313 LearningRate 0.0330 Epoch: 13 Global Step: 142010 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:04:22,941-Speed 5383.86 samples/sec Loss 3.5241 LearningRate 0.0330 Epoch: 13 Global Step: 142020 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:04:30,489-Speed 5427.10 samples/sec Loss 3.5563 LearningRate 0.0330 Epoch: 13 Global Step: 142030 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:04:38,105-Speed 5379.17 samples/sec Loss 3.4765 LearningRate 0.0330 Epoch: 13 Global Step: 142040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:04:45,654-Speed 5428.98 samples/sec Loss 3.5020 LearningRate 0.0330 Epoch: 13 Global Step: 142050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:04:53,187-Speed 5438.34 samples/sec Loss 3.5165 LearningRate 0.0330 Epoch: 13 Global Step: 142060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:05:00,679-Speed 5468.07 samples/sec Loss 3.5016 LearningRate 0.0330 Epoch: 13 Global Step: 142070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:05:08,245-Speed 5414.33 samples/sec Loss 3.5377 LearningRate 0.0330 Epoch: 13 Global Step: 142080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:05:15,817-Speed 5410.25 samples/sec Loss 3.5269 LearningRate 0.0329 Epoch: 13 Global Step: 142090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:05:23,453-Speed 5364.80 samples/sec Loss 3.4744 LearningRate 0.0329 Epoch: 13 Global Step: 142100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:05:31,112-Speed 5348.27 samples/sec Loss 3.4870 LearningRate 0.0329 Epoch: 13 Global Step: 142110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:05:38,588-Speed 5479.72 samples/sec Loss 3.5061 LearningRate 0.0329 Epoch: 13 Global Step: 142120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:05:46,250-Speed 5346.36 samples/sec Loss 3.5432 LearningRate 0.0329 Epoch: 13 Global Step: 142130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:05:53,877-Speed 5370.85 samples/sec Loss 3.4939 LearningRate 0.0329 Epoch: 13 Global Step: 142140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:06:01,442-Speed 5415.33 samples/sec Loss 3.5022 LearningRate 0.0329 Epoch: 13 Global Step: 142150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:06:09,089-Speed 5356.90 samples/sec Loss 3.4962 LearningRate 0.0329 Epoch: 13 Global Step: 142160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:06:16,696-Speed 5385.37 samples/sec Loss 3.5406 LearningRate 0.0329 Epoch: 13 Global Step: 142170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:06:24,215-Speed 5448.20 samples/sec Loss 3.5298 LearningRate 0.0329 Epoch: 13 Global Step: 142180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:06:31,664-Speed 5499.84 samples/sec Loss 3.5233 LearningRate 0.0328 Epoch: 13 Global Step: 142190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:06:39,203-Speed 5433.59 samples/sec Loss 3.5098 LearningRate 0.0328 Epoch: 13 Global Step: 142200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:06:46,702-Speed 5462.59 samples/sec Loss 3.5065 LearningRate 0.0328 Epoch: 13 Global Step: 142210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:06:54,308-Speed 5386.04 samples/sec Loss 3.5015 LearningRate 0.0328 Epoch: 13 Global Step: 142220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:07:01,814-Speed 5458.09 samples/sec Loss 3.4777 LearningRate 0.0328 Epoch: 13 Global Step: 142230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:07:09,367-Speed 5424.00 samples/sec Loss 3.5322 LearningRate 0.0328 Epoch: 13 Global Step: 142240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:07:16,838-Speed 5482.61 samples/sec Loss 3.5107 LearningRate 0.0328 Epoch: 13 Global Step: 142250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:07:24,371-Speed 5438.84 samples/sec Loss 3.4882 LearningRate 0.0328 Epoch: 13 Global Step: 142260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:07:31,851-Speed 5476.76 samples/sec Loss 3.5072 LearningRate 0.0328 Epoch: 13 Global Step: 142270 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:07:39,361-Speed 5454.53 samples/sec Loss 3.5270 LearningRate 0.0328 Epoch: 13 Global Step: 142280 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:07:46,945-Speed 5401.21 samples/sec Loss 3.4727 LearningRate 0.0327 Epoch: 13 Global Step: 142290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:07:54,480-Speed 5436.85 samples/sec Loss 3.5006 LearningRate 0.0327 Epoch: 13 Global Step: 142300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:08:01,974-Speed 5466.81 samples/sec Loss 3.4861 LearningRate 0.0327 Epoch: 13 Global Step: 142310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:08:09,531-Speed 5420.65 samples/sec Loss 3.5176 LearningRate 0.0327 Epoch: 13 Global Step: 142320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:08:17,081-Speed 5426.06 samples/sec Loss 3.4913 LearningRate 0.0327 Epoch: 13 Global Step: 142330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:08:24,592-Speed 5454.04 samples/sec Loss 3.5148 LearningRate 0.0327 Epoch: 13 Global Step: 142340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:08:32,069-Speed 5478.85 samples/sec Loss 3.5187 LearningRate 0.0327 Epoch: 13 Global Step: 142350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:08:39,545-Speed 5479.75 samples/sec Loss 3.4566 LearningRate 0.0327 Epoch: 13 Global Step: 142360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:08:47,063-Speed 5448.38 samples/sec Loss 3.5406 LearningRate 0.0327 Epoch: 13 Global Step: 142370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:08:54,596-Speed 5437.95 samples/sec Loss 3.5229 LearningRate 0.0327 Epoch: 13 Global Step: 142380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:09:02,164-Speed 5413.47 samples/sec Loss 3.4885 LearningRate 0.0326 Epoch: 13 Global Step: 142390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-09 03:09:09,791-Speed 5371.30 samples/sec Loss 3.4761 LearningRate 0.0326 Epoch: 13 Global Step: 142400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-09 03:09:17,239-Speed 5500.01 samples/sec Loss 3.4635 LearningRate 0.0326 Epoch: 13 Global Step: 142410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:09:24,671-Speed 5511.50 samples/sec Loss 3.4529 LearningRate 0.0326 Epoch: 13 Global Step: 142420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:09:32,220-Speed 5426.85 samples/sec Loss 3.4611 LearningRate 0.0326 Epoch: 13 Global Step: 142430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:09:39,757-Speed 5435.13 samples/sec Loss 3.4836 LearningRate 0.0326 Epoch: 13 Global Step: 142440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:09:47,312-Speed 5422.26 samples/sec Loss 3.4668 LearningRate 0.0326 Epoch: 13 Global Step: 142450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:09:54,903-Speed 5396.30 samples/sec Loss 3.5194 LearningRate 0.0326 Epoch: 13 Global Step: 142460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:10:02,440-Speed 5435.17 samples/sec Loss 3.5112 LearningRate 0.0326 Epoch: 13 Global Step: 142470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:10:09,994-Speed 5423.60 samples/sec Loss 3.5039 LearningRate 0.0326 Epoch: 13 Global Step: 142480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:10:17,539-Speed 5429.18 samples/sec Loss 3.4488 LearningRate 0.0325 Epoch: 13 Global Step: 142490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:10:25,077-Speed 5434.23 samples/sec Loss 3.4953 LearningRate 0.0325 Epoch: 13 Global Step: 142500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:10:32,584-Speed 5457.36 samples/sec Loss 3.4744 LearningRate 0.0325 Epoch: 13 Global Step: 142510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-09 03:10:40,065-Speed 5475.63 samples/sec Loss 3.5033 LearningRate 0.0325 Epoch: 13 Global Step: 142520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:10:47,536-Speed 5483.46 samples/sec Loss 3.4426 LearningRate 0.0325 Epoch: 13 Global Step: 142530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:10:55,001-Speed 5487.30 samples/sec Loss 3.4897 LearningRate 0.0325 Epoch: 13 Global Step: 142540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:11:02,473-Speed 5482.77 samples/sec Loss 3.4300 LearningRate 0.0325 Epoch: 13 Global Step: 142550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:11:10,065-Speed 5396.38 samples/sec Loss 3.4457 LearningRate 0.0325 Epoch: 13 Global Step: 142560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:11:17,700-Speed 5365.08 samples/sec Loss 3.4843 LearningRate 0.0325 Epoch: 13 Global Step: 142570 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:11:25,214-Speed 5452.05 samples/sec Loss 3.4545 LearningRate 0.0325 Epoch: 13 Global Step: 142580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:11:32,876-Speed 5345.87 samples/sec Loss 3.5238 LearningRate 0.0324 Epoch: 13 Global Step: 142590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:11:40,347-Speed 5483.90 samples/sec Loss 3.4852 LearningRate 0.0324 Epoch: 13 Global Step: 142600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:11:47,871-Speed 5444.78 samples/sec Loss 3.4509 LearningRate 0.0324 Epoch: 13 Global Step: 142610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:11:55,339-Speed 5485.09 samples/sec Loss 3.4417 LearningRate 0.0324 Epoch: 13 Global Step: 142620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:12:02,874-Speed 5436.67 samples/sec Loss 3.5090 LearningRate 0.0324 Epoch: 13 Global Step: 142630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:12:10,466-Speed 5396.12 samples/sec Loss 3.4766 LearningRate 0.0324 Epoch: 13 Global Step: 142640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:12:17,941-Speed 5480.78 samples/sec Loss 3.4925 LearningRate 0.0324 Epoch: 13 Global Step: 142650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:12:25,482-Speed 5432.10 samples/sec Loss 3.4340 LearningRate 0.0324 Epoch: 13 Global Step: 142660 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:12:33,023-Speed 5432.24 samples/sec Loss 3.4770 LearningRate 0.0324 Epoch: 13 Global Step: 142670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:12:40,622-Speed 5391.45 samples/sec Loss 3.4565 LearningRate 0.0324 Epoch: 13 Global Step: 142680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:12:48,089-Speed 5486.00 samples/sec Loss 3.4687 LearningRate 0.0323 Epoch: 13 Global Step: 142690 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:12:55,573-Speed 5473.78 samples/sec Loss 3.5069 LearningRate 0.0323 Epoch: 13 Global Step: 142700 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:13:03,231-Speed 5349.03 samples/sec Loss 3.4947 LearningRate 0.0323 Epoch: 13 Global Step: 142710 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:13:10,899-Speed 5342.71 samples/sec Loss 3.4872 LearningRate 0.0323 Epoch: 13 Global Step: 142720 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:13:18,392-Speed 5466.60 samples/sec Loss 3.4969 LearningRate 0.0323 Epoch: 13 Global Step: 142730 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:13:25,931-Speed 5433.70 samples/sec Loss 3.4282 LearningRate 0.0323 Epoch: 13 Global Step: 142740 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:13:33,449-Speed 5449.10 samples/sec Loss 3.4640 LearningRate 0.0323 Epoch: 13 Global Step: 142750 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:13:40,924-Speed 5480.55 samples/sec Loss 3.4203 LearningRate 0.0323 Epoch: 13 Global Step: 142760 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:13:48,408-Speed 5473.75 samples/sec Loss 3.4907 LearningRate 0.0323 Epoch: 13 Global Step: 142770 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:13:55,956-Speed 5427.12 samples/sec Loss 3.4920 LearningRate 0.0323 Epoch: 13 Global Step: 142780 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:14:03,508-Speed 5424.10 samples/sec Loss 3.4577 LearningRate 0.0322 Epoch: 13 Global Step: 142790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:14:11,092-Speed 5402.26 samples/sec Loss 3.4854 LearningRate 0.0322 Epoch: 13 Global Step: 142800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:14:18,590-Speed 5463.18 samples/sec Loss 3.4785 LearningRate 0.0322 Epoch: 13 Global Step: 142810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:14:26,072-Speed 5475.09 samples/sec Loss 3.4509 LearningRate 0.0322 Epoch: 13 Global Step: 142820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:14:33,549-Speed 5478.28 samples/sec Loss 3.5226 LearningRate 0.0322 Epoch: 13 Global Step: 142830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:14:41,048-Speed 5463.85 samples/sec Loss 3.4898 LearningRate 0.0322 Epoch: 13 Global Step: 142840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:14:48,627-Speed 5404.72 samples/sec Loss 3.5113 LearningRate 0.0322 Epoch: 13 Global Step: 142850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:14:56,224-Speed 5392.23 samples/sec Loss 3.4912 LearningRate 0.0322 Epoch: 13 Global Step: 142860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:15:03,754-Speed 5440.40 samples/sec Loss 3.4733 LearningRate 0.0322 Epoch: 13 Global Step: 142870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:15:11,236-Speed 5475.39 samples/sec Loss 3.4200 LearningRate 0.0322 Epoch: 13 Global Step: 142880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:15:18,786-Speed 5426.26 samples/sec Loss 3.4502 LearningRate 0.0321 Epoch: 13 Global Step: 142890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:15:26,271-Speed 5472.48 samples/sec Loss 3.4730 LearningRate 0.0321 Epoch: 13 Global Step: 142900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:15:33,774-Speed 5459.50 samples/sec Loss 3.4839 LearningRate 0.0321 Epoch: 13 Global Step: 142910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:15:41,276-Speed 5460.60 samples/sec Loss 3.4557 LearningRate 0.0321 Epoch: 13 Global Step: 142920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:15:48,771-Speed 5466.26 samples/sec Loss 3.4291 LearningRate 0.0321 Epoch: 13 Global Step: 142930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:15:56,284-Speed 5452.37 samples/sec Loss 3.4423 LearningRate 0.0321 Epoch: 13 Global Step: 142940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:16:03,914-Speed 5368.94 samples/sec Loss 3.4215 LearningRate 0.0321 Epoch: 13 Global Step: 142950 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:16:11,386-Speed 5482.60 samples/sec Loss 3.4391 LearningRate 0.0321 Epoch: 13 Global Step: 142960 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:16:18,845-Speed 5491.87 samples/sec Loss 3.5081 LearningRate 0.0321 Epoch: 13 Global Step: 142970 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:16:26,423-Speed 5406.02 samples/sec Loss 3.4811 LearningRate 0.0321 Epoch: 13 Global Step: 142980 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:16:33,959-Speed 5435.99 samples/sec Loss 3.4696 LearningRate 0.0320 Epoch: 13 Global Step: 142990 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:16:41,598-Speed 5362.45 samples/sec Loss 3.4411 LearningRate 0.0320 Epoch: 13 Global Step: 143000 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:16:49,234-Speed 5365.52 samples/sec Loss 3.4388 LearningRate 0.0320 Epoch: 13 Global Step: 143010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:16:56,765-Speed 5439.92 samples/sec Loss 3.4725 LearningRate 0.0320 Epoch: 13 Global Step: 143020 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:17:04,397-Speed 5367.15 samples/sec Loss 3.4149 LearningRate 0.0320 Epoch: 13 Global Step: 143030 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:17:11,886-Speed 5470.52 samples/sec Loss 3.4676 LearningRate 0.0320 Epoch: 13 Global Step: 143040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:17:19,419-Speed 5438.28 samples/sec Loss 3.4645 LearningRate 0.0320 Epoch: 13 Global Step: 143050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:17:26,991-Speed 5409.99 samples/sec Loss 3.4993 LearningRate 0.0320 Epoch: 13 Global Step: 143060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:17:34,654-Speed 5346.15 samples/sec Loss 3.4370 LearningRate 0.0320 Epoch: 13 Global Step: 143070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:17:42,239-Speed 5401.13 samples/sec Loss 3.4581 LearningRate 0.0320 Epoch: 13 Global Step: 143080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:17:49,905-Speed 5343.87 samples/sec Loss 3.4467 LearningRate 0.0319 Epoch: 13 Global Step: 143090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:17:57,409-Speed 5459.11 samples/sec Loss 3.4551 LearningRate 0.0319 Epoch: 13 Global Step: 143100 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:18:04,885-Speed 5479.64 samples/sec Loss 3.4367 LearningRate 0.0319 Epoch: 13 Global Step: 143110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:18:12,550-Speed 5344.39 samples/sec Loss 3.4859 LearningRate 0.0319 Epoch: 13 Global Step: 143120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:18:20,046-Speed 5465.51 samples/sec Loss 3.4491 LearningRate 0.0319 Epoch: 13 Global Step: 143130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:18:27,612-Speed 5414.17 samples/sec Loss 3.3960 LearningRate 0.0319 Epoch: 13 Global Step: 143140 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:18:35,125-Speed 5452.55 samples/sec Loss 3.3988 LearningRate 0.0319 Epoch: 13 Global Step: 143150 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:18:42,681-Speed 5421.22 samples/sec Loss 3.4279 LearningRate 0.0319 Epoch: 13 Global Step: 143160 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:18:50,341-Speed 5348.69 samples/sec Loss 3.4788 LearningRate 0.0319 Epoch: 13 Global Step: 143170 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:18:57,949-Speed 5384.52 samples/sec Loss 3.4476 LearningRate 0.0319 Epoch: 13 Global Step: 143180 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:19:05,558-Speed 5383.95 samples/sec Loss 3.4308 LearningRate 0.0318 Epoch: 13 Global Step: 143190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:19:13,033-Speed 5480.16 samples/sec Loss 3.4111 LearningRate 0.0318 Epoch: 13 Global Step: 143200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:19:20,580-Speed 5427.97 samples/sec Loss 3.4317 LearningRate 0.0318 Epoch: 13 Global Step: 143210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:19:28,037-Speed 5493.60 samples/sec Loss 3.4621 LearningRate 0.0318 Epoch: 13 Global Step: 143220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:19:35,604-Speed 5413.88 samples/sec Loss 3.3980 LearningRate 0.0318 Epoch: 13 Global Step: 143230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:19:43,064-Speed 5491.65 samples/sec Loss 3.4066 LearningRate 0.0318 Epoch: 13 Global Step: 143240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:19:50,647-Speed 5402.46 samples/sec Loss 3.4572 LearningRate 0.0318 Epoch: 13 Global Step: 143250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:19:58,590-Speed 5157.14 samples/sec Loss 3.4548 LearningRate 0.0318 Epoch: 13 Global Step: 143260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:20:06,074-Speed 5474.06 samples/sec Loss 3.4458 LearningRate 0.0318 Epoch: 13 Global Step: 143270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:20:13,578-Speed 5459.35 samples/sec Loss 3.4429 LearningRate 0.0318 Epoch: 13 Global Step: 143280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:20:21,159-Speed 5403.95 samples/sec Loss 3.4141 LearningRate 0.0317 Epoch: 13 Global Step: 143290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:20:28,722-Speed 5416.14 samples/sec Loss 3.4680 LearningRate 0.0317 Epoch: 13 Global Step: 143300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:20:36,226-Speed 5459.38 samples/sec Loss 3.4433 LearningRate 0.0317 Epoch: 13 Global Step: 143310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:20:43,697-Speed 5483.11 samples/sec Loss 3.4450 LearningRate 0.0317 Epoch: 13 Global Step: 143320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:20:51,160-Speed 5489.41 samples/sec Loss 3.4210 LearningRate 0.0317 Epoch: 13 Global Step: 143330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-09 03:20:58,628-Speed 5485.05 samples/sec Loss 3.4469 LearningRate 0.0317 Epoch: 13 Global Step: 143340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:21:06,167-Speed 5433.85 samples/sec Loss 3.4599 LearningRate 0.0317 Epoch: 13 Global Step: 143350 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:21:13,664-Speed 5464.63 samples/sec Loss 3.4730 LearningRate 0.0317 Epoch: 13 Global Step: 143360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:21:21,314-Speed 5354.94 samples/sec Loss 3.4040 LearningRate 0.0317 Epoch: 13 Global Step: 143370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:21:28,968-Speed 5352.02 samples/sec Loss 3.4078 LearningRate 0.0317 Epoch: 13 Global Step: 143380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:21:36,626-Speed 5349.58 samples/sec Loss 3.4275 LearningRate 0.0316 Epoch: 13 Global Step: 143390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:21:44,219-Speed 5395.24 samples/sec Loss 3.4652 LearningRate 0.0316 Epoch: 13 Global Step: 143400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:21:51,706-Speed 5471.42 samples/sec Loss 3.4657 LearningRate 0.0316 Epoch: 13 Global Step: 143410 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:21:59,295-Speed 5398.31 samples/sec Loss 3.4286 LearningRate 0.0316 Epoch: 13 Global Step: 143420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:22:06,871-Speed 5407.34 samples/sec Loss 3.4292 LearningRate 0.0316 Epoch: 13 Global Step: 143430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:22:14,506-Speed 5365.30 samples/sec Loss 3.4239 LearningRate 0.0316 Epoch: 13 Global Step: 143440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:22:22,126-Speed 5376.33 samples/sec Loss 3.4595 LearningRate 0.0316 Epoch: 13 Global Step: 143450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:22:29,899-Speed 5270.44 samples/sec Loss 3.4360 LearningRate 0.0316 Epoch: 13 Global Step: 143460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:22:37,544-Speed 5358.52 samples/sec Loss 3.4638 LearningRate 0.0316 Epoch: 13 Global Step: 143470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:22:45,104-Speed 5418.47 samples/sec Loss 3.4832 LearningRate 0.0316 Epoch: 13 Global Step: 143480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:22:52,667-Speed 5416.94 samples/sec Loss 3.4714 LearningRate 0.0316 Epoch: 13 Global Step: 143490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:23:00,366-Speed 5321.12 samples/sec Loss 3.4026 LearningRate 0.0315 Epoch: 13 Global Step: 143500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:23:07,923-Speed 5420.75 samples/sec Loss 3.4390 LearningRate 0.0315 Epoch: 13 Global Step: 143510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:23:15,467-Speed 5429.85 samples/sec Loss 3.3857 LearningRate 0.0315 Epoch: 13 Global Step: 143520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:23:23,043-Speed 5407.44 samples/sec Loss 3.4520 LearningRate 0.0315 Epoch: 13 Global Step: 143530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:23:30,581-Speed 5434.64 samples/sec Loss 3.3733 LearningRate 0.0315 Epoch: 13 Global Step: 143540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:23:38,092-Speed 5454.04 samples/sec Loss 3.4200 LearningRate 0.0315 Epoch: 13 Global Step: 143550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-09 03:23:45,556-Speed 5487.74 samples/sec Loss 3.4071 LearningRate 0.0315 Epoch: 13 Global Step: 143560 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:23:53,068-Speed 5453.83 samples/sec Loss 3.4300 LearningRate 0.0315 Epoch: 13 Global Step: 143570 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:24:00,557-Speed 5469.94 samples/sec Loss 3.4789 LearningRate 0.0315 Epoch: 13 Global Step: 143580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:24:08,114-Speed 5420.89 samples/sec Loss 3.4345 LearningRate 0.0315 Epoch: 13 Global Step: 143590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:24:15,629-Speed 5450.64 samples/sec Loss 3.4035 LearningRate 0.0314 Epoch: 13 Global Step: 143600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:24:23,204-Speed 5408.43 samples/sec Loss 3.4065 LearningRate 0.0314 Epoch: 13 Global Step: 143610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:24:30,783-Speed 5405.21 samples/sec Loss 3.4595 LearningRate 0.0314 Epoch: 13 Global Step: 143620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:24:38,285-Speed 5460.55 samples/sec Loss 3.4158 LearningRate 0.0314 Epoch: 13 Global Step: 143630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:24:45,886-Speed 5389.13 samples/sec Loss 3.4445 LearningRate 0.0314 Epoch: 13 Global Step: 143640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:24:53,469-Speed 5402.24 samples/sec Loss 3.4437 LearningRate 0.0314 Epoch: 13 Global Step: 143650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:25:00,906-Speed 5508.29 samples/sec Loss 3.4482 LearningRate 0.0314 Epoch: 13 Global Step: 143660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:25:08,361-Speed 5495.33 samples/sec Loss 3.4679 LearningRate 0.0314 Epoch: 13 Global Step: 143670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:25:15,879-Speed 5448.54 samples/sec Loss 3.4121 LearningRate 0.0314 Epoch: 13 Global Step: 143680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:25:23,346-Speed 5486.56 samples/sec Loss 3.4034 LearningRate 0.0314 Epoch: 13 Global Step: 143690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:25:30,865-Speed 5448.05 samples/sec Loss 3.4507 LearningRate 0.0313 Epoch: 13 Global Step: 143700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:25:38,376-Speed 5454.50 samples/sec Loss 3.4084 LearningRate 0.0313 Epoch: 13 Global Step: 143710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:25:45,881-Speed 5457.63 samples/sec Loss 3.4588 LearningRate 0.0313 Epoch: 13 Global Step: 143720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:25:53,381-Speed 5462.23 samples/sec Loss 3.4282 LearningRate 0.0313 Epoch: 13 Global Step: 143730 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:26:00,856-Speed 5480.39 samples/sec Loss 3.4497 LearningRate 0.0313 Epoch: 13 Global Step: 143740 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:26:08,308-Speed 5497.23 samples/sec Loss 3.4352 LearningRate 0.0313 Epoch: 13 Global Step: 143750 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 03:26:15,817-Speed 5455.66 samples/sec Loss 3.4037 LearningRate 0.0313 Epoch: 13 Global Step: 143760 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 03:26:23,325-Speed 5455.77 samples/sec Loss 3.4280 LearningRate 0.0313 Epoch: 13 Global Step: 143770 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 03:26:30,929-Speed 5387.55 samples/sec Loss 3.4511 LearningRate 0.0313 Epoch: 13 Global Step: 143780 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 03:26:38,456-Speed 5442.40 samples/sec Loss 3.4269 LearningRate 0.0313 Epoch: 13 Global Step: 143790 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 03:26:45,985-Speed 5440.64 samples/sec Loss 3.4508 LearningRate 0.0312 Epoch: 13 Global Step: 143800 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 03:26:53,508-Speed 5445.45 samples/sec Loss 3.4248 LearningRate 0.0312 Epoch: 13 Global Step: 143810 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 03:27:00,990-Speed 5475.39 samples/sec Loss 3.4311 LearningRate 0.0312 Epoch: 13 Global Step: 143820 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 03:27:08,478-Speed 5470.44 samples/sec Loss 3.3968 LearningRate 0.0312 Epoch: 13 Global Step: 143830 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 03:27:15,964-Speed 5472.56 samples/sec Loss 3.4048 LearningRate 0.0312 Epoch: 13 Global Step: 143840 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-09 03:27:23,406-Speed 5504.92 samples/sec Loss 3.4132 LearningRate 0.0312 Epoch: 13 Global Step: 143850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:27:30,908-Speed 5460.11 samples/sec Loss 3.3981 LearningRate 0.0312 Epoch: 13 Global Step: 143860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:27:38,525-Speed 5378.35 samples/sec Loss 3.3877 LearningRate 0.0312 Epoch: 13 Global Step: 143870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:27:46,065-Speed 5432.69 samples/sec Loss 3.3868 LearningRate 0.0312 Epoch: 13 Global Step: 143880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:27:53,702-Speed 5364.39 samples/sec Loss 3.3617 LearningRate 0.0312 Epoch: 13 Global Step: 143890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:28:01,170-Speed 5485.46 samples/sec Loss 3.3935 LearningRate 0.0311 Epoch: 13 Global Step: 143900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:28:08,757-Speed 5399.10 samples/sec Loss 3.4009 LearningRate 0.0311 Epoch: 13 Global Step: 143910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:28:16,291-Speed 5437.63 samples/sec Loss 3.4047 LearningRate 0.0311 Epoch: 13 Global Step: 143920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:28:23,909-Speed 5377.40 samples/sec Loss 3.4320 LearningRate 0.0311 Epoch: 13 Global Step: 143930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:28:31,442-Speed 5438.47 samples/sec Loss 3.3987 LearningRate 0.0311 Epoch: 13 Global Step: 143940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:28:39,182-Speed 5292.27 samples/sec Loss 3.3920 LearningRate 0.0311 Epoch: 13 Global Step: 143950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:28:46,617-Speed 5510.15 samples/sec Loss 3.4140 LearningRate 0.0311 Epoch: 13 Global Step: 143960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:28:54,173-Speed 5421.97 samples/sec Loss 3.4013 LearningRate 0.0311 Epoch: 13 Global Step: 143970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:29:01,664-Speed 5468.41 samples/sec Loss 3.4045 LearningRate 0.0311 Epoch: 13 Global Step: 143980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:29:09,139-Speed 5480.46 samples/sec Loss 3.4071 LearningRate 0.0311 Epoch: 13 Global Step: 143990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:29:16,726-Speed 5399.78 samples/sec Loss 3.3462 LearningRate 0.0310 Epoch: 13 Global Step: 144000 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:30:00,778-[lfw][144000]XNorm: 23.763988 Training: 2022-01-09 03:30:00,778-[lfw][144000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-01-09 03:30:00,779-[lfw][144000]Accuracy-Highest: 0.99817 Training: 2022-01-09 03:30:52,312-[cfp_fp][144000]XNorm: 22.170782 Training: 2022-01-09 03:30:52,313-[cfp_fp][144000]Accuracy-Flip: 0.99271+-0.00341 Training: 2022-01-09 03:30:52,314-[cfp_fp][144000]Accuracy-Highest: 0.99271 Training: 2022-01-09 03:31:36,814-[agedb_30][144000]XNorm: 23.898059 Training: 2022-01-09 03:31:36,815-[agedb_30][144000]Accuracy-Flip: 0.98033+-0.00823 Training: 2022-01-09 03:31:36,816-[agedb_30][144000]Accuracy-Highest: 0.98067 Training: 2022-01-09 03:31:44,454-Speed 277.27 samples/sec Loss 3.3822 LearningRate 0.0310 Epoch: 13 Global Step: 144010 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:31:51,971-Speed 5449.91 samples/sec Loss 3.4086 LearningRate 0.0310 Epoch: 13 Global Step: 144020 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:31:59,650-Speed 5334.95 samples/sec Loss 3.4000 LearningRate 0.0310 Epoch: 13 Global Step: 144030 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:32:07,230-Speed 5404.03 samples/sec Loss 3.3529 LearningRate 0.0310 Epoch: 13 Global Step: 144040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:32:14,745-Speed 5451.75 samples/sec Loss 3.4107 LearningRate 0.0310 Epoch: 13 Global Step: 144050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:32:22,298-Speed 5423.58 samples/sec Loss 3.3985 LearningRate 0.0310 Epoch: 13 Global Step: 144060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:32:29,860-Speed 5417.15 samples/sec Loss 3.3871 LearningRate 0.0310 Epoch: 13 Global Step: 144070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:32:37,397-Speed 5435.34 samples/sec Loss 3.3837 LearningRate 0.0310 Epoch: 13 Global Step: 144080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:32:44,875-Speed 5478.59 samples/sec Loss 3.3713 LearningRate 0.0310 Epoch: 13 Global Step: 144090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-01-09 03:32:52,472-Speed 5392.38 samples/sec Loss 3.4575 LearningRate 0.0310 Epoch: 13 Global Step: 144100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:33:00,015-Speed 5430.41 samples/sec Loss 3.4036 LearningRate 0.0309 Epoch: 13 Global Step: 144110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-09 03:33:07,521-Speed 5457.59 samples/sec Loss 3.3861 LearningRate 0.0309 Epoch: 13 Global Step: 144120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:33:15,116-Speed 5394.39 samples/sec Loss 3.3730 LearningRate 0.0309 Epoch: 13 Global Step: 144130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:33:22,673-Speed 5420.38 samples/sec Loss 3.4039 LearningRate 0.0309 Epoch: 13 Global Step: 144140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:33:30,203-Speed 5440.39 samples/sec Loss 3.4242 LearningRate 0.0309 Epoch: 13 Global Step: 144150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:33:37,666-Speed 5488.56 samples/sec Loss 3.3926 LearningRate 0.0309 Epoch: 13 Global Step: 144160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:33:45,129-Speed 5489.52 samples/sec Loss 3.3838 LearningRate 0.0309 Epoch: 13 Global Step: 144170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:33:52,641-Speed 5453.43 samples/sec Loss 3.4120 LearningRate 0.0309 Epoch: 13 Global Step: 144180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:34:00,074-Speed 5511.02 samples/sec Loss 3.3582 LearningRate 0.0309 Epoch: 13 Global Step: 144190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:34:07,526-Speed 5497.47 samples/sec Loss 3.3788 LearningRate 0.0309 Epoch: 13 Global Step: 144200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:34:15,020-Speed 5466.20 samples/sec Loss 3.3859 LearningRate 0.0308 Epoch: 13 Global Step: 144210 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:34:22,514-Speed 5466.59 samples/sec Loss 3.3972 LearningRate 0.0308 Epoch: 13 Global Step: 144220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:34:30,041-Speed 5442.27 samples/sec Loss 3.3889 LearningRate 0.0308 Epoch: 13 Global Step: 144230 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:34:37,644-Speed 5388.67 samples/sec Loss 3.3942 LearningRate 0.0308 Epoch: 13 Global Step: 144240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:34:45,190-Speed 5428.93 samples/sec Loss 3.4177 LearningRate 0.0308 Epoch: 13 Global Step: 144250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:34:52,701-Speed 5454.41 samples/sec Loss 3.4030 LearningRate 0.0308 Epoch: 13 Global Step: 144260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:35:00,234-Speed 5438.19 samples/sec Loss 3.4114 LearningRate 0.0308 Epoch: 13 Global Step: 144270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:35:07,822-Speed 5398.20 samples/sec Loss 3.3729 LearningRate 0.0308 Epoch: 13 Global Step: 144280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:35:15,407-Speed 5400.46 samples/sec Loss 3.3828 LearningRate 0.0308 Epoch: 13 Global Step: 144290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:35:22,868-Speed 5491.05 samples/sec Loss 3.3699 LearningRate 0.0308 Epoch: 13 Global Step: 144300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:35:30,366-Speed 5463.35 samples/sec Loss 3.4134 LearningRate 0.0307 Epoch: 13 Global Step: 144310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:35:37,870-Speed 5459.04 samples/sec Loss 3.3540 LearningRate 0.0307 Epoch: 13 Global Step: 144320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:35:45,313-Speed 5504.01 samples/sec Loss 3.3604 LearningRate 0.0307 Epoch: 13 Global Step: 144330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:35:52,833-Speed 5447.41 samples/sec Loss 3.3682 LearningRate 0.0307 Epoch: 13 Global Step: 144340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:36:00,381-Speed 5427.01 samples/sec Loss 3.3577 LearningRate 0.0307 Epoch: 13 Global Step: 144350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:36:07,914-Speed 5438.20 samples/sec Loss 3.3945 LearningRate 0.0307 Epoch: 13 Global Step: 144360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:36:15,431-Speed 5450.28 samples/sec Loss 3.3516 LearningRate 0.0307 Epoch: 13 Global Step: 144370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:36:22,879-Speed 5500.19 samples/sec Loss 3.3908 LearningRate 0.0307 Epoch: 13 Global Step: 144380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:36:30,337-Speed 5492.77 samples/sec Loss 3.3637 LearningRate 0.0307 Epoch: 13 Global Step: 144390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:36:37,822-Speed 5472.91 samples/sec Loss 3.3659 LearningRate 0.0307 Epoch: 13 Global Step: 144400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:36:45,298-Speed 5479.41 samples/sec Loss 3.4068 LearningRate 0.0306 Epoch: 13 Global Step: 144410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:36:52,849-Speed 5425.98 samples/sec Loss 3.3662 LearningRate 0.0306 Epoch: 13 Global Step: 144420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:37:00,317-Speed 5485.26 samples/sec Loss 3.3794 LearningRate 0.0306 Epoch: 13 Global Step: 144430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:37:07,799-Speed 5475.18 samples/sec Loss 3.3695 LearningRate 0.0306 Epoch: 13 Global Step: 144440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:37:15,237-Speed 5507.76 samples/sec Loss 3.3885 LearningRate 0.0306 Epoch: 13 Global Step: 144450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:37:22,731-Speed 5466.26 samples/sec Loss 3.3992 LearningRate 0.0306 Epoch: 13 Global Step: 144460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:37:30,220-Speed 5470.21 samples/sec Loss 3.3791 LearningRate 0.0306 Epoch: 13 Global Step: 144470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:37:37,652-Speed 5512.22 samples/sec Loss 3.3996 LearningRate 0.0306 Epoch: 13 Global Step: 144480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:37:45,164-Speed 5453.33 samples/sec Loss 3.3545 LearningRate 0.0306 Epoch: 13 Global Step: 144490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:37:52,615-Speed 5498.30 samples/sec Loss 3.3687 LearningRate 0.0306 Epoch: 13 Global Step: 144500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:38:00,138-Speed 5445.72 samples/sec Loss 3.3764 LearningRate 0.0306 Epoch: 13 Global Step: 144510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:38:07,738-Speed 5389.73 samples/sec Loss 3.3639 LearningRate 0.0305 Epoch: 13 Global Step: 144520 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:38:15,174-Speed 5509.32 samples/sec Loss 3.3827 LearningRate 0.0305 Epoch: 13 Global Step: 144530 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:38:22,743-Speed 5412.07 samples/sec Loss 3.3922 LearningRate 0.0305 Epoch: 13 Global Step: 144540 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:38:30,195-Speed 5497.85 samples/sec Loss 3.3619 LearningRate 0.0305 Epoch: 13 Global Step: 144550 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:38:37,726-Speed 5439.66 samples/sec Loss 3.3630 LearningRate 0.0305 Epoch: 13 Global Step: 144560 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:38:45,343-Speed 5377.38 samples/sec Loss 3.3852 LearningRate 0.0305 Epoch: 13 Global Step: 144570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:38:52,918-Speed 5408.21 samples/sec Loss 3.3493 LearningRate 0.0305 Epoch: 13 Global Step: 144580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:39:00,517-Speed 5390.95 samples/sec Loss 3.3658 LearningRate 0.0305 Epoch: 13 Global Step: 144590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:39:08,024-Speed 5457.43 samples/sec Loss 3.3774 LearningRate 0.0305 Epoch: 13 Global Step: 144600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:39:15,525-Speed 5461.06 samples/sec Loss 3.3991 LearningRate 0.0305 Epoch: 13 Global Step: 144610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:39:23,027-Speed 5460.58 samples/sec Loss 3.3844 LearningRate 0.0304 Epoch: 13 Global Step: 144620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:39:30,504-Speed 5479.13 samples/sec Loss 3.3801 LearningRate 0.0304 Epoch: 13 Global Step: 144630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:39:38,066-Speed 5417.28 samples/sec Loss 3.3715 LearningRate 0.0304 Epoch: 13 Global Step: 144640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:39:45,701-Speed 5365.24 samples/sec Loss 3.4147 LearningRate 0.0304 Epoch: 13 Global Step: 144650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:39:53,208-Speed 5457.71 samples/sec Loss 3.3599 LearningRate 0.0304 Epoch: 13 Global Step: 144660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:40:00,791-Speed 5401.97 samples/sec Loss 3.3763 LearningRate 0.0304 Epoch: 13 Global Step: 144670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:40:08,409-Speed 5377.29 samples/sec Loss 3.3218 LearningRate 0.0304 Epoch: 13 Global Step: 144680 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-09 03:40:15,875-Speed 5486.77 samples/sec Loss 3.3724 LearningRate 0.0304 Epoch: 13 Global Step: 144690 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-09 03:40:23,470-Speed 5394.16 samples/sec Loss 3.3750 LearningRate 0.0304 Epoch: 13 Global Step: 144700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:40:31,085-Speed 5380.35 samples/sec Loss 3.3672 LearningRate 0.0304 Epoch: 13 Global Step: 144710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:40:38,571-Speed 5472.81 samples/sec Loss 3.3605 LearningRate 0.0303 Epoch: 13 Global Step: 144720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:40:46,045-Speed 5480.34 samples/sec Loss 3.3958 LearningRate 0.0303 Epoch: 13 Global Step: 144730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:40:53,577-Speed 5439.22 samples/sec Loss 3.3786 LearningRate 0.0303 Epoch: 13 Global Step: 144740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:41:01,182-Speed 5386.51 samples/sec Loss 3.3999 LearningRate 0.0303 Epoch: 13 Global Step: 144750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:41:08,586-Speed 5533.53 samples/sec Loss 3.3749 LearningRate 0.0303 Epoch: 13 Global Step: 144760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:41:16,025-Speed 5506.32 samples/sec Loss 3.3632 LearningRate 0.0303 Epoch: 13 Global Step: 144770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:41:23,730-Speed 5316.96 samples/sec Loss 3.3907 LearningRate 0.0303 Epoch: 13 Global Step: 144780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:41:31,142-Speed 5526.79 samples/sec Loss 3.3553 LearningRate 0.0303 Epoch: 13 Global Step: 144790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:41:38,713-Speed 5411.37 samples/sec Loss 3.3351 LearningRate 0.0303 Epoch: 13 Global Step: 144800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:41:46,270-Speed 5420.84 samples/sec Loss 3.3142 LearningRate 0.0303 Epoch: 13 Global Step: 144810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:41:53,767-Speed 5464.13 samples/sec Loss 3.3450 LearningRate 0.0303 Epoch: 13 Global Step: 144820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:42:01,245-Speed 5477.78 samples/sec Loss 3.3835 LearningRate 0.0302 Epoch: 13 Global Step: 144830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:42:08,703-Speed 5493.39 samples/sec Loss 3.3394 LearningRate 0.0302 Epoch: 13 Global Step: 144840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:42:16,206-Speed 5459.97 samples/sec Loss 3.3811 LearningRate 0.0302 Epoch: 13 Global Step: 144850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:42:23,721-Speed 5451.00 samples/sec Loss 3.3822 LearningRate 0.0302 Epoch: 13 Global Step: 144860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:42:31,184-Speed 5488.96 samples/sec Loss 3.3701 LearningRate 0.0302 Epoch: 13 Global Step: 144870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:42:38,795-Speed 5382.36 samples/sec Loss 3.3425 LearningRate 0.0302 Epoch: 13 Global Step: 144880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:42:46,339-Speed 5430.60 samples/sec Loss 3.3651 LearningRate 0.0302 Epoch: 13 Global Step: 144890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:42:53,859-Speed 5447.42 samples/sec Loss 3.3808 LearningRate 0.0302 Epoch: 13 Global Step: 144900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:43:01,366-Speed 5457.41 samples/sec Loss 3.3185 LearningRate 0.0302 Epoch: 13 Global Step: 144910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:43:08,841-Speed 5479.75 samples/sec Loss 3.3863 LearningRate 0.0302 Epoch: 13 Global Step: 144920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:43:16,457-Speed 5379.11 samples/sec Loss 3.3951 LearningRate 0.0301 Epoch: 13 Global Step: 144930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:43:24,073-Speed 5378.88 samples/sec Loss 3.3457 LearningRate 0.0301 Epoch: 13 Global Step: 144940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:43:31,560-Speed 5471.69 samples/sec Loss 3.4020 LearningRate 0.0301 Epoch: 13 Global Step: 144950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:43:39,052-Speed 5468.13 samples/sec Loss 3.3679 LearningRate 0.0301 Epoch: 13 Global Step: 144960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:43:46,619-Speed 5413.71 samples/sec Loss 3.4080 LearningRate 0.0301 Epoch: 13 Global Step: 144970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:43:54,097-Speed 5477.37 samples/sec Loss 3.3568 LearningRate 0.0301 Epoch: 13 Global Step: 144980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:44:01,650-Speed 5423.86 samples/sec Loss 3.4152 LearningRate 0.0301 Epoch: 13 Global Step: 144990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:44:09,172-Speed 5445.96 samples/sec Loss 3.3295 LearningRate 0.0301 Epoch: 13 Global Step: 145000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:44:16,643-Speed 5483.75 samples/sec Loss 3.3399 LearningRate 0.0301 Epoch: 13 Global Step: 145010 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:44:24,268-Speed 5372.54 samples/sec Loss 3.3709 LearningRate 0.0301 Epoch: 13 Global Step: 145020 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:44:31,838-Speed 5411.31 samples/sec Loss 3.3717 LearningRate 0.0300 Epoch: 13 Global Step: 145030 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:44:39,346-Speed 5456.35 samples/sec Loss 3.3135 LearningRate 0.0300 Epoch: 13 Global Step: 145040 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:44:46,855-Speed 5455.94 samples/sec Loss 3.3470 LearningRate 0.0300 Epoch: 13 Global Step: 145050 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:44:54,372-Speed 5449.25 samples/sec Loss 3.3133 LearningRate 0.0300 Epoch: 13 Global Step: 145060 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:45:01,893-Speed 5446.77 samples/sec Loss 3.3850 LearningRate 0.0300 Epoch: 13 Global Step: 145070 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:45:09,385-Speed 5468.15 samples/sec Loss 3.3521 LearningRate 0.0300 Epoch: 13 Global Step: 145080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:45:16,930-Speed 5429.62 samples/sec Loss 3.3380 LearningRate 0.0300 Epoch: 13 Global Step: 145090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:45:24,430-Speed 5461.69 samples/sec Loss 3.3512 LearningRate 0.0300 Epoch: 13 Global Step: 145100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:45:32,029-Speed 5391.10 samples/sec Loss 3.3441 LearningRate 0.0300 Epoch: 13 Global Step: 145110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:45:39,667-Speed 5363.35 samples/sec Loss 3.3864 LearningRate 0.0300 Epoch: 13 Global Step: 145120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:45:47,243-Speed 5407.38 samples/sec Loss 3.3465 LearningRate 0.0300 Epoch: 13 Global Step: 145130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:45:54,770-Speed 5442.32 samples/sec Loss 3.3863 LearningRate 0.0299 Epoch: 13 Global Step: 145140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:46:02,287-Speed 5449.92 samples/sec Loss 3.3941 LearningRate 0.0299 Epoch: 13 Global Step: 145150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:46:09,776-Speed 5470.47 samples/sec Loss 3.3928 LearningRate 0.0299 Epoch: 13 Global Step: 145160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:46:17,314-Speed 5434.45 samples/sec Loss 3.3364 LearningRate 0.0299 Epoch: 13 Global Step: 145170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:46:40,053-Speed 1801.42 samples/sec Loss 3.3820 LearningRate 0.0299 Epoch: 14 Global Step: 145180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:46:47,641-Speed 5398.66 samples/sec Loss 3.3693 LearningRate 0.0299 Epoch: 14 Global Step: 145190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:46:55,129-Speed 5471.18 samples/sec Loss 3.3467 LearningRate 0.0299 Epoch: 14 Global Step: 145200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:47:02,596-Speed 5486.66 samples/sec Loss 3.3330 LearningRate 0.0299 Epoch: 14 Global Step: 145210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:47:10,114-Speed 5448.50 samples/sec Loss 3.3657 LearningRate 0.0299 Epoch: 14 Global Step: 145220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:47:17,618-Speed 5459.74 samples/sec Loss 3.3513 LearningRate 0.0299 Epoch: 14 Global Step: 145230 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:47:25,245-Speed 5370.82 samples/sec Loss 3.3500 LearningRate 0.0298 Epoch: 14 Global Step: 145240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:47:32,716-Speed 5483.53 samples/sec Loss 3.3277 LearningRate 0.0298 Epoch: 14 Global Step: 145250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:47:40,272-Speed 5421.58 samples/sec Loss 3.3445 LearningRate 0.0298 Epoch: 14 Global Step: 145260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:47:47,741-Speed 5484.75 samples/sec Loss 3.3326 LearningRate 0.0298 Epoch: 14 Global Step: 145270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:47:55,196-Speed 5494.87 samples/sec Loss 3.3171 LearningRate 0.0298 Epoch: 14 Global Step: 145280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:48:08,750-Speed 3022.49 samples/sec Loss 3.3114 LearningRate 0.0298 Epoch: 14 Global Step: 145290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:48:16,199-Speed 5499.77 samples/sec Loss 3.3222 LearningRate 0.0298 Epoch: 14 Global Step: 145300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:48:23,694-Speed 5465.80 samples/sec Loss 3.3691 LearningRate 0.0298 Epoch: 14 Global Step: 145310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:48:31,159-Speed 5487.55 samples/sec Loss 3.3102 LearningRate 0.0298 Epoch: 14 Global Step: 145320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:48:38,658-Speed 5463.12 samples/sec Loss 3.2955 LearningRate 0.0298 Epoch: 14 Global Step: 145330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:48:46,084-Speed 5516.03 samples/sec Loss 3.3390 LearningRate 0.0297 Epoch: 14 Global Step: 145340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:48:53,614-Speed 5440.78 samples/sec Loss 3.3390 LearningRate 0.0297 Epoch: 14 Global Step: 145350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:49:01,185-Speed 5410.84 samples/sec Loss 3.3108 LearningRate 0.0297 Epoch: 14 Global Step: 145360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:49:08,632-Speed 5500.67 samples/sec Loss 3.3130 LearningRate 0.0297 Epoch: 14 Global Step: 145370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:49:16,124-Speed 5467.90 samples/sec Loss 3.2981 LearningRate 0.0297 Epoch: 14 Global Step: 145380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:49:23,557-Speed 5511.37 samples/sec Loss 3.3036 LearningRate 0.0297 Epoch: 14 Global Step: 145390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:49:31,008-Speed 5498.10 samples/sec Loss 3.3142 LearningRate 0.0297 Epoch: 14 Global Step: 145400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:49:38,526-Speed 5449.28 samples/sec Loss 3.2828 LearningRate 0.0297 Epoch: 14 Global Step: 145410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:49:46,035-Speed 5455.19 samples/sec Loss 3.2764 LearningRate 0.0297 Epoch: 14 Global Step: 145420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:49:53,533-Speed 5464.14 samples/sec Loss 3.3559 LearningRate 0.0297 Epoch: 14 Global Step: 145430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:50:01,085-Speed 5424.56 samples/sec Loss 3.3589 LearningRate 0.0297 Epoch: 14 Global Step: 145440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:50:08,689-Speed 5387.52 samples/sec Loss 3.3043 LearningRate 0.0296 Epoch: 14 Global Step: 145450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:50:16,165-Speed 5479.79 samples/sec Loss 3.3177 LearningRate 0.0296 Epoch: 14 Global Step: 145460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:50:23,731-Speed 5414.39 samples/sec Loss 3.3223 LearningRate 0.0296 Epoch: 14 Global Step: 145470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:50:31,199-Speed 5485.13 samples/sec Loss 3.3402 LearningRate 0.0296 Epoch: 14 Global Step: 145480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:50:38,686-Speed 5471.37 samples/sec Loss 3.3303 LearningRate 0.0296 Epoch: 14 Global Step: 145490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:50:46,196-Speed 5454.99 samples/sec Loss 3.3286 LearningRate 0.0296 Epoch: 14 Global Step: 145500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:50:53,676-Speed 5477.21 samples/sec Loss 3.3241 LearningRate 0.0296 Epoch: 14 Global Step: 145510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:51:01,209-Speed 5437.91 samples/sec Loss 3.3257 LearningRate 0.0296 Epoch: 14 Global Step: 145520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:51:08,791-Speed 5402.68 samples/sec Loss 3.2891 LearningRate 0.0296 Epoch: 14 Global Step: 145530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:51:16,364-Speed 5409.61 samples/sec Loss 3.3078 LearningRate 0.0296 Epoch: 14 Global Step: 145540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:51:23,920-Speed 5422.04 samples/sec Loss 3.2878 LearningRate 0.0295 Epoch: 14 Global Step: 145550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:51:31,397-Speed 5478.71 samples/sec Loss 3.2982 LearningRate 0.0295 Epoch: 14 Global Step: 145560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:51:38,809-Speed 5526.31 samples/sec Loss 3.3362 LearningRate 0.0295 Epoch: 14 Global Step: 145570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:51:46,347-Speed 5435.15 samples/sec Loss 3.3203 LearningRate 0.0295 Epoch: 14 Global Step: 145580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:51:53,869-Speed 5446.46 samples/sec Loss 3.2790 LearningRate 0.0295 Epoch: 14 Global Step: 145590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:52:01,404-Speed 5436.38 samples/sec Loss 3.2714 LearningRate 0.0295 Epoch: 14 Global Step: 145600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:52:08,896-Speed 5467.62 samples/sec Loss 3.3203 LearningRate 0.0295 Epoch: 14 Global Step: 145610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:52:16,426-Speed 5440.74 samples/sec Loss 3.3138 LearningRate 0.0295 Epoch: 14 Global Step: 145620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:52:24,125-Speed 5320.96 samples/sec Loss 3.3314 LearningRate 0.0295 Epoch: 14 Global Step: 145630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:52:31,728-Speed 5388.01 samples/sec Loss 3.2533 LearningRate 0.0295 Epoch: 14 Global Step: 145640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:52:39,157-Speed 5513.90 samples/sec Loss 3.3369 LearningRate 0.0295 Epoch: 14 Global Step: 145650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:52:46,609-Speed 5497.77 samples/sec Loss 3.2975 LearningRate 0.0294 Epoch: 14 Global Step: 145660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 03:52:54,125-Speed 5450.33 samples/sec Loss 3.2915 LearningRate 0.0294 Epoch: 14 Global Step: 145670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:53:01,643-Speed 5448.83 samples/sec Loss 3.2847 LearningRate 0.0294 Epoch: 14 Global Step: 145680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:53:09,205-Speed 5417.34 samples/sec Loss 3.3350 LearningRate 0.0294 Epoch: 14 Global Step: 145690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:53:16,673-Speed 5485.53 samples/sec Loss 3.2757 LearningRate 0.0294 Epoch: 14 Global Step: 145700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:53:24,110-Speed 5508.22 samples/sec Loss 3.3111 LearningRate 0.0294 Epoch: 14 Global Step: 145710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:53:31,597-Speed 5471.75 samples/sec Loss 3.3528 LearningRate 0.0294 Epoch: 14 Global Step: 145720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:53:39,062-Speed 5487.65 samples/sec Loss 3.3247 LearningRate 0.0294 Epoch: 14 Global Step: 145730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:53:46,669-Speed 5385.45 samples/sec Loss 3.3269 LearningRate 0.0294 Epoch: 14 Global Step: 145740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:53:54,164-Speed 5465.99 samples/sec Loss 3.3085 LearningRate 0.0294 Epoch: 14 Global Step: 145750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:54:01,581-Speed 5522.97 samples/sec Loss 3.2934 LearningRate 0.0293 Epoch: 14 Global Step: 145760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:54:09,131-Speed 5426.02 samples/sec Loss 3.3158 LearningRate 0.0293 Epoch: 14 Global Step: 145770 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-09 03:54:16,630-Speed 5462.46 samples/sec Loss 3.3353 LearningRate 0.0293 Epoch: 14 Global Step: 145780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:54:24,203-Speed 5409.53 samples/sec Loss 3.3020 LearningRate 0.0293 Epoch: 14 Global Step: 145790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:54:31,695-Speed 5468.05 samples/sec Loss 3.2848 LearningRate 0.0293 Epoch: 14 Global Step: 145800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:54:39,182-Speed 5471.42 samples/sec Loss 3.3286 LearningRate 0.0293 Epoch: 14 Global Step: 145810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:54:46,662-Speed 5476.52 samples/sec Loss 3.2908 LearningRate 0.0293 Epoch: 14 Global Step: 145820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:54:54,119-Speed 5494.03 samples/sec Loss 3.3046 LearningRate 0.0293 Epoch: 14 Global Step: 145830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:55:01,704-Speed 5401.05 samples/sec Loss 3.3225 LearningRate 0.0293 Epoch: 14 Global Step: 145840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:55:09,207-Speed 5460.01 samples/sec Loss 3.3383 LearningRate 0.0293 Epoch: 14 Global Step: 145850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:55:16,687-Speed 5476.15 samples/sec Loss 3.3273 LearningRate 0.0293 Epoch: 14 Global Step: 145860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:55:24,248-Speed 5418.41 samples/sec Loss 3.2663 LearningRate 0.0292 Epoch: 14 Global Step: 145870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:55:31,720-Speed 5482.90 samples/sec Loss 3.2845 LearningRate 0.0292 Epoch: 14 Global Step: 145880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:55:39,228-Speed 5456.20 samples/sec Loss 3.3043 LearningRate 0.0292 Epoch: 14 Global Step: 145890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:55:46,759-Speed 5438.98 samples/sec Loss 3.3417 LearningRate 0.0292 Epoch: 14 Global Step: 145900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:55:54,208-Speed 5499.87 samples/sec Loss 3.2808 LearningRate 0.0292 Epoch: 14 Global Step: 145910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:56:01,638-Speed 5513.34 samples/sec Loss 3.2794 LearningRate 0.0292 Epoch: 14 Global Step: 145920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:56:09,095-Speed 5493.39 samples/sec Loss 3.2953 LearningRate 0.0292 Epoch: 14 Global Step: 145930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:56:16,548-Speed 5496.89 samples/sec Loss 3.3184 LearningRate 0.0292 Epoch: 14 Global Step: 145940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:56:24,032-Speed 5473.41 samples/sec Loss 3.3026 LearningRate 0.0292 Epoch: 14 Global Step: 145950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:56:31,518-Speed 5472.54 samples/sec Loss 3.2449 LearningRate 0.0292 Epoch: 14 Global Step: 145960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:56:38,977-Speed 5492.54 samples/sec Loss 3.3203 LearningRate 0.0291 Epoch: 14 Global Step: 145970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:56:46,485-Speed 5455.88 samples/sec Loss 3.3001 LearningRate 0.0291 Epoch: 14 Global Step: 145980 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-09 03:56:53,958-Speed 5482.12 samples/sec Loss 3.3064 LearningRate 0.0291 Epoch: 14 Global Step: 145990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:57:01,478-Speed 5447.63 samples/sec Loss 3.3109 LearningRate 0.0291 Epoch: 14 Global Step: 146000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:57:45,345-[lfw][146000]XNorm: 23.394129 Training: 2022-01-09 03:57:45,346-[lfw][146000]Accuracy-Flip: 0.99767+-0.00300 Training: 2022-01-09 03:57:45,346-[lfw][146000]Accuracy-Highest: 0.99817 Training: 2022-01-09 03:58:36,498-[cfp_fp][146000]XNorm: 22.015470 Training: 2022-01-09 03:58:36,499-[cfp_fp][146000]Accuracy-Flip: 0.99214+-0.00405 Training: 2022-01-09 03:58:36,499-[cfp_fp][146000]Accuracy-Highest: 0.99271 Training: 2022-01-09 03:59:20,596-[agedb_30][146000]XNorm: 23.441551 Training: 2022-01-09 03:59:20,597-[agedb_30][146000]Accuracy-Flip: 0.98050+-0.00715 Training: 2022-01-09 03:59:20,597-[agedb_30][146000]Accuracy-Highest: 0.98067 Training: 2022-01-09 03:59:28,251-Speed 279.07 samples/sec Loss 3.3025 LearningRate 0.0291 Epoch: 14 Global Step: 146010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:59:35,818-Speed 5413.82 samples/sec Loss 3.2827 LearningRate 0.0291 Epoch: 14 Global Step: 146020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:59:43,299-Speed 5475.66 samples/sec Loss 3.2648 LearningRate 0.0291 Epoch: 14 Global Step: 146030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:59:50,748-Speed 5500.00 samples/sec Loss 3.3068 LearningRate 0.0291 Epoch: 14 Global Step: 146040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 03:59:58,197-Speed 5499.65 samples/sec Loss 3.2924 LearningRate 0.0291 Epoch: 14 Global Step: 146050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:00:05,684-Speed 5470.94 samples/sec Loss 3.3016 LearningRate 0.0291 Epoch: 14 Global Step: 146060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:00:13,229-Speed 5429.49 samples/sec Loss 3.3212 LearningRate 0.0291 Epoch: 14 Global Step: 146070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:00:20,726-Speed 5464.53 samples/sec Loss 3.3021 LearningRate 0.0290 Epoch: 14 Global Step: 146080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:00:28,276-Speed 5426.11 samples/sec Loss 3.2941 LearningRate 0.0290 Epoch: 14 Global Step: 146090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-09 04:00:35,725-Speed 5499.01 samples/sec Loss 3.3128 LearningRate 0.0290 Epoch: 14 Global Step: 146100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:00:43,160-Speed 5509.88 samples/sec Loss 3.3226 LearningRate 0.0290 Epoch: 14 Global Step: 146110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:00:50,622-Speed 5490.10 samples/sec Loss 3.2993 LearningRate 0.0290 Epoch: 14 Global Step: 146120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:00:58,077-Speed 5495.77 samples/sec Loss 3.2701 LearningRate 0.0290 Epoch: 14 Global Step: 146130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:01:05,577-Speed 5461.67 samples/sec Loss 3.2905 LearningRate 0.0290 Epoch: 14 Global Step: 146140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:01:13,064-Speed 5471.92 samples/sec Loss 3.2582 LearningRate 0.0290 Epoch: 14 Global Step: 146150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:01:20,530-Speed 5487.38 samples/sec Loss 3.3260 LearningRate 0.0290 Epoch: 14 Global Step: 146160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:01:28,005-Speed 5479.91 samples/sec Loss 3.2651 LearningRate 0.0290 Epoch: 14 Global Step: 146170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:01:35,497-Speed 5467.90 samples/sec Loss 3.2841 LearningRate 0.0289 Epoch: 14 Global Step: 146180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:01:42,958-Speed 5490.64 samples/sec Loss 3.3017 LearningRate 0.0289 Epoch: 14 Global Step: 146190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:01:50,414-Speed 5494.46 samples/sec Loss 3.2621 LearningRate 0.0289 Epoch: 14 Global Step: 146200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:01:57,919-Speed 5458.56 samples/sec Loss 3.2507 LearningRate 0.0289 Epoch: 14 Global Step: 146210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:02:05,528-Speed 5383.71 samples/sec Loss 3.2737 LearningRate 0.0289 Epoch: 14 Global Step: 146220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:02:13,074-Speed 5429.00 samples/sec Loss 3.3255 LearningRate 0.0289 Epoch: 14 Global Step: 146230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:02:20,558-Speed 5473.50 samples/sec Loss 3.3166 LearningRate 0.0289 Epoch: 14 Global Step: 146240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:02:27,997-Speed 5507.07 samples/sec Loss 3.2369 LearningRate 0.0289 Epoch: 14 Global Step: 146250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:02:35,565-Speed 5412.81 samples/sec Loss 3.2778 LearningRate 0.0289 Epoch: 14 Global Step: 146260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:02:43,105-Speed 5432.71 samples/sec Loss 3.2951 LearningRate 0.0289 Epoch: 14 Global Step: 146270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:02:50,582-Speed 5479.14 samples/sec Loss 3.3055 LearningRate 0.0289 Epoch: 14 Global Step: 146280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:02:58,058-Speed 5479.61 samples/sec Loss 3.2507 LearningRate 0.0288 Epoch: 14 Global Step: 146290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:03:05,532-Speed 5481.46 samples/sec Loss 3.3003 LearningRate 0.0288 Epoch: 14 Global Step: 146300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:03:12,965-Speed 5511.10 samples/sec Loss 3.2590 LearningRate 0.0288 Epoch: 14 Global Step: 146310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:03:20,494-Speed 5441.18 samples/sec Loss 3.2875 LearningRate 0.0288 Epoch: 14 Global Step: 146320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:03:28,063-Speed 5412.38 samples/sec Loss 3.2568 LearningRate 0.0288 Epoch: 14 Global Step: 146330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:03:35,478-Speed 5524.48 samples/sec Loss 3.2656 LearningRate 0.0288 Epoch: 14 Global Step: 146340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:03:42,986-Speed 5456.46 samples/sec Loss 3.2250 LearningRate 0.0288 Epoch: 14 Global Step: 146350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:03:50,453-Speed 5486.45 samples/sec Loss 3.2988 LearningRate 0.0288 Epoch: 14 Global Step: 146360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:03:57,908-Speed 5495.63 samples/sec Loss 3.3256 LearningRate 0.0288 Epoch: 14 Global Step: 146370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:04:05,368-Speed 5490.97 samples/sec Loss 3.2759 LearningRate 0.0288 Epoch: 14 Global Step: 146380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:04:12,872-Speed 5459.23 samples/sec Loss 3.2490 LearningRate 0.0288 Epoch: 14 Global Step: 146390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:04:20,415-Speed 5430.14 samples/sec Loss 3.3011 LearningRate 0.0287 Epoch: 14 Global Step: 146400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:04:27,851-Speed 5509.51 samples/sec Loss 3.3114 LearningRate 0.0287 Epoch: 14 Global Step: 146410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:04:35,352-Speed 5461.85 samples/sec Loss 3.2472 LearningRate 0.0287 Epoch: 14 Global Step: 146420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:04:42,890-Speed 5433.96 samples/sec Loss 3.2977 LearningRate 0.0287 Epoch: 14 Global Step: 146430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:04:50,347-Speed 5493.89 samples/sec Loss 3.2830 LearningRate 0.0287 Epoch: 14 Global Step: 146440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:04:57,850-Speed 5459.64 samples/sec Loss 3.2747 LearningRate 0.0287 Epoch: 14 Global Step: 146450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:05:05,324-Speed 5481.35 samples/sec Loss 3.2560 LearningRate 0.0287 Epoch: 14 Global Step: 146460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:05:12,910-Speed 5399.59 samples/sec Loss 3.2400 LearningRate 0.0287 Epoch: 14 Global Step: 146470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:05:20,424-Speed 5452.08 samples/sec Loss 3.2658 LearningRate 0.0287 Epoch: 14 Global Step: 146480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:05:27,875-Speed 5498.86 samples/sec Loss 3.2698 LearningRate 0.0287 Epoch: 14 Global Step: 146490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:05:35,346-Speed 5483.27 samples/sec Loss 3.2855 LearningRate 0.0286 Epoch: 14 Global Step: 146500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:05:42,844-Speed 5463.08 samples/sec Loss 3.2609 LearningRate 0.0286 Epoch: 14 Global Step: 146510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:05:50,491-Speed 5357.34 samples/sec Loss 3.3023 LearningRate 0.0286 Epoch: 14 Global Step: 146520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:05:57,913-Speed 5519.49 samples/sec Loss 3.2646 LearningRate 0.0286 Epoch: 14 Global Step: 146530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:06:05,392-Speed 5477.37 samples/sec Loss 3.2391 LearningRate 0.0286 Epoch: 14 Global Step: 146540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:06:12,833-Speed 5505.57 samples/sec Loss 3.2539 LearningRate 0.0286 Epoch: 14 Global Step: 146550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:06:20,289-Speed 5493.85 samples/sec Loss 3.2305 LearningRate 0.0286 Epoch: 14 Global Step: 146560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:06:27,759-Speed 5484.69 samples/sec Loss 3.2471 LearningRate 0.0286 Epoch: 14 Global Step: 146570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:06:35,223-Speed 5488.05 samples/sec Loss 3.2496 LearningRate 0.0286 Epoch: 14 Global Step: 146580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:06:42,685-Speed 5490.34 samples/sec Loss 3.2496 LearningRate 0.0286 Epoch: 14 Global Step: 146590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:06:50,165-Speed 5476.28 samples/sec Loss 3.2463 LearningRate 0.0286 Epoch: 14 Global Step: 146600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:06:57,688-Speed 5445.75 samples/sec Loss 3.2570 LearningRate 0.0285 Epoch: 14 Global Step: 146610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-09 04:07:05,145-Speed 5493.52 samples/sec Loss 3.2900 LearningRate 0.0285 Epoch: 14 Global Step: 146620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:07:12,683-Speed 5434.29 samples/sec Loss 3.2254 LearningRate 0.0285 Epoch: 14 Global Step: 146630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:07:20,143-Speed 5491.14 samples/sec Loss 3.2388 LearningRate 0.0285 Epoch: 14 Global Step: 146640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:07:27,694-Speed 5424.84 samples/sec Loss 3.2583 LearningRate 0.0285 Epoch: 14 Global Step: 146650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:07:35,146-Speed 5497.92 samples/sec Loss 3.2322 LearningRate 0.0285 Epoch: 14 Global Step: 146660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:07:42,709-Speed 5416.59 samples/sec Loss 3.2549 LearningRate 0.0285 Epoch: 14 Global Step: 146670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:07:50,187-Speed 5477.95 samples/sec Loss 3.2755 LearningRate 0.0285 Epoch: 14 Global Step: 146680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:07:57,705-Speed 5449.22 samples/sec Loss 3.2477 LearningRate 0.0285 Epoch: 14 Global Step: 146690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:08:05,206-Speed 5461.40 samples/sec Loss 3.2660 LearningRate 0.0285 Epoch: 14 Global Step: 146700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:08:12,716-Speed 5454.89 samples/sec Loss 3.2607 LearningRate 0.0285 Epoch: 14 Global Step: 146710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:08:20,229-Speed 5452.16 samples/sec Loss 3.2708 LearningRate 0.0284 Epoch: 14 Global Step: 146720 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-09 04:08:27,710-Speed 5475.82 samples/sec Loss 3.2278 LearningRate 0.0284 Epoch: 14 Global Step: 146730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:08:35,205-Speed 5466.30 samples/sec Loss 3.2370 LearningRate 0.0284 Epoch: 14 Global Step: 146740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:08:42,751-Speed 5429.14 samples/sec Loss 3.2649 LearningRate 0.0284 Epoch: 14 Global Step: 146750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:08:50,325-Speed 5407.83 samples/sec Loss 3.2446 LearningRate 0.0284 Epoch: 14 Global Step: 146760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:08:57,833-Speed 5456.31 samples/sec Loss 3.2966 LearningRate 0.0284 Epoch: 14 Global Step: 146770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:09:05,391-Speed 5420.56 samples/sec Loss 3.2428 LearningRate 0.0284 Epoch: 14 Global Step: 146780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:09:12,868-Speed 5478.94 samples/sec Loss 3.2650 LearningRate 0.0284 Epoch: 14 Global Step: 146790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:09:20,362-Speed 5466.48 samples/sec Loss 3.2748 LearningRate 0.0284 Epoch: 14 Global Step: 146800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:09:27,818-Speed 5494.52 samples/sec Loss 3.2584 LearningRate 0.0284 Epoch: 14 Global Step: 146810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:09:35,286-Speed 5485.41 samples/sec Loss 3.2714 LearningRate 0.0283 Epoch: 14 Global Step: 146820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:09:42,783-Speed 5464.38 samples/sec Loss 3.2238 LearningRate 0.0283 Epoch: 14 Global Step: 146830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:09:50,478-Speed 5323.34 samples/sec Loss 3.2593 LearningRate 0.0283 Epoch: 14 Global Step: 146840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:09:58,037-Speed 5419.48 samples/sec Loss 3.2627 LearningRate 0.0283 Epoch: 14 Global Step: 146850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:10:05,566-Speed 5440.64 samples/sec Loss 3.2801 LearningRate 0.0283 Epoch: 14 Global Step: 146860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:10:12,995-Speed 5514.39 samples/sec Loss 3.2819 LearningRate 0.0283 Epoch: 14 Global Step: 146870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:10:20,465-Speed 5484.12 samples/sec Loss 3.2542 LearningRate 0.0283 Epoch: 14 Global Step: 146880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:10:27,924-Speed 5491.67 samples/sec Loss 3.2409 LearningRate 0.0283 Epoch: 14 Global Step: 146890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:10:35,416-Speed 5468.05 samples/sec Loss 3.2168 LearningRate 0.0283 Epoch: 14 Global Step: 146900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:10:43,032-Speed 5378.68 samples/sec Loss 3.2150 LearningRate 0.0283 Epoch: 14 Global Step: 146910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:10:50,537-Speed 5458.89 samples/sec Loss 3.2460 LearningRate 0.0283 Epoch: 14 Global Step: 146920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:10:58,114-Speed 5405.98 samples/sec Loss 3.2570 LearningRate 0.0282 Epoch: 14 Global Step: 146930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:11:05,633-Speed 5448.30 samples/sec Loss 3.2576 LearningRate 0.0282 Epoch: 14 Global Step: 146940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:11:13,159-Speed 5443.83 samples/sec Loss 3.2263 LearningRate 0.0282 Epoch: 14 Global Step: 146950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:11:20,665-Speed 5457.44 samples/sec Loss 3.2449 LearningRate 0.0282 Epoch: 14 Global Step: 146960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:11:28,163-Speed 5463.20 samples/sec Loss 3.2167 LearningRate 0.0282 Epoch: 14 Global Step: 146970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:11:35,632-Speed 5485.36 samples/sec Loss 3.2038 LearningRate 0.0282 Epoch: 14 Global Step: 146980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:11:43,063-Speed 5513.06 samples/sec Loss 3.2369 LearningRate 0.0282 Epoch: 14 Global Step: 146990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:11:50,614-Speed 5424.97 samples/sec Loss 3.2503 LearningRate 0.0282 Epoch: 14 Global Step: 147000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:11:58,175-Speed 5417.84 samples/sec Loss 3.2521 LearningRate 0.0282 Epoch: 14 Global Step: 147010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:12:05,701-Speed 5442.95 samples/sec Loss 3.2224 LearningRate 0.0282 Epoch: 14 Global Step: 147020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:12:13,165-Speed 5488.70 samples/sec Loss 3.2248 LearningRate 0.0282 Epoch: 14 Global Step: 147030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:12:20,668-Speed 5459.72 samples/sec Loss 3.2207 LearningRate 0.0281 Epoch: 14 Global Step: 147040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:12:28,104-Speed 5509.39 samples/sec Loss 3.2667 LearningRate 0.0281 Epoch: 14 Global Step: 147050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:12:35,538-Speed 5510.25 samples/sec Loss 3.2480 LearningRate 0.0281 Epoch: 14 Global Step: 147060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:12:42,999-Speed 5491.19 samples/sec Loss 3.2632 LearningRate 0.0281 Epoch: 14 Global Step: 147070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:12:50,507-Speed 5456.40 samples/sec Loss 3.2427 LearningRate 0.0281 Epoch: 14 Global Step: 147080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:12:58,084-Speed 5405.78 samples/sec Loss 3.2296 LearningRate 0.0281 Epoch: 14 Global Step: 147090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:13:05,566-Speed 5475.43 samples/sec Loss 3.2649 LearningRate 0.0281 Epoch: 14 Global Step: 147100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:13:12,994-Speed 5515.35 samples/sec Loss 3.2366 LearningRate 0.0281 Epoch: 14 Global Step: 147110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:13:20,472-Speed 5478.01 samples/sec Loss 3.2256 LearningRate 0.0281 Epoch: 14 Global Step: 147120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:13:27,981-Speed 5455.38 samples/sec Loss 3.2405 LearningRate 0.0281 Epoch: 14 Global Step: 147130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:13:35,461-Speed 5476.55 samples/sec Loss 3.2450 LearningRate 0.0280 Epoch: 14 Global Step: 147140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:13:42,987-Speed 5443.94 samples/sec Loss 3.1720 LearningRate 0.0280 Epoch: 14 Global Step: 147150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:13:50,531-Speed 5430.00 samples/sec Loss 3.2505 LearningRate 0.0280 Epoch: 14 Global Step: 147160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:13:58,042-Speed 5454.15 samples/sec Loss 3.2162 LearningRate 0.0280 Epoch: 14 Global Step: 147170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:14:05,531-Speed 5469.92 samples/sec Loss 3.1787 LearningRate 0.0280 Epoch: 14 Global Step: 147180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:14:13,038-Speed 5456.98 samples/sec Loss 3.2216 LearningRate 0.0280 Epoch: 14 Global Step: 147190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:14:20,564-Speed 5443.78 samples/sec Loss 3.2881 LearningRate 0.0280 Epoch: 14 Global Step: 147200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:14:28,062-Speed 5463.69 samples/sec Loss 3.2287 LearningRate 0.0280 Epoch: 14 Global Step: 147210 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:14:35,586-Speed 5443.97 samples/sec Loss 3.2425 LearningRate 0.0280 Epoch: 14 Global Step: 147220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:14:43,086-Speed 5462.54 samples/sec Loss 3.1926 LearningRate 0.0280 Epoch: 14 Global Step: 147230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:14:50,643-Speed 5420.43 samples/sec Loss 3.2267 LearningRate 0.0280 Epoch: 14 Global Step: 147240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:14:58,080-Speed 5508.42 samples/sec Loss 3.2293 LearningRate 0.0279 Epoch: 14 Global Step: 147250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:15:05,620-Speed 5433.12 samples/sec Loss 3.2562 LearningRate 0.0279 Epoch: 14 Global Step: 147260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:15:13,105-Speed 5472.81 samples/sec Loss 3.2055 LearningRate 0.0279 Epoch: 14 Global Step: 147270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:15:20,611-Speed 5458.19 samples/sec Loss 3.2041 LearningRate 0.0279 Epoch: 14 Global Step: 147280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:15:28,152-Speed 5432.01 samples/sec Loss 3.2147 LearningRate 0.0279 Epoch: 14 Global Step: 147290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:15:35,679-Speed 5442.97 samples/sec Loss 3.2193 LearningRate 0.0279 Epoch: 14 Global Step: 147300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:15:43,207-Speed 5441.33 samples/sec Loss 3.2031 LearningRate 0.0279 Epoch: 14 Global Step: 147310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:15:50,733-Speed 5443.58 samples/sec Loss 3.2323 LearningRate 0.0279 Epoch: 14 Global Step: 147320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:15:58,385-Speed 5353.44 samples/sec Loss 3.2585 LearningRate 0.0279 Epoch: 14 Global Step: 147330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:16:05,904-Speed 5448.21 samples/sec Loss 3.1961 LearningRate 0.0279 Epoch: 14 Global Step: 147340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:16:13,459-Speed 5422.10 samples/sec Loss 3.2399 LearningRate 0.0279 Epoch: 14 Global Step: 147350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:16:20,997-Speed 5434.26 samples/sec Loss 3.2355 LearningRate 0.0278 Epoch: 14 Global Step: 147360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:16:28,549-Speed 5425.10 samples/sec Loss 3.1976 LearningRate 0.0278 Epoch: 14 Global Step: 147370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:16:36,067-Speed 5448.63 samples/sec Loss 3.2093 LearningRate 0.0278 Epoch: 14 Global Step: 147380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:16:43,561-Speed 5466.36 samples/sec Loss 3.1443 LearningRate 0.0278 Epoch: 14 Global Step: 147390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:16:51,062-Speed 5461.31 samples/sec Loss 3.1905 LearningRate 0.0278 Epoch: 14 Global Step: 147400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:16:58,642-Speed 5404.53 samples/sec Loss 3.2619 LearningRate 0.0278 Epoch: 14 Global Step: 147410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:17:06,105-Speed 5489.67 samples/sec Loss 3.2659 LearningRate 0.0278 Epoch: 14 Global Step: 147420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:17:13,661-Speed 5421.55 samples/sec Loss 3.2169 LearningRate 0.0278 Epoch: 14 Global Step: 147430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:17:21,229-Speed 5412.94 samples/sec Loss 3.1839 LearningRate 0.0278 Epoch: 14 Global Step: 147440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:17:28,673-Speed 5503.05 samples/sec Loss 3.2269 LearningRate 0.0278 Epoch: 14 Global Step: 147450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:17:36,149-Speed 5479.21 samples/sec Loss 3.2197 LearningRate 0.0278 Epoch: 14 Global Step: 147460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:17:43,625-Speed 5479.90 samples/sec Loss 3.1973 LearningRate 0.0277 Epoch: 14 Global Step: 147470 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-09 04:17:51,099-Speed 5480.82 samples/sec Loss 3.2034 LearningRate 0.0277 Epoch: 14 Global Step: 147480 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-09 04:17:58,554-Speed 5494.50 samples/sec Loss 3.2111 LearningRate 0.0277 Epoch: 14 Global Step: 147490 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-09 04:18:06,096-Speed 5432.09 samples/sec Loss 3.2199 LearningRate 0.0277 Epoch: 14 Global Step: 147500 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-09 04:18:13,551-Speed 5495.21 samples/sec Loss 3.2228 LearningRate 0.0277 Epoch: 14 Global Step: 147510 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-09 04:18:20,988-Speed 5507.91 samples/sec Loss 3.2216 LearningRate 0.0277 Epoch: 14 Global Step: 147520 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-09 04:18:28,542-Speed 5423.49 samples/sec Loss 3.2383 LearningRate 0.0277 Epoch: 14 Global Step: 147530 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-09 04:18:36,052-Speed 5454.21 samples/sec Loss 3.1665 LearningRate 0.0277 Epoch: 14 Global Step: 147540 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-09 04:18:43,553-Speed 5461.68 samples/sec Loss 3.1874 LearningRate 0.0277 Epoch: 14 Global Step: 147550 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-09 04:18:51,082-Speed 5440.83 samples/sec Loss 3.2218 LearningRate 0.0277 Epoch: 14 Global Step: 147560 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-09 04:18:58,532-Speed 5499.11 samples/sec Loss 3.1744 LearningRate 0.0276 Epoch: 14 Global Step: 147570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:19:05,994-Speed 5489.65 samples/sec Loss 3.2009 LearningRate 0.0276 Epoch: 14 Global Step: 147580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:19:13,438-Speed 5503.12 samples/sec Loss 3.1807 LearningRate 0.0276 Epoch: 14 Global Step: 147590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:19:20,935-Speed 5464.88 samples/sec Loss 3.1690 LearningRate 0.0276 Epoch: 14 Global Step: 147600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:19:28,519-Speed 5401.04 samples/sec Loss 3.2320 LearningRate 0.0276 Epoch: 14 Global Step: 147610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:19:36,057-Speed 5434.70 samples/sec Loss 3.2028 LearningRate 0.0276 Epoch: 14 Global Step: 147620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:19:43,525-Speed 5485.29 samples/sec Loss 3.1746 LearningRate 0.0276 Epoch: 14 Global Step: 147630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:19:51,063-Speed 5434.95 samples/sec Loss 3.1917 LearningRate 0.0276 Epoch: 14 Global Step: 147640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:19:58,546-Speed 5474.41 samples/sec Loss 3.1927 LearningRate 0.0276 Epoch: 14 Global Step: 147650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:20:06,053-Speed 5457.03 samples/sec Loss 3.1979 LearningRate 0.0276 Epoch: 14 Global Step: 147660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:20:13,623-Speed 5411.50 samples/sec Loss 3.2121 LearningRate 0.0276 Epoch: 14 Global Step: 147670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:20:21,127-Speed 5459.21 samples/sec Loss 3.2171 LearningRate 0.0275 Epoch: 14 Global Step: 147680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:20:28,619-Speed 5468.09 samples/sec Loss 3.2113 LearningRate 0.0275 Epoch: 14 Global Step: 147690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:20:36,165-Speed 5428.30 samples/sec Loss 3.2177 LearningRate 0.0275 Epoch: 14 Global Step: 147700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:20:43,673-Speed 5456.22 samples/sec Loss 3.2298 LearningRate 0.0275 Epoch: 14 Global Step: 147710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:20:51,187-Speed 5452.29 samples/sec Loss 3.2182 LearningRate 0.0275 Epoch: 14 Global Step: 147720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:20:58,665-Speed 5477.95 samples/sec Loss 3.1551 LearningRate 0.0275 Epoch: 14 Global Step: 147730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:21:06,270-Speed 5386.50 samples/sec Loss 3.2037 LearningRate 0.0275 Epoch: 14 Global Step: 147740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:21:13,742-Speed 5482.85 samples/sec Loss 3.1839 LearningRate 0.0275 Epoch: 14 Global Step: 147750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:21:21,305-Speed 5416.75 samples/sec Loss 3.2422 LearningRate 0.0275 Epoch: 14 Global Step: 147760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:21:28,791-Speed 5471.72 samples/sec Loss 3.2439 LearningRate 0.0275 Epoch: 14 Global Step: 147770 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-09 04:21:36,305-Speed 5452.65 samples/sec Loss 3.1794 LearningRate 0.0275 Epoch: 14 Global Step: 147780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:21:43,838-Speed 5438.14 samples/sec Loss 3.2477 LearningRate 0.0274 Epoch: 14 Global Step: 147790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:21:51,463-Speed 5372.50 samples/sec Loss 3.2358 LearningRate 0.0274 Epoch: 14 Global Step: 147800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:21:59,015-Speed 5423.97 samples/sec Loss 3.1524 LearningRate 0.0274 Epoch: 14 Global Step: 147810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:22:06,467-Speed 5498.00 samples/sec Loss 3.1849 LearningRate 0.0274 Epoch: 14 Global Step: 147820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:22:13,963-Speed 5464.56 samples/sec Loss 3.1883 LearningRate 0.0274 Epoch: 14 Global Step: 147830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:22:21,420-Speed 5493.31 samples/sec Loss 3.1837 LearningRate 0.0274 Epoch: 14 Global Step: 147840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:22:28,858-Speed 5507.96 samples/sec Loss 3.1883 LearningRate 0.0274 Epoch: 14 Global Step: 147850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:22:36,407-Speed 5426.48 samples/sec Loss 3.2858 LearningRate 0.0274 Epoch: 14 Global Step: 147860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:22:43,990-Speed 5402.41 samples/sec Loss 3.2200 LearningRate 0.0274 Epoch: 14 Global Step: 147870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:22:51,536-Speed 5428.34 samples/sec Loss 3.1823 LearningRate 0.0274 Epoch: 14 Global Step: 147880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:22:59,050-Speed 5452.43 samples/sec Loss 3.1987 LearningRate 0.0274 Epoch: 14 Global Step: 147890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:23:06,649-Speed 5390.79 samples/sec Loss 3.2102 LearningRate 0.0273 Epoch: 14 Global Step: 147900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:23:14,153-Speed 5459.01 samples/sec Loss 3.1636 LearningRate 0.0273 Epoch: 14 Global Step: 147910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:23:21,692-Speed 5433.66 samples/sec Loss 3.1979 LearningRate 0.0273 Epoch: 14 Global Step: 147920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:23:29,163-Speed 5483.43 samples/sec Loss 3.1929 LearningRate 0.0273 Epoch: 14 Global Step: 147930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:23:36,698-Speed 5436.31 samples/sec Loss 3.1377 LearningRate 0.0273 Epoch: 14 Global Step: 147940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:23:44,199-Speed 5461.69 samples/sec Loss 3.1630 LearningRate 0.0273 Epoch: 14 Global Step: 147950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:23:51,849-Speed 5355.17 samples/sec Loss 3.2197 LearningRate 0.0273 Epoch: 14 Global Step: 147960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:23:59,485-Speed 5364.41 samples/sec Loss 3.2178 LearningRate 0.0273 Epoch: 14 Global Step: 147970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:24:07,150-Speed 5344.65 samples/sec Loss 3.1995 LearningRate 0.0273 Epoch: 14 Global Step: 147980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:24:14,682-Speed 5438.47 samples/sec Loss 3.1938 LearningRate 0.0273 Epoch: 14 Global Step: 147990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:24:22,289-Speed 5385.60 samples/sec Loss 3.1860 LearningRate 0.0273 Epoch: 14 Global Step: 148000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:25:05,994-[lfw][148000]XNorm: 23.198030 Training: 2022-01-09 04:25:05,995-[lfw][148000]Accuracy-Flip: 0.99783+-0.00289 Training: 2022-01-09 04:25:05,995-[lfw][148000]Accuracy-Highest: 0.99817 Training: 2022-01-09 04:25:57,120-[cfp_fp][148000]XNorm: 21.742158 Training: 2022-01-09 04:25:57,121-[cfp_fp][148000]Accuracy-Flip: 0.99229+-0.00333 Training: 2022-01-09 04:25:57,121-[cfp_fp][148000]Accuracy-Highest: 0.99271 Training: 2022-01-09 04:26:41,007-[agedb_30][148000]XNorm: 23.122083 Training: 2022-01-09 04:26:41,008-[agedb_30][148000]Accuracy-Flip: 0.98150+-0.00677 Training: 2022-01-09 04:26:41,008-[agedb_30][148000]Accuracy-Highest: 0.98150 Training: 2022-01-09 04:26:48,582-Speed 279.99 samples/sec Loss 3.1586 LearningRate 0.0272 Epoch: 14 Global Step: 148010 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:26:56,092-Speed 5454.84 samples/sec Loss 3.2374 LearningRate 0.0272 Epoch: 14 Global Step: 148020 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:27:03,721-Speed 5370.02 samples/sec Loss 3.1988 LearningRate 0.0272 Epoch: 14 Global Step: 148030 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:27:11,280-Speed 5419.66 samples/sec Loss 3.1977 LearningRate 0.0272 Epoch: 14 Global Step: 148040 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:27:18,796-Speed 5450.55 samples/sec Loss 3.1662 LearningRate 0.0272 Epoch: 14 Global Step: 148050 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:27:26,318-Speed 5445.59 samples/sec Loss 3.2259 LearningRate 0.0272 Epoch: 14 Global Step: 148060 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:27:33,757-Speed 5506.85 samples/sec Loss 3.1468 LearningRate 0.0272 Epoch: 14 Global Step: 148070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:27:41,303-Speed 5428.92 samples/sec Loss 3.1570 LearningRate 0.0272 Epoch: 14 Global Step: 148080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:27:48,820-Speed 5449.42 samples/sec Loss 3.1679 LearningRate 0.0272 Epoch: 14 Global Step: 148090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:27:56,296-Speed 5479.56 samples/sec Loss 3.1545 LearningRate 0.0272 Epoch: 14 Global Step: 148100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:28:03,797-Speed 5461.40 samples/sec Loss 3.1549 LearningRate 0.0272 Epoch: 14 Global Step: 148110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:28:11,243-Speed 5501.85 samples/sec Loss 3.1562 LearningRate 0.0271 Epoch: 14 Global Step: 148120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:28:18,739-Speed 5464.45 samples/sec Loss 3.1950 LearningRate 0.0271 Epoch: 14 Global Step: 148130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:28:26,322-Speed 5402.43 samples/sec Loss 3.1622 LearningRate 0.0271 Epoch: 14 Global Step: 148140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:28:33,998-Speed 5336.75 samples/sec Loss 3.1823 LearningRate 0.0271 Epoch: 14 Global Step: 148150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:28:41,443-Speed 5502.48 samples/sec Loss 3.1342 LearningRate 0.0271 Epoch: 14 Global Step: 148160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:28:48,984-Speed 5432.38 samples/sec Loss 3.1938 LearningRate 0.0271 Epoch: 14 Global Step: 148170 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-09 04:28:56,403-Speed 5522.00 samples/sec Loss 3.1769 LearningRate 0.0271 Epoch: 14 Global Step: 148180 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-09 04:29:03,868-Speed 5487.96 samples/sec Loss 3.1954 LearningRate 0.0271 Epoch: 14 Global Step: 148190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:29:11,340-Speed 5481.94 samples/sec Loss 3.1845 LearningRate 0.0271 Epoch: 14 Global Step: 148200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:29:18,853-Speed 5452.83 samples/sec Loss 3.1745 LearningRate 0.0271 Epoch: 14 Global Step: 148210 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:29:26,272-Speed 5521.47 samples/sec Loss 3.1972 LearningRate 0.0271 Epoch: 14 Global Step: 148220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:29:33,780-Speed 5456.62 samples/sec Loss 3.1978 LearningRate 0.0270 Epoch: 14 Global Step: 148230 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:29:41,382-Speed 5388.28 samples/sec Loss 3.1679 LearningRate 0.0270 Epoch: 14 Global Step: 148240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:29:48,911-Speed 5441.21 samples/sec Loss 3.2175 LearningRate 0.0270 Epoch: 14 Global Step: 148250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:29:56,430-Speed 5448.64 samples/sec Loss 3.1814 LearningRate 0.0270 Epoch: 14 Global Step: 148260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:30:03,903-Speed 5481.44 samples/sec Loss 3.1460 LearningRate 0.0270 Epoch: 14 Global Step: 148270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:30:11,433-Speed 5440.20 samples/sec Loss 3.2015 LearningRate 0.0270 Epoch: 14 Global Step: 148280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:30:18,908-Speed 5480.56 samples/sec Loss 3.1495 LearningRate 0.0270 Epoch: 14 Global Step: 148290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:30:26,474-Speed 5414.19 samples/sec Loss 3.1951 LearningRate 0.0270 Epoch: 14 Global Step: 148300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:30:33,935-Speed 5490.95 samples/sec Loss 3.1552 LearningRate 0.0270 Epoch: 14 Global Step: 148310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:30:41,479-Speed 5429.90 samples/sec Loss 3.1424 LearningRate 0.0270 Epoch: 14 Global Step: 148320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:30:49,106-Speed 5371.32 samples/sec Loss 3.1654 LearningRate 0.0270 Epoch: 14 Global Step: 148330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:30:56,531-Speed 5517.00 samples/sec Loss 3.1815 LearningRate 0.0269 Epoch: 14 Global Step: 148340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:31:03,991-Speed 5491.18 samples/sec Loss 3.1822 LearningRate 0.0269 Epoch: 14 Global Step: 148350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:31:11,586-Speed 5394.22 samples/sec Loss 3.1471 LearningRate 0.0269 Epoch: 14 Global Step: 148360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:31:19,002-Speed 5523.15 samples/sec Loss 3.2062 LearningRate 0.0269 Epoch: 14 Global Step: 148370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:31:26,470-Speed 5485.97 samples/sec Loss 3.2147 LearningRate 0.0269 Epoch: 14 Global Step: 148380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:31:33,952-Speed 5474.87 samples/sec Loss 3.1807 LearningRate 0.0269 Epoch: 14 Global Step: 148390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:31:41,449-Speed 5464.24 samples/sec Loss 3.1914 LearningRate 0.0269 Epoch: 14 Global Step: 148400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:31:48,912-Speed 5489.72 samples/sec Loss 3.1464 LearningRate 0.0269 Epoch: 14 Global Step: 148410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:31:56,465-Speed 5423.52 samples/sec Loss 3.1286 LearningRate 0.0269 Epoch: 14 Global Step: 148420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:32:04,022-Speed 5420.54 samples/sec Loss 3.1671 LearningRate 0.0269 Epoch: 14 Global Step: 148430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:32:11,578-Speed 5421.63 samples/sec Loss 3.1613 LearningRate 0.0269 Epoch: 14 Global Step: 148440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-09 04:32:19,037-Speed 5492.27 samples/sec Loss 3.1407 LearningRate 0.0268 Epoch: 14 Global Step: 148450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:32:26,549-Speed 5453.05 samples/sec Loss 3.1510 LearningRate 0.0268 Epoch: 14 Global Step: 148460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-09 04:32:34,018-Speed 5484.61 samples/sec Loss 3.1396 LearningRate 0.0268 Epoch: 14 Global Step: 148470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:32:41,500-Speed 5475.18 samples/sec Loss 3.1706 LearningRate 0.0268 Epoch: 14 Global Step: 148480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:32:48,947-Speed 5501.10 samples/sec Loss 3.1487 LearningRate 0.0268 Epoch: 14 Global Step: 148490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:32:56,518-Speed 5410.66 samples/sec Loss 3.1265 LearningRate 0.0268 Epoch: 14 Global Step: 148500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:33:04,025-Speed 5472.00 samples/sec Loss 3.1718 LearningRate 0.0268 Epoch: 14 Global Step: 148510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:33:11,477-Speed 5497.38 samples/sec Loss 3.1728 LearningRate 0.0268 Epoch: 14 Global Step: 148520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:33:18,970-Speed 5467.23 samples/sec Loss 3.1281 LearningRate 0.0268 Epoch: 14 Global Step: 148530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:33:26,498-Speed 5441.50 samples/sec Loss 3.1848 LearningRate 0.0268 Epoch: 14 Global Step: 148540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:33:34,150-Speed 5353.63 samples/sec Loss 3.1854 LearningRate 0.0268 Epoch: 14 Global Step: 148550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:33:41,754-Speed 5387.17 samples/sec Loss 3.1117 LearningRate 0.0267 Epoch: 14 Global Step: 148560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:33:49,343-Speed 5398.53 samples/sec Loss 3.1595 LearningRate 0.0267 Epoch: 14 Global Step: 148570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:33:56,847-Speed 5459.03 samples/sec Loss 3.1756 LearningRate 0.0267 Epoch: 14 Global Step: 148580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:34:04,397-Speed 5425.97 samples/sec Loss 3.1288 LearningRate 0.0267 Epoch: 14 Global Step: 148590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:34:11,886-Speed 5470.25 samples/sec Loss 3.0976 LearningRate 0.0267 Epoch: 14 Global Step: 148600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:34:19,365-Speed 5477.67 samples/sec Loss 3.1288 LearningRate 0.0267 Epoch: 14 Global Step: 148610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:34:26,890-Speed 5443.84 samples/sec Loss 3.1687 LearningRate 0.0267 Epoch: 14 Global Step: 148620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:34:34,347-Speed 5493.77 samples/sec Loss 3.1520 LearningRate 0.0267 Epoch: 14 Global Step: 148630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:34:41,936-Speed 5397.86 samples/sec Loss 3.1390 LearningRate 0.0267 Epoch: 14 Global Step: 148640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:34:49,461-Speed 5443.59 samples/sec Loss 3.2110 LearningRate 0.0267 Epoch: 14 Global Step: 148650 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:34:57,001-Speed 5433.80 samples/sec Loss 3.1736 LearningRate 0.0267 Epoch: 14 Global Step: 148660 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:35:04,442-Speed 5505.18 samples/sec Loss 3.1353 LearningRate 0.0266 Epoch: 14 Global Step: 148670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:35:11,953-Speed 5454.16 samples/sec Loss 3.1034 LearningRate 0.0266 Epoch: 14 Global Step: 148680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:35:19,527-Speed 5408.49 samples/sec Loss 3.1635 LearningRate 0.0266 Epoch: 14 Global Step: 148690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:35:26,993-Speed 5487.01 samples/sec Loss 3.1312 LearningRate 0.0266 Epoch: 14 Global Step: 148700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:35:34,543-Speed 5426.36 samples/sec Loss 3.1461 LearningRate 0.0266 Epoch: 14 Global Step: 148710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:35:42,030-Speed 5471.18 samples/sec Loss 3.1632 LearningRate 0.0266 Epoch: 14 Global Step: 148720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:35:49,506-Speed 5479.57 samples/sec Loss 3.1531 LearningRate 0.0266 Epoch: 14 Global Step: 148730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:35:57,132-Speed 5372.32 samples/sec Loss 3.1587 LearningRate 0.0266 Epoch: 14 Global Step: 148740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:36:04,634-Speed 5460.42 samples/sec Loss 3.1827 LearningRate 0.0266 Epoch: 14 Global Step: 148750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:36:12,127-Speed 5467.54 samples/sec Loss 3.1179 LearningRate 0.0266 Epoch: 14 Global Step: 148760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:36:19,611-Speed 5473.55 samples/sec Loss 3.1606 LearningRate 0.0266 Epoch: 14 Global Step: 148770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:36:27,125-Speed 5452.20 samples/sec Loss 3.1507 LearningRate 0.0265 Epoch: 14 Global Step: 148780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:36:34,635-Speed 5454.50 samples/sec Loss 3.1970 LearningRate 0.0265 Epoch: 14 Global Step: 148790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:36:42,115-Speed 5477.28 samples/sec Loss 3.1699 LearningRate 0.0265 Epoch: 14 Global Step: 148800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:36:49,620-Speed 5458.31 samples/sec Loss 3.1318 LearningRate 0.0265 Epoch: 14 Global Step: 148810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:36:57,152-Speed 5439.08 samples/sec Loss 3.1218 LearningRate 0.0265 Epoch: 14 Global Step: 148820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:37:04,656-Speed 5459.32 samples/sec Loss 3.1181 LearningRate 0.0265 Epoch: 14 Global Step: 148830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:37:12,184-Speed 5441.54 samples/sec Loss 3.1332 LearningRate 0.0265 Epoch: 14 Global Step: 148840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:37:19,673-Speed 5470.44 samples/sec Loss 3.1166 LearningRate 0.0265 Epoch: 14 Global Step: 148850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:37:27,157-Speed 5473.18 samples/sec Loss 3.1184 LearningRate 0.0265 Epoch: 14 Global Step: 148860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:37:34,626-Speed 5485.12 samples/sec Loss 3.1249 LearningRate 0.0265 Epoch: 14 Global Step: 148870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:37:42,130-Speed 5459.42 samples/sec Loss 3.0967 LearningRate 0.0265 Epoch: 14 Global Step: 148880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:37:49,708-Speed 5405.88 samples/sec Loss 3.1688 LearningRate 0.0264 Epoch: 14 Global Step: 148890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:37:57,264-Speed 5421.18 samples/sec Loss 3.1222 LearningRate 0.0264 Epoch: 14 Global Step: 148900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:38:04,900-Speed 5365.45 samples/sec Loss 3.1422 LearningRate 0.0264 Epoch: 14 Global Step: 148910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:38:12,450-Speed 5425.65 samples/sec Loss 3.1483 LearningRate 0.0264 Epoch: 14 Global Step: 148920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:38:19,984-Speed 5437.10 samples/sec Loss 3.1241 LearningRate 0.0264 Epoch: 14 Global Step: 148930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:38:27,439-Speed 5495.58 samples/sec Loss 3.1518 LearningRate 0.0264 Epoch: 14 Global Step: 148940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:38:34,931-Speed 5467.76 samples/sec Loss 3.1293 LearningRate 0.0264 Epoch: 14 Global Step: 148950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:38:42,532-Speed 5389.31 samples/sec Loss 3.1261 LearningRate 0.0264 Epoch: 14 Global Step: 148960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:38:50,048-Speed 5450.81 samples/sec Loss 3.1270 LearningRate 0.0264 Epoch: 14 Global Step: 148970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:38:57,728-Speed 5333.89 samples/sec Loss 3.1344 LearningRate 0.0264 Epoch: 14 Global Step: 148980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:39:05,202-Speed 5480.47 samples/sec Loss 3.1331 LearningRate 0.0264 Epoch: 14 Global Step: 148990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:39:12,776-Speed 5409.18 samples/sec Loss 3.1167 LearningRate 0.0263 Epoch: 14 Global Step: 149000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:39:20,300-Speed 5444.14 samples/sec Loss 3.0583 LearningRate 0.0263 Epoch: 14 Global Step: 149010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:39:27,823-Speed 5445.60 samples/sec Loss 3.1577 LearningRate 0.0263 Epoch: 14 Global Step: 149020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:39:35,381-Speed 5420.22 samples/sec Loss 3.1246 LearningRate 0.0263 Epoch: 14 Global Step: 149030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:39:42,920-Speed 5433.58 samples/sec Loss 3.0924 LearningRate 0.0263 Epoch: 14 Global Step: 149040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:39:50,419-Speed 5462.53 samples/sec Loss 3.1168 LearningRate 0.0263 Epoch: 14 Global Step: 149050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:39:57,971-Speed 5425.08 samples/sec Loss 3.0969 LearningRate 0.0263 Epoch: 14 Global Step: 149060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:40:05,520-Speed 5426.45 samples/sec Loss 3.1213 LearningRate 0.0263 Epoch: 14 Global Step: 149070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:40:13,041-Speed 5446.89 samples/sec Loss 3.1291 LearningRate 0.0263 Epoch: 14 Global Step: 149080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:40:20,463-Speed 5519.35 samples/sec Loss 3.1398 LearningRate 0.0263 Epoch: 14 Global Step: 149090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:40:27,984-Speed 5446.92 samples/sec Loss 3.1116 LearningRate 0.0263 Epoch: 14 Global Step: 149100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:40:35,403-Speed 5521.60 samples/sec Loss 3.1303 LearningRate 0.0262 Epoch: 14 Global Step: 149110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:40:42,942-Speed 5434.07 samples/sec Loss 3.1050 LearningRate 0.0262 Epoch: 14 Global Step: 149120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:40:50,486-Speed 5429.70 samples/sec Loss 3.1273 LearningRate 0.0262 Epoch: 14 Global Step: 149130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:40:57,971-Speed 5473.54 samples/sec Loss 3.1280 LearningRate 0.0262 Epoch: 14 Global Step: 149140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:41:05,394-Speed 5518.85 samples/sec Loss 3.1363 LearningRate 0.0262 Epoch: 14 Global Step: 149150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:41:12,910-Speed 5450.20 samples/sec Loss 3.0962 LearningRate 0.0262 Epoch: 14 Global Step: 149160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:41:20,484-Speed 5408.53 samples/sec Loss 3.1245 LearningRate 0.0262 Epoch: 14 Global Step: 149170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:41:27,926-Speed 5505.13 samples/sec Loss 3.1134 LearningRate 0.0262 Epoch: 14 Global Step: 149180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:41:35,447-Speed 5446.51 samples/sec Loss 3.1600 LearningRate 0.0262 Epoch: 14 Global Step: 149190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:41:42,915-Speed 5485.25 samples/sec Loss 3.1058 LearningRate 0.0262 Epoch: 14 Global Step: 149200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:41:50,468-Speed 5424.43 samples/sec Loss 3.1269 LearningRate 0.0262 Epoch: 14 Global Step: 149210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:41:57,969-Speed 5461.13 samples/sec Loss 3.0887 LearningRate 0.0261 Epoch: 14 Global Step: 149220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:42:05,428-Speed 5492.38 samples/sec Loss 3.0943 LearningRate 0.0261 Epoch: 14 Global Step: 149230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:42:12,964-Speed 5435.43 samples/sec Loss 3.1420 LearningRate 0.0261 Epoch: 14 Global Step: 149240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:42:20,430-Speed 5487.13 samples/sec Loss 3.1049 LearningRate 0.0261 Epoch: 14 Global Step: 149250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:42:28,104-Speed 5338.27 samples/sec Loss 3.1205 LearningRate 0.0261 Epoch: 14 Global Step: 149260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:42:35,598-Speed 5466.57 samples/sec Loss 3.1386 LearningRate 0.0261 Epoch: 14 Global Step: 149270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:42:43,206-Speed 5384.55 samples/sec Loss 3.1404 LearningRate 0.0261 Epoch: 14 Global Step: 149280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:42:50,780-Speed 5408.79 samples/sec Loss 3.0897 LearningRate 0.0261 Epoch: 14 Global Step: 149290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:42:58,266-Speed 5471.92 samples/sec Loss 3.1276 LearningRate 0.0261 Epoch: 14 Global Step: 149300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:43:05,832-Speed 5415.06 samples/sec Loss 3.1310 LearningRate 0.0261 Epoch: 14 Global Step: 149310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:43:13,396-Speed 5415.74 samples/sec Loss 3.1199 LearningRate 0.0261 Epoch: 14 Global Step: 149320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:43:20,893-Speed 5463.92 samples/sec Loss 3.1088 LearningRate 0.0260 Epoch: 14 Global Step: 149330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:43:28,499-Speed 5386.03 samples/sec Loss 3.0875 LearningRate 0.0260 Epoch: 14 Global Step: 149340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:43:36,011-Speed 5453.60 samples/sec Loss 3.0945 LearningRate 0.0260 Epoch: 14 Global Step: 149350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:43:43,599-Speed 5398.19 samples/sec Loss 3.1238 LearningRate 0.0260 Epoch: 14 Global Step: 149360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:43:51,089-Speed 5469.41 samples/sec Loss 3.1467 LearningRate 0.0260 Epoch: 14 Global Step: 149370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:43:58,610-Speed 5446.47 samples/sec Loss 3.1536 LearningRate 0.0260 Epoch: 14 Global Step: 149380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:44:06,189-Speed 5405.22 samples/sec Loss 3.1121 LearningRate 0.0260 Epoch: 14 Global Step: 149390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:44:13,724-Speed 5437.12 samples/sec Loss 3.0825 LearningRate 0.0260 Epoch: 14 Global Step: 149400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-09 04:44:21,235-Speed 5453.20 samples/sec Loss 3.0867 LearningRate 0.0260 Epoch: 14 Global Step: 149410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:44:28,780-Speed 5429.55 samples/sec Loss 3.0950 LearningRate 0.0260 Epoch: 14 Global Step: 149420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:44:36,304-Speed 5445.18 samples/sec Loss 3.0877 LearningRate 0.0260 Epoch: 14 Global Step: 149430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:44:43,789-Speed 5473.12 samples/sec Loss 3.0989 LearningRate 0.0259 Epoch: 14 Global Step: 149440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:44:51,316-Speed 5442.04 samples/sec Loss 3.0866 LearningRate 0.0259 Epoch: 14 Global Step: 149450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:44:58,841-Speed 5443.26 samples/sec Loss 3.1097 LearningRate 0.0259 Epoch: 14 Global Step: 149460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:45:06,379-Speed 5435.16 samples/sec Loss 3.1168 LearningRate 0.0259 Epoch: 14 Global Step: 149470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:45:13,942-Speed 5416.45 samples/sec Loss 3.0983 LearningRate 0.0259 Epoch: 14 Global Step: 149480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:45:21,454-Speed 5453.17 samples/sec Loss 3.0872 LearningRate 0.0259 Epoch: 14 Global Step: 149490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:45:28,959-Speed 5458.32 samples/sec Loss 3.0907 LearningRate 0.0259 Epoch: 14 Global Step: 149500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:45:36,507-Speed 5427.99 samples/sec Loss 3.0796 LearningRate 0.0259 Epoch: 14 Global Step: 149510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:45:44,050-Speed 5430.98 samples/sec Loss 3.0835 LearningRate 0.0259 Epoch: 14 Global Step: 149520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:45:51,633-Speed 5401.70 samples/sec Loss 3.1351 LearningRate 0.0259 Epoch: 14 Global Step: 149530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:45:59,147-Speed 5452.17 samples/sec Loss 3.1217 LearningRate 0.0259 Epoch: 14 Global Step: 149540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:46:06,637-Speed 5469.88 samples/sec Loss 3.1181 LearningRate 0.0258 Epoch: 14 Global Step: 149550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:46:14,178-Speed 5432.02 samples/sec Loss 3.0910 LearningRate 0.0258 Epoch: 14 Global Step: 149560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:46:21,611-Speed 5511.36 samples/sec Loss 3.1088 LearningRate 0.0258 Epoch: 14 Global Step: 149570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:46:29,108-Speed 5463.85 samples/sec Loss 3.1216 LearningRate 0.0258 Epoch: 14 Global Step: 149580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:46:36,614-Speed 5458.42 samples/sec Loss 3.0517 LearningRate 0.0258 Epoch: 14 Global Step: 149590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:46:44,173-Speed 5419.63 samples/sec Loss 3.1067 LearningRate 0.0258 Epoch: 14 Global Step: 149600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:46:51,612-Speed 5506.89 samples/sec Loss 3.0836 LearningRate 0.0258 Epoch: 14 Global Step: 149610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:46:59,118-Speed 5457.05 samples/sec Loss 3.1149 LearningRate 0.0258 Epoch: 14 Global Step: 149620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:47:06,605-Speed 5472.14 samples/sec Loss 3.0769 LearningRate 0.0258 Epoch: 14 Global Step: 149630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:47:14,136-Speed 5439.22 samples/sec Loss 3.1122 LearningRate 0.0258 Epoch: 14 Global Step: 149640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:47:21,627-Speed 5468.88 samples/sec Loss 3.0800 LearningRate 0.0258 Epoch: 14 Global Step: 149650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:47:29,085-Speed 5492.69 samples/sec Loss 3.1079 LearningRate 0.0258 Epoch: 14 Global Step: 149660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:47:36,550-Speed 5488.04 samples/sec Loss 3.0645 LearningRate 0.0257 Epoch: 14 Global Step: 149670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:47:44,003-Speed 5496.56 samples/sec Loss 3.1258 LearningRate 0.0257 Epoch: 14 Global Step: 149680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:47:51,520-Speed 5449.75 samples/sec Loss 3.0993 LearningRate 0.0257 Epoch: 14 Global Step: 149690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:47:59,008-Speed 5470.64 samples/sec Loss 3.0804 LearningRate 0.0257 Epoch: 14 Global Step: 149700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:48:06,578-Speed 5411.77 samples/sec Loss 3.0748 LearningRate 0.0257 Epoch: 14 Global Step: 149710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:48:14,135-Speed 5421.43 samples/sec Loss 3.0703 LearningRate 0.0257 Epoch: 14 Global Step: 149720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:48:21,567-Speed 5511.73 samples/sec Loss 3.0930 LearningRate 0.0257 Epoch: 14 Global Step: 149730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:48:29,096-Speed 5440.16 samples/sec Loss 3.0914 LearningRate 0.0257 Epoch: 14 Global Step: 149740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:48:36,652-Speed 5437.00 samples/sec Loss 3.0484 LearningRate 0.0257 Epoch: 14 Global Step: 149750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:48:46,257-Speed 5634.80 samples/sec Loss 3.0867 LearningRate 0.0257 Epoch: 14 Global Step: 149760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:48:53,799-Speed 5431.87 samples/sec Loss 3.1001 LearningRate 0.0257 Epoch: 14 Global Step: 149770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:49:01,307-Speed 5455.93 samples/sec Loss 3.0437 LearningRate 0.0256 Epoch: 14 Global Step: 149780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:49:08,735-Speed 5515.74 samples/sec Loss 3.1309 LearningRate 0.0256 Epoch: 14 Global Step: 149790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:49:16,309-Speed 5408.31 samples/sec Loss 3.0607 LearningRate 0.0256 Epoch: 14 Global Step: 149800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:49:23,985-Speed 5337.00 samples/sec Loss 3.0474 LearningRate 0.0256 Epoch: 14 Global Step: 149810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:49:31,559-Speed 5408.75 samples/sec Loss 3.0946 LearningRate 0.0256 Epoch: 14 Global Step: 149820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:49:39,057-Speed 5463.93 samples/sec Loss 3.1186 LearningRate 0.0256 Epoch: 14 Global Step: 149830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:49:46,638-Speed 5403.48 samples/sec Loss 3.1036 LearningRate 0.0256 Epoch: 14 Global Step: 149840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:49:54,138-Speed 5462.30 samples/sec Loss 3.0799 LearningRate 0.0256 Epoch: 14 Global Step: 149850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:50:01,722-Speed 5401.74 samples/sec Loss 3.0784 LearningRate 0.0256 Epoch: 14 Global Step: 149860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:50:09,169-Speed 5500.49 samples/sec Loss 3.1115 LearningRate 0.0256 Epoch: 14 Global Step: 149870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:50:16,589-Speed 5521.42 samples/sec Loss 3.0575 LearningRate 0.0256 Epoch: 14 Global Step: 149880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:50:24,111-Speed 5445.86 samples/sec Loss 3.0867 LearningRate 0.0255 Epoch: 14 Global Step: 149890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:50:31,668-Speed 5420.78 samples/sec Loss 3.0571 LearningRate 0.0255 Epoch: 14 Global Step: 149900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:50:39,197-Speed 5440.99 samples/sec Loss 3.0951 LearningRate 0.0255 Epoch: 14 Global Step: 149910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:50:46,833-Speed 5365.32 samples/sec Loss 3.0739 LearningRate 0.0255 Epoch: 14 Global Step: 149920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:50:54,384-Speed 5424.50 samples/sec Loss 3.0951 LearningRate 0.0255 Epoch: 14 Global Step: 149930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:51:01,890-Speed 5457.48 samples/sec Loss 3.0795 LearningRate 0.0255 Epoch: 14 Global Step: 149940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:51:09,445-Speed 5422.93 samples/sec Loss 3.0919 LearningRate 0.0255 Epoch: 14 Global Step: 149950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:51:16,975-Speed 5440.31 samples/sec Loss 3.1340 LearningRate 0.0255 Epoch: 14 Global Step: 149960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:51:24,561-Speed 5399.88 samples/sec Loss 3.0592 LearningRate 0.0255 Epoch: 14 Global Step: 149970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:51:32,059-Speed 5463.28 samples/sec Loss 3.0804 LearningRate 0.0255 Epoch: 14 Global Step: 149980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:51:39,584-Speed 5444.07 samples/sec Loss 3.0704 LearningRate 0.0255 Epoch: 14 Global Step: 149990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:51:47,021-Speed 5508.37 samples/sec Loss 3.0772 LearningRate 0.0254 Epoch: 14 Global Step: 150000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:52:30,981-[lfw][150000]XNorm: 23.833620 Training: 2022-01-09 04:52:30,981-[lfw][150000]Accuracy-Flip: 0.99783+-0.00289 Training: 2022-01-09 04:52:30,982-[lfw][150000]Accuracy-Highest: 0.99817 Training: 2022-01-09 04:53:22,280-[cfp_fp][150000]XNorm: 22.481064 Training: 2022-01-09 04:53:22,281-[cfp_fp][150000]Accuracy-Flip: 0.99314+-0.00360 Training: 2022-01-09 04:53:22,281-[cfp_fp][150000]Accuracy-Highest: 0.99314 Training: 2022-01-09 04:54:06,466-[agedb_30][150000]XNorm: 23.921558 Training: 2022-01-09 04:54:06,467-[agedb_30][150000]Accuracy-Flip: 0.98017+-0.00790 Training: 2022-01-09 04:54:06,468-[agedb_30][150000]Accuracy-Highest: 0.98150 Training: 2022-01-09 04:54:14,083-Speed 278.52 samples/sec Loss 3.0471 LearningRate 0.0254 Epoch: 14 Global Step: 150010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:54:21,553-Speed 5484.20 samples/sec Loss 3.0723 LearningRate 0.0254 Epoch: 14 Global Step: 150020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:54:29,038-Speed 5473.18 samples/sec Loss 3.0821 LearningRate 0.0254 Epoch: 14 Global Step: 150030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:54:36,452-Speed 5525.18 samples/sec Loss 3.0581 LearningRate 0.0254 Epoch: 14 Global Step: 150040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:54:43,942-Speed 5468.92 samples/sec Loss 3.0515 LearningRate 0.0254 Epoch: 14 Global Step: 150050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:54:51,422-Speed 5477.00 samples/sec Loss 3.0960 LearningRate 0.0254 Epoch: 14 Global Step: 150060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:54:58,995-Speed 5409.36 samples/sec Loss 3.0715 LearningRate 0.0254 Epoch: 14 Global Step: 150070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:55:06,509-Speed 5451.81 samples/sec Loss 3.1050 LearningRate 0.0254 Epoch: 14 Global Step: 150080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:55:13,973-Speed 5488.38 samples/sec Loss 3.0532 LearningRate 0.0254 Epoch: 14 Global Step: 150090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:55:21,394-Speed 5520.31 samples/sec Loss 3.0820 LearningRate 0.0254 Epoch: 14 Global Step: 150100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:55:28,967-Speed 5410.27 samples/sec Loss 3.0679 LearningRate 0.0254 Epoch: 14 Global Step: 150110 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 04:55:36,614-Speed 5356.59 samples/sec Loss 3.0818 LearningRate 0.0253 Epoch: 14 Global Step: 150120 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 04:55:44,111-Speed 5464.20 samples/sec Loss 3.1001 LearningRate 0.0253 Epoch: 14 Global Step: 150130 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 04:55:51,706-Speed 5393.46 samples/sec Loss 3.0691 LearningRate 0.0253 Epoch: 14 Global Step: 150140 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 04:55:59,159-Speed 5497.19 samples/sec Loss 3.0406 LearningRate 0.0253 Epoch: 14 Global Step: 150150 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 04:56:06,660-Speed 5460.99 samples/sec Loss 3.1056 LearningRate 0.0253 Epoch: 14 Global Step: 150160 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 04:56:14,163-Speed 5459.23 samples/sec Loss 3.1224 LearningRate 0.0253 Epoch: 14 Global Step: 150170 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 04:56:21,674-Speed 5454.42 samples/sec Loss 3.0772 LearningRate 0.0253 Epoch: 14 Global Step: 150180 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 04:56:29,208-Speed 5437.93 samples/sec Loss 3.0622 LearningRate 0.0253 Epoch: 14 Global Step: 150190 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 04:56:36,670-Speed 5489.69 samples/sec Loss 3.0940 LearningRate 0.0253 Epoch: 14 Global Step: 150200 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 04:56:44,260-Speed 5397.20 samples/sec Loss 3.0641 LearningRate 0.0253 Epoch: 14 Global Step: 150210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:56:51,798-Speed 5434.31 samples/sec Loss 3.1085 LearningRate 0.0253 Epoch: 14 Global Step: 150220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:56:59,326-Speed 5441.89 samples/sec Loss 3.0595 LearningRate 0.0252 Epoch: 14 Global Step: 150230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:57:06,859-Speed 5438.46 samples/sec Loss 3.0617 LearningRate 0.0252 Epoch: 14 Global Step: 150240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:57:14,359-Speed 5461.76 samples/sec Loss 3.0449 LearningRate 0.0252 Epoch: 14 Global Step: 150250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:57:21,968-Speed 5383.56 samples/sec Loss 3.0767 LearningRate 0.0252 Epoch: 14 Global Step: 150260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:57:29,466-Speed 5463.28 samples/sec Loss 3.0892 LearningRate 0.0252 Epoch: 14 Global Step: 150270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:57:36,968-Speed 5461.07 samples/sec Loss 3.0478 LearningRate 0.0252 Epoch: 14 Global Step: 150280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:57:44,544-Speed 5406.86 samples/sec Loss 3.0734 LearningRate 0.0252 Epoch: 14 Global Step: 150290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:57:52,118-Speed 5408.44 samples/sec Loss 3.0408 LearningRate 0.0252 Epoch: 14 Global Step: 150300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:57:59,628-Speed 5454.48 samples/sec Loss 3.0705 LearningRate 0.0252 Epoch: 14 Global Step: 150310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:58:07,077-Speed 5500.50 samples/sec Loss 3.0836 LearningRate 0.0252 Epoch: 14 Global Step: 150320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:58:14,552-Speed 5479.89 samples/sec Loss 3.0720 LearningRate 0.0252 Epoch: 14 Global Step: 150330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:58:22,039-Speed 5471.23 samples/sec Loss 3.0432 LearningRate 0.0251 Epoch: 14 Global Step: 150340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:58:29,529-Speed 5469.44 samples/sec Loss 3.0507 LearningRate 0.0251 Epoch: 14 Global Step: 150350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:58:36,976-Speed 5501.84 samples/sec Loss 3.0633 LearningRate 0.0251 Epoch: 14 Global Step: 150360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:58:44,416-Speed 5505.30 samples/sec Loss 3.0978 LearningRate 0.0251 Epoch: 14 Global Step: 150370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:58:51,978-Speed 5417.98 samples/sec Loss 3.0887 LearningRate 0.0251 Epoch: 14 Global Step: 150380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:58:59,575-Speed 5392.26 samples/sec Loss 3.0450 LearningRate 0.0251 Epoch: 14 Global Step: 150390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:59:07,196-Speed 5375.69 samples/sec Loss 3.0520 LearningRate 0.0251 Epoch: 14 Global Step: 150400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:59:14,671-Speed 5480.53 samples/sec Loss 3.0480 LearningRate 0.0251 Epoch: 14 Global Step: 150410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:59:22,190-Speed 5448.04 samples/sec Loss 3.0698 LearningRate 0.0251 Epoch: 14 Global Step: 150420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:59:29,665-Speed 5480.11 samples/sec Loss 3.0481 LearningRate 0.0251 Epoch: 14 Global Step: 150430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:59:37,172-Speed 5457.56 samples/sec Loss 3.0580 LearningRate 0.0251 Epoch: 14 Global Step: 150440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:59:44,721-Speed 5426.19 samples/sec Loss 3.0338 LearningRate 0.0251 Epoch: 14 Global Step: 150450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 04:59:52,178-Speed 5494.07 samples/sec Loss 3.0266 LearningRate 0.0250 Epoch: 14 Global Step: 150460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 04:59:59,711-Speed 5437.93 samples/sec Loss 3.0656 LearningRate 0.0250 Epoch: 14 Global Step: 150470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:00:07,274-Speed 5417.10 samples/sec Loss 3.0416 LearningRate 0.0250 Epoch: 14 Global Step: 150480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:00:14,743-Speed 5484.23 samples/sec Loss 3.0278 LearningRate 0.0250 Epoch: 14 Global Step: 150490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:00:22,227-Speed 5473.87 samples/sec Loss 3.0773 LearningRate 0.0250 Epoch: 14 Global Step: 150500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:00:29,750-Speed 5445.75 samples/sec Loss 3.0703 LearningRate 0.0250 Epoch: 14 Global Step: 150510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:00:37,256-Speed 5457.31 samples/sec Loss 3.0128 LearningRate 0.0250 Epoch: 14 Global Step: 150520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:00:44,695-Speed 5507.08 samples/sec Loss 3.0191 LearningRate 0.0250 Epoch: 14 Global Step: 150530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:00:52,171-Speed 5479.42 samples/sec Loss 3.0195 LearningRate 0.0250 Epoch: 14 Global Step: 150540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:00:59,634-Speed 5489.57 samples/sec Loss 3.1068 LearningRate 0.0250 Epoch: 14 Global Step: 150550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:01:07,164-Speed 5440.18 samples/sec Loss 3.0125 LearningRate 0.0250 Epoch: 14 Global Step: 150560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:01:14,670-Speed 5457.44 samples/sec Loss 3.0257 LearningRate 0.0249 Epoch: 14 Global Step: 150570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:01:22,092-Speed 5520.28 samples/sec Loss 3.0528 LearningRate 0.0249 Epoch: 14 Global Step: 150580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:01:29,554-Speed 5489.62 samples/sec Loss 3.0094 LearningRate 0.0249 Epoch: 14 Global Step: 150590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:01:36,978-Speed 5518.04 samples/sec Loss 3.0392 LearningRate 0.0249 Epoch: 14 Global Step: 150600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:01:44,453-Speed 5480.25 samples/sec Loss 3.0487 LearningRate 0.0249 Epoch: 14 Global Step: 150610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:01:51,912-Speed 5492.44 samples/sec Loss 3.0445 LearningRate 0.0249 Epoch: 14 Global Step: 150620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:01:59,329-Speed 5523.20 samples/sec Loss 3.0584 LearningRate 0.0249 Epoch: 14 Global Step: 150630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:02:06,790-Speed 5490.43 samples/sec Loss 3.0636 LearningRate 0.0249 Epoch: 14 Global Step: 150640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:02:14,258-Speed 5485.39 samples/sec Loss 3.0849 LearningRate 0.0249 Epoch: 14 Global Step: 150650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:02:21,739-Speed 5476.58 samples/sec Loss 3.0561 LearningRate 0.0249 Epoch: 14 Global Step: 150660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:02:29,213-Speed 5480.87 samples/sec Loss 3.0264 LearningRate 0.0249 Epoch: 14 Global Step: 150670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:02:36,741-Speed 5441.30 samples/sec Loss 3.0279 LearningRate 0.0248 Epoch: 14 Global Step: 150680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:02:44,209-Speed 5485.63 samples/sec Loss 3.0318 LearningRate 0.0248 Epoch: 14 Global Step: 150690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:02:51,735-Speed 5443.36 samples/sec Loss 3.0355 LearningRate 0.0248 Epoch: 14 Global Step: 150700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:02:59,352-Speed 5378.86 samples/sec Loss 2.9945 LearningRate 0.0248 Epoch: 14 Global Step: 150710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:03:06,819-Speed 5486.13 samples/sec Loss 3.0525 LearningRate 0.0248 Epoch: 14 Global Step: 150720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:03:14,367-Speed 5427.30 samples/sec Loss 3.0386 LearningRate 0.0248 Epoch: 14 Global Step: 150730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:03:21,903-Speed 5436.22 samples/sec Loss 3.0357 LearningRate 0.0248 Epoch: 14 Global Step: 150740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:03:29,488-Speed 5400.55 samples/sec Loss 3.0483 LearningRate 0.0248 Epoch: 14 Global Step: 150750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:03:39,279-Speed 4184.25 samples/sec Loss 3.0581 LearningRate 0.0248 Epoch: 14 Global Step: 150760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:03:46,834-Speed 5422.47 samples/sec Loss 3.0384 LearningRate 0.0248 Epoch: 14 Global Step: 150770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:03:54,484-Speed 5354.81 samples/sec Loss 3.0704 LearningRate 0.0248 Epoch: 14 Global Step: 150780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:04:01,905-Speed 5519.93 samples/sec Loss 3.0164 LearningRate 0.0248 Epoch: 14 Global Step: 150790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:04:09,382-Speed 5479.11 samples/sec Loss 3.0494 LearningRate 0.0247 Epoch: 14 Global Step: 150800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:04:16,803-Speed 5520.52 samples/sec Loss 3.0084 LearningRate 0.0247 Epoch: 14 Global Step: 150810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:04:24,276-Speed 5481.52 samples/sec Loss 3.0442 LearningRate 0.0247 Epoch: 14 Global Step: 150820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:04:31,773-Speed 5463.93 samples/sec Loss 3.0267 LearningRate 0.0247 Epoch: 14 Global Step: 150830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:04:39,243-Speed 5483.87 samples/sec Loss 3.0366 LearningRate 0.0247 Epoch: 14 Global Step: 150840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:04:46,707-Speed 5488.45 samples/sec Loss 3.0330 LearningRate 0.0247 Epoch: 14 Global Step: 150850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:04:54,132-Speed 5517.44 samples/sec Loss 3.0689 LearningRate 0.0247 Epoch: 14 Global Step: 150860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:05:01,619-Speed 5470.92 samples/sec Loss 3.0092 LearningRate 0.0247 Epoch: 14 Global Step: 150870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:05:09,098-Speed 5478.06 samples/sec Loss 3.0612 LearningRate 0.0247 Epoch: 14 Global Step: 150880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:05:16,558-Speed 5491.59 samples/sec Loss 3.0089 LearningRate 0.0247 Epoch: 14 Global Step: 150890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:05:24,021-Speed 5488.57 samples/sec Loss 3.0118 LearningRate 0.0247 Epoch: 14 Global Step: 150900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:05:31,442-Speed 5519.95 samples/sec Loss 3.0264 LearningRate 0.0246 Epoch: 14 Global Step: 150910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:05:38,908-Speed 5487.38 samples/sec Loss 3.0512 LearningRate 0.0246 Epoch: 14 Global Step: 150920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:05:46,356-Speed 5500.25 samples/sec Loss 2.9892 LearningRate 0.0246 Epoch: 14 Global Step: 150930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:05:53,807-Speed 5497.71 samples/sec Loss 3.0180 LearningRate 0.0246 Epoch: 14 Global Step: 150940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:06:01,229-Speed 5519.22 samples/sec Loss 3.0507 LearningRate 0.0246 Epoch: 14 Global Step: 150950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:06:08,686-Speed 5493.85 samples/sec Loss 3.0146 LearningRate 0.0246 Epoch: 14 Global Step: 150960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:06:16,137-Speed 5498.20 samples/sec Loss 3.0404 LearningRate 0.0246 Epoch: 14 Global Step: 150970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:06:23,614-Speed 5478.98 samples/sec Loss 3.0008 LearningRate 0.0246 Epoch: 14 Global Step: 150980 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 05:06:31,062-Speed 5499.44 samples/sec Loss 3.0566 LearningRate 0.0246 Epoch: 14 Global Step: 150990 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 05:06:38,630-Speed 5413.35 samples/sec Loss 3.0533 LearningRate 0.0246 Epoch: 14 Global Step: 151000 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 05:06:46,089-Speed 5492.62 samples/sec Loss 3.0326 LearningRate 0.0246 Epoch: 14 Global Step: 151010 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 05:06:53,616-Speed 5442.05 samples/sec Loss 3.0038 LearningRate 0.0246 Epoch: 14 Global Step: 151020 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 05:07:01,061-Speed 5501.99 samples/sec Loss 3.0417 LearningRate 0.0245 Epoch: 14 Global Step: 151030 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 05:07:08,593-Speed 5438.97 samples/sec Loss 3.0429 LearningRate 0.0245 Epoch: 14 Global Step: 151040 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 05:07:16,056-Speed 5489.38 samples/sec Loss 3.0264 LearningRate 0.0245 Epoch: 14 Global Step: 151050 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 05:07:23,515-Speed 5492.08 samples/sec Loss 3.0363 LearningRate 0.0245 Epoch: 14 Global Step: 151060 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 05:07:31,065-Speed 5425.49 samples/sec Loss 3.0154 LearningRate 0.0245 Epoch: 14 Global Step: 151070 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-09 05:07:38,538-Speed 5482.33 samples/sec Loss 3.0209 LearningRate 0.0245 Epoch: 14 Global Step: 151080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:07:46,034-Speed 5465.27 samples/sec Loss 3.0354 LearningRate 0.0245 Epoch: 14 Global Step: 151090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:07:53,535-Speed 5461.31 samples/sec Loss 3.0118 LearningRate 0.0245 Epoch: 14 Global Step: 151100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:08:01,049-Speed 5451.54 samples/sec Loss 3.0124 LearningRate 0.0245 Epoch: 14 Global Step: 151110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:08:08,482-Speed 5511.02 samples/sec Loss 3.0340 LearningRate 0.0245 Epoch: 14 Global Step: 151120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:08:16,031-Speed 5426.46 samples/sec Loss 3.0395 LearningRate 0.0245 Epoch: 14 Global Step: 151130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:08:23,560-Speed 5441.01 samples/sec Loss 3.0412 LearningRate 0.0244 Epoch: 14 Global Step: 151140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:08:31,047-Speed 5471.49 samples/sec Loss 3.0191 LearningRate 0.0244 Epoch: 14 Global Step: 151150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:08:38,560-Speed 5452.48 samples/sec Loss 3.0085 LearningRate 0.0244 Epoch: 14 Global Step: 151160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:08:46,104-Speed 5430.84 samples/sec Loss 3.0401 LearningRate 0.0244 Epoch: 14 Global Step: 151170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:08:53,630-Speed 5443.02 samples/sec Loss 2.9932 LearningRate 0.0244 Epoch: 14 Global Step: 151180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:09:01,124-Speed 5466.05 samples/sec Loss 3.0023 LearningRate 0.0244 Epoch: 14 Global Step: 151190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:09:08,629-Speed 5457.98 samples/sec Loss 3.0364 LearningRate 0.0244 Epoch: 14 Global Step: 151200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:09:16,178-Speed 5426.99 samples/sec Loss 3.0410 LearningRate 0.0244 Epoch: 14 Global Step: 151210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:09:23,686-Speed 5456.20 samples/sec Loss 3.0293 LearningRate 0.0244 Epoch: 14 Global Step: 151220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:09:31,189-Speed 5459.90 samples/sec Loss 3.0196 LearningRate 0.0244 Epoch: 14 Global Step: 151230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:09:38,720-Speed 5439.70 samples/sec Loss 2.9674 LearningRate 0.0244 Epoch: 14 Global Step: 151240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:09:46,221-Speed 5461.36 samples/sec Loss 3.0328 LearningRate 0.0244 Epoch: 14 Global Step: 151250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:09:53,788-Speed 5413.96 samples/sec Loss 3.0319 LearningRate 0.0243 Epoch: 14 Global Step: 151260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:10:01,365-Speed 5406.49 samples/sec Loss 2.9892 LearningRate 0.0243 Epoch: 14 Global Step: 151270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:10:08,860-Speed 5465.88 samples/sec Loss 3.0233 LearningRate 0.0243 Epoch: 14 Global Step: 151280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:10:16,366-Speed 5457.53 samples/sec Loss 3.0167 LearningRate 0.0243 Epoch: 14 Global Step: 151290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:10:23,764-Speed 5537.59 samples/sec Loss 2.9742 LearningRate 0.0243 Epoch: 14 Global Step: 151300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:10:31,262-Speed 5463.18 samples/sec Loss 3.0258 LearningRate 0.0243 Epoch: 14 Global Step: 151310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:10:38,775-Speed 5452.81 samples/sec Loss 2.9944 LearningRate 0.0243 Epoch: 14 Global Step: 151320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:10:46,381-Speed 5385.67 samples/sec Loss 3.0122 LearningRate 0.0243 Epoch: 14 Global Step: 151330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:10:53,858-Speed 5479.77 samples/sec Loss 3.0327 LearningRate 0.0243 Epoch: 14 Global Step: 151340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:11:01,393-Speed 5436.46 samples/sec Loss 2.9848 LearningRate 0.0243 Epoch: 14 Global Step: 151350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:11:08,862-Speed 5484.20 samples/sec Loss 3.0258 LearningRate 0.0243 Epoch: 14 Global Step: 151360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:11:16,362-Speed 5462.44 samples/sec Loss 3.0417 LearningRate 0.0242 Epoch: 14 Global Step: 151370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:11:24,034-Speed 5340.06 samples/sec Loss 3.0122 LearningRate 0.0242 Epoch: 14 Global Step: 151380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:11:31,527-Speed 5466.40 samples/sec Loss 2.9714 LearningRate 0.0242 Epoch: 14 Global Step: 151390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:11:39,070-Speed 5431.19 samples/sec Loss 3.0040 LearningRate 0.0242 Epoch: 14 Global Step: 151400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:11:46,617-Speed 5427.56 samples/sec Loss 2.9978 LearningRate 0.0242 Epoch: 14 Global Step: 151410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:11:54,253-Speed 5365.06 samples/sec Loss 2.9965 LearningRate 0.0242 Epoch: 14 Global Step: 151420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:12:01,837-Speed 5401.65 samples/sec Loss 3.0393 LearningRate 0.0242 Epoch: 14 Global Step: 151430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:12:09,371-Speed 5437.55 samples/sec Loss 2.9668 LearningRate 0.0242 Epoch: 14 Global Step: 151440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:12:16,843-Speed 5482.47 samples/sec Loss 2.9752 LearningRate 0.0242 Epoch: 14 Global Step: 151450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:12:24,286-Speed 5504.46 samples/sec Loss 2.9716 LearningRate 0.0242 Epoch: 14 Global Step: 151460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:12:31,714-Speed 5514.57 samples/sec Loss 3.0253 LearningRate 0.0242 Epoch: 14 Global Step: 151470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:12:39,195-Speed 5476.12 samples/sec Loss 2.9771 LearningRate 0.0242 Epoch: 14 Global Step: 151480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:12:46,686-Speed 5468.72 samples/sec Loss 3.0202 LearningRate 0.0241 Epoch: 14 Global Step: 151490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:12:54,227-Speed 5432.39 samples/sec Loss 2.9922 LearningRate 0.0241 Epoch: 14 Global Step: 151500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:13:01,784-Speed 5421.08 samples/sec Loss 3.0602 LearningRate 0.0241 Epoch: 14 Global Step: 151510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:13:09,291-Speed 5456.71 samples/sec Loss 3.0142 LearningRate 0.0241 Epoch: 14 Global Step: 151520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:13:16,848-Speed 5420.64 samples/sec Loss 2.9527 LearningRate 0.0241 Epoch: 14 Global Step: 151530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:13:24,398-Speed 5426.43 samples/sec Loss 3.0080 LearningRate 0.0241 Epoch: 14 Global Step: 151540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:13:31,921-Speed 5445.06 samples/sec Loss 2.9860 LearningRate 0.0241 Epoch: 14 Global Step: 151550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:13:39,530-Speed 5383.93 samples/sec Loss 3.0131 LearningRate 0.0241 Epoch: 14 Global Step: 151560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:13:47,036-Speed 5457.35 samples/sec Loss 2.9734 LearningRate 0.0241 Epoch: 14 Global Step: 151570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:13:54,587-Speed 5425.64 samples/sec Loss 2.9889 LearningRate 0.0241 Epoch: 14 Global Step: 151580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:14:02,130-Speed 5430.67 samples/sec Loss 2.9889 LearningRate 0.0241 Epoch: 14 Global Step: 151590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:14:09,649-Speed 5448.53 samples/sec Loss 3.0005 LearningRate 0.0240 Epoch: 14 Global Step: 151600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:14:17,277-Speed 5370.07 samples/sec Loss 2.9998 LearningRate 0.0240 Epoch: 14 Global Step: 151610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:14:24,870-Speed 5394.91 samples/sec Loss 2.9926 LearningRate 0.0240 Epoch: 14 Global Step: 151620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:14:32,460-Speed 5397.83 samples/sec Loss 2.9685 LearningRate 0.0240 Epoch: 14 Global Step: 151630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:14:40,035-Speed 5407.61 samples/sec Loss 2.9784 LearningRate 0.0240 Epoch: 14 Global Step: 151640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:14:47,575-Speed 5432.64 samples/sec Loss 2.9488 LearningRate 0.0240 Epoch: 14 Global Step: 151650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:14:55,093-Speed 5449.66 samples/sec Loss 2.9877 LearningRate 0.0240 Epoch: 14 Global Step: 151660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:15:02,673-Speed 5404.18 samples/sec Loss 2.9734 LearningRate 0.0240 Epoch: 14 Global Step: 151670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:15:10,177-Speed 5459.03 samples/sec Loss 2.9986 LearningRate 0.0240 Epoch: 14 Global Step: 151680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:15:17,721-Speed 5429.76 samples/sec Loss 2.9480 LearningRate 0.0240 Epoch: 14 Global Step: 151690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:15:25,210-Speed 5470.77 samples/sec Loss 2.9631 LearningRate 0.0240 Epoch: 14 Global Step: 151700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:15:32,674-Speed 5487.90 samples/sec Loss 2.9813 LearningRate 0.0240 Epoch: 14 Global Step: 151710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:15:40,197-Speed 5445.71 samples/sec Loss 2.9841 LearningRate 0.0239 Epoch: 14 Global Step: 151720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:15:47,752-Speed 5422.02 samples/sec Loss 3.0099 LearningRate 0.0239 Epoch: 14 Global Step: 151730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:15:55,225-Speed 5481.85 samples/sec Loss 3.0079 LearningRate 0.0239 Epoch: 14 Global Step: 151740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:16:02,770-Speed 5429.73 samples/sec Loss 2.9618 LearningRate 0.0239 Epoch: 14 Global Step: 151750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:16:10,456-Speed 5329.71 samples/sec Loss 2.9662 LearningRate 0.0239 Epoch: 14 Global Step: 151760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:16:17,984-Speed 5441.38 samples/sec Loss 2.9740 LearningRate 0.0239 Epoch: 14 Global Step: 151770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:16:25,492-Speed 5456.38 samples/sec Loss 2.9812 LearningRate 0.0239 Epoch: 14 Global Step: 151780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:16:33,026-Speed 5437.51 samples/sec Loss 3.0048 LearningRate 0.0239 Epoch: 14 Global Step: 151790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:16:40,504-Speed 5477.90 samples/sec Loss 2.9917 LearningRate 0.0239 Epoch: 14 Global Step: 151800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:16:48,063-Speed 5419.55 samples/sec Loss 2.9827 LearningRate 0.0239 Epoch: 14 Global Step: 151810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:16:55,553-Speed 5469.78 samples/sec Loss 2.9719 LearningRate 0.0239 Epoch: 14 Global Step: 151820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:17:03,156-Speed 5388.34 samples/sec Loss 3.0049 LearningRate 0.0239 Epoch: 14 Global Step: 151830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:17:10,800-Speed 5358.86 samples/sec Loss 3.0169 LearningRate 0.0238 Epoch: 14 Global Step: 151840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:17:18,330-Speed 5440.07 samples/sec Loss 3.0167 LearningRate 0.0238 Epoch: 14 Global Step: 151850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:17:25,813-Speed 5474.65 samples/sec Loss 2.9875 LearningRate 0.0238 Epoch: 14 Global Step: 151860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:17:33,345-Speed 5439.04 samples/sec Loss 2.9719 LearningRate 0.0238 Epoch: 14 Global Step: 151870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:17:40,874-Speed 5441.44 samples/sec Loss 2.9589 LearningRate 0.0238 Epoch: 14 Global Step: 151880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:17:48,434-Speed 5418.26 samples/sec Loss 3.0141 LearningRate 0.0238 Epoch: 14 Global Step: 151890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:17:55,967-Speed 5438.07 samples/sec Loss 3.0063 LearningRate 0.0238 Epoch: 14 Global Step: 151900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:18:03,540-Speed 5409.37 samples/sec Loss 2.9870 LearningRate 0.0238 Epoch: 14 Global Step: 151910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:18:11,057-Speed 5449.40 samples/sec Loss 2.9338 LearningRate 0.0238 Epoch: 14 Global Step: 151920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:18:18,584-Speed 5442.47 samples/sec Loss 2.9536 LearningRate 0.0238 Epoch: 14 Global Step: 151930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:18:26,115-Speed 5440.00 samples/sec Loss 2.9510 LearningRate 0.0238 Epoch: 14 Global Step: 151940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:18:33,678-Speed 5416.07 samples/sec Loss 3.0068 LearningRate 0.0237 Epoch: 14 Global Step: 151950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:18:41,204-Speed 5443.59 samples/sec Loss 2.9605 LearningRate 0.0237 Epoch: 14 Global Step: 151960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:18:48,725-Speed 5446.63 samples/sec Loss 2.9945 LearningRate 0.0237 Epoch: 14 Global Step: 151970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:18:56,266-Speed 5432.24 samples/sec Loss 2.9767 LearningRate 0.0237 Epoch: 14 Global Step: 151980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:19:03,860-Speed 5394.85 samples/sec Loss 2.9611 LearningRate 0.0237 Epoch: 14 Global Step: 151990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:19:11,438-Speed 5405.93 samples/sec Loss 2.9764 LearningRate 0.0237 Epoch: 14 Global Step: 152000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:19:55,061-[lfw][152000]XNorm: 23.242325 Training: 2022-01-09 05:19:55,062-[lfw][152000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-09 05:19:55,062-[lfw][152000]Accuracy-Highest: 0.99817 Training: 2022-01-09 05:20:45,917-[cfp_fp][152000]XNorm: 21.870703 Training: 2022-01-09 05:20:45,918-[cfp_fp][152000]Accuracy-Flip: 0.99129+-0.00548 Training: 2022-01-09 05:20:45,919-[cfp_fp][152000]Accuracy-Highest: 0.99314 Training: 2022-01-09 05:21:29,704-[agedb_30][152000]XNorm: 23.497709 Training: 2022-01-09 05:21:29,705-[agedb_30][152000]Accuracy-Flip: 0.98217+-0.00837 Training: 2022-01-09 05:21:29,705-[agedb_30][152000]Accuracy-Highest: 0.98217 Training: 2022-01-09 05:21:37,310-Speed 280.80 samples/sec Loss 2.9679 LearningRate 0.0237 Epoch: 14 Global Step: 152010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:21:44,843-Speed 5438.32 samples/sec Loss 2.9601 LearningRate 0.0237 Epoch: 14 Global Step: 152020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:21:52,479-Speed 5364.51 samples/sec Loss 2.9773 LearningRate 0.0237 Epoch: 14 Global Step: 152030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:22:00,112-Speed 5366.89 samples/sec Loss 2.9684 LearningRate 0.0237 Epoch: 14 Global Step: 152040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:22:07,849-Speed 5295.23 samples/sec Loss 2.9360 LearningRate 0.0237 Epoch: 14 Global Step: 152050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:22:15,634-Speed 5261.27 samples/sec Loss 2.9837 LearningRate 0.0237 Epoch: 14 Global Step: 152060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:22:23,377-Speed 5291.05 samples/sec Loss 2.9738 LearningRate 0.0236 Epoch: 14 Global Step: 152070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:22:31,125-Speed 5287.70 samples/sec Loss 2.9405 LearningRate 0.0236 Epoch: 14 Global Step: 152080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:22:38,729-Speed 5386.76 samples/sec Loss 2.9603 LearningRate 0.0236 Epoch: 14 Global Step: 152090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:22:46,247-Speed 5449.14 samples/sec Loss 2.9790 LearningRate 0.0236 Epoch: 14 Global Step: 152100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:22:53,787-Speed 5432.85 samples/sec Loss 2.9789 LearningRate 0.0236 Epoch: 14 Global Step: 152110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:23:01,289-Speed 5460.53 samples/sec Loss 2.9021 LearningRate 0.0236 Epoch: 14 Global Step: 152120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:23:08,815-Speed 5443.38 samples/sec Loss 2.9383 LearningRate 0.0236 Epoch: 14 Global Step: 152130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:23:16,412-Speed 5391.83 samples/sec Loss 2.9524 LearningRate 0.0236 Epoch: 14 Global Step: 152140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:23:23,988-Speed 5407.46 samples/sec Loss 2.9324 LearningRate 0.0236 Epoch: 14 Global Step: 152150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:23:31,518-Speed 5440.74 samples/sec Loss 2.9650 LearningRate 0.0236 Epoch: 14 Global Step: 152160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:23:39,096-Speed 5405.77 samples/sec Loss 2.9489 LearningRate 0.0236 Epoch: 14 Global Step: 152170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:23:46,578-Speed 5474.99 samples/sec Loss 2.9507 LearningRate 0.0236 Epoch: 14 Global Step: 152180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:23:54,151-Speed 5409.60 samples/sec Loss 2.9701 LearningRate 0.0235 Epoch: 14 Global Step: 152190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:24:01,675-Speed 5444.25 samples/sec Loss 2.9789 LearningRate 0.0235 Epoch: 14 Global Step: 152200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:24:09,242-Speed 5413.66 samples/sec Loss 2.9266 LearningRate 0.0235 Epoch: 14 Global Step: 152210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:24:16,821-Speed 5404.97 samples/sec Loss 2.9645 LearningRate 0.0235 Epoch: 14 Global Step: 152220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:24:24,366-Speed 5429.71 samples/sec Loss 2.9949 LearningRate 0.0235 Epoch: 14 Global Step: 152230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:24:32,023-Speed 5350.60 samples/sec Loss 2.9422 LearningRate 0.0235 Epoch: 14 Global Step: 152240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:24:39,545-Speed 5446.29 samples/sec Loss 2.9371 LearningRate 0.0235 Epoch: 14 Global Step: 152250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:24:47,031-Speed 5472.07 samples/sec Loss 2.9435 LearningRate 0.0235 Epoch: 14 Global Step: 152260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:24:54,558-Speed 5442.40 samples/sec Loss 2.9694 LearningRate 0.0235 Epoch: 14 Global Step: 152270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:25:02,154-Speed 5393.16 samples/sec Loss 2.9316 LearningRate 0.0235 Epoch: 14 Global Step: 152280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:25:09,695-Speed 5432.21 samples/sec Loss 2.9750 LearningRate 0.0235 Epoch: 14 Global Step: 152290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:25:17,279-Speed 5401.95 samples/sec Loss 2.9355 LearningRate 0.0234 Epoch: 14 Global Step: 152300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:25:24,931-Speed 5353.32 samples/sec Loss 2.9483 LearningRate 0.0234 Epoch: 14 Global Step: 152310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:25:32,479-Speed 5427.51 samples/sec Loss 2.9544 LearningRate 0.0234 Epoch: 14 Global Step: 152320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:25:40,011-Speed 5439.38 samples/sec Loss 2.9623 LearningRate 0.0234 Epoch: 14 Global Step: 152330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:25:47,558-Speed 5427.66 samples/sec Loss 2.9340 LearningRate 0.0234 Epoch: 14 Global Step: 152340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:25:55,082-Speed 5444.70 samples/sec Loss 2.9222 LearningRate 0.0234 Epoch: 14 Global Step: 152350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:26:02,654-Speed 5410.17 samples/sec Loss 2.9252 LearningRate 0.0234 Epoch: 14 Global Step: 152360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:26:10,151-Speed 5464.30 samples/sec Loss 2.9371 LearningRate 0.0234 Epoch: 14 Global Step: 152370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:26:17,673-Speed 5445.98 samples/sec Loss 2.9587 LearningRate 0.0234 Epoch: 14 Global Step: 152380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:26:25,222-Speed 5426.69 samples/sec Loss 2.9476 LearningRate 0.0234 Epoch: 14 Global Step: 152390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:26:32,813-Speed 5396.74 samples/sec Loss 2.9145 LearningRate 0.0234 Epoch: 14 Global Step: 152400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:26:40,298-Speed 5473.13 samples/sec Loss 2.9566 LearningRate 0.0234 Epoch: 14 Global Step: 152410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:26:47,761-Speed 5488.96 samples/sec Loss 2.9141 LearningRate 0.0233 Epoch: 14 Global Step: 152420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:26:55,280-Speed 5448.04 samples/sec Loss 2.9173 LearningRate 0.0233 Epoch: 14 Global Step: 152430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:27:02,841-Speed 5418.87 samples/sec Loss 2.9279 LearningRate 0.0233 Epoch: 14 Global Step: 152440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:27:10,338-Speed 5463.87 samples/sec Loss 2.9165 LearningRate 0.0233 Epoch: 14 Global Step: 152450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:27:17,861-Speed 5445.57 samples/sec Loss 2.9549 LearningRate 0.0233 Epoch: 14 Global Step: 152460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:27:25,394-Speed 5437.65 samples/sec Loss 2.9507 LearningRate 0.0233 Epoch: 14 Global Step: 152470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:27:32,850-Speed 5494.74 samples/sec Loss 2.9760 LearningRate 0.0233 Epoch: 14 Global Step: 152480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:27:40,391-Speed 5432.35 samples/sec Loss 2.9515 LearningRate 0.0233 Epoch: 14 Global Step: 152490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:27:47,893-Speed 5460.68 samples/sec Loss 2.8951 LearningRate 0.0233 Epoch: 14 Global Step: 152500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:27:55,430-Speed 5435.22 samples/sec Loss 2.9736 LearningRate 0.0233 Epoch: 14 Global Step: 152510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:28:02,947-Speed 5449.84 samples/sec Loss 2.9894 LearningRate 0.0233 Epoch: 14 Global Step: 152520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:28:10,462-Speed 5451.52 samples/sec Loss 2.9166 LearningRate 0.0233 Epoch: 14 Global Step: 152530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:28:17,964-Speed 5460.38 samples/sec Loss 2.9492 LearningRate 0.0232 Epoch: 14 Global Step: 152540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:28:25,514-Speed 5426.10 samples/sec Loss 2.9664 LearningRate 0.0232 Epoch: 14 Global Step: 152550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:28:33,023-Speed 5455.65 samples/sec Loss 2.9634 LearningRate 0.0232 Epoch: 14 Global Step: 152560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:28:40,547-Speed 5444.57 samples/sec Loss 2.9737 LearningRate 0.0232 Epoch: 14 Global Step: 152570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:28:48,082-Speed 5436.87 samples/sec Loss 2.9343 LearningRate 0.0232 Epoch: 14 Global Step: 152580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:28:55,608-Speed 5442.98 samples/sec Loss 2.9281 LearningRate 0.0232 Epoch: 14 Global Step: 152590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:29:03,179-Speed 5410.83 samples/sec Loss 2.9034 LearningRate 0.0232 Epoch: 14 Global Step: 152600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:29:10,658-Speed 5477.72 samples/sec Loss 2.9590 LearningRate 0.0232 Epoch: 14 Global Step: 152610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:29:18,190-Speed 5439.09 samples/sec Loss 2.8729 LearningRate 0.0232 Epoch: 14 Global Step: 152620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:29:25,742-Speed 5424.00 samples/sec Loss 2.9042 LearningRate 0.0232 Epoch: 14 Global Step: 152630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:29:33,297-Speed 5423.08 samples/sec Loss 2.9228 LearningRate 0.0232 Epoch: 14 Global Step: 152640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:29:40,825-Speed 5441.35 samples/sec Loss 2.9200 LearningRate 0.0232 Epoch: 14 Global Step: 152650 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:29:48,364-Speed 5434.19 samples/sec Loss 2.9789 LearningRate 0.0231 Epoch: 14 Global Step: 152660 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:29:55,912-Speed 5426.92 samples/sec Loss 2.9356 LearningRate 0.0231 Epoch: 14 Global Step: 152670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:30:03,440-Speed 5441.60 samples/sec Loss 2.9338 LearningRate 0.0231 Epoch: 14 Global Step: 152680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:30:11,002-Speed 5417.79 samples/sec Loss 2.9492 LearningRate 0.0231 Epoch: 14 Global Step: 152690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:30:18,609-Speed 5385.31 samples/sec Loss 2.9344 LearningRate 0.0231 Epoch: 14 Global Step: 152700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:30:26,108-Speed 5462.45 samples/sec Loss 2.8950 LearningRate 0.0231 Epoch: 14 Global Step: 152710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:30:33,624-Speed 5450.87 samples/sec Loss 2.9508 LearningRate 0.0231 Epoch: 14 Global Step: 152720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:30:41,111-Speed 5471.49 samples/sec Loss 2.9452 LearningRate 0.0231 Epoch: 14 Global Step: 152730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:30:48,585-Speed 5481.00 samples/sec Loss 2.8945 LearningRate 0.0231 Epoch: 14 Global Step: 152740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:30:56,101-Speed 5450.63 samples/sec Loss 2.9491 LearningRate 0.0231 Epoch: 14 Global Step: 152750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:31:03,621-Speed 5447.65 samples/sec Loss 2.9419 LearningRate 0.0231 Epoch: 14 Global Step: 152760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:31:11,105-Speed 5474.31 samples/sec Loss 2.9522 LearningRate 0.0231 Epoch: 14 Global Step: 152770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-09 05:31:18,738-Speed 5366.65 samples/sec Loss 2.9568 LearningRate 0.0230 Epoch: 14 Global Step: 152780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:31:26,292-Speed 5422.86 samples/sec Loss 2.9295 LearningRate 0.0230 Epoch: 14 Global Step: 152790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:31:33,858-Speed 5414.46 samples/sec Loss 2.9189 LearningRate 0.0230 Epoch: 14 Global Step: 152800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:31:41,401-Speed 5431.13 samples/sec Loss 2.9651 LearningRate 0.0230 Epoch: 14 Global Step: 152810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-09 05:31:48,945-Speed 5430.54 samples/sec Loss 2.9344 LearningRate 0.0230 Epoch: 14 Global Step: 152820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:31:56,432-Speed 5471.40 samples/sec Loss 2.9057 LearningRate 0.0230 Epoch: 14 Global Step: 152830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:32:03,902-Speed 5484.28 samples/sec Loss 2.8979 LearningRate 0.0230 Epoch: 14 Global Step: 152840 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:32:11,393-Speed 5468.38 samples/sec Loss 2.9022 LearningRate 0.0230 Epoch: 14 Global Step: 152850 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:32:18,938-Speed 5429.77 samples/sec Loss 2.9222 LearningRate 0.0230 Epoch: 14 Global Step: 152860 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:32:26,442-Speed 5458.83 samples/sec Loss 2.9249 LearningRate 0.0230 Epoch: 14 Global Step: 152870 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:32:33,985-Speed 5430.64 samples/sec Loss 2.9094 LearningRate 0.0230 Epoch: 14 Global Step: 152880 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:32:41,532-Speed 5428.55 samples/sec Loss 2.8908 LearningRate 0.0229 Epoch: 14 Global Step: 152890 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:32:49,016-Speed 5473.44 samples/sec Loss 2.8975 LearningRate 0.0229 Epoch: 14 Global Step: 152900 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:32:56,515-Speed 5462.83 samples/sec Loss 2.9140 LearningRate 0.0229 Epoch: 14 Global Step: 152910 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:33:04,028-Speed 5452.38 samples/sec Loss 2.8990 LearningRate 0.0229 Epoch: 14 Global Step: 152920 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:33:11,535-Speed 5457.03 samples/sec Loss 2.8958 LearningRate 0.0229 Epoch: 14 Global Step: 152930 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:33:19,040-Speed 5458.32 samples/sec Loss 2.9024 LearningRate 0.0229 Epoch: 14 Global Step: 152940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:33:26,639-Speed 5391.07 samples/sec Loss 2.8938 LearningRate 0.0229 Epoch: 14 Global Step: 152950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:33:34,188-Speed 5427.07 samples/sec Loss 2.9022 LearningRate 0.0229 Epoch: 14 Global Step: 152960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:33:41,778-Speed 5396.69 samples/sec Loss 2.9196 LearningRate 0.0229 Epoch: 14 Global Step: 152970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:33:49,321-Speed 5431.33 samples/sec Loss 2.9217 LearningRate 0.0229 Epoch: 14 Global Step: 152980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:33:56,868-Speed 5427.52 samples/sec Loss 2.9191 LearningRate 0.0229 Epoch: 14 Global Step: 152990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:34:04,341-Speed 5482.06 samples/sec Loss 2.9387 LearningRate 0.0229 Epoch: 14 Global Step: 153000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:34:11,859-Speed 5448.68 samples/sec Loss 2.9211 LearningRate 0.0228 Epoch: 14 Global Step: 153010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:34:19,353-Speed 5466.59 samples/sec Loss 2.8966 LearningRate 0.0228 Epoch: 14 Global Step: 153020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:34:26,863-Speed 5454.65 samples/sec Loss 2.9017 LearningRate 0.0228 Epoch: 14 Global Step: 153030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:34:34,391-Speed 5441.89 samples/sec Loss 2.9233 LearningRate 0.0228 Epoch: 14 Global Step: 153040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:34:41,977-Speed 5400.21 samples/sec Loss 2.8807 LearningRate 0.0228 Epoch: 14 Global Step: 153050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:34:49,510-Speed 5437.82 samples/sec Loss 2.9071 LearningRate 0.0228 Epoch: 14 Global Step: 153060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:34:57,092-Speed 5403.33 samples/sec Loss 2.9220 LearningRate 0.0228 Epoch: 14 Global Step: 153070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:35:04,603-Speed 5453.48 samples/sec Loss 2.9106 LearningRate 0.0228 Epoch: 14 Global Step: 153080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:35:12,128-Speed 5443.94 samples/sec Loss 2.8937 LearningRate 0.0228 Epoch: 14 Global Step: 153090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:35:19,664-Speed 5436.30 samples/sec Loss 2.9104 LearningRate 0.0228 Epoch: 14 Global Step: 153100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:35:27,164-Speed 5461.70 samples/sec Loss 2.9035 LearningRate 0.0228 Epoch: 14 Global Step: 153110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:35:34,689-Speed 5444.23 samples/sec Loss 2.9128 LearningRate 0.0228 Epoch: 14 Global Step: 153120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:35:42,194-Speed 5458.46 samples/sec Loss 2.9422 LearningRate 0.0227 Epoch: 14 Global Step: 153130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:35:49,736-Speed 5431.69 samples/sec Loss 2.9034 LearningRate 0.0227 Epoch: 14 Global Step: 153140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:35:57,198-Speed 5489.89 samples/sec Loss 2.9051 LearningRate 0.0227 Epoch: 14 Global Step: 153150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:36:04,748-Speed 5425.39 samples/sec Loss 2.9093 LearningRate 0.0227 Epoch: 14 Global Step: 153160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:36:12,289-Speed 5432.46 samples/sec Loss 2.9141 LearningRate 0.0227 Epoch: 14 Global Step: 153170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:36:19,764-Speed 5480.50 samples/sec Loss 2.9198 LearningRate 0.0227 Epoch: 14 Global Step: 153180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:36:27,292-Speed 5441.82 samples/sec Loss 2.9158 LearningRate 0.0227 Epoch: 14 Global Step: 153190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:36:34,782-Speed 5468.97 samples/sec Loss 2.9163 LearningRate 0.0227 Epoch: 14 Global Step: 153200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:36:42,241-Speed 5492.18 samples/sec Loss 2.8914 LearningRate 0.0227 Epoch: 14 Global Step: 153210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:36:49,784-Speed 5430.51 samples/sec Loss 2.8869 LearningRate 0.0227 Epoch: 14 Global Step: 153220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:36:57,334-Speed 5426.14 samples/sec Loss 2.8720 LearningRate 0.0227 Epoch: 14 Global Step: 153230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:37:04,839-Speed 5458.63 samples/sec Loss 2.8879 LearningRate 0.0227 Epoch: 14 Global Step: 153240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:37:12,367-Speed 5442.01 samples/sec Loss 2.9316 LearningRate 0.0226 Epoch: 14 Global Step: 153250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:37:19,907-Speed 5433.03 samples/sec Loss 2.9304 LearningRate 0.0226 Epoch: 14 Global Step: 153260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:37:27,452-Speed 5429.49 samples/sec Loss 2.8974 LearningRate 0.0226 Epoch: 14 Global Step: 153270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:37:34,941-Speed 5469.80 samples/sec Loss 2.9040 LearningRate 0.0226 Epoch: 14 Global Step: 153280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:37:42,438-Speed 5464.59 samples/sec Loss 2.9188 LearningRate 0.0226 Epoch: 14 Global Step: 153290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:37:49,958-Speed 5447.91 samples/sec Loss 2.9086 LearningRate 0.0226 Epoch: 14 Global Step: 153300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 05:37:57,466-Speed 5455.99 samples/sec Loss 2.8969 LearningRate 0.0226 Epoch: 14 Global Step: 153310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:38:05,121-Speed 5351.54 samples/sec Loss 2.9514 LearningRate 0.0226 Epoch: 14 Global Step: 153320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:38:12,635-Speed 5451.24 samples/sec Loss 2.9442 LearningRate 0.0226 Epoch: 14 Global Step: 153330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:38:20,230-Speed 5394.15 samples/sec Loss 2.9168 LearningRate 0.0226 Epoch: 14 Global Step: 153340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:38:27,904-Speed 5337.87 samples/sec Loss 2.8776 LearningRate 0.0226 Epoch: 14 Global Step: 153350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:38:35,528-Speed 5373.13 samples/sec Loss 2.9130 LearningRate 0.0226 Epoch: 14 Global Step: 153360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:38:43,074-Speed 5428.59 samples/sec Loss 2.8781 LearningRate 0.0225 Epoch: 14 Global Step: 153370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:38:50,579-Speed 5458.81 samples/sec Loss 2.8688 LearningRate 0.0225 Epoch: 14 Global Step: 153380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:38:58,117-Speed 5434.43 samples/sec Loss 2.8961 LearningRate 0.0225 Epoch: 14 Global Step: 153390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:39:05,688-Speed 5410.59 samples/sec Loss 2.9129 LearningRate 0.0225 Epoch: 14 Global Step: 153400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:39:13,219-Speed 5440.04 samples/sec Loss 2.9045 LearningRate 0.0225 Epoch: 14 Global Step: 153410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:39:20,772-Speed 5423.67 samples/sec Loss 2.8734 LearningRate 0.0225 Epoch: 14 Global Step: 153420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:39:28,271-Speed 5463.01 samples/sec Loss 2.8881 LearningRate 0.0225 Epoch: 14 Global Step: 153430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:39:35,840-Speed 5412.17 samples/sec Loss 2.8826 LearningRate 0.0225 Epoch: 14 Global Step: 153440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:39:43,359-Speed 5448.33 samples/sec Loss 2.9064 LearningRate 0.0225 Epoch: 14 Global Step: 153450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:39:50,833-Speed 5481.08 samples/sec Loss 2.9065 LearningRate 0.0225 Epoch: 14 Global Step: 153460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:39:58,331-Speed 5463.63 samples/sec Loss 2.8508 LearningRate 0.0225 Epoch: 14 Global Step: 153470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:40:05,827-Speed 5464.93 samples/sec Loss 2.8709 LearningRate 0.0225 Epoch: 14 Global Step: 153480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:40:13,342-Speed 5451.36 samples/sec Loss 2.8889 LearningRate 0.0224 Epoch: 14 Global Step: 153490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:40:20,815-Speed 5481.39 samples/sec Loss 2.9027 LearningRate 0.0224 Epoch: 14 Global Step: 153500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:40:28,288-Speed 5482.14 samples/sec Loss 2.9256 LearningRate 0.0224 Epoch: 14 Global Step: 153510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:40:35,713-Speed 5516.63 samples/sec Loss 2.8942 LearningRate 0.0224 Epoch: 14 Global Step: 153520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:40:43,135-Speed 5519.63 samples/sec Loss 2.8897 LearningRate 0.0224 Epoch: 14 Global Step: 153530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:40:50,635-Speed 5462.85 samples/sec Loss 2.8708 LearningRate 0.0224 Epoch: 14 Global Step: 153540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:40:58,171-Speed 5435.54 samples/sec Loss 2.9126 LearningRate 0.0224 Epoch: 14 Global Step: 153550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:41:05,696-Speed 5443.92 samples/sec Loss 2.8780 LearningRate 0.0224 Epoch: 14 Global Step: 153560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:41:13,218-Speed 5445.80 samples/sec Loss 2.9057 LearningRate 0.0224 Epoch: 14 Global Step: 153570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:41:20,729-Speed 5454.88 samples/sec Loss 2.8755 LearningRate 0.0224 Epoch: 14 Global Step: 153580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:41:28,273-Speed 5429.82 samples/sec Loss 2.8811 LearningRate 0.0224 Epoch: 14 Global Step: 153590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:41:35,821-Speed 5427.38 samples/sec Loss 2.8467 LearningRate 0.0224 Epoch: 14 Global Step: 153600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:41:43,352-Speed 5439.61 samples/sec Loss 2.8735 LearningRate 0.0223 Epoch: 14 Global Step: 153610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:41:50,911-Speed 5419.65 samples/sec Loss 2.8758 LearningRate 0.0223 Epoch: 14 Global Step: 153620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:41:58,367-Speed 5493.99 samples/sec Loss 2.8432 LearningRate 0.0223 Epoch: 14 Global Step: 153630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:42:05,875-Speed 5456.44 samples/sec Loss 2.9041 LearningRate 0.0223 Epoch: 14 Global Step: 153640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:42:13,407-Speed 5438.73 samples/sec Loss 2.8708 LearningRate 0.0223 Epoch: 14 Global Step: 153650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:42:20,954-Speed 5428.24 samples/sec Loss 2.8918 LearningRate 0.0223 Epoch: 14 Global Step: 153660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:42:28,468-Speed 5452.29 samples/sec Loss 2.8790 LearningRate 0.0223 Epoch: 14 Global Step: 153670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:42:35,926-Speed 5492.18 samples/sec Loss 2.8829 LearningRate 0.0223 Epoch: 14 Global Step: 153680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:42:43,377-Speed 5498.50 samples/sec Loss 2.8556 LearningRate 0.0223 Epoch: 14 Global Step: 153690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:42:50,868-Speed 5468.51 samples/sec Loss 2.8761 LearningRate 0.0223 Epoch: 14 Global Step: 153700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:42:58,357-Speed 5470.07 samples/sec Loss 2.8409 LearningRate 0.0223 Epoch: 14 Global Step: 153710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:43:05,788-Speed 5512.54 samples/sec Loss 2.8468 LearningRate 0.0223 Epoch: 14 Global Step: 153720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:43:13,335-Speed 5428.44 samples/sec Loss 2.8817 LearningRate 0.0222 Epoch: 14 Global Step: 153730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:43:20,788-Speed 5496.20 samples/sec Loss 2.8787 LearningRate 0.0222 Epoch: 14 Global Step: 153740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:43:28,293-Speed 5458.35 samples/sec Loss 2.8934 LearningRate 0.0222 Epoch: 14 Global Step: 153750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:43:35,823-Speed 5440.96 samples/sec Loss 2.8502 LearningRate 0.0222 Epoch: 14 Global Step: 153760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:43:43,349-Speed 5442.53 samples/sec Loss 2.8695 LearningRate 0.0222 Epoch: 14 Global Step: 153770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:43:50,800-Speed 5498.57 samples/sec Loss 2.8936 LearningRate 0.0222 Epoch: 14 Global Step: 153780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:43:58,309-Speed 5455.64 samples/sec Loss 2.8841 LearningRate 0.0222 Epoch: 14 Global Step: 153790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:44:05,870-Speed 5417.39 samples/sec Loss 2.8711 LearningRate 0.0222 Epoch: 14 Global Step: 153800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:44:13,372-Speed 5460.66 samples/sec Loss 2.8845 LearningRate 0.0222 Epoch: 14 Global Step: 153810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:44:20,833-Speed 5490.77 samples/sec Loss 2.8516 LearningRate 0.0222 Epoch: 14 Global Step: 153820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:44:28,319-Speed 5472.49 samples/sec Loss 2.9002 LearningRate 0.0222 Epoch: 14 Global Step: 153830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:44:35,772-Speed 5496.43 samples/sec Loss 2.8905 LearningRate 0.0222 Epoch: 14 Global Step: 153840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:44:43,222-Speed 5498.66 samples/sec Loss 2.8568 LearningRate 0.0221 Epoch: 14 Global Step: 153850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:44:50,756-Speed 5438.01 samples/sec Loss 2.9019 LearningRate 0.0221 Epoch: 14 Global Step: 153860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:44:58,246-Speed 5469.61 samples/sec Loss 2.8809 LearningRate 0.0221 Epoch: 14 Global Step: 153870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:45:05,807-Speed 5417.86 samples/sec Loss 2.8422 LearningRate 0.0221 Epoch: 14 Global Step: 153880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:45:13,301-Speed 5466.12 samples/sec Loss 2.8451 LearningRate 0.0221 Epoch: 14 Global Step: 153890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:45:20,802-Speed 5461.25 samples/sec Loss 2.8952 LearningRate 0.0221 Epoch: 14 Global Step: 153900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:45:28,339-Speed 5435.74 samples/sec Loss 2.8473 LearningRate 0.0221 Epoch: 14 Global Step: 153910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:45:35,887-Speed 5426.92 samples/sec Loss 2.8918 LearningRate 0.0221 Epoch: 14 Global Step: 153920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:45:43,467-Speed 5404.51 samples/sec Loss 2.8439 LearningRate 0.0221 Epoch: 14 Global Step: 153930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:45:51,159-Speed 5325.85 samples/sec Loss 2.8845 LearningRate 0.0221 Epoch: 14 Global Step: 153940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:45:58,692-Speed 5438.23 samples/sec Loss 2.8827 LearningRate 0.0221 Epoch: 14 Global Step: 153950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:46:06,288-Speed 5392.94 samples/sec Loss 2.8644 LearningRate 0.0221 Epoch: 14 Global Step: 153960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:46:13,805-Speed 5449.99 samples/sec Loss 2.8564 LearningRate 0.0220 Epoch: 14 Global Step: 153970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:46:21,301-Speed 5464.76 samples/sec Loss 2.8887 LearningRate 0.0220 Epoch: 14 Global Step: 153980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:46:28,777-Speed 5479.52 samples/sec Loss 2.8717 LearningRate 0.0220 Epoch: 14 Global Step: 153990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:46:36,322-Speed 5429.94 samples/sec Loss 2.8760 LearningRate 0.0220 Epoch: 14 Global Step: 154000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:47:20,308-[lfw][154000]XNorm: 22.161902 Training: 2022-01-09 05:47:20,309-[lfw][154000]Accuracy-Flip: 0.99800+-0.00221 Training: 2022-01-09 05:47:20,310-[lfw][154000]Accuracy-Highest: 0.99817 Training: 2022-01-09 05:48:11,597-[cfp_fp][154000]XNorm: 21.247886 Training: 2022-01-09 05:48:11,598-[cfp_fp][154000]Accuracy-Flip: 0.99214+-0.00453 Training: 2022-01-09 05:48:11,599-[cfp_fp][154000]Accuracy-Highest: 0.99314 Training: 2022-01-09 05:48:55,760-[agedb_30][154000]XNorm: 22.316006 Training: 2022-01-09 05:48:55,761-[agedb_30][154000]Accuracy-Flip: 0.98217+-0.00615 Training: 2022-01-09 05:48:55,761-[agedb_30][154000]Accuracy-Highest: 0.98217 Training: 2022-01-09 05:49:03,445-Speed 278.41 samples/sec Loss 2.8560 LearningRate 0.0220 Epoch: 14 Global Step: 154010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:49:10,981-Speed 5435.79 samples/sec Loss 2.8659 LearningRate 0.0220 Epoch: 14 Global Step: 154020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:49:18,521-Speed 5433.07 samples/sec Loss 2.8748 LearningRate 0.0220 Epoch: 14 Global Step: 154030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:49:26,033-Speed 5453.49 samples/sec Loss 2.8628 LearningRate 0.0220 Epoch: 14 Global Step: 154040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:49:33,539-Speed 5457.61 samples/sec Loss 2.8550 LearningRate 0.0220 Epoch: 14 Global Step: 154050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:49:41,070-Speed 5439.40 samples/sec Loss 2.8940 LearningRate 0.0220 Epoch: 14 Global Step: 154060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:49:48,582-Speed 5453.49 samples/sec Loss 2.8705 LearningRate 0.0220 Epoch: 14 Global Step: 154070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:49:56,149-Speed 5413.98 samples/sec Loss 2.9256 LearningRate 0.0220 Epoch: 14 Global Step: 154080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:50:03,747-Speed 5391.54 samples/sec Loss 2.8721 LearningRate 0.0219 Epoch: 14 Global Step: 154090 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:50:11,302-Speed 5422.64 samples/sec Loss 2.8669 LearningRate 0.0219 Epoch: 14 Global Step: 154100 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:50:18,854-Speed 5424.06 samples/sec Loss 2.8243 LearningRate 0.0219 Epoch: 14 Global Step: 154110 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:50:26,388-Speed 5437.23 samples/sec Loss 2.8978 LearningRate 0.0219 Epoch: 14 Global Step: 154120 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:50:33,932-Speed 5431.21 samples/sec Loss 2.8100 LearningRate 0.0219 Epoch: 14 Global Step: 154130 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:50:41,530-Speed 5391.32 samples/sec Loss 2.8514 LearningRate 0.0219 Epoch: 14 Global Step: 154140 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:50:49,048-Speed 5449.15 samples/sec Loss 2.8886 LearningRate 0.0219 Epoch: 14 Global Step: 154150 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:50:56,580-Speed 5438.76 samples/sec Loss 2.8696 LearningRate 0.0219 Epoch: 14 Global Step: 154160 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:51:04,226-Speed 5358.12 samples/sec Loss 2.8113 LearningRate 0.0219 Epoch: 14 Global Step: 154170 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:51:11,748-Speed 5445.86 samples/sec Loss 2.8687 LearningRate 0.0219 Epoch: 14 Global Step: 154180 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-01-09 05:51:19,270-Speed 5445.84 samples/sec Loss 2.8659 LearningRate 0.0219 Epoch: 14 Global Step: 154190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:51:26,822-Speed 5424.97 samples/sec Loss 2.8464 LearningRate 0.0219 Epoch: 14 Global Step: 154200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:51:34,355-Speed 5438.59 samples/sec Loss 2.8447 LearningRate 0.0219 Epoch: 14 Global Step: 154210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:51:41,900-Speed 5429.03 samples/sec Loss 2.8125 LearningRate 0.0218 Epoch: 14 Global Step: 154220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:51:49,487-Speed 5399.44 samples/sec Loss 2.8013 LearningRate 0.0218 Epoch: 14 Global Step: 154230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:51:57,071-Speed 5401.98 samples/sec Loss 2.8354 LearningRate 0.0218 Epoch: 14 Global Step: 154240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:52:04,562-Speed 5468.34 samples/sec Loss 2.8385 LearningRate 0.0218 Epoch: 14 Global Step: 154250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:52:12,048-Speed 5471.91 samples/sec Loss 2.8500 LearningRate 0.0218 Epoch: 14 Global Step: 154260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:52:19,613-Speed 5415.25 samples/sec Loss 2.8642 LearningRate 0.0218 Epoch: 14 Global Step: 154270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:52:27,166-Speed 5423.94 samples/sec Loss 2.8282 LearningRate 0.0218 Epoch: 14 Global Step: 154280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:52:34,651-Speed 5473.74 samples/sec Loss 2.8697 LearningRate 0.0218 Epoch: 14 Global Step: 154290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:52:42,167-Speed 5450.60 samples/sec Loss 2.8311 LearningRate 0.0218 Epoch: 14 Global Step: 154300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:52:49,699-Speed 5438.59 samples/sec Loss 2.8528 LearningRate 0.0218 Epoch: 14 Global Step: 154310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:52:57,219-Speed 5447.45 samples/sec Loss 2.8326 LearningRate 0.0218 Epoch: 14 Global Step: 154320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:53:04,717-Speed 5463.49 samples/sec Loss 2.8494 LearningRate 0.0218 Epoch: 14 Global Step: 154330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:53:12,260-Speed 5430.67 samples/sec Loss 2.8482 LearningRate 0.0217 Epoch: 14 Global Step: 154340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:53:19,778-Speed 5448.91 samples/sec Loss 2.8323 LearningRate 0.0217 Epoch: 14 Global Step: 154350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:53:27,294-Speed 5450.77 samples/sec Loss 2.8024 LearningRate 0.0217 Epoch: 14 Global Step: 154360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:53:34,934-Speed 5361.51 samples/sec Loss 2.8340 LearningRate 0.0217 Epoch: 14 Global Step: 154370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:53:42,489-Speed 5422.58 samples/sec Loss 2.8198 LearningRate 0.0217 Epoch: 14 Global Step: 154380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:53:50,081-Speed 5395.56 samples/sec Loss 2.8625 LearningRate 0.0217 Epoch: 14 Global Step: 154390 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:53:57,649-Speed 5412.67 samples/sec Loss 2.8101 LearningRate 0.0217 Epoch: 14 Global Step: 154400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:54:05,192-Speed 5431.66 samples/sec Loss 2.7884 LearningRate 0.0217 Epoch: 14 Global Step: 154410 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:54:12,836-Speed 5359.13 samples/sec Loss 2.8522 LearningRate 0.0217 Epoch: 14 Global Step: 154420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:54:20,388-Speed 5423.62 samples/sec Loss 2.8299 LearningRate 0.0217 Epoch: 14 Global Step: 154430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:54:27,945-Speed 5420.97 samples/sec Loss 2.8686 LearningRate 0.0217 Epoch: 14 Global Step: 154440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:54:35,518-Speed 5409.64 samples/sec Loss 2.8100 LearningRate 0.0217 Epoch: 14 Global Step: 154450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:54:43,082-Speed 5416.34 samples/sec Loss 2.8510 LearningRate 0.0216 Epoch: 14 Global Step: 154460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:54:50,728-Speed 5357.11 samples/sec Loss 2.8434 LearningRate 0.0216 Epoch: 14 Global Step: 154470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:54:58,302-Speed 5409.30 samples/sec Loss 2.8526 LearningRate 0.0216 Epoch: 14 Global Step: 154480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:55:05,930-Speed 5370.08 samples/sec Loss 2.8715 LearningRate 0.0216 Epoch: 14 Global Step: 154490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:55:13,481-Speed 5425.46 samples/sec Loss 2.8417 LearningRate 0.0216 Epoch: 14 Global Step: 154500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:55:21,049-Speed 5413.10 samples/sec Loss 2.8512 LearningRate 0.0216 Epoch: 14 Global Step: 154510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:55:28,638-Speed 5397.74 samples/sec Loss 2.8562 LearningRate 0.0216 Epoch: 14 Global Step: 154520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:55:36,227-Speed 5398.34 samples/sec Loss 2.8624 LearningRate 0.0216 Epoch: 14 Global Step: 154530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:55:43,789-Speed 5417.79 samples/sec Loss 2.8197 LearningRate 0.0216 Epoch: 14 Global Step: 154540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:55:51,340-Speed 5424.48 samples/sec Loss 2.8361 LearningRate 0.0216 Epoch: 14 Global Step: 154550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:55:58,905-Speed 5414.95 samples/sec Loss 2.8227 LearningRate 0.0216 Epoch: 14 Global Step: 154560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:56:06,505-Speed 5390.69 samples/sec Loss 2.8198 LearningRate 0.0216 Epoch: 14 Global Step: 154570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:56:14,048-Speed 5431.16 samples/sec Loss 2.8375 LearningRate 0.0215 Epoch: 14 Global Step: 154580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:56:21,579-Speed 5439.35 samples/sec Loss 2.8283 LearningRate 0.0215 Epoch: 14 Global Step: 154590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:56:29,136-Speed 5421.00 samples/sec Loss 2.8495 LearningRate 0.0215 Epoch: 14 Global Step: 154600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:56:36,706-Speed 5411.43 samples/sec Loss 2.8051 LearningRate 0.0215 Epoch: 14 Global Step: 154610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:56:44,276-Speed 5412.02 samples/sec Loss 2.8378 LearningRate 0.0215 Epoch: 14 Global Step: 154620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:56:51,923-Speed 5357.04 samples/sec Loss 2.8252 LearningRate 0.0215 Epoch: 14 Global Step: 154630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:56:59,575-Speed 5353.45 samples/sec Loss 2.7969 LearningRate 0.0215 Epoch: 14 Global Step: 154640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:57:07,245-Speed 5341.12 samples/sec Loss 2.8245 LearningRate 0.0215 Epoch: 14 Global Step: 154650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:57:14,828-Speed 5402.36 samples/sec Loss 2.8400 LearningRate 0.0215 Epoch: 14 Global Step: 154660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 05:57:22,537-Speed 5313.46 samples/sec Loss 2.8255 LearningRate 0.0215 Epoch: 14 Global Step: 154670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:57:30,038-Speed 5461.19 samples/sec Loss 2.8514 LearningRate 0.0215 Epoch: 14 Global Step: 154680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:57:37,563-Speed 5444.28 samples/sec Loss 2.8358 LearningRate 0.0215 Epoch: 14 Global Step: 154690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:57:45,237-Speed 5338.84 samples/sec Loss 2.8481 LearningRate 0.0215 Epoch: 14 Global Step: 154700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:57:52,799-Speed 5416.67 samples/sec Loss 2.8418 LearningRate 0.0214 Epoch: 14 Global Step: 154710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:58:00,351-Speed 5423.99 samples/sec Loss 2.8391 LearningRate 0.0214 Epoch: 14 Global Step: 154720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:58:07,892-Speed 5432.62 samples/sec Loss 2.8223 LearningRate 0.0214 Epoch: 14 Global Step: 154730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:58:15,480-Speed 5398.96 samples/sec Loss 2.7836 LearningRate 0.0214 Epoch: 14 Global Step: 154740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:58:23,098-Speed 5377.38 samples/sec Loss 2.8131 LearningRate 0.0214 Epoch: 14 Global Step: 154750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:58:30,592-Speed 5466.38 samples/sec Loss 2.8400 LearningRate 0.0214 Epoch: 14 Global Step: 154760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:58:38,216-Speed 5372.96 samples/sec Loss 2.8037 LearningRate 0.0214 Epoch: 14 Global Step: 154770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:58:45,731-Speed 5451.70 samples/sec Loss 2.8113 LearningRate 0.0214 Epoch: 14 Global Step: 154780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:58:53,306-Speed 5407.77 samples/sec Loss 2.7919 LearningRate 0.0214 Epoch: 14 Global Step: 154790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:59:00,839-Speed 5437.71 samples/sec Loss 2.8038 LearningRate 0.0214 Epoch: 14 Global Step: 154800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:59:08,448-Speed 5384.34 samples/sec Loss 2.8286 LearningRate 0.0214 Epoch: 14 Global Step: 154810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:59:16,067-Speed 5376.47 samples/sec Loss 2.8365 LearningRate 0.0214 Epoch: 14 Global Step: 154820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 05:59:23,578-Speed 5453.69 samples/sec Loss 2.8272 LearningRate 0.0213 Epoch: 14 Global Step: 154830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:59:31,215-Speed 5364.21 samples/sec Loss 2.7995 LearningRate 0.0213 Epoch: 14 Global Step: 154840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:59:38,841-Speed 5371.89 samples/sec Loss 2.7699 LearningRate 0.0213 Epoch: 14 Global Step: 154850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:59:46,348-Speed 5457.24 samples/sec Loss 2.8213 LearningRate 0.0213 Epoch: 14 Global Step: 154860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 05:59:53,915-Speed 5413.68 samples/sec Loss 2.8472 LearningRate 0.0213 Epoch: 14 Global Step: 154870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:00:01,497-Speed 5403.17 samples/sec Loss 2.8213 LearningRate 0.0213 Epoch: 14 Global Step: 154880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:00:09,067-Speed 5411.13 samples/sec Loss 2.8441 LearningRate 0.0213 Epoch: 14 Global Step: 154890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:00:16,629-Speed 5417.61 samples/sec Loss 2.7874 LearningRate 0.0213 Epoch: 14 Global Step: 154900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:00:24,208-Speed 5405.43 samples/sec Loss 2.8521 LearningRate 0.0213 Epoch: 14 Global Step: 154910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:00:31,730-Speed 5446.08 samples/sec Loss 2.8405 LearningRate 0.0213 Epoch: 14 Global Step: 154920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:00:39,369-Speed 5362.40 samples/sec Loss 2.8112 LearningRate 0.0213 Epoch: 14 Global Step: 154930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:00:46,925-Speed 5422.01 samples/sec Loss 2.8315 LearningRate 0.0213 Epoch: 14 Global Step: 154940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:00:54,481-Speed 5421.79 samples/sec Loss 2.7878 LearningRate 0.0212 Epoch: 14 Global Step: 154950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:01:02,053-Speed 5409.82 samples/sec Loss 2.8191 LearningRate 0.0212 Epoch: 14 Global Step: 154960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:01:09,706-Speed 5352.71 samples/sec Loss 2.7607 LearningRate 0.0212 Epoch: 14 Global Step: 154970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:01:17,390-Speed 5331.53 samples/sec Loss 2.8218 LearningRate 0.0212 Epoch: 14 Global Step: 154980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:01:24,943-Speed 5424.05 samples/sec Loss 2.8284 LearningRate 0.0212 Epoch: 14 Global Step: 154990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:01:32,484-Speed 5431.94 samples/sec Loss 2.8273 LearningRate 0.0212 Epoch: 14 Global Step: 155000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:01:40,015-Speed 5439.52 samples/sec Loss 2.8137 LearningRate 0.0212 Epoch: 14 Global Step: 155010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:01:47,538-Speed 5445.35 samples/sec Loss 2.8046 LearningRate 0.0212 Epoch: 14 Global Step: 155020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:01:55,246-Speed 5314.67 samples/sec Loss 2.8317 LearningRate 0.0212 Epoch: 14 Global Step: 155030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:02:02,741-Speed 5465.49 samples/sec Loss 2.8120 LearningRate 0.0212 Epoch: 14 Global Step: 155040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:02:10,262-Speed 5447.12 samples/sec Loss 2.8298 LearningRate 0.0212 Epoch: 14 Global Step: 155050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:02:17,862-Speed 5389.90 samples/sec Loss 2.8095 LearningRate 0.0212 Epoch: 14 Global Step: 155060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:02:25,422-Speed 5419.09 samples/sec Loss 2.8153 LearningRate 0.0211 Epoch: 14 Global Step: 155070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:02:32,972-Speed 5425.74 samples/sec Loss 2.8107 LearningRate 0.0211 Epoch: 14 Global Step: 155080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:02:40,507-Speed 5436.76 samples/sec Loss 2.7785 LearningRate 0.0211 Epoch: 14 Global Step: 155090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:02:48,031-Speed 5444.71 samples/sec Loss 2.7922 LearningRate 0.0211 Epoch: 14 Global Step: 155100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:02:55,570-Speed 5433.39 samples/sec Loss 2.8293 LearningRate 0.0211 Epoch: 14 Global Step: 155110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:03:03,144-Speed 5408.80 samples/sec Loss 2.8340 LearningRate 0.0211 Epoch: 14 Global Step: 155120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:03:10,678-Speed 5437.31 samples/sec Loss 2.8214 LearningRate 0.0211 Epoch: 14 Global Step: 155130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:03:18,199-Speed 5446.73 samples/sec Loss 2.7749 LearningRate 0.0211 Epoch: 14 Global Step: 155140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:03:25,785-Speed 5400.35 samples/sec Loss 2.7814 LearningRate 0.0211 Epoch: 14 Global Step: 155150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:03:33,268-Speed 5474.75 samples/sec Loss 2.7925 LearningRate 0.0211 Epoch: 14 Global Step: 155160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:03:40,835-Speed 5413.08 samples/sec Loss 2.7873 LearningRate 0.0211 Epoch: 14 Global Step: 155170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:03:48,407-Speed 5410.96 samples/sec Loss 2.8140 LearningRate 0.0211 Epoch: 14 Global Step: 155180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:03:55,982-Speed 5408.47 samples/sec Loss 2.7842 LearningRate 0.0211 Epoch: 14 Global Step: 155190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:04:03,517-Speed 5436.14 samples/sec Loss 2.7955 LearningRate 0.0210 Epoch: 14 Global Step: 155200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:04:11,033-Speed 5451.08 samples/sec Loss 2.8157 LearningRate 0.0210 Epoch: 14 Global Step: 155210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:04:18,600-Speed 5413.54 samples/sec Loss 2.8542 LearningRate 0.0210 Epoch: 14 Global Step: 155220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:04:26,203-Speed 5387.99 samples/sec Loss 2.8024 LearningRate 0.0210 Epoch: 14 Global Step: 155230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:04:33,714-Speed 5454.00 samples/sec Loss 2.8086 LearningRate 0.0210 Epoch: 14 Global Step: 155240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:04:41,365-Speed 5354.61 samples/sec Loss 2.7545 LearningRate 0.0210 Epoch: 14 Global Step: 155250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:04:48,906-Speed 5432.50 samples/sec Loss 2.7718 LearningRate 0.0210 Epoch: 14 Global Step: 155260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:04:56,491-Speed 5400.59 samples/sec Loss 2.7957 LearningRate 0.0210 Epoch: 14 Global Step: 155270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:05:04,024-Speed 5438.05 samples/sec Loss 2.8094 LearningRate 0.0210 Epoch: 14 Global Step: 155280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:05:11,652-Speed 5370.51 samples/sec Loss 2.8322 LearningRate 0.0210 Epoch: 14 Global Step: 155290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:05:19,272-Speed 5376.53 samples/sec Loss 2.7873 LearningRate 0.0210 Epoch: 14 Global Step: 155300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:05:26,778-Speed 5457.23 samples/sec Loss 2.8097 LearningRate 0.0210 Epoch: 14 Global Step: 155310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:05:34,331-Speed 5423.57 samples/sec Loss 2.7812 LearningRate 0.0209 Epoch: 14 Global Step: 155320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:05:41,814-Speed 5474.67 samples/sec Loss 2.8027 LearningRate 0.0209 Epoch: 14 Global Step: 155330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:05:49,373-Speed 5419.56 samples/sec Loss 2.7989 LearningRate 0.0209 Epoch: 14 Global Step: 155340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:05:56,930-Speed 5420.76 samples/sec Loss 2.7845 LearningRate 0.0209 Epoch: 14 Global Step: 155350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:06:04,462-Speed 5438.88 samples/sec Loss 2.7883 LearningRate 0.0209 Epoch: 14 Global Step: 155360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:06:11,979-Speed 5449.17 samples/sec Loss 2.7768 LearningRate 0.0209 Epoch: 14 Global Step: 155370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:06:19,560-Speed 5404.01 samples/sec Loss 2.8180 LearningRate 0.0209 Epoch: 14 Global Step: 155380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:06:27,096-Speed 5436.34 samples/sec Loss 2.7856 LearningRate 0.0209 Epoch: 14 Global Step: 155390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:06:34,561-Speed 5487.08 samples/sec Loss 2.8145 LearningRate 0.0209 Epoch: 14 Global Step: 155400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:06:42,131-Speed 5411.40 samples/sec Loss 2.7863 LearningRate 0.0209 Epoch: 14 Global Step: 155410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:06:49,648-Speed 5450.47 samples/sec Loss 2.7581 LearningRate 0.0209 Epoch: 14 Global Step: 155420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:06:57,180-Speed 5438.71 samples/sec Loss 2.7673 LearningRate 0.0209 Epoch: 14 Global Step: 155430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:07:04,780-Speed 5389.73 samples/sec Loss 2.7376 LearningRate 0.0209 Epoch: 14 Global Step: 155440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:07:12,368-Speed 5399.04 samples/sec Loss 2.8083 LearningRate 0.0208 Epoch: 14 Global Step: 155450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:07:19,876-Speed 5456.35 samples/sec Loss 2.7851 LearningRate 0.0208 Epoch: 14 Global Step: 155460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:07:27,451-Speed 5407.38 samples/sec Loss 2.8199 LearningRate 0.0208 Epoch: 14 Global Step: 155470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:07:35,040-Speed 5397.98 samples/sec Loss 2.7929 LearningRate 0.0208 Epoch: 14 Global Step: 155480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:07:42,578-Speed 5434.41 samples/sec Loss 2.7954 LearningRate 0.0208 Epoch: 14 Global Step: 155490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:07:50,193-Speed 5380.29 samples/sec Loss 2.7978 LearningRate 0.0208 Epoch: 14 Global Step: 155500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:07:57,855-Speed 5346.09 samples/sec Loss 2.8038 LearningRate 0.0208 Epoch: 14 Global Step: 155510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:08:05,377-Speed 5446.10 samples/sec Loss 2.8216 LearningRate 0.0208 Epoch: 14 Global Step: 155520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:08:12,848-Speed 5483.71 samples/sec Loss 2.7433 LearningRate 0.0208 Epoch: 14 Global Step: 155530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:08:20,361-Speed 5452.53 samples/sec Loss 2.7725 LearningRate 0.0208 Epoch: 14 Global Step: 155540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:08:43,333-Speed 1783.18 samples/sec Loss 2.7553 LearningRate 0.0208 Epoch: 15 Global Step: 155550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:08:50,789-Speed 5493.65 samples/sec Loss 2.7819 LearningRate 0.0208 Epoch: 15 Global Step: 155560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:08:58,348-Speed 5419.82 samples/sec Loss 2.7993 LearningRate 0.0207 Epoch: 15 Global Step: 155570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:09:05,832-Speed 5474.03 samples/sec Loss 2.7789 LearningRate 0.0207 Epoch: 15 Global Step: 155580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:09:13,399-Speed 5413.20 samples/sec Loss 2.7597 LearningRate 0.0207 Epoch: 15 Global Step: 155590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:09:20,817-Speed 5522.88 samples/sec Loss 2.7940 LearningRate 0.0207 Epoch: 15 Global Step: 155600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:09:28,317-Speed 5462.06 samples/sec Loss 2.7686 LearningRate 0.0207 Epoch: 15 Global Step: 155610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:09:35,788-Speed 5483.24 samples/sec Loss 2.7728 LearningRate 0.0207 Epoch: 15 Global Step: 155620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:09:43,250-Speed 5489.79 samples/sec Loss 2.7965 LearningRate 0.0207 Epoch: 15 Global Step: 155630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:09:50,727-Speed 5479.50 samples/sec Loss 2.7348 LearningRate 0.0207 Epoch: 15 Global Step: 155640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:09:58,211-Speed 5473.72 samples/sec Loss 2.7638 LearningRate 0.0207 Epoch: 15 Global Step: 155650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:10:05,643-Speed 5511.61 samples/sec Loss 2.7533 LearningRate 0.0207 Epoch: 15 Global Step: 155660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:10:13,065-Speed 5519.84 samples/sec Loss 2.7431 LearningRate 0.0207 Epoch: 15 Global Step: 155670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:10:20,563-Speed 5463.41 samples/sec Loss 2.7516 LearningRate 0.0207 Epoch: 15 Global Step: 155680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:10:28,277-Speed 5310.93 samples/sec Loss 2.7894 LearningRate 0.0207 Epoch: 15 Global Step: 155690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:10:35,912-Speed 5365.39 samples/sec Loss 2.7837 LearningRate 0.0206 Epoch: 15 Global Step: 155700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:10:43,589-Speed 5336.18 samples/sec Loss 2.7722 LearningRate 0.0206 Epoch: 15 Global Step: 155710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:10:51,284-Speed 5323.78 samples/sec Loss 2.7367 LearningRate 0.0206 Epoch: 15 Global Step: 155720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:10:58,983-Speed 5320.44 samples/sec Loss 2.7543 LearningRate 0.0206 Epoch: 15 Global Step: 155730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:11:06,660-Speed 5336.68 samples/sec Loss 2.7363 LearningRate 0.0206 Epoch: 15 Global Step: 155740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:11:14,349-Speed 5327.99 samples/sec Loss 2.7522 LearningRate 0.0206 Epoch: 15 Global Step: 155750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:11:21,993-Speed 5359.27 samples/sec Loss 2.7669 LearningRate 0.0206 Epoch: 15 Global Step: 155760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:11:29,642-Speed 5355.15 samples/sec Loss 2.7644 LearningRate 0.0206 Epoch: 15 Global Step: 155770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:11:37,316-Speed 5338.18 samples/sec Loss 2.7420 LearningRate 0.0206 Epoch: 15 Global Step: 155780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:11:44,876-Speed 5418.86 samples/sec Loss 2.7554 LearningRate 0.0206 Epoch: 15 Global Step: 155790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:11:52,411-Speed 5436.44 samples/sec Loss 2.7313 LearningRate 0.0206 Epoch: 15 Global Step: 155800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:11:59,910-Speed 5463.41 samples/sec Loss 2.7181 LearningRate 0.0206 Epoch: 15 Global Step: 155810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:12:07,398-Speed 5470.70 samples/sec Loss 2.7464 LearningRate 0.0205 Epoch: 15 Global Step: 155820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:12:14,898-Speed 5462.02 samples/sec Loss 2.8014 LearningRate 0.0205 Epoch: 15 Global Step: 155830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:12:22,305-Speed 5530.47 samples/sec Loss 2.7304 LearningRate 0.0205 Epoch: 15 Global Step: 155840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:12:29,749-Speed 5503.00 samples/sec Loss 2.7878 LearningRate 0.0205 Epoch: 15 Global Step: 155850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:12:37,221-Speed 5482.58 samples/sec Loss 2.7332 LearningRate 0.0205 Epoch: 15 Global Step: 155860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:12:44,715-Speed 5466.07 samples/sec Loss 2.7669 LearningRate 0.0205 Epoch: 15 Global Step: 155870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:12:52,238-Speed 5445.78 samples/sec Loss 2.7746 LearningRate 0.0205 Epoch: 15 Global Step: 155880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:12:59,678-Speed 5506.08 samples/sec Loss 2.7324 LearningRate 0.0205 Epoch: 15 Global Step: 155890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:13:07,222-Speed 5429.97 samples/sec Loss 2.7346 LearningRate 0.0205 Epoch: 15 Global Step: 155900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:13:14,721-Speed 5462.68 samples/sec Loss 2.7598 LearningRate 0.0205 Epoch: 15 Global Step: 155910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:13:22,233-Speed 5453.57 samples/sec Loss 2.7314 LearningRate 0.0205 Epoch: 15 Global Step: 155920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:13:29,703-Speed 5483.63 samples/sec Loss 2.7623 LearningRate 0.0205 Epoch: 15 Global Step: 155930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:13:37,221-Speed 5449.08 samples/sec Loss 2.7830 LearningRate 0.0205 Epoch: 15 Global Step: 155940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:13:44,725-Speed 5459.14 samples/sec Loss 2.7105 LearningRate 0.0204 Epoch: 15 Global Step: 155950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:13:52,174-Speed 5499.66 samples/sec Loss 2.7639 LearningRate 0.0204 Epoch: 15 Global Step: 155960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:13:59,731-Speed 5420.74 samples/sec Loss 2.7218 LearningRate 0.0204 Epoch: 15 Global Step: 155970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:14:07,410-Speed 5334.84 samples/sec Loss 2.7481 LearningRate 0.0204 Epoch: 15 Global Step: 155980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:14:14,966-Speed 5421.44 samples/sec Loss 2.7274 LearningRate 0.0204 Epoch: 15 Global Step: 155990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:14:22,539-Speed 5410.01 samples/sec Loss 2.7169 LearningRate 0.0204 Epoch: 15 Global Step: 156000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:15:12,466-[lfw][156000]XNorm: 23.614685 Training: 2022-01-09 06:15:12,467-[lfw][156000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-01-09 06:15:12,467-[lfw][156000]Accuracy-Highest: 0.99817 Training: 2022-01-09 06:16:05,225-[cfp_fp][156000]XNorm: 22.312758 Training: 2022-01-09 06:16:05,226-[cfp_fp][156000]Accuracy-Flip: 0.99371+-0.00363 Training: 2022-01-09 06:16:05,226-[cfp_fp][156000]Accuracy-Highest: 0.99371 Training: 2022-01-09 06:16:49,927-[agedb_30][156000]XNorm: 23.813588 Training: 2022-01-09 06:16:49,928-[agedb_30][156000]Accuracy-Flip: 0.98150+-0.00724 Training: 2022-01-09 06:16:49,928-[agedb_30][156000]Accuracy-Highest: 0.98217 Training: 2022-01-09 06:16:57,544-Speed 264.25 samples/sec Loss 2.7393 LearningRate 0.0204 Epoch: 15 Global Step: 156010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:17:05,143-Speed 5390.80 samples/sec Loss 2.7506 LearningRate 0.0204 Epoch: 15 Global Step: 156020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:17:12,707-Speed 5415.67 samples/sec Loss 2.7186 LearningRate 0.0204 Epoch: 15 Global Step: 156030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:17:20,206-Speed 5463.23 samples/sec Loss 2.7567 LearningRate 0.0204 Epoch: 15 Global Step: 156040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:17:27,682-Speed 5479.66 samples/sec Loss 2.7678 LearningRate 0.0204 Epoch: 15 Global Step: 156050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:17:35,164-Speed 5474.65 samples/sec Loss 2.7411 LearningRate 0.0204 Epoch: 15 Global Step: 156060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:17:42,629-Speed 5488.21 samples/sec Loss 2.7285 LearningRate 0.0203 Epoch: 15 Global Step: 156070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:17:50,159-Speed 5440.24 samples/sec Loss 2.7535 LearningRate 0.0203 Epoch: 15 Global Step: 156080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:17:57,714-Speed 5422.38 samples/sec Loss 2.7716 LearningRate 0.0203 Epoch: 15 Global Step: 156090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:18:05,228-Speed 5452.08 samples/sec Loss 2.7780 LearningRate 0.0203 Epoch: 15 Global Step: 156100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:18:12,688-Speed 5491.28 samples/sec Loss 2.7400 LearningRate 0.0203 Epoch: 15 Global Step: 156110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:18:20,250-Speed 5417.57 samples/sec Loss 2.7909 LearningRate 0.0203 Epoch: 15 Global Step: 156120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:18:27,827-Speed 5406.50 samples/sec Loss 2.7898 LearningRate 0.0203 Epoch: 15 Global Step: 156130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:18:35,389-Speed 5417.40 samples/sec Loss 2.7670 LearningRate 0.0203 Epoch: 15 Global Step: 156140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:18:43,048-Speed 5348.46 samples/sec Loss 2.7800 LearningRate 0.0203 Epoch: 15 Global Step: 156150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:18:50,691-Speed 5360.17 samples/sec Loss 2.7460 LearningRate 0.0203 Epoch: 15 Global Step: 156160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:18:58,187-Speed 5464.64 samples/sec Loss 2.7706 LearningRate 0.0203 Epoch: 15 Global Step: 156170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:19:05,766-Speed 5405.21 samples/sec Loss 2.7644 LearningRate 0.0203 Epoch: 15 Global Step: 156180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:19:13,247-Speed 5475.35 samples/sec Loss 2.7634 LearningRate 0.0203 Epoch: 15 Global Step: 156190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:19:20,809-Speed 5418.16 samples/sec Loss 2.7410 LearningRate 0.0202 Epoch: 15 Global Step: 156200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:19:28,303-Speed 5466.42 samples/sec Loss 2.7562 LearningRate 0.0202 Epoch: 15 Global Step: 156210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:19:35,798-Speed 5465.07 samples/sec Loss 2.7772 LearningRate 0.0202 Epoch: 15 Global Step: 156220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:19:43,373-Speed 5407.86 samples/sec Loss 2.7275 LearningRate 0.0202 Epoch: 15 Global Step: 156230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:19:51,177-Speed 5249.67 samples/sec Loss 2.7534 LearningRate 0.0202 Epoch: 15 Global Step: 156240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:19:58,705-Speed 5441.96 samples/sec Loss 2.7617 LearningRate 0.0202 Epoch: 15 Global Step: 156250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:20:06,298-Speed 5395.16 samples/sec Loss 2.7311 LearningRate 0.0202 Epoch: 15 Global Step: 156260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:20:13,793-Speed 5465.37 samples/sec Loss 2.7484 LearningRate 0.0202 Epoch: 15 Global Step: 156270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:20:21,321-Speed 5441.68 samples/sec Loss 2.7393 LearningRate 0.0202 Epoch: 15 Global Step: 156280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:20:28,877-Speed 5421.94 samples/sec Loss 2.7328 LearningRate 0.0202 Epoch: 15 Global Step: 156290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:20:36,435-Speed 5419.97 samples/sec Loss 2.7588 LearningRate 0.0202 Epoch: 15 Global Step: 156300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:20:43,978-Speed 5430.97 samples/sec Loss 2.7294 LearningRate 0.0202 Epoch: 15 Global Step: 156310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:20:51,528-Speed 5425.92 samples/sec Loss 2.6783 LearningRate 0.0202 Epoch: 15 Global Step: 156320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:20:59,030-Speed 5460.90 samples/sec Loss 2.7661 LearningRate 0.0201 Epoch: 15 Global Step: 156330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:21:06,507-Speed 5478.69 samples/sec Loss 2.6876 LearningRate 0.0201 Epoch: 15 Global Step: 156340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:21:14,043-Speed 5435.48 samples/sec Loss 2.7378 LearningRate 0.0201 Epoch: 15 Global Step: 156350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:21:21,557-Speed 5452.53 samples/sec Loss 2.7621 LearningRate 0.0201 Epoch: 15 Global Step: 156360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:21:29,076-Speed 5448.27 samples/sec Loss 2.7086 LearningRate 0.0201 Epoch: 15 Global Step: 156370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:21:36,693-Speed 5378.32 samples/sec Loss 2.7099 LearningRate 0.0201 Epoch: 15 Global Step: 156380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:21:44,369-Speed 5336.62 samples/sec Loss 2.7486 LearningRate 0.0201 Epoch: 15 Global Step: 156390 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:21:51,879-Speed 5455.43 samples/sec Loss 2.7710 LearningRate 0.0201 Epoch: 15 Global Step: 156400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:21:59,412-Speed 5438.07 samples/sec Loss 2.7691 LearningRate 0.0201 Epoch: 15 Global Step: 156410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:22:06,949-Speed 5435.49 samples/sec Loss 2.7294 LearningRate 0.0201 Epoch: 15 Global Step: 156420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:22:14,497-Speed 5426.45 samples/sec Loss 2.7407 LearningRate 0.0201 Epoch: 15 Global Step: 156430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:22:22,061-Speed 5416.18 samples/sec Loss 2.7379 LearningRate 0.0201 Epoch: 15 Global Step: 156440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:22:29,541-Speed 5477.10 samples/sec Loss 2.7291 LearningRate 0.0200 Epoch: 15 Global Step: 156450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:22:37,052-Speed 5454.08 samples/sec Loss 2.7113 LearningRate 0.0200 Epoch: 15 Global Step: 156460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:22:44,550-Speed 5463.22 samples/sec Loss 2.7201 LearningRate 0.0200 Epoch: 15 Global Step: 156470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:22:52,103-Speed 5423.43 samples/sec Loss 2.7144 LearningRate 0.0200 Epoch: 15 Global Step: 156480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:22:59,643-Speed 5433.14 samples/sec Loss 2.7454 LearningRate 0.0200 Epoch: 15 Global Step: 156490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:23:07,104-Speed 5490.75 samples/sec Loss 2.7121 LearningRate 0.0200 Epoch: 15 Global Step: 156500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:23:14,599-Speed 5465.46 samples/sec Loss 2.7483 LearningRate 0.0200 Epoch: 15 Global Step: 156510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:23:22,120-Speed 5446.74 samples/sec Loss 2.7207 LearningRate 0.0200 Epoch: 15 Global Step: 156520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:23:29,605-Speed 5473.42 samples/sec Loss 2.7120 LearningRate 0.0200 Epoch: 15 Global Step: 156530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:23:37,166-Speed 5417.71 samples/sec Loss 2.7115 LearningRate 0.0200 Epoch: 15 Global Step: 156540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:23:44,734-Speed 5413.04 samples/sec Loss 2.7219 LearningRate 0.0200 Epoch: 15 Global Step: 156550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:23:52,220-Speed 5472.42 samples/sec Loss 2.7475 LearningRate 0.0200 Epoch: 15 Global Step: 156560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:23:59,707-Speed 5471.52 samples/sec Loss 2.7449 LearningRate 0.0200 Epoch: 15 Global Step: 156570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:24:07,230-Speed 5445.24 samples/sec Loss 2.7367 LearningRate 0.0199 Epoch: 15 Global Step: 156580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:24:14,772-Speed 5431.48 samples/sec Loss 2.6993 LearningRate 0.0199 Epoch: 15 Global Step: 156590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:24:22,238-Speed 5486.84 samples/sec Loss 2.7451 LearningRate 0.0199 Epoch: 15 Global Step: 156600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:24:29,725-Speed 5471.63 samples/sec Loss 2.7287 LearningRate 0.0199 Epoch: 15 Global Step: 156610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:24:37,197-Speed 5482.32 samples/sec Loss 2.7185 LearningRate 0.0199 Epoch: 15 Global Step: 156620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:24:44,660-Speed 5489.13 samples/sec Loss 2.7468 LearningRate 0.0199 Epoch: 15 Global Step: 156630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:24:52,267-Speed 5384.87 samples/sec Loss 2.6992 LearningRate 0.0199 Epoch: 15 Global Step: 156640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 06:24:59,758-Speed 5468.84 samples/sec Loss 2.7067 LearningRate 0.0199 Epoch: 15 Global Step: 156650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:25:07,310-Speed 5424.26 samples/sec Loss 2.7730 LearningRate 0.0199 Epoch: 15 Global Step: 156660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:25:14,815-Speed 5459.02 samples/sec Loss 2.7010 LearningRate 0.0199 Epoch: 15 Global Step: 156670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:25:22,301-Speed 5472.07 samples/sec Loss 2.7186 LearningRate 0.0199 Epoch: 15 Global Step: 156680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:25:29,830-Speed 5441.33 samples/sec Loss 2.7122 LearningRate 0.0199 Epoch: 15 Global Step: 156690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:25:37,296-Speed 5486.21 samples/sec Loss 2.6863 LearningRate 0.0199 Epoch: 15 Global Step: 156700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:25:44,859-Speed 5416.66 samples/sec Loss 2.6995 LearningRate 0.0198 Epoch: 15 Global Step: 156710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:25:52,355-Speed 5465.38 samples/sec Loss 2.7379 LearningRate 0.0198 Epoch: 15 Global Step: 156720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:26:00,010-Speed 5351.54 samples/sec Loss 2.7128 LearningRate 0.0198 Epoch: 15 Global Step: 156730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:26:07,599-Speed 5397.80 samples/sec Loss 2.6830 LearningRate 0.0198 Epoch: 15 Global Step: 156740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:26:15,130-Speed 5439.94 samples/sec Loss 2.7022 LearningRate 0.0198 Epoch: 15 Global Step: 156750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:26:22,689-Speed 5419.67 samples/sec Loss 2.6907 LearningRate 0.0198 Epoch: 15 Global Step: 156760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:26:30,229-Speed 5433.14 samples/sec Loss 2.7081 LearningRate 0.0198 Epoch: 15 Global Step: 156770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:26:37,731-Speed 5459.98 samples/sec Loss 2.7084 LearningRate 0.0198 Epoch: 15 Global Step: 156780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:26:45,246-Speed 5451.52 samples/sec Loss 2.7041 LearningRate 0.0198 Epoch: 15 Global Step: 156790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:26:52,793-Speed 5428.28 samples/sec Loss 2.7370 LearningRate 0.0198 Epoch: 15 Global Step: 156800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:27:00,322-Speed 5441.17 samples/sec Loss 2.7328 LearningRate 0.0198 Epoch: 15 Global Step: 156810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:27:07,799-Speed 5478.36 samples/sec Loss 2.7235 LearningRate 0.0198 Epoch: 15 Global Step: 156820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:27:15,306-Speed 5457.15 samples/sec Loss 2.7225 LearningRate 0.0198 Epoch: 15 Global Step: 156830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:27:22,850-Speed 5430.56 samples/sec Loss 2.6845 LearningRate 0.0197 Epoch: 15 Global Step: 156840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:27:30,417-Speed 5413.41 samples/sec Loss 2.6895 LearningRate 0.0197 Epoch: 15 Global Step: 156850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:27:37,884-Speed 5486.06 samples/sec Loss 2.7145 LearningRate 0.0197 Epoch: 15 Global Step: 156860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:27:45,476-Speed 5396.36 samples/sec Loss 2.6971 LearningRate 0.0197 Epoch: 15 Global Step: 156870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:27:53,004-Speed 5441.80 samples/sec Loss 2.7067 LearningRate 0.0197 Epoch: 15 Global Step: 156880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:28:00,538-Speed 5437.25 samples/sec Loss 2.7056 LearningRate 0.0197 Epoch: 15 Global Step: 156890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:28:08,045-Speed 5457.22 samples/sec Loss 2.6676 LearningRate 0.0197 Epoch: 15 Global Step: 156900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:28:15,517-Speed 5482.61 samples/sec Loss 2.6838 LearningRate 0.0197 Epoch: 15 Global Step: 156910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:28:23,017-Speed 5461.66 samples/sec Loss 2.6728 LearningRate 0.0197 Epoch: 15 Global Step: 156920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:28:30,610-Speed 5394.95 samples/sec Loss 2.6844 LearningRate 0.0197 Epoch: 15 Global Step: 156930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:28:38,096-Speed 5472.67 samples/sec Loss 2.7045 LearningRate 0.0197 Epoch: 15 Global Step: 156940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:28:45,573-Speed 5478.90 samples/sec Loss 2.7089 LearningRate 0.0197 Epoch: 15 Global Step: 156950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:28:53,121-Speed 5427.70 samples/sec Loss 2.6798 LearningRate 0.0196 Epoch: 15 Global Step: 156960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:29:00,720-Speed 5390.95 samples/sec Loss 2.6932 LearningRate 0.0196 Epoch: 15 Global Step: 156970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:29:08,302-Speed 5402.87 samples/sec Loss 2.6713 LearningRate 0.0196 Epoch: 15 Global Step: 156980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:29:15,858-Speed 5421.95 samples/sec Loss 2.7151 LearningRate 0.0196 Epoch: 15 Global Step: 156990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:29:23,334-Speed 5479.00 samples/sec Loss 2.6869 LearningRate 0.0196 Epoch: 15 Global Step: 157000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:29:30,822-Speed 5471.48 samples/sec Loss 2.7259 LearningRate 0.0196 Epoch: 15 Global Step: 157010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:29:38,307-Speed 5472.77 samples/sec Loss 2.7280 LearningRate 0.0196 Epoch: 15 Global Step: 157020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 06:29:45,899-Speed 5396.09 samples/sec Loss 2.7156 LearningRate 0.0196 Epoch: 15 Global Step: 157030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-09 06:29:53,422-Speed 5445.44 samples/sec Loss 2.6937 LearningRate 0.0196 Epoch: 15 Global Step: 157040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:30:00,888-Speed 5486.66 samples/sec Loss 2.6860 LearningRate 0.0196 Epoch: 15 Global Step: 157050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:30:08,558-Speed 5341.12 samples/sec Loss 2.6911 LearningRate 0.0196 Epoch: 15 Global Step: 157060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:30:16,033-Speed 5480.68 samples/sec Loss 2.6842 LearningRate 0.0196 Epoch: 15 Global Step: 157070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:30:23,597-Speed 5415.32 samples/sec Loss 2.7061 LearningRate 0.0196 Epoch: 15 Global Step: 157080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:30:31,228-Speed 5368.98 samples/sec Loss 2.6925 LearningRate 0.0195 Epoch: 15 Global Step: 157090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-09 06:30:38,731-Speed 5459.32 samples/sec Loss 2.6568 LearningRate 0.0195 Epoch: 15 Global Step: 157100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:30:46,184-Speed 5496.97 samples/sec Loss 2.7137 LearningRate 0.0195 Epoch: 15 Global Step: 157110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:30:53,691-Speed 5456.94 samples/sec Loss 2.7285 LearningRate 0.0195 Epoch: 15 Global Step: 157120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:31:01,167-Speed 5479.92 samples/sec Loss 2.6652 LearningRate 0.0195 Epoch: 15 Global Step: 157130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:31:08,638-Speed 5482.73 samples/sec Loss 2.7071 LearningRate 0.0195 Epoch: 15 Global Step: 157140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:31:16,113-Speed 5480.54 samples/sec Loss 2.7020 LearningRate 0.0195 Epoch: 15 Global Step: 157150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:31:23,544-Speed 5513.39 samples/sec Loss 2.6968 LearningRate 0.0195 Epoch: 15 Global Step: 157160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:31:31,138-Speed 5393.79 samples/sec Loss 2.7156 LearningRate 0.0195 Epoch: 15 Global Step: 157170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-09 06:31:38,715-Speed 5407.09 samples/sec Loss 2.6671 LearningRate 0.0195 Epoch: 15 Global Step: 157180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:31:46,229-Speed 5451.82 samples/sec Loss 2.7296 LearningRate 0.0195 Epoch: 15 Global Step: 157190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:31:53,690-Speed 5490.49 samples/sec Loss 2.6679 LearningRate 0.0195 Epoch: 15 Global Step: 157200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:32:01,187-Speed 5464.46 samples/sec Loss 2.6810 LearningRate 0.0195 Epoch: 15 Global Step: 157210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:32:08,740-Speed 5423.43 samples/sec Loss 2.6876 LearningRate 0.0194 Epoch: 15 Global Step: 157220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:32:16,307-Speed 5413.70 samples/sec Loss 2.6649 LearningRate 0.0194 Epoch: 15 Global Step: 157230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:32:23,808-Speed 5461.37 samples/sec Loss 2.6431 LearningRate 0.0194 Epoch: 15 Global Step: 157240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:32:31,414-Speed 5385.77 samples/sec Loss 2.6608 LearningRate 0.0194 Epoch: 15 Global Step: 157250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:32:38,974-Speed 5418.84 samples/sec Loss 2.7100 LearningRate 0.0194 Epoch: 15 Global Step: 157260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:32:46,482-Speed 5456.06 samples/sec Loss 2.6841 LearningRate 0.0194 Epoch: 15 Global Step: 157270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:32:54,056-Speed 5408.90 samples/sec Loss 2.6461 LearningRate 0.0194 Epoch: 15 Global Step: 157280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:33:01,594-Speed 5433.84 samples/sec Loss 2.7110 LearningRate 0.0194 Epoch: 15 Global Step: 157290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:33:09,057-Speed 5489.86 samples/sec Loss 2.6930 LearningRate 0.0194 Epoch: 15 Global Step: 157300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 06:33:16,590-Speed 5437.62 samples/sec Loss 2.6955 LearningRate 0.0194 Epoch: 15 Global Step: 157310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 06:33:24,119-Speed 5441.19 samples/sec Loss 2.6709 LearningRate 0.0194 Epoch: 15 Global Step: 157320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 06:33:31,630-Speed 5453.92 samples/sec Loss 2.6685 LearningRate 0.0194 Epoch: 15 Global Step: 157330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 06:33:39,160-Speed 5440.35 samples/sec Loss 2.6696 LearningRate 0.0194 Epoch: 15 Global Step: 157340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:33:46,726-Speed 5414.71 samples/sec Loss 2.6795 LearningRate 0.0193 Epoch: 15 Global Step: 157350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:33:54,343-Speed 5378.23 samples/sec Loss 2.6387 LearningRate 0.0193 Epoch: 15 Global Step: 157360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:34:01,842-Speed 5462.30 samples/sec Loss 2.6927 LearningRate 0.0193 Epoch: 15 Global Step: 157370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:34:09,417-Speed 5408.33 samples/sec Loss 2.7032 LearningRate 0.0193 Epoch: 15 Global Step: 157380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:34:16,988-Speed 5410.96 samples/sec Loss 2.6454 LearningRate 0.0193 Epoch: 15 Global Step: 157390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:34:24,472-Speed 5473.97 samples/sec Loss 2.6507 LearningRate 0.0193 Epoch: 15 Global Step: 157400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:34:31,981-Speed 5454.99 samples/sec Loss 2.6542 LearningRate 0.0193 Epoch: 15 Global Step: 157410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:34:39,500-Speed 5448.76 samples/sec Loss 2.6983 LearningRate 0.0193 Epoch: 15 Global Step: 157420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:34:46,959-Speed 5492.42 samples/sec Loss 2.6530 LearningRate 0.0193 Epoch: 15 Global Step: 157430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:34:54,578-Speed 5376.20 samples/sec Loss 2.6894 LearningRate 0.0193 Epoch: 15 Global Step: 157440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 06:35:02,105-Speed 5442.14 samples/sec Loss 2.6531 LearningRate 0.0193 Epoch: 15 Global Step: 157450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 06:35:09,580-Speed 5480.41 samples/sec Loss 2.6327 LearningRate 0.0193 Epoch: 15 Global Step: 157460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:35:17,140-Speed 5419.38 samples/sec Loss 2.6231 LearningRate 0.0193 Epoch: 15 Global Step: 157470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:35:24,693-Speed 5423.42 samples/sec Loss 2.7010 LearningRate 0.0192 Epoch: 15 Global Step: 157480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:35:32,253-Speed 5418.42 samples/sec Loss 2.6861 LearningRate 0.0192 Epoch: 15 Global Step: 157490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:35:39,772-Speed 5448.88 samples/sec Loss 2.6861 LearningRate 0.0192 Epoch: 15 Global Step: 157500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:35:47,358-Speed 5400.28 samples/sec Loss 2.6850 LearningRate 0.0192 Epoch: 15 Global Step: 157510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:35:54,867-Speed 5455.19 samples/sec Loss 2.6396 LearningRate 0.0192 Epoch: 15 Global Step: 157520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:36:02,288-Speed 5519.85 samples/sec Loss 2.6783 LearningRate 0.0192 Epoch: 15 Global Step: 157530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:36:09,751-Speed 5489.70 samples/sec Loss 2.6788 LearningRate 0.0192 Epoch: 15 Global Step: 157540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:36:17,261-Speed 5454.38 samples/sec Loss 2.6458 LearningRate 0.0192 Epoch: 15 Global Step: 157550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:36:24,696-Speed 5509.86 samples/sec Loss 2.6629 LearningRate 0.0192 Epoch: 15 Global Step: 157560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:36:32,230-Speed 5436.95 samples/sec Loss 2.6904 LearningRate 0.0192 Epoch: 15 Global Step: 157570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:36:39,789-Speed 5419.95 samples/sec Loss 2.6148 LearningRate 0.0192 Epoch: 15 Global Step: 157580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:36:47,230-Speed 5505.63 samples/sec Loss 2.6591 LearningRate 0.0192 Epoch: 15 Global Step: 157590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:36:54,700-Speed 5483.68 samples/sec Loss 2.6713 LearningRate 0.0192 Epoch: 15 Global Step: 157600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:37:02,222-Speed 5445.82 samples/sec Loss 2.6625 LearningRate 0.0191 Epoch: 15 Global Step: 157610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:37:09,761-Speed 5433.15 samples/sec Loss 2.6626 LearningRate 0.0191 Epoch: 15 Global Step: 157620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:37:17,221-Speed 5492.00 samples/sec Loss 2.6529 LearningRate 0.0191 Epoch: 15 Global Step: 157630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:37:24,679-Speed 5492.81 samples/sec Loss 2.6636 LearningRate 0.0191 Epoch: 15 Global Step: 157640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:37:32,150-Speed 5482.96 samples/sec Loss 2.6425 LearningRate 0.0191 Epoch: 15 Global Step: 157650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:37:39,665-Speed 5450.72 samples/sec Loss 2.6635 LearningRate 0.0191 Epoch: 15 Global Step: 157660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:37:47,167-Speed 5461.25 samples/sec Loss 2.6435 LearningRate 0.0191 Epoch: 15 Global Step: 157670 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 06:37:54,681-Speed 5451.62 samples/sec Loss 2.6509 LearningRate 0.0191 Epoch: 15 Global Step: 157680 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 06:38:02,153-Speed 5482.84 samples/sec Loss 2.6523 LearningRate 0.0191 Epoch: 15 Global Step: 157690 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 06:38:09,694-Speed 5431.51 samples/sec Loss 2.6868 LearningRate 0.0191 Epoch: 15 Global Step: 157700 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 06:38:17,213-Speed 5449.22 samples/sec Loss 2.6719 LearningRate 0.0191 Epoch: 15 Global Step: 157710 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 06:38:24,783-Speed 5411.49 samples/sec Loss 2.6649 LearningRate 0.0191 Epoch: 15 Global Step: 157720 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 06:38:32,290-Speed 5456.21 samples/sec Loss 2.6921 LearningRate 0.0191 Epoch: 15 Global Step: 157730 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 06:38:39,781-Speed 5468.71 samples/sec Loss 2.6698 LearningRate 0.0190 Epoch: 15 Global Step: 157740 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 06:38:47,292-Speed 5454.19 samples/sec Loss 2.6034 LearningRate 0.0190 Epoch: 15 Global Step: 157750 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 06:38:54,899-Speed 5385.27 samples/sec Loss 2.6890 LearningRate 0.0190 Epoch: 15 Global Step: 157760 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 06:39:02,419-Speed 5447.52 samples/sec Loss 2.6482 LearningRate 0.0190 Epoch: 15 Global Step: 157770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:39:09,893-Speed 5481.02 samples/sec Loss 2.6940 LearningRate 0.0190 Epoch: 15 Global Step: 157780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:39:17,387-Speed 5466.51 samples/sec Loss 2.6821 LearningRate 0.0190 Epoch: 15 Global Step: 157790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:39:24,841-Speed 5496.09 samples/sec Loss 2.6425 LearningRate 0.0190 Epoch: 15 Global Step: 157800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:39:32,310-Speed 5484.52 samples/sec Loss 2.6290 LearningRate 0.0190 Epoch: 15 Global Step: 157810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:39:39,815-Speed 5457.83 samples/sec Loss 2.6664 LearningRate 0.0190 Epoch: 15 Global Step: 157820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:39:47,356-Speed 5432.71 samples/sec Loss 2.6543 LearningRate 0.0190 Epoch: 15 Global Step: 157830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:39:54,862-Speed 5457.84 samples/sec Loss 2.6452 LearningRate 0.0190 Epoch: 15 Global Step: 157840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:40:02,379-Speed 5449.55 samples/sec Loss 2.6161 LearningRate 0.0190 Epoch: 15 Global Step: 157850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:40:09,815-Speed 5508.55 samples/sec Loss 2.6655 LearningRate 0.0190 Epoch: 15 Global Step: 157860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:40:17,381-Speed 5414.47 samples/sec Loss 2.6310 LearningRate 0.0189 Epoch: 15 Global Step: 157870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:40:24,907-Speed 5443.54 samples/sec Loss 2.6280 LearningRate 0.0189 Epoch: 15 Global Step: 157880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:40:32,450-Speed 5431.17 samples/sec Loss 2.6671 LearningRate 0.0189 Epoch: 15 Global Step: 157890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:40:39,940-Speed 5468.83 samples/sec Loss 2.6279 LearningRate 0.0189 Epoch: 15 Global Step: 157900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:40:47,415-Speed 5480.87 samples/sec Loss 2.6593 LearningRate 0.0189 Epoch: 15 Global Step: 157910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:40:54,937-Speed 5445.56 samples/sec Loss 2.6461 LearningRate 0.0189 Epoch: 15 Global Step: 157920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:41:02,486-Speed 5426.38 samples/sec Loss 2.6625 LearningRate 0.0189 Epoch: 15 Global Step: 157930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:41:10,006-Speed 5447.85 samples/sec Loss 2.6636 LearningRate 0.0189 Epoch: 15 Global Step: 157940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:41:17,524-Speed 5449.24 samples/sec Loss 2.6397 LearningRate 0.0189 Epoch: 15 Global Step: 157950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:41:25,051-Speed 5442.80 samples/sec Loss 2.6489 LearningRate 0.0189 Epoch: 15 Global Step: 157960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:41:32,640-Speed 5397.54 samples/sec Loss 2.6957 LearningRate 0.0189 Epoch: 15 Global Step: 157970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:41:40,127-Speed 5471.51 samples/sec Loss 2.6350 LearningRate 0.0189 Epoch: 15 Global Step: 157980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:41:47,813-Speed 5329.94 samples/sec Loss 2.6114 LearningRate 0.0189 Epoch: 15 Global Step: 157990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:41:55,325-Speed 5453.37 samples/sec Loss 2.6101 LearningRate 0.0188 Epoch: 15 Global Step: 158000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:42:39,136-[lfw][158000]XNorm: 23.097801 Training: 2022-01-09 06:42:39,136-[lfw][158000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 06:42:39,137-[lfw][158000]Accuracy-Highest: 0.99833 Training: 2022-01-09 06:43:29,973-[cfp_fp][158000]XNorm: 21.881999 Training: 2022-01-09 06:43:29,974-[cfp_fp][158000]Accuracy-Flip: 0.99129+-0.00463 Training: 2022-01-09 06:43:29,974-[cfp_fp][158000]Accuracy-Highest: 0.99371 Training: 2022-01-09 06:44:13,606-[agedb_30][158000]XNorm: 23.452830 Training: 2022-01-09 06:44:13,607-[agedb_30][158000]Accuracy-Flip: 0.98167+-0.00796 Training: 2022-01-09 06:44:13,607-[agedb_30][158000]Accuracy-Highest: 0.98217 Training: 2022-01-09 06:44:21,219-Speed 280.75 samples/sec Loss 2.6181 LearningRate 0.0188 Epoch: 15 Global Step: 158010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:44:28,671-Speed 5497.01 samples/sec Loss 2.6485 LearningRate 0.0188 Epoch: 15 Global Step: 158020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:44:36,145-Speed 5480.83 samples/sec Loss 2.6247 LearningRate 0.0188 Epoch: 15 Global Step: 158030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:44:43,614-Speed 5485.20 samples/sec Loss 2.6345 LearningRate 0.0188 Epoch: 15 Global Step: 158040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:44:51,201-Speed 5399.19 samples/sec Loss 2.6478 LearningRate 0.0188 Epoch: 15 Global Step: 158050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:44:58,682-Speed 5476.08 samples/sec Loss 2.6161 LearningRate 0.0188 Epoch: 15 Global Step: 158060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:45:06,278-Speed 5393.16 samples/sec Loss 2.6512 LearningRate 0.0188 Epoch: 15 Global Step: 158070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:45:13,737-Speed 5491.89 samples/sec Loss 2.6512 LearningRate 0.0188 Epoch: 15 Global Step: 158080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 06:45:21,259-Speed 5446.43 samples/sec Loss 2.6361 LearningRate 0.0188 Epoch: 15 Global Step: 158090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:45:28,805-Speed 5428.79 samples/sec Loss 2.6319 LearningRate 0.0188 Epoch: 15 Global Step: 158100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:45:36,277-Speed 5482.70 samples/sec Loss 2.6444 LearningRate 0.0188 Epoch: 15 Global Step: 158110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:45:43,807-Speed 5440.10 samples/sec Loss 2.6203 LearningRate 0.0188 Epoch: 15 Global Step: 158120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:45:51,289-Speed 5474.90 samples/sec Loss 2.6280 LearningRate 0.0187 Epoch: 15 Global Step: 158130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:45:58,813-Speed 5444.70 samples/sec Loss 2.6721 LearningRate 0.0187 Epoch: 15 Global Step: 158140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:46:06,438-Speed 5372.85 samples/sec Loss 2.6474 LearningRate 0.0187 Epoch: 15 Global Step: 158150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:46:14,059-Speed 5374.85 samples/sec Loss 2.6596 LearningRate 0.0187 Epoch: 15 Global Step: 158160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:46:21,644-Speed 5400.60 samples/sec Loss 2.6522 LearningRate 0.0187 Epoch: 15 Global Step: 158170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:46:29,148-Speed 5459.68 samples/sec Loss 2.6298 LearningRate 0.0187 Epoch: 15 Global Step: 158180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:46:36,691-Speed 5430.54 samples/sec Loss 2.6065 LearningRate 0.0187 Epoch: 15 Global Step: 158190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:46:44,150-Speed 5492.64 samples/sec Loss 2.6023 LearningRate 0.0187 Epoch: 15 Global Step: 158200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:46:51,670-Speed 5447.30 samples/sec Loss 2.6113 LearningRate 0.0187 Epoch: 15 Global Step: 158210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:46:59,191-Speed 5446.07 samples/sec Loss 2.6411 LearningRate 0.0187 Epoch: 15 Global Step: 158220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:47:06,707-Speed 5450.64 samples/sec Loss 2.6494 LearningRate 0.0187 Epoch: 15 Global Step: 158230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:47:14,271-Speed 5416.52 samples/sec Loss 2.5733 LearningRate 0.0187 Epoch: 15 Global Step: 158240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:47:21,709-Speed 5507.53 samples/sec Loss 2.6325 LearningRate 0.0187 Epoch: 15 Global Step: 158250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:47:29,201-Speed 5466.94 samples/sec Loss 2.6233 LearningRate 0.0186 Epoch: 15 Global Step: 158260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:47:36,796-Speed 5393.86 samples/sec Loss 2.6176 LearningRate 0.0186 Epoch: 15 Global Step: 158270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:47:44,367-Speed 5411.41 samples/sec Loss 2.6728 LearningRate 0.0186 Epoch: 15 Global Step: 158280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:47:51,837-Speed 5483.98 samples/sec Loss 2.6393 LearningRate 0.0186 Epoch: 15 Global Step: 158290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:47:59,317-Speed 5476.70 samples/sec Loss 2.6160 LearningRate 0.0186 Epoch: 15 Global Step: 158300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:48:06,780-Speed 5488.90 samples/sec Loss 2.6011 LearningRate 0.0186 Epoch: 15 Global Step: 158310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:48:14,234-Speed 5496.21 samples/sec Loss 2.6202 LearningRate 0.0186 Epoch: 15 Global Step: 158320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:48:21,702-Speed 5485.13 samples/sec Loss 2.6365 LearningRate 0.0186 Epoch: 15 Global Step: 158330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:48:29,192-Speed 5469.32 samples/sec Loss 2.6114 LearningRate 0.0186 Epoch: 15 Global Step: 158340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:48:36,687-Speed 5465.69 samples/sec Loss 2.6074 LearningRate 0.0186 Epoch: 15 Global Step: 158350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:48:44,165-Speed 5478.29 samples/sec Loss 2.6065 LearningRate 0.0186 Epoch: 15 Global Step: 158360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:48:51,669-Speed 5459.10 samples/sec Loss 2.6190 LearningRate 0.0186 Epoch: 15 Global Step: 158370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:48:59,112-Speed 5503.66 samples/sec Loss 2.6653 LearningRate 0.0186 Epoch: 15 Global Step: 158380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:49:06,596-Speed 5473.76 samples/sec Loss 2.6151 LearningRate 0.0186 Epoch: 15 Global Step: 158390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:49:14,111-Speed 5451.40 samples/sec Loss 2.5870 LearningRate 0.0185 Epoch: 15 Global Step: 158400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:49:21,518-Speed 5530.85 samples/sec Loss 2.6367 LearningRate 0.0185 Epoch: 15 Global Step: 158410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:49:29,025-Speed 5456.71 samples/sec Loss 2.5650 LearningRate 0.0185 Epoch: 15 Global Step: 158420 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:49:36,523-Speed 5463.93 samples/sec Loss 2.6205 LearningRate 0.0185 Epoch: 15 Global Step: 158430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:49:43,983-Speed 5491.31 samples/sec Loss 2.5795 LearningRate 0.0185 Epoch: 15 Global Step: 158440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:49:51,419-Speed 5508.91 samples/sec Loss 2.6457 LearningRate 0.0185 Epoch: 15 Global Step: 158450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:49:58,885-Speed 5487.05 samples/sec Loss 2.5945 LearningRate 0.0185 Epoch: 15 Global Step: 158460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:50:06,373-Speed 5470.99 samples/sec Loss 2.6625 LearningRate 0.0185 Epoch: 15 Global Step: 158470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:50:13,857-Speed 5473.77 samples/sec Loss 2.6173 LearningRate 0.0185 Epoch: 15 Global Step: 158480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:50:21,390-Speed 5438.51 samples/sec Loss 2.6077 LearningRate 0.0185 Epoch: 15 Global Step: 158490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:50:28,978-Speed 5398.20 samples/sec Loss 2.6242 LearningRate 0.0185 Epoch: 15 Global Step: 158500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:50:39,586-Speed 5491.85 samples/sec Loss 2.5984 LearningRate 0.0185 Epoch: 15 Global Step: 158510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:50:47,144-Speed 5420.46 samples/sec Loss 2.6044 LearningRate 0.0185 Epoch: 15 Global Step: 158520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:50:54,669-Speed 5443.54 samples/sec Loss 2.6383 LearningRate 0.0184 Epoch: 15 Global Step: 158530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:51:02,252-Speed 5402.13 samples/sec Loss 2.5823 LearningRate 0.0184 Epoch: 15 Global Step: 158540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:51:09,814-Speed 5417.02 samples/sec Loss 2.6216 LearningRate 0.0184 Epoch: 15 Global Step: 158550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:51:17,311-Speed 5464.94 samples/sec Loss 2.5893 LearningRate 0.0184 Epoch: 15 Global Step: 158560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:51:24,773-Speed 5489.78 samples/sec Loss 2.6455 LearningRate 0.0184 Epoch: 15 Global Step: 158570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:51:32,354-Speed 5403.79 samples/sec Loss 2.6006 LearningRate 0.0184 Epoch: 15 Global Step: 158580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:51:39,879-Speed 5443.89 samples/sec Loss 2.5976 LearningRate 0.0184 Epoch: 15 Global Step: 158590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:51:47,453-Speed 5408.41 samples/sec Loss 2.6152 LearningRate 0.0184 Epoch: 15 Global Step: 158600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:51:54,923-Speed 5484.33 samples/sec Loss 2.6183 LearningRate 0.0184 Epoch: 15 Global Step: 158610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:52:02,463-Speed 5432.55 samples/sec Loss 2.5950 LearningRate 0.0184 Epoch: 15 Global Step: 158620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:52:09,956-Speed 5467.29 samples/sec Loss 2.6590 LearningRate 0.0184 Epoch: 15 Global Step: 158630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:52:17,464-Speed 5456.15 samples/sec Loss 2.6316 LearningRate 0.0184 Epoch: 15 Global Step: 158640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:52:24,979-Speed 5451.22 samples/sec Loss 2.6218 LearningRate 0.0184 Epoch: 15 Global Step: 158650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:52:32,592-Speed 5380.91 samples/sec Loss 2.6567 LearningRate 0.0183 Epoch: 15 Global Step: 158660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:52:40,109-Speed 5450.43 samples/sec Loss 2.6563 LearningRate 0.0183 Epoch: 15 Global Step: 158670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:52:47,560-Speed 5497.56 samples/sec Loss 2.6340 LearningRate 0.0183 Epoch: 15 Global Step: 158680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:52:55,047-Speed 5471.74 samples/sec Loss 2.6436 LearningRate 0.0183 Epoch: 15 Global Step: 158690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:53:02,497-Speed 5498.53 samples/sec Loss 2.6491 LearningRate 0.0183 Epoch: 15 Global Step: 158700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:53:10,010-Speed 5452.51 samples/sec Loss 2.6120 LearningRate 0.0183 Epoch: 15 Global Step: 158710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:53:17,555-Speed 5429.43 samples/sec Loss 2.6350 LearningRate 0.0183 Epoch: 15 Global Step: 158720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:53:25,034-Speed 5477.44 samples/sec Loss 2.6494 LearningRate 0.0183 Epoch: 15 Global Step: 158730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:53:32,161-Speed 5747.84 samples/sec Loss 2.5689 LearningRate 0.0183 Epoch: 15 Global Step: 158740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:53:39,231-Speed 5794.78 samples/sec Loss 2.5945 LearningRate 0.0183 Epoch: 15 Global Step: 158750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:53:46,576-Speed 5577.31 samples/sec Loss 2.6040 LearningRate 0.0183 Epoch: 15 Global Step: 158760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:53:54,246-Speed 5340.43 samples/sec Loss 2.6404 LearningRate 0.0183 Epoch: 15 Global Step: 158770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:54:01,814-Speed 5413.13 samples/sec Loss 2.6040 LearningRate 0.0183 Epoch: 15 Global Step: 158780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:54:09,503-Speed 5328.31 samples/sec Loss 2.5993 LearningRate 0.0182 Epoch: 15 Global Step: 158790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:54:17,017-Speed 5451.77 samples/sec Loss 2.6203 LearningRate 0.0182 Epoch: 15 Global Step: 158800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:54:24,640-Speed 5373.62 samples/sec Loss 2.5780 LearningRate 0.0182 Epoch: 15 Global Step: 158810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:54:32,162-Speed 5446.07 samples/sec Loss 2.5981 LearningRate 0.0182 Epoch: 15 Global Step: 158820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:54:39,685-Speed 5445.87 samples/sec Loss 2.6230 LearningRate 0.0182 Epoch: 15 Global Step: 158830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:54:47,251-Speed 5413.56 samples/sec Loss 2.6034 LearningRate 0.0182 Epoch: 15 Global Step: 158840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:54:54,784-Speed 5438.78 samples/sec Loss 2.5549 LearningRate 0.0182 Epoch: 15 Global Step: 158850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:55:02,241-Speed 5493.26 samples/sec Loss 2.5928 LearningRate 0.0182 Epoch: 15 Global Step: 158860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:55:09,781-Speed 5432.86 samples/sec Loss 2.6184 LearningRate 0.0182 Epoch: 15 Global Step: 158870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:55:17,245-Speed 5488.15 samples/sec Loss 2.6160 LearningRate 0.0182 Epoch: 15 Global Step: 158880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:55:24,701-Speed 5494.88 samples/sec Loss 2.5998 LearningRate 0.0182 Epoch: 15 Global Step: 158890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:55:32,184-Speed 5474.60 samples/sec Loss 2.5483 LearningRate 0.0182 Epoch: 15 Global Step: 158900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:55:39,647-Speed 5489.12 samples/sec Loss 2.5805 LearningRate 0.0182 Epoch: 15 Global Step: 158910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:55:47,117-Speed 5483.53 samples/sec Loss 2.5780 LearningRate 0.0182 Epoch: 15 Global Step: 158920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:55:54,642-Speed 5443.54 samples/sec Loss 2.5677 LearningRate 0.0181 Epoch: 15 Global Step: 158930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:56:02,175-Speed 5438.96 samples/sec Loss 2.5641 LearningRate 0.0181 Epoch: 15 Global Step: 158940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:56:09,620-Speed 5501.73 samples/sec Loss 2.5975 LearningRate 0.0181 Epoch: 15 Global Step: 158950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:56:17,176-Speed 5422.20 samples/sec Loss 2.6245 LearningRate 0.0181 Epoch: 15 Global Step: 158960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:56:24,636-Speed 5491.22 samples/sec Loss 2.6301 LearningRate 0.0181 Epoch: 15 Global Step: 158970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:56:32,064-Speed 5515.23 samples/sec Loss 2.5671 LearningRate 0.0181 Epoch: 15 Global Step: 158980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:56:39,549-Speed 5472.76 samples/sec Loss 2.6474 LearningRate 0.0181 Epoch: 15 Global Step: 158990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:56:47,037-Speed 5471.07 samples/sec Loss 2.5845 LearningRate 0.0181 Epoch: 15 Global Step: 159000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:56:54,523-Speed 5472.34 samples/sec Loss 2.6017 LearningRate 0.0181 Epoch: 15 Global Step: 159010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:57:01,996-Speed 5482.08 samples/sec Loss 2.5777 LearningRate 0.0181 Epoch: 15 Global Step: 159020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:57:09,446-Speed 5498.78 samples/sec Loss 2.6006 LearningRate 0.0181 Epoch: 15 Global Step: 159030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:57:16,936-Speed 5469.39 samples/sec Loss 2.5700 LearningRate 0.0181 Epoch: 15 Global Step: 159040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:57:24,524-Speed 5398.54 samples/sec Loss 2.5792 LearningRate 0.0181 Epoch: 15 Global Step: 159050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:57:32,082-Speed 5420.04 samples/sec Loss 2.5553 LearningRate 0.0180 Epoch: 15 Global Step: 159060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:57:39,697-Speed 5380.06 samples/sec Loss 2.5614 LearningRate 0.0180 Epoch: 15 Global Step: 159070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:57:47,263-Speed 5414.32 samples/sec Loss 2.5815 LearningRate 0.0180 Epoch: 15 Global Step: 159080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:57:54,786-Speed 5445.76 samples/sec Loss 2.5703 LearningRate 0.0180 Epoch: 15 Global Step: 159090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:58:02,248-Speed 5489.22 samples/sec Loss 2.5887 LearningRate 0.0180 Epoch: 15 Global Step: 159100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:58:09,755-Speed 5457.22 samples/sec Loss 2.5462 LearningRate 0.0180 Epoch: 15 Global Step: 159110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:58:17,254-Speed 5462.53 samples/sec Loss 2.5856 LearningRate 0.0180 Epoch: 15 Global Step: 159120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:58:24,849-Speed 5394.37 samples/sec Loss 2.5109 LearningRate 0.0180 Epoch: 15 Global Step: 159130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:58:32,378-Speed 5440.67 samples/sec Loss 2.5899 LearningRate 0.0180 Epoch: 15 Global Step: 159140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:58:39,910-Speed 5439.37 samples/sec Loss 2.5509 LearningRate 0.0180 Epoch: 15 Global Step: 159150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:58:47,377-Speed 5486.15 samples/sec Loss 2.6033 LearningRate 0.0180 Epoch: 15 Global Step: 159160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:58:54,845-Speed 5484.93 samples/sec Loss 2.5759 LearningRate 0.0180 Epoch: 15 Global Step: 159170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:59:02,404-Speed 5419.35 samples/sec Loss 2.5539 LearningRate 0.0180 Epoch: 15 Global Step: 159180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:59:09,904-Speed 5462.64 samples/sec Loss 2.5827 LearningRate 0.0179 Epoch: 15 Global Step: 159190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:59:17,443-Speed 5433.29 samples/sec Loss 2.5820 LearningRate 0.0179 Epoch: 15 Global Step: 159200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:59:24,962-Speed 5448.24 samples/sec Loss 2.5497 LearningRate 0.0179 Epoch: 15 Global Step: 159210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 06:59:32,394-Speed 5512.50 samples/sec Loss 2.5913 LearningRate 0.0179 Epoch: 15 Global Step: 159220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:59:39,869-Speed 5480.33 samples/sec Loss 2.5776 LearningRate 0.0179 Epoch: 15 Global Step: 159230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:59:47,361-Speed 5467.76 samples/sec Loss 2.6055 LearningRate 0.0179 Epoch: 15 Global Step: 159240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 06:59:54,904-Speed 5430.50 samples/sec Loss 2.5779 LearningRate 0.0179 Epoch: 15 Global Step: 159250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:00:02,405-Speed 5462.23 samples/sec Loss 2.5373 LearningRate 0.0179 Epoch: 15 Global Step: 159260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:00:10,083-Speed 5335.21 samples/sec Loss 2.5892 LearningRate 0.0179 Epoch: 15 Global Step: 159270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:00:17,554-Speed 5482.70 samples/sec Loss 2.5850 LearningRate 0.0179 Epoch: 15 Global Step: 159280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:00:25,046-Speed 5468.16 samples/sec Loss 2.6143 LearningRate 0.0179 Epoch: 15 Global Step: 159290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:00:32,547-Speed 5461.87 samples/sec Loss 2.5892 LearningRate 0.0179 Epoch: 15 Global Step: 159300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:00:40,051-Speed 5458.85 samples/sec Loss 2.5767 LearningRate 0.0179 Epoch: 15 Global Step: 159310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:00:47,579-Speed 5441.72 samples/sec Loss 2.6176 LearningRate 0.0179 Epoch: 15 Global Step: 159320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:00:55,392-Speed 5243.12 samples/sec Loss 2.5821 LearningRate 0.0178 Epoch: 15 Global Step: 159330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:01:02,980-Speed 5398.77 samples/sec Loss 2.6179 LearningRate 0.0178 Epoch: 15 Global Step: 159340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:01:10,476-Speed 5465.18 samples/sec Loss 2.5338 LearningRate 0.0178 Epoch: 15 Global Step: 159350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:01:17,976-Speed 5462.03 samples/sec Loss 2.5461 LearningRate 0.0178 Epoch: 15 Global Step: 159360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:01:25,515-Speed 5433.28 samples/sec Loss 2.5435 LearningRate 0.0178 Epoch: 15 Global Step: 159370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:01:33,087-Speed 5410.19 samples/sec Loss 2.5684 LearningRate 0.0178 Epoch: 15 Global Step: 159380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:01:40,585-Speed 5463.55 samples/sec Loss 2.5574 LearningRate 0.0178 Epoch: 15 Global Step: 159390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:01:48,063-Speed 5478.41 samples/sec Loss 2.5939 LearningRate 0.0178 Epoch: 15 Global Step: 159400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:01:55,493-Speed 5513.04 samples/sec Loss 2.5558 LearningRate 0.0178 Epoch: 15 Global Step: 159410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:02:02,970-Speed 5478.80 samples/sec Loss 2.5358 LearningRate 0.0178 Epoch: 15 Global Step: 159420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:02:10,418-Speed 5500.46 samples/sec Loss 2.5737 LearningRate 0.0178 Epoch: 15 Global Step: 159430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:02:18,039-Speed 5375.10 samples/sec Loss 2.5521 LearningRate 0.0178 Epoch: 15 Global Step: 159440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:02:25,543-Speed 5459.05 samples/sec Loss 2.5683 LearningRate 0.0178 Epoch: 15 Global Step: 159450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:02:32,988-Speed 5502.98 samples/sec Loss 2.5914 LearningRate 0.0177 Epoch: 15 Global Step: 159460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:02:40,538-Speed 5425.73 samples/sec Loss 2.5597 LearningRate 0.0177 Epoch: 15 Global Step: 159470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:02:48,041-Speed 5459.83 samples/sec Loss 2.5177 LearningRate 0.0177 Epoch: 15 Global Step: 159480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:02:55,597-Speed 5421.54 samples/sec Loss 2.5279 LearningRate 0.0177 Epoch: 15 Global Step: 159490 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:03:03,127-Speed 5440.54 samples/sec Loss 2.5470 LearningRate 0.0177 Epoch: 15 Global Step: 159500 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:03:10,702-Speed 5408.07 samples/sec Loss 2.5209 LearningRate 0.0177 Epoch: 15 Global Step: 159510 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:03:18,210-Speed 5456.49 samples/sec Loss 2.5650 LearningRate 0.0177 Epoch: 15 Global Step: 159520 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:03:25,656-Speed 5501.10 samples/sec Loss 2.5507 LearningRate 0.0177 Epoch: 15 Global Step: 159530 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:03:33,328-Speed 5339.20 samples/sec Loss 2.5808 LearningRate 0.0177 Epoch: 15 Global Step: 159540 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:03:40,840-Speed 5454.03 samples/sec Loss 2.5614 LearningRate 0.0177 Epoch: 15 Global Step: 159550 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:03:48,344-Speed 5458.91 samples/sec Loss 2.5272 LearningRate 0.0177 Epoch: 15 Global Step: 159560 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:03:55,849-Speed 5458.19 samples/sec Loss 2.5361 LearningRate 0.0177 Epoch: 15 Global Step: 159570 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:04:03,441-Speed 5395.52 samples/sec Loss 2.6053 LearningRate 0.0177 Epoch: 15 Global Step: 159580 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:04:10,946-Speed 5458.72 samples/sec Loss 2.5960 LearningRate 0.0177 Epoch: 15 Global Step: 159590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:04:18,458-Speed 5453.65 samples/sec Loss 2.5378 LearningRate 0.0176 Epoch: 15 Global Step: 159600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:04:25,966-Speed 5455.76 samples/sec Loss 2.5592 LearningRate 0.0176 Epoch: 15 Global Step: 159610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:04:33,479-Speed 5452.18 samples/sec Loss 2.5714 LearningRate 0.0176 Epoch: 15 Global Step: 159620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:04:41,067-Speed 5398.85 samples/sec Loss 2.5590 LearningRate 0.0176 Epoch: 15 Global Step: 159630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:04:48,573-Speed 5457.68 samples/sec Loss 2.5401 LearningRate 0.0176 Epoch: 15 Global Step: 159640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:04:56,015-Speed 5504.86 samples/sec Loss 2.5467 LearningRate 0.0176 Epoch: 15 Global Step: 159650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:05:03,477-Speed 5489.38 samples/sec Loss 2.5956 LearningRate 0.0176 Epoch: 15 Global Step: 159660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:05:10,974-Speed 5464.38 samples/sec Loss 2.5997 LearningRate 0.0176 Epoch: 15 Global Step: 159670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:05:18,433-Speed 5492.53 samples/sec Loss 2.5565 LearningRate 0.0176 Epoch: 15 Global Step: 159680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:05:25,898-Speed 5487.56 samples/sec Loss 2.5556 LearningRate 0.0176 Epoch: 15 Global Step: 159690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:05:33,374-Speed 5478.78 samples/sec Loss 2.5395 LearningRate 0.0176 Epoch: 15 Global Step: 159700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:05:40,857-Speed 5475.10 samples/sec Loss 2.5733 LearningRate 0.0176 Epoch: 15 Global Step: 159710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:05:48,317-Speed 5491.31 samples/sec Loss 2.5321 LearningRate 0.0176 Epoch: 15 Global Step: 159720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:05:55,818-Speed 5461.27 samples/sec Loss 2.5093 LearningRate 0.0175 Epoch: 15 Global Step: 159730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:06:03,346-Speed 5441.64 samples/sec Loss 2.5547 LearningRate 0.0175 Epoch: 15 Global Step: 159740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:06:10,889-Speed 5430.84 samples/sec Loss 2.5153 LearningRate 0.0175 Epoch: 15 Global Step: 159750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:06:18,475-Speed 5400.53 samples/sec Loss 2.5380 LearningRate 0.0175 Epoch: 15 Global Step: 159760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:06:25,950-Speed 5480.50 samples/sec Loss 2.5244 LearningRate 0.0175 Epoch: 15 Global Step: 159770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:06:33,458-Speed 5455.75 samples/sec Loss 2.5597 LearningRate 0.0175 Epoch: 15 Global Step: 159780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:06:40,904-Speed 5501.76 samples/sec Loss 2.5485 LearningRate 0.0175 Epoch: 15 Global Step: 159790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:06:48,401-Speed 5464.19 samples/sec Loss 2.5207 LearningRate 0.0175 Epoch: 15 Global Step: 159800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:06:55,845-Speed 5503.06 samples/sec Loss 2.5861 LearningRate 0.0175 Epoch: 15 Global Step: 159810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:07:03,316-Speed 5483.57 samples/sec Loss 2.5754 LearningRate 0.0175 Epoch: 15 Global Step: 159820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:07:10,729-Speed 5525.95 samples/sec Loss 2.5337 LearningRate 0.0175 Epoch: 15 Global Step: 159830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:07:18,327-Speed 5391.80 samples/sec Loss 2.5601 LearningRate 0.0175 Epoch: 15 Global Step: 159840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:07:25,910-Speed 5402.22 samples/sec Loss 2.5623 LearningRate 0.0175 Epoch: 15 Global Step: 159850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:07:33,357-Speed 5500.89 samples/sec Loss 2.5684 LearningRate 0.0175 Epoch: 15 Global Step: 159860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:07:40,814-Speed 5493.59 samples/sec Loss 2.5308 LearningRate 0.0174 Epoch: 15 Global Step: 159870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:07:48,298-Speed 5473.74 samples/sec Loss 2.5709 LearningRate 0.0174 Epoch: 15 Global Step: 159880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:07:55,793-Speed 5466.15 samples/sec Loss 2.5462 LearningRate 0.0174 Epoch: 15 Global Step: 159890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:08:03,412-Speed 5376.44 samples/sec Loss 2.5624 LearningRate 0.0174 Epoch: 15 Global Step: 159900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:08:10,973-Speed 5417.89 samples/sec Loss 2.5516 LearningRate 0.0174 Epoch: 15 Global Step: 159910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:08:18,595-Speed 5374.50 samples/sec Loss 2.5693 LearningRate 0.0174 Epoch: 15 Global Step: 159920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:08:26,158-Speed 5416.88 samples/sec Loss 2.5341 LearningRate 0.0174 Epoch: 15 Global Step: 159930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:08:33,712-Speed 5423.22 samples/sec Loss 2.5428 LearningRate 0.0174 Epoch: 15 Global Step: 159940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:08:41,334-Speed 5374.21 samples/sec Loss 2.5422 LearningRate 0.0174 Epoch: 15 Global Step: 159950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:08:48,879-Speed 5429.58 samples/sec Loss 2.5074 LearningRate 0.0174 Epoch: 15 Global Step: 159960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:08:56,400-Speed 5446.75 samples/sec Loss 2.5452 LearningRate 0.0174 Epoch: 15 Global Step: 159970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:09:03,868-Speed 5485.70 samples/sec Loss 2.5132 LearningRate 0.0174 Epoch: 15 Global Step: 159980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:09:11,420-Speed 5424.41 samples/sec Loss 2.5474 LearningRate 0.0174 Epoch: 15 Global Step: 159990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:09:19,043-Speed 5373.88 samples/sec Loss 2.5358 LearningRate 0.0174 Epoch: 15 Global Step: 160000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:10:02,497-[lfw][160000]XNorm: 22.309747 Training: 2022-01-09 07:10:02,498-[lfw][160000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 07:10:02,498-[lfw][160000]Accuracy-Highest: 0.99833 Training: 2022-01-09 07:10:53,121-[cfp_fp][160000]XNorm: 21.000020 Training: 2022-01-09 07:10:53,122-[cfp_fp][160000]Accuracy-Flip: 0.99214+-0.00301 Training: 2022-01-09 07:10:53,122-[cfp_fp][160000]Accuracy-Highest: 0.99371 Training: 2022-01-09 07:11:36,776-[agedb_30][160000]XNorm: 22.272978 Training: 2022-01-09 07:11:36,776-[agedb_30][160000]Accuracy-Flip: 0.98133+-0.00670 Training: 2022-01-09 07:11:36,777-[agedb_30][160000]Accuracy-Highest: 0.98217 Training: 2022-01-09 07:11:44,052-Speed 282.47 samples/sec Loss 2.5251 LearningRate 0.0173 Epoch: 15 Global Step: 160010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:11:51,653-Speed 5389.71 samples/sec Loss 2.5380 LearningRate 0.0173 Epoch: 15 Global Step: 160020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:11:59,236-Speed 5402.58 samples/sec Loss 2.5441 LearningRate 0.0173 Epoch: 15 Global Step: 160030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:12:06,803-Speed 5413.29 samples/sec Loss 2.5343 LearningRate 0.0173 Epoch: 15 Global Step: 160040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:12:14,404-Speed 5389.75 samples/sec Loss 2.5359 LearningRate 0.0173 Epoch: 15 Global Step: 160050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:12:21,942-Speed 5434.37 samples/sec Loss 2.5564 LearningRate 0.0173 Epoch: 15 Global Step: 160060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:12:29,587-Speed 5358.76 samples/sec Loss 2.5517 LearningRate 0.0173 Epoch: 15 Global Step: 160070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:12:37,194-Speed 5385.66 samples/sec Loss 2.4695 LearningRate 0.0173 Epoch: 15 Global Step: 160080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:12:44,765-Speed 5410.63 samples/sec Loss 2.5183 LearningRate 0.0173 Epoch: 15 Global Step: 160090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:12:52,538-Speed 5270.31 samples/sec Loss 2.4992 LearningRate 0.0173 Epoch: 15 Global Step: 160100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:13:00,033-Speed 5465.60 samples/sec Loss 2.5386 LearningRate 0.0173 Epoch: 15 Global Step: 160110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:13:07,577-Speed 5430.05 samples/sec Loss 2.5469 LearningRate 0.0173 Epoch: 15 Global Step: 160120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:13:15,090-Speed 5453.02 samples/sec Loss 2.5226 LearningRate 0.0173 Epoch: 15 Global Step: 160130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:13:22,764-Speed 5338.19 samples/sec Loss 2.5490 LearningRate 0.0172 Epoch: 15 Global Step: 160140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:13:30,446-Speed 5332.80 samples/sec Loss 2.5280 LearningRate 0.0172 Epoch: 15 Global Step: 160150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:13:38,023-Speed 5406.59 samples/sec Loss 2.5181 LearningRate 0.0172 Epoch: 15 Global Step: 160160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:13:45,728-Speed 5316.81 samples/sec Loss 2.5416 LearningRate 0.0172 Epoch: 15 Global Step: 160170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:13:53,372-Speed 5359.54 samples/sec Loss 2.5259 LearningRate 0.0172 Epoch: 15 Global Step: 160180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:14:00,888-Speed 5450.12 samples/sec Loss 2.5073 LearningRate 0.0172 Epoch: 15 Global Step: 160190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:14:08,449-Speed 5418.20 samples/sec Loss 2.4935 LearningRate 0.0172 Epoch: 15 Global Step: 160200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:14:16,122-Speed 5338.58 samples/sec Loss 2.5479 LearningRate 0.0172 Epoch: 15 Global Step: 160210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:14:23,753-Speed 5368.95 samples/sec Loss 2.5334 LearningRate 0.0172 Epoch: 15 Global Step: 160220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:14:31,319-Speed 5414.21 samples/sec Loss 2.5193 LearningRate 0.0172 Epoch: 15 Global Step: 160230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:14:38,918-Speed 5390.59 samples/sec Loss 2.5009 LearningRate 0.0172 Epoch: 15 Global Step: 160240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:14:46,610-Speed 5325.20 samples/sec Loss 2.5005 LearningRate 0.0172 Epoch: 15 Global Step: 160250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:14:54,269-Speed 5349.03 samples/sec Loss 2.4973 LearningRate 0.0172 Epoch: 15 Global Step: 160260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:15:01,795-Speed 5443.21 samples/sec Loss 2.5340 LearningRate 0.0172 Epoch: 15 Global Step: 160270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:15:09,338-Speed 5430.78 samples/sec Loss 2.5314 LearningRate 0.0171 Epoch: 15 Global Step: 160280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:15:16,850-Speed 5453.55 samples/sec Loss 2.5098 LearningRate 0.0171 Epoch: 15 Global Step: 160290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:15:24,404-Speed 5422.71 samples/sec Loss 2.5203 LearningRate 0.0171 Epoch: 15 Global Step: 160300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 07:15:31,904-Speed 5462.05 samples/sec Loss 2.4985 LearningRate 0.0171 Epoch: 15 Global Step: 160310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 07:15:39,350-Speed 5501.37 samples/sec Loss 2.5263 LearningRate 0.0171 Epoch: 15 Global Step: 160320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:15:46,819-Speed 5484.84 samples/sec Loss 2.5159 LearningRate 0.0171 Epoch: 15 Global Step: 160330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:15:54,337-Speed 5448.77 samples/sec Loss 2.5110 LearningRate 0.0171 Epoch: 15 Global Step: 160340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:16:01,834-Speed 5464.90 samples/sec Loss 2.4978 LearningRate 0.0171 Epoch: 15 Global Step: 160350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:16:09,361-Speed 5442.04 samples/sec Loss 2.5523 LearningRate 0.0171 Epoch: 15 Global Step: 160360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:16:16,864-Speed 5459.77 samples/sec Loss 2.5253 LearningRate 0.0171 Epoch: 15 Global Step: 160370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:16:24,279-Speed 5524.37 samples/sec Loss 2.5095 LearningRate 0.0171 Epoch: 15 Global Step: 160380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:16:31,769-Speed 5469.61 samples/sec Loss 2.4992 LearningRate 0.0171 Epoch: 15 Global Step: 160390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:16:39,216-Speed 5500.74 samples/sec Loss 2.5486 LearningRate 0.0171 Epoch: 15 Global Step: 160400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:16:46,693-Speed 5479.06 samples/sec Loss 2.5356 LearningRate 0.0171 Epoch: 15 Global Step: 160410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:16:54,238-Speed 5429.38 samples/sec Loss 2.4796 LearningRate 0.0170 Epoch: 15 Global Step: 160420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 07:17:01,792-Speed 5422.99 samples/sec Loss 2.5049 LearningRate 0.0170 Epoch: 15 Global Step: 160430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:17:09,269-Speed 5479.18 samples/sec Loss 2.5046 LearningRate 0.0170 Epoch: 15 Global Step: 160440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:17:16,734-Speed 5487.76 samples/sec Loss 2.5264 LearningRate 0.0170 Epoch: 15 Global Step: 160450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:17:24,325-Speed 5396.43 samples/sec Loss 2.5331 LearningRate 0.0170 Epoch: 15 Global Step: 160460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:17:31,835-Speed 5455.45 samples/sec Loss 2.5396 LearningRate 0.0170 Epoch: 15 Global Step: 160470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:17:39,441-Speed 5385.69 samples/sec Loss 2.4968 LearningRate 0.0170 Epoch: 15 Global Step: 160480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:17:46,996-Speed 5422.24 samples/sec Loss 2.5324 LearningRate 0.0170 Epoch: 15 Global Step: 160490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:17:54,499-Speed 5459.92 samples/sec Loss 2.4951 LearningRate 0.0170 Epoch: 15 Global Step: 160500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:18:01,982-Speed 5474.63 samples/sec Loss 2.4606 LearningRate 0.0170 Epoch: 15 Global Step: 160510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:18:09,467-Speed 5472.52 samples/sec Loss 2.5144 LearningRate 0.0170 Epoch: 15 Global Step: 160520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:18:17,047-Speed 5404.76 samples/sec Loss 2.4892 LearningRate 0.0170 Epoch: 15 Global Step: 160530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:18:24,690-Speed 5359.37 samples/sec Loss 2.5157 LearningRate 0.0170 Epoch: 15 Global Step: 160540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:18:32,217-Speed 5443.10 samples/sec Loss 2.5137 LearningRate 0.0170 Epoch: 15 Global Step: 160550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:18:39,765-Speed 5427.23 samples/sec Loss 2.5310 LearningRate 0.0169 Epoch: 15 Global Step: 160560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:18:47,260-Speed 5465.72 samples/sec Loss 2.4695 LearningRate 0.0169 Epoch: 15 Global Step: 160570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:18:54,810-Speed 5425.59 samples/sec Loss 2.5029 LearningRate 0.0169 Epoch: 15 Global Step: 160580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:19:02,376-Speed 5414.80 samples/sec Loss 2.4934 LearningRate 0.0169 Epoch: 15 Global Step: 160590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:19:09,947-Speed 5410.58 samples/sec Loss 2.5044 LearningRate 0.0169 Epoch: 15 Global Step: 160600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:19:17,537-Speed 5397.10 samples/sec Loss 2.5122 LearningRate 0.0169 Epoch: 15 Global Step: 160610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:19:25,070-Speed 5437.60 samples/sec Loss 2.4956 LearningRate 0.0169 Epoch: 15 Global Step: 160620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:19:32,600-Speed 5440.77 samples/sec Loss 2.4946 LearningRate 0.0169 Epoch: 15 Global Step: 160630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:19:40,094-Speed 5466.40 samples/sec Loss 2.5384 LearningRate 0.0169 Epoch: 15 Global Step: 160640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:19:47,570-Speed 5479.39 samples/sec Loss 2.5268 LearningRate 0.0169 Epoch: 15 Global Step: 160650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:19:55,076-Speed 5457.33 samples/sec Loss 2.5205 LearningRate 0.0169 Epoch: 15 Global Step: 160660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:20:02,529-Speed 5497.08 samples/sec Loss 2.5077 LearningRate 0.0169 Epoch: 15 Global Step: 160670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:20:10,199-Speed 5341.11 samples/sec Loss 2.4960 LearningRate 0.0169 Epoch: 15 Global Step: 160680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:20:17,713-Speed 5451.82 samples/sec Loss 2.4930 LearningRate 0.0168 Epoch: 15 Global Step: 160690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:20:25,243-Speed 5440.27 samples/sec Loss 2.4636 LearningRate 0.0168 Epoch: 15 Global Step: 160700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:20:32,709-Speed 5486.44 samples/sec Loss 2.5302 LearningRate 0.0168 Epoch: 15 Global Step: 160710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:20:40,272-Speed 5417.38 samples/sec Loss 2.5090 LearningRate 0.0168 Epoch: 15 Global Step: 160720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:20:47,766-Speed 5465.68 samples/sec Loss 2.5093 LearningRate 0.0168 Epoch: 15 Global Step: 160730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:20:55,258-Speed 5468.08 samples/sec Loss 2.5115 LearningRate 0.0168 Epoch: 15 Global Step: 160740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:21:02,834-Speed 5407.24 samples/sec Loss 2.4925 LearningRate 0.0168 Epoch: 15 Global Step: 160750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:21:10,342-Speed 5456.46 samples/sec Loss 2.4835 LearningRate 0.0168 Epoch: 15 Global Step: 160760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:21:17,901-Speed 5419.48 samples/sec Loss 2.5025 LearningRate 0.0168 Epoch: 15 Global Step: 160770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:21:25,409-Speed 5456.04 samples/sec Loss 2.5362 LearningRate 0.0168 Epoch: 15 Global Step: 160780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:21:32,908-Speed 5462.14 samples/sec Loss 2.4484 LearningRate 0.0168 Epoch: 15 Global Step: 160790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:21:40,416-Speed 5456.91 samples/sec Loss 2.5030 LearningRate 0.0168 Epoch: 15 Global Step: 160800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:21:47,899-Speed 5474.07 samples/sec Loss 2.5094 LearningRate 0.0168 Epoch: 15 Global Step: 160810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:21:55,418-Speed 5448.37 samples/sec Loss 2.4731 LearningRate 0.0168 Epoch: 15 Global Step: 160820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:22:03,036-Speed 5377.87 samples/sec Loss 2.4856 LearningRate 0.0167 Epoch: 15 Global Step: 160830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:22:10,619-Speed 5402.74 samples/sec Loss 2.4557 LearningRate 0.0167 Epoch: 15 Global Step: 160840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:22:18,132-Speed 5452.48 samples/sec Loss 2.4952 LearningRate 0.0167 Epoch: 15 Global Step: 160850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:22:25,752-Speed 5375.58 samples/sec Loss 2.5179 LearningRate 0.0167 Epoch: 15 Global Step: 160860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:22:33,339-Speed 5399.86 samples/sec Loss 2.4964 LearningRate 0.0167 Epoch: 15 Global Step: 160870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:22:40,830-Speed 5468.51 samples/sec Loss 2.5049 LearningRate 0.0167 Epoch: 15 Global Step: 160880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:22:48,304-Speed 5481.18 samples/sec Loss 2.5011 LearningRate 0.0167 Epoch: 15 Global Step: 160890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:22:55,738-Speed 5510.51 samples/sec Loss 2.4763 LearningRate 0.0167 Epoch: 15 Global Step: 160900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:23:03,254-Speed 5450.42 samples/sec Loss 2.4752 LearningRate 0.0167 Epoch: 15 Global Step: 160910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:23:10,831-Speed 5406.88 samples/sec Loss 2.4879 LearningRate 0.0167 Epoch: 15 Global Step: 160920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:23:18,484-Speed 5353.10 samples/sec Loss 2.4625 LearningRate 0.0167 Epoch: 15 Global Step: 160930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:23:26,035-Speed 5424.61 samples/sec Loss 2.4504 LearningRate 0.0167 Epoch: 15 Global Step: 160940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:23:33,530-Speed 5465.50 samples/sec Loss 2.4841 LearningRate 0.0167 Epoch: 15 Global Step: 160950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:23:41,216-Speed 5330.62 samples/sec Loss 2.4404 LearningRate 0.0167 Epoch: 15 Global Step: 160960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:23:48,706-Speed 5469.55 samples/sec Loss 2.4742 LearningRate 0.0166 Epoch: 15 Global Step: 160970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:23:56,382-Speed 5336.25 samples/sec Loss 2.4415 LearningRate 0.0166 Epoch: 15 Global Step: 160980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:24:03,894-Speed 5453.16 samples/sec Loss 2.4719 LearningRate 0.0166 Epoch: 15 Global Step: 160990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:24:11,468-Speed 5409.54 samples/sec Loss 2.4544 LearningRate 0.0166 Epoch: 15 Global Step: 161000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:24:18,938-Speed 5483.90 samples/sec Loss 2.4796 LearningRate 0.0166 Epoch: 15 Global Step: 161010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:24:26,453-Speed 5451.17 samples/sec Loss 2.4472 LearningRate 0.0166 Epoch: 15 Global Step: 161020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:24:33,928-Speed 5480.23 samples/sec Loss 2.5242 LearningRate 0.0166 Epoch: 15 Global Step: 161030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:24:41,421-Speed 5466.77 samples/sec Loss 2.4572 LearningRate 0.0166 Epoch: 15 Global Step: 161040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:24:48,853-Speed 5512.22 samples/sec Loss 2.4709 LearningRate 0.0166 Epoch: 15 Global Step: 161050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:24:56,310-Speed 5493.81 samples/sec Loss 2.4966 LearningRate 0.0166 Epoch: 15 Global Step: 161060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:25:03,816-Speed 5457.28 samples/sec Loss 2.4739 LearningRate 0.0166 Epoch: 15 Global Step: 161070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:25:11,385-Speed 5412.39 samples/sec Loss 2.4455 LearningRate 0.0166 Epoch: 15 Global Step: 161080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:25:18,960-Speed 5408.60 samples/sec Loss 2.4426 LearningRate 0.0166 Epoch: 15 Global Step: 161090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-09 07:25:26,461-Speed 5461.12 samples/sec Loss 2.5267 LearningRate 0.0166 Epoch: 15 Global Step: 161100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:25:33,954-Speed 5466.97 samples/sec Loss 2.5303 LearningRate 0.0165 Epoch: 15 Global Step: 161110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:25:41,489-Speed 5436.44 samples/sec Loss 2.4528 LearningRate 0.0165 Epoch: 15 Global Step: 161120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:25:49,140-Speed 5354.48 samples/sec Loss 2.4728 LearningRate 0.0165 Epoch: 15 Global Step: 161130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:25:56,730-Speed 5397.63 samples/sec Loss 2.5051 LearningRate 0.0165 Epoch: 15 Global Step: 161140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:26:04,268-Speed 5434.64 samples/sec Loss 2.4908 LearningRate 0.0165 Epoch: 15 Global Step: 161150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:26:11,758-Speed 5468.73 samples/sec Loss 2.4509 LearningRate 0.0165 Epoch: 15 Global Step: 161160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:26:19,262-Speed 5459.69 samples/sec Loss 2.5084 LearningRate 0.0165 Epoch: 15 Global Step: 161170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:26:26,749-Speed 5471.36 samples/sec Loss 2.4715 LearningRate 0.0165 Epoch: 15 Global Step: 161180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:26:34,315-Speed 5414.20 samples/sec Loss 2.4681 LearningRate 0.0165 Epoch: 15 Global Step: 161190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:26:41,838-Speed 5445.11 samples/sec Loss 2.4784 LearningRate 0.0165 Epoch: 15 Global Step: 161200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:26:49,413-Speed 5408.23 samples/sec Loss 2.4514 LearningRate 0.0165 Epoch: 15 Global Step: 161210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:26:56,971-Speed 5420.20 samples/sec Loss 2.5010 LearningRate 0.0165 Epoch: 15 Global Step: 161220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:27:04,406-Speed 5509.89 samples/sec Loss 2.4815 LearningRate 0.0165 Epoch: 15 Global Step: 161230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:27:11,873-Speed 5485.65 samples/sec Loss 2.4739 LearningRate 0.0165 Epoch: 15 Global Step: 161240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:27:19,385-Speed 5453.42 samples/sec Loss 2.4869 LearningRate 0.0164 Epoch: 15 Global Step: 161250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:27:26,860-Speed 5481.17 samples/sec Loss 2.4823 LearningRate 0.0164 Epoch: 15 Global Step: 161260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:27:34,335-Speed 5479.62 samples/sec Loss 2.4661 LearningRate 0.0164 Epoch: 15 Global Step: 161270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:27:41,764-Speed 5514.29 samples/sec Loss 2.4991 LearningRate 0.0164 Epoch: 15 Global Step: 161280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:27:49,374-Speed 5383.41 samples/sec Loss 2.4526 LearningRate 0.0164 Epoch: 15 Global Step: 161290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:27:56,811-Speed 5508.06 samples/sec Loss 2.5219 LearningRate 0.0164 Epoch: 15 Global Step: 161300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:28:04,381-Speed 5411.78 samples/sec Loss 2.4406 LearningRate 0.0164 Epoch: 15 Global Step: 161310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:28:11,877-Speed 5464.34 samples/sec Loss 2.4652 LearningRate 0.0164 Epoch: 15 Global Step: 161320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:28:19,430-Speed 5423.48 samples/sec Loss 2.4511 LearningRate 0.0164 Epoch: 15 Global Step: 161330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:28:26,954-Speed 5445.11 samples/sec Loss 2.4804 LearningRate 0.0164 Epoch: 15 Global Step: 161340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:28:34,404-Speed 5498.58 samples/sec Loss 2.4858 LearningRate 0.0164 Epoch: 15 Global Step: 161350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:28:41,819-Speed 5524.51 samples/sec Loss 2.4822 LearningRate 0.0164 Epoch: 15 Global Step: 161360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:28:49,318-Speed 5462.41 samples/sec Loss 2.4642 LearningRate 0.0164 Epoch: 15 Global Step: 161370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-09 07:28:56,776-Speed 5493.31 samples/sec Loss 2.4794 LearningRate 0.0164 Epoch: 15 Global Step: 161380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:29:04,296-Speed 5448.00 samples/sec Loss 2.4741 LearningRate 0.0163 Epoch: 15 Global Step: 161390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:29:11,822-Speed 5442.61 samples/sec Loss 2.4569 LearningRate 0.0163 Epoch: 15 Global Step: 161400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:29:19,444-Speed 5374.38 samples/sec Loss 2.4528 LearningRate 0.0163 Epoch: 15 Global Step: 161410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:29:26,976-Speed 5439.63 samples/sec Loss 2.4646 LearningRate 0.0163 Epoch: 15 Global Step: 161420 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:29:34,480-Speed 5458.73 samples/sec Loss 2.4846 LearningRate 0.0163 Epoch: 15 Global Step: 161430 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:29:42,031-Speed 5425.00 samples/sec Loss 2.4528 LearningRate 0.0163 Epoch: 15 Global Step: 161440 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:29:49,546-Speed 5451.07 samples/sec Loss 2.4604 LearningRate 0.0163 Epoch: 15 Global Step: 161450 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:29:57,089-Speed 5431.29 samples/sec Loss 2.4759 LearningRate 0.0163 Epoch: 15 Global Step: 161460 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:30:04,689-Speed 5389.85 samples/sec Loss 2.4412 LearningRate 0.0163 Epoch: 15 Global Step: 161470 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:30:12,311-Speed 5375.17 samples/sec Loss 2.4726 LearningRate 0.0163 Epoch: 15 Global Step: 161480 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:30:19,860-Speed 5425.89 samples/sec Loss 2.4797 LearningRate 0.0163 Epoch: 15 Global Step: 161490 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:30:27,329-Speed 5484.72 samples/sec Loss 2.4447 LearningRate 0.0163 Epoch: 15 Global Step: 161500 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:30:34,853-Speed 5445.05 samples/sec Loss 2.4327 LearningRate 0.0163 Epoch: 15 Global Step: 161510 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:30:42,415-Speed 5417.15 samples/sec Loss 2.4449 LearningRate 0.0163 Epoch: 15 Global Step: 161520 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-09 07:30:49,936-Speed 5446.65 samples/sec Loss 2.4369 LearningRate 0.0162 Epoch: 15 Global Step: 161530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-09 07:30:57,430-Speed 5466.72 samples/sec Loss 2.4805 LearningRate 0.0162 Epoch: 15 Global Step: 161540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:31:04,938-Speed 5456.07 samples/sec Loss 2.4674 LearningRate 0.0162 Epoch: 15 Global Step: 161550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:31:12,491-Speed 5424.03 samples/sec Loss 2.4505 LearningRate 0.0162 Epoch: 15 Global Step: 161560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:31:20,052-Speed 5417.64 samples/sec Loss 2.4550 LearningRate 0.0162 Epoch: 15 Global Step: 161570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:31:27,511-Speed 5491.97 samples/sec Loss 2.4703 LearningRate 0.0162 Epoch: 15 Global Step: 161580 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:31:35,046-Speed 5436.85 samples/sec Loss 2.4590 LearningRate 0.0162 Epoch: 15 Global Step: 161590 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:31:42,615-Speed 5412.38 samples/sec Loss 2.4680 LearningRate 0.0162 Epoch: 15 Global Step: 161600 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:31:50,095-Speed 5476.62 samples/sec Loss 2.4775 LearningRate 0.0162 Epoch: 15 Global Step: 161610 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:31:57,600-Speed 5458.15 samples/sec Loss 2.4687 LearningRate 0.0162 Epoch: 15 Global Step: 161620 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:32:05,191-Speed 5397.07 samples/sec Loss 2.4488 LearningRate 0.0162 Epoch: 15 Global Step: 161630 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:32:12,752-Speed 5418.07 samples/sec Loss 2.4410 LearningRate 0.0162 Epoch: 15 Global Step: 161640 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:32:20,327-Speed 5407.31 samples/sec Loss 2.4528 LearningRate 0.0162 Epoch: 15 Global Step: 161650 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:32:27,799-Speed 5482.97 samples/sec Loss 2.4253 LearningRate 0.0162 Epoch: 15 Global Step: 161660 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:32:35,443-Speed 5358.75 samples/sec Loss 2.4483 LearningRate 0.0161 Epoch: 15 Global Step: 161670 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:32:43,108-Speed 5344.52 samples/sec Loss 2.4494 LearningRate 0.0161 Epoch: 15 Global Step: 161680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:32:50,804-Speed 5323.36 samples/sec Loss 2.4677 LearningRate 0.0161 Epoch: 15 Global Step: 161690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:32:58,320-Speed 5449.98 samples/sec Loss 2.4431 LearningRate 0.0161 Epoch: 15 Global Step: 161700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:33:05,815-Speed 5465.92 samples/sec Loss 2.4559 LearningRate 0.0161 Epoch: 15 Global Step: 161710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:33:13,253-Speed 5507.41 samples/sec Loss 2.4450 LearningRate 0.0161 Epoch: 15 Global Step: 161720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:33:20,841-Speed 5398.37 samples/sec Loss 2.4665 LearningRate 0.0161 Epoch: 15 Global Step: 161730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:33:28,390-Speed 5426.88 samples/sec Loss 2.4004 LearningRate 0.0161 Epoch: 15 Global Step: 161740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:33:35,987-Speed 5392.68 samples/sec Loss 2.4143 LearningRate 0.0161 Epoch: 15 Global Step: 161750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:33:43,514-Speed 5442.74 samples/sec Loss 2.4499 LearningRate 0.0161 Epoch: 15 Global Step: 161760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:33:51,013-Speed 5462.47 samples/sec Loss 2.4439 LearningRate 0.0161 Epoch: 15 Global Step: 161770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:33:58,557-Speed 5430.28 samples/sec Loss 2.4473 LearningRate 0.0161 Epoch: 15 Global Step: 161780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:34:06,137-Speed 5404.55 samples/sec Loss 2.3968 LearningRate 0.0161 Epoch: 15 Global Step: 161790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:34:13,598-Speed 5490.95 samples/sec Loss 2.4481 LearningRate 0.0161 Epoch: 15 Global Step: 161800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:34:21,091-Speed 5467.00 samples/sec Loss 2.4414 LearningRate 0.0161 Epoch: 15 Global Step: 161810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:34:28,614-Speed 5445.83 samples/sec Loss 2.4412 LearningRate 0.0160 Epoch: 15 Global Step: 161820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:34:36,124-Speed 5454.64 samples/sec Loss 2.4356 LearningRate 0.0160 Epoch: 15 Global Step: 161830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:34:43,731-Speed 5385.35 samples/sec Loss 2.4591 LearningRate 0.0160 Epoch: 15 Global Step: 161840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:34:51,249-Speed 5448.70 samples/sec Loss 2.4281 LearningRate 0.0160 Epoch: 15 Global Step: 161850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:34:58,888-Speed 5362.54 samples/sec Loss 2.4179 LearningRate 0.0160 Epoch: 15 Global Step: 161860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:35:06,575-Speed 5329.81 samples/sec Loss 2.4465 LearningRate 0.0160 Epoch: 15 Global Step: 161870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:35:14,160-Speed 5400.55 samples/sec Loss 2.4244 LearningRate 0.0160 Epoch: 15 Global Step: 161880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:35:21,726-Speed 5414.76 samples/sec Loss 2.4059 LearningRate 0.0160 Epoch: 15 Global Step: 161890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:35:29,304-Speed 5405.58 samples/sec Loss 2.4717 LearningRate 0.0160 Epoch: 15 Global Step: 161900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:35:36,850-Speed 5428.89 samples/sec Loss 2.4208 LearningRate 0.0160 Epoch: 15 Global Step: 161910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:35:44,475-Speed 5372.31 samples/sec Loss 2.4335 LearningRate 0.0160 Epoch: 15 Global Step: 161920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:35:52,007-Speed 5438.89 samples/sec Loss 2.4470 LearningRate 0.0160 Epoch: 15 Global Step: 161930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:35:59,472-Speed 5487.84 samples/sec Loss 2.4295 LearningRate 0.0160 Epoch: 15 Global Step: 161940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:36:06,992-Speed 5447.98 samples/sec Loss 2.4362 LearningRate 0.0160 Epoch: 15 Global Step: 161950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:36:14,524-Speed 5438.75 samples/sec Loss 2.4228 LearningRate 0.0159 Epoch: 15 Global Step: 161960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:36:22,102-Speed 5405.97 samples/sec Loss 2.4450 LearningRate 0.0159 Epoch: 15 Global Step: 161970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:36:29,724-Speed 5374.34 samples/sec Loss 2.4210 LearningRate 0.0159 Epoch: 15 Global Step: 161980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:36:37,358-Speed 5366.70 samples/sec Loss 2.4492 LearningRate 0.0159 Epoch: 15 Global Step: 161990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:36:45,014-Speed 5350.45 samples/sec Loss 2.3755 LearningRate 0.0159 Epoch: 15 Global Step: 162000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:37:28,674-[lfw][162000]XNorm: 22.860283 Training: 2022-01-09 07:37:28,674-[lfw][162000]Accuracy-Flip: 0.99833+-0.00197 Training: 2022-01-09 07:37:28,675-[lfw][162000]Accuracy-Highest: 0.99833 Training: 2022-01-09 07:38:19,599-[cfp_fp][162000]XNorm: 21.701475 Training: 2022-01-09 07:38:19,600-[cfp_fp][162000]Accuracy-Flip: 0.99286+-0.00409 Training: 2022-01-09 07:38:19,601-[cfp_fp][162000]Accuracy-Highest: 0.99371 Training: 2022-01-09 07:39:03,439-[agedb_30][162000]XNorm: 23.361640 Training: 2022-01-09 07:39:03,440-[agedb_30][162000]Accuracy-Flip: 0.98333+-0.00695 Training: 2022-01-09 07:39:03,440-[agedb_30][162000]Accuracy-Highest: 0.98333 Training: 2022-01-09 07:39:11,145-Speed 280.30 samples/sec Loss 2.4089 LearningRate 0.0159 Epoch: 15 Global Step: 162010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:39:18,725-Speed 5404.74 samples/sec Loss 2.4423 LearningRate 0.0159 Epoch: 15 Global Step: 162020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:39:26,220-Speed 5465.44 samples/sec Loss 2.4469 LearningRate 0.0159 Epoch: 15 Global Step: 162030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:39:33,673-Speed 5496.32 samples/sec Loss 2.4311 LearningRate 0.0159 Epoch: 15 Global Step: 162040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:39:41,194-Speed 5446.97 samples/sec Loss 2.3981 LearningRate 0.0159 Epoch: 15 Global Step: 162050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:39:48,684-Speed 5469.08 samples/sec Loss 2.4072 LearningRate 0.0159 Epoch: 15 Global Step: 162060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:39:56,241-Speed 5420.76 samples/sec Loss 2.4085 LearningRate 0.0159 Epoch: 15 Global Step: 162070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:40:03,721-Speed 5477.00 samples/sec Loss 2.4214 LearningRate 0.0159 Epoch: 15 Global Step: 162080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:40:11,217-Speed 5465.32 samples/sec Loss 2.4149 LearningRate 0.0159 Epoch: 15 Global Step: 162090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:40:18,793-Speed 5407.05 samples/sec Loss 2.4055 LearningRate 0.0158 Epoch: 15 Global Step: 162100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:40:26,312-Speed 5448.23 samples/sec Loss 2.4283 LearningRate 0.0158 Epoch: 15 Global Step: 162110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:40:33,794-Speed 5475.31 samples/sec Loss 2.4221 LearningRate 0.0158 Epoch: 15 Global Step: 162120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:40:41,267-Speed 5481.70 samples/sec Loss 2.4253 LearningRate 0.0158 Epoch: 15 Global Step: 162130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:40:48,751-Speed 5474.06 samples/sec Loss 2.4479 LearningRate 0.0158 Epoch: 15 Global Step: 162140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:40:56,248-Speed 5464.03 samples/sec Loss 2.3833 LearningRate 0.0158 Epoch: 15 Global Step: 162150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:41:03,794-Speed 5428.98 samples/sec Loss 2.3920 LearningRate 0.0158 Epoch: 15 Global Step: 162160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:41:11,486-Speed 5325.98 samples/sec Loss 2.3871 LearningRate 0.0158 Epoch: 15 Global Step: 162170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:41:19,070-Speed 5401.11 samples/sec Loss 2.4197 LearningRate 0.0158 Epoch: 15 Global Step: 162180 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:41:26,640-Speed 5411.63 samples/sec Loss 2.4068 LearningRate 0.0158 Epoch: 15 Global Step: 162190 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:41:34,203-Speed 5416.80 samples/sec Loss 2.4266 LearningRate 0.0158 Epoch: 15 Global Step: 162200 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:41:41,915-Speed 5312.21 samples/sec Loss 2.4063 LearningRate 0.0158 Epoch: 15 Global Step: 162210 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:41:49,559-Speed 5359.07 samples/sec Loss 2.4206 LearningRate 0.0158 Epoch: 15 Global Step: 162220 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:41:57,169-Speed 5382.67 samples/sec Loss 2.4484 LearningRate 0.0158 Epoch: 15 Global Step: 162230 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:42:04,693-Speed 5445.18 samples/sec Loss 2.4205 LearningRate 0.0157 Epoch: 15 Global Step: 162240 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:42:12,193-Speed 5461.94 samples/sec Loss 2.3922 LearningRate 0.0157 Epoch: 15 Global Step: 162250 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:42:19,759-Speed 5414.31 samples/sec Loss 2.4415 LearningRate 0.0157 Epoch: 15 Global Step: 162260 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:42:27,251-Speed 5467.75 samples/sec Loss 2.3855 LearningRate 0.0157 Epoch: 15 Global Step: 162270 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 07:42:34,763-Speed 5453.88 samples/sec Loss 2.4254 LearningRate 0.0157 Epoch: 15 Global Step: 162280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:42:42,286-Speed 5445.70 samples/sec Loss 2.3889 LearningRate 0.0157 Epoch: 15 Global Step: 162290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:42:49,861-Speed 5407.19 samples/sec Loss 2.4149 LearningRate 0.0157 Epoch: 15 Global Step: 162300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:42:57,326-Speed 5487.85 samples/sec Loss 2.4434 LearningRate 0.0157 Epoch: 15 Global Step: 162310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:43:04,865-Speed 5433.91 samples/sec Loss 2.3915 LearningRate 0.0157 Epoch: 15 Global Step: 162320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:43:12,364-Speed 5463.26 samples/sec Loss 2.4476 LearningRate 0.0157 Epoch: 15 Global Step: 162330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:43:19,826-Speed 5489.62 samples/sec Loss 2.4205 LearningRate 0.0157 Epoch: 15 Global Step: 162340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:43:27,268-Speed 5504.58 samples/sec Loss 2.4334 LearningRate 0.0157 Epoch: 15 Global Step: 162350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:43:34,791-Speed 5446.09 samples/sec Loss 2.4230 LearningRate 0.0157 Epoch: 15 Global Step: 162360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:43:42,376-Speed 5400.77 samples/sec Loss 2.4225 LearningRate 0.0157 Epoch: 15 Global Step: 162370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:43:49,885-Speed 5455.73 samples/sec Loss 2.3964 LearningRate 0.0157 Epoch: 15 Global Step: 162380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:43:57,316-Speed 5512.31 samples/sec Loss 2.3939 LearningRate 0.0156 Epoch: 15 Global Step: 162390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:44:04,766-Speed 5498.85 samples/sec Loss 2.3978 LearningRate 0.0156 Epoch: 15 Global Step: 162400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:44:12,361-Speed 5393.59 samples/sec Loss 2.4029 LearningRate 0.0156 Epoch: 15 Global Step: 162410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:44:19,938-Speed 5406.65 samples/sec Loss 2.3987 LearningRate 0.0156 Epoch: 15 Global Step: 162420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:44:27,619-Speed 5333.18 samples/sec Loss 2.4412 LearningRate 0.0156 Epoch: 15 Global Step: 162430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:44:35,078-Speed 5492.26 samples/sec Loss 2.3898 LearningRate 0.0156 Epoch: 15 Global Step: 162440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:44:42,489-Speed 5528.00 samples/sec Loss 2.4135 LearningRate 0.0156 Epoch: 15 Global Step: 162450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:44:50,055-Speed 5414.74 samples/sec Loss 2.4024 LearningRate 0.0156 Epoch: 15 Global Step: 162460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:44:57,607-Speed 5424.25 samples/sec Loss 2.3969 LearningRate 0.0156 Epoch: 15 Global Step: 162470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:45:05,201-Speed 5394.29 samples/sec Loss 2.4020 LearningRate 0.0156 Epoch: 15 Global Step: 162480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:45:12,678-Speed 5478.76 samples/sec Loss 2.3910 LearningRate 0.0156 Epoch: 15 Global Step: 162490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:45:20,175-Speed 5464.61 samples/sec Loss 2.4142 LearningRate 0.0156 Epoch: 15 Global Step: 162500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:45:27,721-Speed 5428.45 samples/sec Loss 2.4122 LearningRate 0.0156 Epoch: 15 Global Step: 162510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:45:35,265-Speed 5429.77 samples/sec Loss 2.3944 LearningRate 0.0156 Epoch: 15 Global Step: 162520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:45:42,868-Speed 5388.68 samples/sec Loss 2.4050 LearningRate 0.0155 Epoch: 15 Global Step: 162530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:45:50,394-Speed 5443.77 samples/sec Loss 2.4092 LearningRate 0.0155 Epoch: 15 Global Step: 162540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:45:57,837-Speed 5503.46 samples/sec Loss 2.4073 LearningRate 0.0155 Epoch: 15 Global Step: 162550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:46:05,289-Speed 5496.83 samples/sec Loss 2.4266 LearningRate 0.0155 Epoch: 15 Global Step: 162560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:46:12,933-Speed 5359.38 samples/sec Loss 2.3968 LearningRate 0.0155 Epoch: 15 Global Step: 162570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:46:20,424-Speed 5468.88 samples/sec Loss 2.3781 LearningRate 0.0155 Epoch: 15 Global Step: 162580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:46:27,984-Speed 5418.04 samples/sec Loss 2.3754 LearningRate 0.0155 Epoch: 15 Global Step: 162590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:46:35,462-Speed 5477.96 samples/sec Loss 2.4121 LearningRate 0.0155 Epoch: 15 Global Step: 162600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:46:42,943-Speed 5475.89 samples/sec Loss 2.3932 LearningRate 0.0155 Epoch: 15 Global Step: 162610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:46:50,421-Speed 5478.45 samples/sec Loss 2.4383 LearningRate 0.0155 Epoch: 15 Global Step: 162620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:46:57,915-Speed 5466.66 samples/sec Loss 2.4526 LearningRate 0.0155 Epoch: 15 Global Step: 162630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:47:05,504-Speed 5397.71 samples/sec Loss 2.4040 LearningRate 0.0155 Epoch: 15 Global Step: 162640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:47:12,998-Speed 5466.91 samples/sec Loss 2.4216 LearningRate 0.0155 Epoch: 15 Global Step: 162650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:47:20,444-Speed 5501.48 samples/sec Loss 2.3758 LearningRate 0.0155 Epoch: 15 Global Step: 162660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:47:27,903-Speed 5492.09 samples/sec Loss 2.3581 LearningRate 0.0155 Epoch: 15 Global Step: 162670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:47:35,391-Speed 5470.93 samples/sec Loss 2.3715 LearningRate 0.0154 Epoch: 15 Global Step: 162680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:47:42,979-Speed 5398.79 samples/sec Loss 2.4405 LearningRate 0.0154 Epoch: 15 Global Step: 162690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:47:50,488-Speed 5455.02 samples/sec Loss 2.3958 LearningRate 0.0154 Epoch: 15 Global Step: 162700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:47:58,094-Speed 5386.25 samples/sec Loss 2.4170 LearningRate 0.0154 Epoch: 15 Global Step: 162710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:48:05,717-Speed 5374.28 samples/sec Loss 2.3892 LearningRate 0.0154 Epoch: 15 Global Step: 162720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:48:13,305-Speed 5398.65 samples/sec Loss 2.4140 LearningRate 0.0154 Epoch: 15 Global Step: 162730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:48:20,818-Speed 5452.16 samples/sec Loss 2.3638 LearningRate 0.0154 Epoch: 15 Global Step: 162740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:48:28,367-Speed 5426.50 samples/sec Loss 2.3830 LearningRate 0.0154 Epoch: 15 Global Step: 162750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:48:35,835-Speed 5485.95 samples/sec Loss 2.4039 LearningRate 0.0154 Epoch: 15 Global Step: 162760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:48:43,356-Speed 5446.92 samples/sec Loss 2.3871 LearningRate 0.0154 Epoch: 15 Global Step: 162770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:48:50,831-Speed 5479.54 samples/sec Loss 2.3947 LearningRate 0.0154 Epoch: 15 Global Step: 162780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:48:58,329-Speed 5463.62 samples/sec Loss 2.3612 LearningRate 0.0154 Epoch: 15 Global Step: 162790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:49:05,796-Speed 5486.08 samples/sec Loss 2.4042 LearningRate 0.0154 Epoch: 15 Global Step: 162800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:49:13,448-Speed 5354.21 samples/sec Loss 2.3906 LearningRate 0.0154 Epoch: 15 Global Step: 162810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:49:21,087-Speed 5362.39 samples/sec Loss 2.3910 LearningRate 0.0153 Epoch: 15 Global Step: 162820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:49:28,644-Speed 5421.54 samples/sec Loss 2.3674 LearningRate 0.0153 Epoch: 15 Global Step: 162830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:49:36,335-Speed 5326.14 samples/sec Loss 2.3930 LearningRate 0.0153 Epoch: 15 Global Step: 162840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:49:43,837-Speed 5460.79 samples/sec Loss 2.3973 LearningRate 0.0153 Epoch: 15 Global Step: 162850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:49:51,417-Speed 5404.33 samples/sec Loss 2.3860 LearningRate 0.0153 Epoch: 15 Global Step: 162860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:49:59,170-Speed 5283.85 samples/sec Loss 2.4045 LearningRate 0.0153 Epoch: 15 Global Step: 162870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:50:06,804-Speed 5366.13 samples/sec Loss 2.3957 LearningRate 0.0153 Epoch: 15 Global Step: 162880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:50:14,290-Speed 5471.99 samples/sec Loss 2.3823 LearningRate 0.0153 Epoch: 15 Global Step: 162890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:50:21,728-Speed 5507.57 samples/sec Loss 2.3701 LearningRate 0.0153 Epoch: 15 Global Step: 162900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:50:29,209-Speed 5476.31 samples/sec Loss 2.3761 LearningRate 0.0153 Epoch: 15 Global Step: 162910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:50:36,765-Speed 5421.45 samples/sec Loss 2.3953 LearningRate 0.0153 Epoch: 15 Global Step: 162920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:50:44,277-Speed 5453.08 samples/sec Loss 2.3774 LearningRate 0.0153 Epoch: 15 Global Step: 162930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:50:51,789-Speed 5453.33 samples/sec Loss 2.3723 LearningRate 0.0153 Epoch: 15 Global Step: 162940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:50:59,327-Speed 5434.79 samples/sec Loss 2.3916 LearningRate 0.0153 Epoch: 15 Global Step: 162950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:51:06,915-Speed 5398.44 samples/sec Loss 2.3463 LearningRate 0.0153 Epoch: 15 Global Step: 162960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:51:14,429-Speed 5451.81 samples/sec Loss 2.3706 LearningRate 0.0152 Epoch: 15 Global Step: 162970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:51:22,041-Speed 5382.00 samples/sec Loss 2.3899 LearningRate 0.0152 Epoch: 15 Global Step: 162980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:51:29,506-Speed 5487.19 samples/sec Loss 2.3624 LearningRate 0.0152 Epoch: 15 Global Step: 162990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:51:37,044-Speed 5434.73 samples/sec Loss 2.3135 LearningRate 0.0152 Epoch: 15 Global Step: 163000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:51:44,522-Speed 5478.49 samples/sec Loss 2.3674 LearningRate 0.0152 Epoch: 15 Global Step: 163010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:51:52,010-Speed 5470.08 samples/sec Loss 2.3620 LearningRate 0.0152 Epoch: 15 Global Step: 163020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:51:59,566-Speed 5422.13 samples/sec Loss 2.3911 LearningRate 0.0152 Epoch: 15 Global Step: 163030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:52:07,030-Speed 5487.98 samples/sec Loss 2.3611 LearningRate 0.0152 Epoch: 15 Global Step: 163040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:52:14,680-Speed 5354.93 samples/sec Loss 2.3467 LearningRate 0.0152 Epoch: 15 Global Step: 163050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:52:22,252-Speed 5410.00 samples/sec Loss 2.4262 LearningRate 0.0152 Epoch: 15 Global Step: 163060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:52:29,766-Speed 5452.28 samples/sec Loss 2.3608 LearningRate 0.0152 Epoch: 15 Global Step: 163070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:52:37,366-Speed 5390.17 samples/sec Loss 2.3484 LearningRate 0.0152 Epoch: 15 Global Step: 163080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:52:44,872-Speed 5457.93 samples/sec Loss 2.3593 LearningRate 0.0152 Epoch: 15 Global Step: 163090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:52:52,594-Speed 5305.00 samples/sec Loss 2.3327 LearningRate 0.0152 Epoch: 15 Global Step: 163100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:53:00,258-Speed 5344.87 samples/sec Loss 2.3665 LearningRate 0.0151 Epoch: 15 Global Step: 163110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:53:07,876-Speed 5377.52 samples/sec Loss 2.3539 LearningRate 0.0151 Epoch: 15 Global Step: 163120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:53:15,480-Speed 5388.01 samples/sec Loss 2.3879 LearningRate 0.0151 Epoch: 15 Global Step: 163130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:53:23,166-Speed 5330.10 samples/sec Loss 2.3470 LearningRate 0.0151 Epoch: 15 Global Step: 163140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:53:30,729-Speed 5416.40 samples/sec Loss 2.3628 LearningRate 0.0151 Epoch: 15 Global Step: 163150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:53:38,257-Speed 5441.96 samples/sec Loss 2.3838 LearningRate 0.0151 Epoch: 15 Global Step: 163160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:53:45,816-Speed 5419.63 samples/sec Loss 2.3760 LearningRate 0.0151 Epoch: 15 Global Step: 163170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:53:53,378-Speed 5416.96 samples/sec Loss 2.3960 LearningRate 0.0151 Epoch: 15 Global Step: 163180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:54:01,027-Speed 5355.71 samples/sec Loss 2.3905 LearningRate 0.0151 Epoch: 15 Global Step: 163190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:54:08,601-Speed 5409.25 samples/sec Loss 2.3587 LearningRate 0.0151 Epoch: 15 Global Step: 163200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:54:16,255-Speed 5352.02 samples/sec Loss 2.3985 LearningRate 0.0151 Epoch: 15 Global Step: 163210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:54:23,817-Speed 5417.07 samples/sec Loss 2.3527 LearningRate 0.0151 Epoch: 15 Global Step: 163220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:54:31,276-Speed 5491.74 samples/sec Loss 2.3498 LearningRate 0.0151 Epoch: 15 Global Step: 163230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:54:38,782-Speed 5457.84 samples/sec Loss 2.3570 LearningRate 0.0151 Epoch: 15 Global Step: 163240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:54:46,268-Speed 5472.62 samples/sec Loss 2.3485 LearningRate 0.0151 Epoch: 15 Global Step: 163250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:54:53,793-Speed 5443.49 samples/sec Loss 2.3677 LearningRate 0.0150 Epoch: 15 Global Step: 163260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:55:01,375-Speed 5403.01 samples/sec Loss 2.3802 LearningRate 0.0150 Epoch: 15 Global Step: 163270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:55:08,896-Speed 5446.59 samples/sec Loss 2.3202 LearningRate 0.0150 Epoch: 15 Global Step: 163280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:55:16,434-Speed 5435.20 samples/sec Loss 2.3533 LearningRate 0.0150 Epoch: 15 Global Step: 163290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:55:23,962-Speed 5441.19 samples/sec Loss 2.3775 LearningRate 0.0150 Epoch: 15 Global Step: 163300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:55:31,478-Speed 5450.46 samples/sec Loss 2.3203 LearningRate 0.0150 Epoch: 15 Global Step: 163310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:55:38,967-Speed 5469.74 samples/sec Loss 2.3474 LearningRate 0.0150 Epoch: 15 Global Step: 163320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:55:46,473-Speed 5458.30 samples/sec Loss 2.3576 LearningRate 0.0150 Epoch: 15 Global Step: 163330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:55:53,902-Speed 5513.57 samples/sec Loss 2.3698 LearningRate 0.0150 Epoch: 15 Global Step: 163340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:56:01,390-Speed 5471.10 samples/sec Loss 2.3188 LearningRate 0.0150 Epoch: 15 Global Step: 163350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:56:08,889-Speed 5462.69 samples/sec Loss 2.3276 LearningRate 0.0150 Epoch: 15 Global Step: 163360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:56:16,396-Speed 5457.24 samples/sec Loss 2.3583 LearningRate 0.0150 Epoch: 15 Global Step: 163370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:56:23,944-Speed 5427.25 samples/sec Loss 2.2903 LearningRate 0.0150 Epoch: 15 Global Step: 163380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:56:31,522-Speed 5405.71 samples/sec Loss 2.3518 LearningRate 0.0150 Epoch: 15 Global Step: 163390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:56:39,059-Speed 5434.66 samples/sec Loss 2.3628 LearningRate 0.0150 Epoch: 15 Global Step: 163400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:56:46,574-Speed 5451.60 samples/sec Loss 2.3608 LearningRate 0.0149 Epoch: 15 Global Step: 163410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:56:54,047-Speed 5481.34 samples/sec Loss 2.3604 LearningRate 0.0149 Epoch: 15 Global Step: 163420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:57:01,566-Speed 5448.04 samples/sec Loss 2.3728 LearningRate 0.0149 Epoch: 15 Global Step: 163430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:57:09,067-Speed 5461.89 samples/sec Loss 2.3706 LearningRate 0.0149 Epoch: 15 Global Step: 163440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:57:16,545-Speed 5477.83 samples/sec Loss 2.3397 LearningRate 0.0149 Epoch: 15 Global Step: 163450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:57:24,028-Speed 5474.78 samples/sec Loss 2.3610 LearningRate 0.0149 Epoch: 15 Global Step: 163460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:57:31,455-Speed 5515.27 samples/sec Loss 2.3479 LearningRate 0.0149 Epoch: 15 Global Step: 163470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:57:39,007-Speed 5424.66 samples/sec Loss 2.3380 LearningRate 0.0149 Epoch: 15 Global Step: 163480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:57:46,556-Speed 5426.23 samples/sec Loss 2.3367 LearningRate 0.0149 Epoch: 15 Global Step: 163490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:57:54,019-Speed 5489.26 samples/sec Loss 2.3597 LearningRate 0.0149 Epoch: 15 Global Step: 163500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:58:01,583-Speed 5415.85 samples/sec Loss 2.3567 LearningRate 0.0149 Epoch: 15 Global Step: 163510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:58:09,060-Speed 5478.68 samples/sec Loss 2.3387 LearningRate 0.0149 Epoch: 15 Global Step: 163520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:58:16,666-Speed 5386.13 samples/sec Loss 2.3412 LearningRate 0.0149 Epoch: 15 Global Step: 163530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:58:24,170-Speed 5459.58 samples/sec Loss 2.3473 LearningRate 0.0149 Epoch: 15 Global Step: 163540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:58:31,738-Speed 5412.71 samples/sec Loss 2.3214 LearningRate 0.0148 Epoch: 15 Global Step: 163550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:58:39,270-Speed 5438.18 samples/sec Loss 2.3682 LearningRate 0.0148 Epoch: 15 Global Step: 163560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:58:46,759-Speed 5470.01 samples/sec Loss 2.3233 LearningRate 0.0148 Epoch: 15 Global Step: 163570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:58:54,234-Speed 5481.30 samples/sec Loss 2.3385 LearningRate 0.0148 Epoch: 15 Global Step: 163580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:59:01,732-Speed 5462.96 samples/sec Loss 2.3372 LearningRate 0.0148 Epoch: 15 Global Step: 163590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:59:09,302-Speed 5411.05 samples/sec Loss 2.3526 LearningRate 0.0148 Epoch: 15 Global Step: 163600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:59:16,851-Speed 5426.79 samples/sec Loss 2.3310 LearningRate 0.0148 Epoch: 15 Global Step: 163610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 07:59:24,382-Speed 5439.72 samples/sec Loss 2.3647 LearningRate 0.0148 Epoch: 15 Global Step: 163620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:59:31,841-Speed 5492.14 samples/sec Loss 2.3493 LearningRate 0.0148 Epoch: 15 Global Step: 163630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:59:39,303-Speed 5489.46 samples/sec Loss 2.3673 LearningRate 0.0148 Epoch: 15 Global Step: 163640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:59:46,771-Speed 5486.02 samples/sec Loss 2.3880 LearningRate 0.0148 Epoch: 15 Global Step: 163650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 07:59:54,240-Speed 5484.74 samples/sec Loss 2.3629 LearningRate 0.0148 Epoch: 15 Global Step: 163660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:00:01,696-Speed 5494.20 samples/sec Loss 2.3572 LearningRate 0.0148 Epoch: 15 Global Step: 163670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:00:09,189-Speed 5467.08 samples/sec Loss 2.3581 LearningRate 0.0148 Epoch: 15 Global Step: 163680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:00:16,724-Speed 5436.65 samples/sec Loss 2.3915 LearningRate 0.0148 Epoch: 15 Global Step: 163690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:00:24,218-Speed 5466.40 samples/sec Loss 2.3027 LearningRate 0.0147 Epoch: 15 Global Step: 163700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:00:31,707-Speed 5470.08 samples/sec Loss 2.3387 LearningRate 0.0147 Epoch: 15 Global Step: 163710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:00:39,280-Speed 5409.25 samples/sec Loss 2.3212 LearningRate 0.0147 Epoch: 15 Global Step: 163720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:00:46,777-Speed 5463.84 samples/sec Loss 2.3333 LearningRate 0.0147 Epoch: 15 Global Step: 163730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:00:54,312-Speed 5437.30 samples/sec Loss 2.3417 LearningRate 0.0147 Epoch: 15 Global Step: 163740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:01:01,885-Speed 5408.95 samples/sec Loss 2.3181 LearningRate 0.0147 Epoch: 15 Global Step: 163750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:01:09,412-Speed 5442.67 samples/sec Loss 2.3527 LearningRate 0.0147 Epoch: 15 Global Step: 163760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:01:16,994-Speed 5403.18 samples/sec Loss 2.3167 LearningRate 0.0147 Epoch: 15 Global Step: 163770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:01:24,470-Speed 5479.54 samples/sec Loss 2.3334 LearningRate 0.0147 Epoch: 15 Global Step: 163780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:01:31,969-Speed 5462.53 samples/sec Loss 2.3696 LearningRate 0.0147 Epoch: 15 Global Step: 163790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:01:39,498-Speed 5440.66 samples/sec Loss 2.3006 LearningRate 0.0147 Epoch: 15 Global Step: 163800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:01:47,151-Speed 5353.16 samples/sec Loss 2.3355 LearningRate 0.0147 Epoch: 15 Global Step: 163810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:01:54,700-Speed 5427.13 samples/sec Loss 2.3424 LearningRate 0.0147 Epoch: 15 Global Step: 163820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:02:02,178-Speed 5478.17 samples/sec Loss 2.3342 LearningRate 0.0147 Epoch: 15 Global Step: 163830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:02:09,649-Speed 5483.03 samples/sec Loss 2.3533 LearningRate 0.0147 Epoch: 15 Global Step: 163840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:02:17,180-Speed 5439.38 samples/sec Loss 2.3416 LearningRate 0.0146 Epoch: 15 Global Step: 163850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:02:24,675-Speed 5465.90 samples/sec Loss 2.3389 LearningRate 0.0146 Epoch: 15 Global Step: 163860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:02:32,183-Speed 5456.53 samples/sec Loss 2.3035 LearningRate 0.0146 Epoch: 15 Global Step: 163870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:02:39,661-Speed 5477.71 samples/sec Loss 2.3197 LearningRate 0.0146 Epoch: 15 Global Step: 163880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:02:47,356-Speed 5323.82 samples/sec Loss 2.2983 LearningRate 0.0146 Epoch: 15 Global Step: 163890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:02:54,855-Speed 5463.16 samples/sec Loss 2.3598 LearningRate 0.0146 Epoch: 15 Global Step: 163900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:03:02,379-Speed 5444.86 samples/sec Loss 2.3486 LearningRate 0.0146 Epoch: 15 Global Step: 163910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:03:09,843-Speed 5488.02 samples/sec Loss 2.3322 LearningRate 0.0146 Epoch: 15 Global Step: 163920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:03:17,269-Speed 5516.49 samples/sec Loss 2.3096 LearningRate 0.0146 Epoch: 15 Global Step: 163930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:03:24,766-Speed 5464.66 samples/sec Loss 2.3229 LearningRate 0.0146 Epoch: 15 Global Step: 163940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:03:32,171-Speed 5532.01 samples/sec Loss 2.3159 LearningRate 0.0146 Epoch: 15 Global Step: 163950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:03:39,645-Speed 5480.50 samples/sec Loss 2.3003 LearningRate 0.0146 Epoch: 15 Global Step: 163960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:03:47,112-Speed 5486.98 samples/sec Loss 2.3121 LearningRate 0.0146 Epoch: 15 Global Step: 163970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:03:54,681-Speed 5412.01 samples/sec Loss 2.3092 LearningRate 0.0146 Epoch: 15 Global Step: 163980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:04:02,182-Speed 5461.83 samples/sec Loss 2.3304 LearningRate 0.0146 Epoch: 15 Global Step: 163990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:04:09,776-Speed 5394.34 samples/sec Loss 2.3588 LearningRate 0.0145 Epoch: 15 Global Step: 164000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:04:54,381-[lfw][164000]XNorm: 22.614009 Training: 2022-01-09 08:04:54,382-[lfw][164000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 08:04:54,382-[lfw][164000]Accuracy-Highest: 0.99833 Training: 2022-01-09 08:05:46,446-[cfp_fp][164000]XNorm: 21.485216 Training: 2022-01-09 08:05:46,447-[cfp_fp][164000]Accuracy-Flip: 0.99214+-0.00430 Training: 2022-01-09 08:05:46,447-[cfp_fp][164000]Accuracy-Highest: 0.99371 Training: 2022-01-09 08:06:30,871-[agedb_30][164000]XNorm: 22.797398 Training: 2022-01-09 08:06:30,872-[agedb_30][164000]Accuracy-Flip: 0.98183+-0.00751 Training: 2022-01-09 08:06:30,872-[agedb_30][164000]Accuracy-Highest: 0.98333 Training: 2022-01-09 08:06:38,523-Speed 275.37 samples/sec Loss 2.3075 LearningRate 0.0145 Epoch: 15 Global Step: 164010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:06:46,030-Speed 5456.70 samples/sec Loss 2.3199 LearningRate 0.0145 Epoch: 15 Global Step: 164020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:06:53,572-Speed 5431.89 samples/sec Loss 2.3144 LearningRate 0.0145 Epoch: 15 Global Step: 164030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:07:01,113-Speed 5432.27 samples/sec Loss 2.3459 LearningRate 0.0145 Epoch: 15 Global Step: 164040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:07:08,672-Speed 5419.05 samples/sec Loss 2.3401 LearningRate 0.0145 Epoch: 15 Global Step: 164050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:07:16,155-Speed 5474.50 samples/sec Loss 2.3077 LearningRate 0.0145 Epoch: 15 Global Step: 164060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:07:23,607-Speed 5497.39 samples/sec Loss 2.3058 LearningRate 0.0145 Epoch: 15 Global Step: 164070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:07:31,211-Speed 5387.55 samples/sec Loss 2.3156 LearningRate 0.0145 Epoch: 15 Global Step: 164080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:07:38,690-Speed 5476.93 samples/sec Loss 2.3164 LearningRate 0.0145 Epoch: 15 Global Step: 164090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:07:46,201-Speed 5454.30 samples/sec Loss 2.3094 LearningRate 0.0145 Epoch: 15 Global Step: 164100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:07:53,703-Speed 5460.21 samples/sec Loss 2.3398 LearningRate 0.0145 Epoch: 15 Global Step: 164110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:08:01,319-Speed 5379.43 samples/sec Loss 2.3160 LearningRate 0.0145 Epoch: 15 Global Step: 164120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:08:08,811-Speed 5467.84 samples/sec Loss 2.3292 LearningRate 0.0145 Epoch: 15 Global Step: 164130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:08:16,364-Speed 5423.33 samples/sec Loss 2.3070 LearningRate 0.0145 Epoch: 15 Global Step: 164140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:08:23,950-Speed 5400.03 samples/sec Loss 2.3193 LearningRate 0.0144 Epoch: 15 Global Step: 164150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:08:31,582-Speed 5367.43 samples/sec Loss 2.3433 LearningRate 0.0144 Epoch: 15 Global Step: 164160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:08:39,075-Speed 5466.96 samples/sec Loss 2.2948 LearningRate 0.0144 Epoch: 15 Global Step: 164170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:08:46,525-Speed 5498.80 samples/sec Loss 2.3279 LearningRate 0.0144 Epoch: 15 Global Step: 164180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:08:54,021-Speed 5465.09 samples/sec Loss 2.3414 LearningRate 0.0144 Epoch: 15 Global Step: 164190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:09:01,592-Speed 5411.34 samples/sec Loss 2.3320 LearningRate 0.0144 Epoch: 15 Global Step: 164200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:09:09,200-Speed 5384.40 samples/sec Loss 2.3177 LearningRate 0.0144 Epoch: 15 Global Step: 164210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:09:16,823-Speed 5373.44 samples/sec Loss 2.3048 LearningRate 0.0144 Epoch: 15 Global Step: 164220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:09:24,481-Speed 5349.74 samples/sec Loss 2.3072 LearningRate 0.0144 Epoch: 15 Global Step: 164230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:09:32,094-Speed 5381.29 samples/sec Loss 2.3149 LearningRate 0.0144 Epoch: 15 Global Step: 164240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:09:39,722-Speed 5369.90 samples/sec Loss 2.3116 LearningRate 0.0144 Epoch: 15 Global Step: 164250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:09:47,231-Speed 5455.29 samples/sec Loss 2.2914 LearningRate 0.0144 Epoch: 15 Global Step: 164260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:09:54,801-Speed 5412.15 samples/sec Loss 2.2647 LearningRate 0.0144 Epoch: 15 Global Step: 164270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:10:02,424-Speed 5373.44 samples/sec Loss 2.2985 LearningRate 0.0144 Epoch: 15 Global Step: 164280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:10:09,951-Speed 5442.48 samples/sec Loss 2.3100 LearningRate 0.0144 Epoch: 15 Global Step: 164290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:10:17,483-Speed 5439.18 samples/sec Loss 2.3140 LearningRate 0.0143 Epoch: 15 Global Step: 164300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:10:24,977-Speed 5466.57 samples/sec Loss 2.2965 LearningRate 0.0143 Epoch: 15 Global Step: 164310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:10:32,538-Speed 5417.80 samples/sec Loss 2.2793 LearningRate 0.0143 Epoch: 15 Global Step: 164320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:10:40,103-Speed 5414.98 samples/sec Loss 2.3119 LearningRate 0.0143 Epoch: 15 Global Step: 164330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:10:47,596-Speed 5467.39 samples/sec Loss 2.2913 LearningRate 0.0143 Epoch: 15 Global Step: 164340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:10:55,078-Speed 5474.85 samples/sec Loss 2.3134 LearningRate 0.0143 Epoch: 15 Global Step: 164350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:11:02,614-Speed 5436.30 samples/sec Loss 2.2949 LearningRate 0.0143 Epoch: 15 Global Step: 164360 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 08:11:13,478-Speed 3770.33 samples/sec Loss 2.2940 LearningRate 0.0143 Epoch: 15 Global Step: 164370 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 08:11:21,028-Speed 5426.00 samples/sec Loss 2.3177 LearningRate 0.0143 Epoch: 15 Global Step: 164380 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 08:11:28,526-Speed 5463.77 samples/sec Loss 2.3167 LearningRate 0.0143 Epoch: 15 Global Step: 164390 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 08:11:36,049-Speed 5444.89 samples/sec Loss 2.3438 LearningRate 0.0143 Epoch: 15 Global Step: 164400 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 08:11:43,576-Speed 5442.57 samples/sec Loss 2.3423 LearningRate 0.0143 Epoch: 15 Global Step: 164410 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 08:11:51,126-Speed 5426.44 samples/sec Loss 2.3673 LearningRate 0.0143 Epoch: 15 Global Step: 164420 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 08:11:58,651-Speed 5443.63 samples/sec Loss 2.3258 LearningRate 0.0143 Epoch: 15 Global Step: 164430 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 08:12:06,176-Speed 5443.68 samples/sec Loss 2.3085 LearningRate 0.0143 Epoch: 15 Global Step: 164440 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 08:12:13,670-Speed 5466.57 samples/sec Loss 2.2811 LearningRate 0.0142 Epoch: 15 Global Step: 164450 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-09 08:12:21,171-Speed 5461.34 samples/sec Loss 2.3177 LearningRate 0.0142 Epoch: 15 Global Step: 164460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:12:28,667-Speed 5465.28 samples/sec Loss 2.2982 LearningRate 0.0142 Epoch: 15 Global Step: 164470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:12:36,129-Speed 5489.93 samples/sec Loss 2.3074 LearningRate 0.0142 Epoch: 15 Global Step: 164480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:12:43,613-Speed 5473.55 samples/sec Loss 2.3122 LearningRate 0.0142 Epoch: 15 Global Step: 164490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:12:51,184-Speed 5410.87 samples/sec Loss 2.3064 LearningRate 0.0142 Epoch: 15 Global Step: 164500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:12:58,724-Speed 5433.76 samples/sec Loss 2.2706 LearningRate 0.0142 Epoch: 15 Global Step: 164510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:13:06,254-Speed 5440.04 samples/sec Loss 2.2742 LearningRate 0.0142 Epoch: 15 Global Step: 164520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:13:13,958-Speed 5317.63 samples/sec Loss 2.2922 LearningRate 0.0142 Epoch: 15 Global Step: 164530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:13:21,453-Speed 5465.26 samples/sec Loss 2.2795 LearningRate 0.0142 Epoch: 15 Global Step: 164540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:13:28,959-Speed 5458.59 samples/sec Loss 2.2812 LearningRate 0.0142 Epoch: 15 Global Step: 164550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:13:36,497-Speed 5433.77 samples/sec Loss 2.2658 LearningRate 0.0142 Epoch: 15 Global Step: 164560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:13:43,924-Speed 5516.03 samples/sec Loss 2.3081 LearningRate 0.0142 Epoch: 15 Global Step: 164570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:13:51,534-Speed 5382.85 samples/sec Loss 2.3252 LearningRate 0.0142 Epoch: 15 Global Step: 164580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:13:59,056-Speed 5446.96 samples/sec Loss 2.3067 LearningRate 0.0142 Epoch: 15 Global Step: 164590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:14:06,564-Speed 5455.66 samples/sec Loss 2.3034 LearningRate 0.0141 Epoch: 15 Global Step: 164600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:14:14,116-Speed 5424.18 samples/sec Loss 2.2610 LearningRate 0.0141 Epoch: 15 Global Step: 164610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:14:21,644-Speed 5442.29 samples/sec Loss 2.2952 LearningRate 0.0141 Epoch: 15 Global Step: 164620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:14:29,231-Speed 5399.21 samples/sec Loss 2.2842 LearningRate 0.0141 Epoch: 15 Global Step: 164630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:14:36,748-Speed 5449.85 samples/sec Loss 2.2588 LearningRate 0.0141 Epoch: 15 Global Step: 164640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:14:44,315-Speed 5413.09 samples/sec Loss 2.2833 LearningRate 0.0141 Epoch: 15 Global Step: 164650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:14:51,836-Speed 5447.13 samples/sec Loss 2.2755 LearningRate 0.0141 Epoch: 15 Global Step: 164660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:14:59,531-Speed 5323.66 samples/sec Loss 2.2916 LearningRate 0.0141 Epoch: 15 Global Step: 164670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:15:07,159-Speed 5370.46 samples/sec Loss 2.2741 LearningRate 0.0141 Epoch: 15 Global Step: 164680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:15:14,686-Speed 5441.85 samples/sec Loss 2.2925 LearningRate 0.0141 Epoch: 15 Global Step: 164690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:15:22,166-Speed 5476.46 samples/sec Loss 2.2677 LearningRate 0.0141 Epoch: 15 Global Step: 164700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:15:29,767-Speed 5390.32 samples/sec Loss 2.2903 LearningRate 0.0141 Epoch: 15 Global Step: 164710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:15:37,287-Speed 5447.29 samples/sec Loss 2.3011 LearningRate 0.0141 Epoch: 15 Global Step: 164720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:15:44,741-Speed 5495.12 samples/sec Loss 2.2701 LearningRate 0.0141 Epoch: 15 Global Step: 164730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:15:52,289-Speed 5427.89 samples/sec Loss 2.3066 LearningRate 0.0141 Epoch: 15 Global Step: 164740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:15:59,759-Speed 5483.88 samples/sec Loss 2.2971 LearningRate 0.0140 Epoch: 15 Global Step: 164750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:16:07,275-Speed 5450.63 samples/sec Loss 2.2775 LearningRate 0.0140 Epoch: 15 Global Step: 164760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:16:14,834-Speed 5418.68 samples/sec Loss 2.2661 LearningRate 0.0140 Epoch: 15 Global Step: 164770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:16:22,352-Speed 5449.00 samples/sec Loss 2.2719 LearningRate 0.0140 Epoch: 15 Global Step: 164780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:16:29,897-Speed 5429.73 samples/sec Loss 2.2942 LearningRate 0.0140 Epoch: 15 Global Step: 164790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:16:37,453-Speed 5421.54 samples/sec Loss 2.2819 LearningRate 0.0140 Epoch: 15 Global Step: 164800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:16:44,912-Speed 5492.07 samples/sec Loss 2.3035 LearningRate 0.0140 Epoch: 15 Global Step: 164810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:16:52,608-Speed 5323.16 samples/sec Loss 2.2923 LearningRate 0.0140 Epoch: 15 Global Step: 164820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:17:00,127-Speed 5448.12 samples/sec Loss 2.2822 LearningRate 0.0140 Epoch: 15 Global Step: 164830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:17:07,653-Speed 5442.84 samples/sec Loss 2.2710 LearningRate 0.0140 Epoch: 15 Global Step: 164840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:17:15,192-Speed 5434.30 samples/sec Loss 2.2497 LearningRate 0.0140 Epoch: 15 Global Step: 164850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:17:22,777-Speed 5400.41 samples/sec Loss 2.3055 LearningRate 0.0140 Epoch: 15 Global Step: 164860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:17:30,242-Speed 5488.54 samples/sec Loss 2.3096 LearningRate 0.0140 Epoch: 15 Global Step: 164870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:17:37,844-Speed 5388.52 samples/sec Loss 2.2809 LearningRate 0.0140 Epoch: 15 Global Step: 164880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:17:45,492-Speed 5356.05 samples/sec Loss 2.3119 LearningRate 0.0140 Epoch: 15 Global Step: 164890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:17:53,115-Speed 5374.22 samples/sec Loss 2.2447 LearningRate 0.0139 Epoch: 15 Global Step: 164900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:18:00,695-Speed 5404.11 samples/sec Loss 2.2698 LearningRate 0.0139 Epoch: 15 Global Step: 164910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:18:08,231-Speed 5436.73 samples/sec Loss 2.2949 LearningRate 0.0139 Epoch: 15 Global Step: 164920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:18:15,905-Speed 5337.81 samples/sec Loss 2.2726 LearningRate 0.0139 Epoch: 15 Global Step: 164930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:18:23,482-Speed 5406.67 samples/sec Loss 2.2792 LearningRate 0.0139 Epoch: 15 Global Step: 164940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:18:31,099-Speed 5377.87 samples/sec Loss 2.2714 LearningRate 0.0139 Epoch: 15 Global Step: 164950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:18:38,570-Speed 5483.85 samples/sec Loss 2.2796 LearningRate 0.0139 Epoch: 15 Global Step: 164960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:18:46,117-Speed 5427.59 samples/sec Loss 2.2940 LearningRate 0.0139 Epoch: 15 Global Step: 164970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:18:53,758-Speed 5361.53 samples/sec Loss 2.2566 LearningRate 0.0139 Epoch: 15 Global Step: 164980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:19:01,335-Speed 5406.74 samples/sec Loss 2.2460 LearningRate 0.0139 Epoch: 15 Global Step: 164990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:19:08,880-Speed 5429.76 samples/sec Loss 2.2477 LearningRate 0.0139 Epoch: 15 Global Step: 165000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:19:16,681-Speed 5251.00 samples/sec Loss 2.2238 LearningRate 0.0139 Epoch: 15 Global Step: 165010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:19:24,276-Speed 5393.84 samples/sec Loss 2.3060 LearningRate 0.0139 Epoch: 15 Global Step: 165020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:19:31,804-Speed 5441.24 samples/sec Loss 2.2709 LearningRate 0.0139 Epoch: 15 Global Step: 165030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:19:39,296-Speed 5468.34 samples/sec Loss 2.2596 LearningRate 0.0139 Epoch: 15 Global Step: 165040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 08:19:46,710-Speed 5525.64 samples/sec Loss 2.2809 LearningRate 0.0138 Epoch: 15 Global Step: 165050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:19:54,223-Speed 5451.87 samples/sec Loss 2.2809 LearningRate 0.0138 Epoch: 15 Global Step: 165060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:20:01,808-Speed 5400.84 samples/sec Loss 2.2666 LearningRate 0.0138 Epoch: 15 Global Step: 165070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:20:09,296-Speed 5471.19 samples/sec Loss 2.2893 LearningRate 0.0138 Epoch: 15 Global Step: 165080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:20:16,835-Speed 5434.28 samples/sec Loss 2.2437 LearningRate 0.0138 Epoch: 15 Global Step: 165090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:20:24,389-Speed 5422.15 samples/sec Loss 2.3016 LearningRate 0.0138 Epoch: 15 Global Step: 165100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:20:31,934-Speed 5429.30 samples/sec Loss 2.2633 LearningRate 0.0138 Epoch: 15 Global Step: 165110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:20:39,367-Speed 5511.46 samples/sec Loss 2.2889 LearningRate 0.0138 Epoch: 15 Global Step: 165120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:20:46,989-Speed 5375.08 samples/sec Loss 2.2779 LearningRate 0.0138 Epoch: 15 Global Step: 165130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:20:54,574-Speed 5400.70 samples/sec Loss 2.2685 LearningRate 0.0138 Epoch: 15 Global Step: 165140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:21:02,088-Speed 5451.96 samples/sec Loss 2.2144 LearningRate 0.0138 Epoch: 15 Global Step: 165150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:21:09,598-Speed 5454.27 samples/sec Loss 2.2763 LearningRate 0.0138 Epoch: 15 Global Step: 165160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:21:17,231-Speed 5367.08 samples/sec Loss 2.2578 LearningRate 0.0138 Epoch: 15 Global Step: 165170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:21:24,740-Speed 5456.04 samples/sec Loss 2.2182 LearningRate 0.0138 Epoch: 15 Global Step: 165180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:21:32,370-Speed 5368.56 samples/sec Loss 2.2535 LearningRate 0.0138 Epoch: 15 Global Step: 165190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:21:39,863-Speed 5467.08 samples/sec Loss 2.3030 LearningRate 0.0138 Epoch: 15 Global Step: 165200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:21:47,406-Speed 5431.15 samples/sec Loss 2.2580 LearningRate 0.0137 Epoch: 15 Global Step: 165210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:21:54,876-Speed 5483.92 samples/sec Loss 2.2639 LearningRate 0.0137 Epoch: 15 Global Step: 165220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:22:02,393-Speed 5449.74 samples/sec Loss 2.2375 LearningRate 0.0137 Epoch: 15 Global Step: 165230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:22:09,873-Speed 5477.02 samples/sec Loss 2.2310 LearningRate 0.0137 Epoch: 15 Global Step: 165240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:22:17,374-Speed 5461.70 samples/sec Loss 2.2755 LearningRate 0.0137 Epoch: 15 Global Step: 165250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:22:24,847-Speed 5481.82 samples/sec Loss 2.2426 LearningRate 0.0137 Epoch: 15 Global Step: 165260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:22:32,280-Speed 5510.79 samples/sec Loss 2.2664 LearningRate 0.0137 Epoch: 15 Global Step: 165270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:22:39,884-Speed 5387.57 samples/sec Loss 2.2991 LearningRate 0.0137 Epoch: 15 Global Step: 165280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:22:47,382-Speed 5463.32 samples/sec Loss 2.2188 LearningRate 0.0137 Epoch: 15 Global Step: 165290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:22:54,871-Speed 5470.44 samples/sec Loss 2.2672 LearningRate 0.0137 Epoch: 15 Global Step: 165300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:23:02,366-Speed 5465.14 samples/sec Loss 2.2513 LearningRate 0.0137 Epoch: 15 Global Step: 165310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:23:09,916-Speed 5426.57 samples/sec Loss 2.2374 LearningRate 0.0137 Epoch: 15 Global Step: 165320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:23:17,489-Speed 5409.40 samples/sec Loss 2.2429 LearningRate 0.0137 Epoch: 15 Global Step: 165330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:23:25,144-Speed 5351.41 samples/sec Loss 2.2924 LearningRate 0.0137 Epoch: 15 Global Step: 165340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:23:32,713-Speed 5412.41 samples/sec Loss 2.2581 LearningRate 0.0137 Epoch: 15 Global Step: 165350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:23:40,257-Speed 5429.87 samples/sec Loss 2.2635 LearningRate 0.0136 Epoch: 15 Global Step: 165360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:23:47,729-Speed 5482.21 samples/sec Loss 2.2578 LearningRate 0.0136 Epoch: 15 Global Step: 165370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:23:55,202-Speed 5481.82 samples/sec Loss 2.2465 LearningRate 0.0136 Epoch: 15 Global Step: 165380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:24:02,690-Speed 5470.80 samples/sec Loss 2.2547 LearningRate 0.0136 Epoch: 15 Global Step: 165390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:24:10,277-Speed 5399.71 samples/sec Loss 2.2781 LearningRate 0.0136 Epoch: 15 Global Step: 165400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:24:17,764-Speed 5471.34 samples/sec Loss 2.2675 LearningRate 0.0136 Epoch: 15 Global Step: 165410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:24:25,241-Speed 5478.65 samples/sec Loss 2.2409 LearningRate 0.0136 Epoch: 15 Global Step: 165420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:24:32,914-Speed 5339.16 samples/sec Loss 2.2458 LearningRate 0.0136 Epoch: 15 Global Step: 165430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:24:40,492-Speed 5405.56 samples/sec Loss 2.2698 LearningRate 0.0136 Epoch: 15 Global Step: 165440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:24:47,986-Speed 5466.23 samples/sec Loss 2.2128 LearningRate 0.0136 Epoch: 15 Global Step: 165450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:24:55,477-Speed 5469.33 samples/sec Loss 2.2337 LearningRate 0.0136 Epoch: 15 Global Step: 165460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:25:02,974-Speed 5464.13 samples/sec Loss 2.2667 LearningRate 0.0136 Epoch: 15 Global Step: 165470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:25:10,527-Speed 5423.69 samples/sec Loss 2.2467 LearningRate 0.0136 Epoch: 15 Global Step: 165480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:25:18,086-Speed 5419.54 samples/sec Loss 2.2220 LearningRate 0.0136 Epoch: 15 Global Step: 165490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:25:25,594-Speed 5456.08 samples/sec Loss 2.2552 LearningRate 0.0136 Epoch: 15 Global Step: 165500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:25:33,106-Speed 5452.99 samples/sec Loss 2.2271 LearningRate 0.0136 Epoch: 15 Global Step: 165510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:25:40,602-Speed 5465.37 samples/sec Loss 2.2577 LearningRate 0.0135 Epoch: 15 Global Step: 165520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-09 08:25:48,125-Speed 5445.05 samples/sec Loss 2.2166 LearningRate 0.0135 Epoch: 15 Global Step: 165530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:25:55,598-Speed 5481.32 samples/sec Loss 2.2438 LearningRate 0.0135 Epoch: 15 Global Step: 165540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:26:03,142-Speed 5430.70 samples/sec Loss 2.2413 LearningRate 0.0135 Epoch: 15 Global Step: 165550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:26:10,662-Speed 5447.64 samples/sec Loss 2.2685 LearningRate 0.0135 Epoch: 15 Global Step: 165560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:26:18,162-Speed 5462.18 samples/sec Loss 2.2547 LearningRate 0.0135 Epoch: 15 Global Step: 165570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:26:25,648-Speed 5472.09 samples/sec Loss 2.2419 LearningRate 0.0135 Epoch: 15 Global Step: 165580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:26:33,185-Speed 5435.54 samples/sec Loss 2.2337 LearningRate 0.0135 Epoch: 15 Global Step: 165590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:26:40,796-Speed 5381.97 samples/sec Loss 2.2312 LearningRate 0.0135 Epoch: 15 Global Step: 165600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:26:48,286-Speed 5469.40 samples/sec Loss 2.2401 LearningRate 0.0135 Epoch: 15 Global Step: 165610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:26:55,741-Speed 5495.71 samples/sec Loss 2.2841 LearningRate 0.0135 Epoch: 15 Global Step: 165620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:27:03,351-Speed 5382.70 samples/sec Loss 2.2469 LearningRate 0.0135 Epoch: 15 Global Step: 165630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:27:10,904-Speed 5423.44 samples/sec Loss 2.2658 LearningRate 0.0135 Epoch: 15 Global Step: 165640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:27:18,416-Speed 5454.09 samples/sec Loss 2.2525 LearningRate 0.0135 Epoch: 15 Global Step: 165650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:27:25,922-Speed 5457.03 samples/sec Loss 2.2847 LearningRate 0.0135 Epoch: 15 Global Step: 165660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:27:33,514-Speed 5395.69 samples/sec Loss 2.2453 LearningRate 0.0134 Epoch: 15 Global Step: 165670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:27:41,068-Speed 5422.67 samples/sec Loss 2.2384 LearningRate 0.0134 Epoch: 15 Global Step: 165680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:27:48,557-Speed 5470.27 samples/sec Loss 2.2253 LearningRate 0.0134 Epoch: 15 Global Step: 165690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:27:56,010-Speed 5497.29 samples/sec Loss 2.2296 LearningRate 0.0134 Epoch: 15 Global Step: 165700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:28:03,668-Speed 5348.85 samples/sec Loss 2.2355 LearningRate 0.0134 Epoch: 15 Global Step: 165710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:28:11,227-Speed 5419.36 samples/sec Loss 2.2464 LearningRate 0.0134 Epoch: 15 Global Step: 165720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:28:18,840-Speed 5380.81 samples/sec Loss 2.2414 LearningRate 0.0134 Epoch: 15 Global Step: 165730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:28:26,381-Speed 5432.64 samples/sec Loss 2.2507 LearningRate 0.0134 Epoch: 15 Global Step: 165740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:28:33,887-Speed 5457.35 samples/sec Loss 2.2141 LearningRate 0.0134 Epoch: 15 Global Step: 165750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:28:41,354-Speed 5486.24 samples/sec Loss 2.2425 LearningRate 0.0134 Epoch: 15 Global Step: 165760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:28:48,795-Speed 5506.39 samples/sec Loss 2.2145 LearningRate 0.0134 Epoch: 15 Global Step: 165770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:28:56,333-Speed 5434.21 samples/sec Loss 2.2443 LearningRate 0.0134 Epoch: 15 Global Step: 165780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:29:03,923-Speed 5397.29 samples/sec Loss 2.2001 LearningRate 0.0134 Epoch: 15 Global Step: 165790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:29:11,396-Speed 5481.40 samples/sec Loss 2.2242 LearningRate 0.0134 Epoch: 15 Global Step: 165800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-09 08:29:18,897-Speed 5461.22 samples/sec Loss 2.2807 LearningRate 0.0134 Epoch: 15 Global Step: 165810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:29:26,477-Speed 5404.66 samples/sec Loss 2.2608 LearningRate 0.0134 Epoch: 15 Global Step: 165820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:29:34,173-Speed 5322.99 samples/sec Loss 2.2108 LearningRate 0.0133 Epoch: 15 Global Step: 165830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:29:41,685-Speed 5453.40 samples/sec Loss 2.2353 LearningRate 0.0133 Epoch: 15 Global Step: 165840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:29:49,187-Speed 5459.80 samples/sec Loss 2.2713 LearningRate 0.0133 Epoch: 15 Global Step: 165850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:29:56,668-Speed 5476.48 samples/sec Loss 2.2404 LearningRate 0.0133 Epoch: 15 Global Step: 165860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:30:04,238-Speed 5411.80 samples/sec Loss 2.2219 LearningRate 0.0133 Epoch: 15 Global Step: 165870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:30:11,780-Speed 5431.01 samples/sec Loss 2.2585 LearningRate 0.0133 Epoch: 15 Global Step: 165880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:30:19,331-Speed 5425.12 samples/sec Loss 2.2447 LearningRate 0.0133 Epoch: 15 Global Step: 165890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-09 08:30:26,802-Speed 5483.36 samples/sec Loss 2.2248 LearningRate 0.0133 Epoch: 15 Global Step: 165900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:30:34,334-Speed 5439.30 samples/sec Loss 2.2243 LearningRate 0.0133 Epoch: 15 Global Step: 165910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:30:56,880-Speed 1816.77 samples/sec Loss 2.2215 LearningRate 0.0133 Epoch: 16 Global Step: 165920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:31:04,344-Speed 5488.79 samples/sec Loss 2.1997 LearningRate 0.0133 Epoch: 16 Global Step: 165930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:31:11,757-Speed 5526.14 samples/sec Loss 2.2263 LearningRate 0.0133 Epoch: 16 Global Step: 165940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:31:19,177-Speed 5520.82 samples/sec Loss 2.2306 LearningRate 0.0133 Epoch: 16 Global Step: 165950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:31:26,602-Speed 5517.43 samples/sec Loss 2.2228 LearningRate 0.0133 Epoch: 16 Global Step: 165960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:31:33,982-Speed 5550.89 samples/sec Loss 2.2080 LearningRate 0.0133 Epoch: 16 Global Step: 165970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:31:41,436-Speed 5496.10 samples/sec Loss 2.2253 LearningRate 0.0132 Epoch: 16 Global Step: 165980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:31:48,832-Speed 5538.74 samples/sec Loss 2.2085 LearningRate 0.0132 Epoch: 16 Global Step: 165990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:31:56,228-Speed 5539.03 samples/sec Loss 2.2101 LearningRate 0.0132 Epoch: 16 Global Step: 166000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:32:40,156-[lfw][166000]XNorm: 22.020784 Training: 2022-01-09 08:32:40,156-[lfw][166000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 08:32:40,157-[lfw][166000]Accuracy-Highest: 0.99833 Training: 2022-01-09 08:33:31,765-[cfp_fp][166000]XNorm: 21.145533 Training: 2022-01-09 08:33:31,766-[cfp_fp][166000]Accuracy-Flip: 0.99214+-0.00508 Training: 2022-01-09 08:33:31,766-[cfp_fp][166000]Accuracy-Highest: 0.99371 Training: 2022-01-09 08:34:15,727-[agedb_30][166000]XNorm: 22.243697 Training: 2022-01-09 08:34:15,728-[agedb_30][166000]Accuracy-Flip: 0.98317+-0.00724 Training: 2022-01-09 08:34:15,728-[agedb_30][166000]Accuracy-Highest: 0.98333 Training: 2022-01-09 08:34:23,058-Speed 278.96 samples/sec Loss 2.1959 LearningRate 0.0132 Epoch: 16 Global Step: 166010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:34:30,585-Speed 5442.74 samples/sec Loss 2.2154 LearningRate 0.0132 Epoch: 16 Global Step: 166020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:34:38,045-Speed 5490.82 samples/sec Loss 2.2333 LearningRate 0.0132 Epoch: 16 Global Step: 166030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:34:45,505-Speed 5491.62 samples/sec Loss 2.2557 LearningRate 0.0132 Epoch: 16 Global Step: 166040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:34:52,965-Speed 5492.13 samples/sec Loss 2.2308 LearningRate 0.0132 Epoch: 16 Global Step: 166050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:35:00,420-Speed 5494.33 samples/sec Loss 2.1811 LearningRate 0.0132 Epoch: 16 Global Step: 166060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:35:07,913-Speed 5467.22 samples/sec Loss 2.1791 LearningRate 0.0132 Epoch: 16 Global Step: 166070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:35:15,412-Speed 5463.65 samples/sec Loss 2.2115 LearningRate 0.0132 Epoch: 16 Global Step: 166080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:35:22,883-Speed 5482.99 samples/sec Loss 2.2223 LearningRate 0.0132 Epoch: 16 Global Step: 166090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:35:30,397-Speed 5451.54 samples/sec Loss 2.2211 LearningRate 0.0132 Epoch: 16 Global Step: 166100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:35:37,869-Speed 5482.64 samples/sec Loss 2.1495 LearningRate 0.0132 Epoch: 16 Global Step: 166110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:35:45,341-Speed 5482.91 samples/sec Loss 2.2062 LearningRate 0.0132 Epoch: 16 Global Step: 166120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:35:52,835-Speed 5466.76 samples/sec Loss 2.2347 LearningRate 0.0132 Epoch: 16 Global Step: 166130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:36:00,268-Speed 5511.07 samples/sec Loss 2.1751 LearningRate 0.0131 Epoch: 16 Global Step: 166140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:36:07,780-Speed 5453.20 samples/sec Loss 2.1991 LearningRate 0.0131 Epoch: 16 Global Step: 166150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:36:15,277-Speed 5464.96 samples/sec Loss 2.2094 LearningRate 0.0131 Epoch: 16 Global Step: 166160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:36:22,920-Speed 5359.73 samples/sec Loss 2.2128 LearningRate 0.0131 Epoch: 16 Global Step: 166170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:36:30,474-Speed 5422.64 samples/sec Loss 2.1799 LearningRate 0.0131 Epoch: 16 Global Step: 166180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:36:38,045-Speed 5410.86 samples/sec Loss 2.2027 LearningRate 0.0131 Epoch: 16 Global Step: 166190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:36:45,521-Speed 5479.84 samples/sec Loss 2.2183 LearningRate 0.0131 Epoch: 16 Global Step: 166200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:36:53,061-Speed 5432.74 samples/sec Loss 2.2234 LearningRate 0.0131 Epoch: 16 Global Step: 166210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:37:00,516-Speed 5495.30 samples/sec Loss 2.1805 LearningRate 0.0131 Epoch: 16 Global Step: 166220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:37:07,989-Speed 5481.70 samples/sec Loss 2.2037 LearningRate 0.0131 Epoch: 16 Global Step: 166230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:37:15,524-Speed 5436.76 samples/sec Loss 2.1948 LearningRate 0.0131 Epoch: 16 Global Step: 166240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:37:23,011-Speed 5471.59 samples/sec Loss 2.1553 LearningRate 0.0131 Epoch: 16 Global Step: 166250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:37:30,430-Speed 5521.36 samples/sec Loss 2.1657 LearningRate 0.0131 Epoch: 16 Global Step: 166260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:37:37,905-Speed 5480.59 samples/sec Loss 2.1964 LearningRate 0.0131 Epoch: 16 Global Step: 166270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:37:45,427-Speed 5445.64 samples/sec Loss 2.2094 LearningRate 0.0131 Epoch: 16 Global Step: 166280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:37:52,846-Speed 5522.63 samples/sec Loss 2.2165 LearningRate 0.0131 Epoch: 16 Global Step: 166290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:38:00,378-Speed 5438.97 samples/sec Loss 2.1979 LearningRate 0.0130 Epoch: 16 Global Step: 166300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:38:07,853-Speed 5479.94 samples/sec Loss 2.1687 LearningRate 0.0130 Epoch: 16 Global Step: 166310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:38:15,361-Speed 5455.96 samples/sec Loss 2.2148 LearningRate 0.0130 Epoch: 16 Global Step: 166320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:38:22,930-Speed 5412.53 samples/sec Loss 2.1638 LearningRate 0.0130 Epoch: 16 Global Step: 166330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:38:30,428-Speed 5463.66 samples/sec Loss 2.1973 LearningRate 0.0130 Epoch: 16 Global Step: 166340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:38:37,940-Speed 5453.47 samples/sec Loss 2.1796 LearningRate 0.0130 Epoch: 16 Global Step: 166350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:38:45,482-Speed 5431.84 samples/sec Loss 2.2122 LearningRate 0.0130 Epoch: 16 Global Step: 166360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:38:53,039-Speed 5421.01 samples/sec Loss 2.1762 LearningRate 0.0130 Epoch: 16 Global Step: 166370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:39:00,523-Speed 5474.24 samples/sec Loss 2.2133 LearningRate 0.0130 Epoch: 16 Global Step: 166380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:39:08,048-Speed 5443.48 samples/sec Loss 2.1917 LearningRate 0.0130 Epoch: 16 Global Step: 166390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:39:15,512-Speed 5488.36 samples/sec Loss 2.1936 LearningRate 0.0130 Epoch: 16 Global Step: 166400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:39:22,939-Speed 5516.42 samples/sec Loss 2.2211 LearningRate 0.0130 Epoch: 16 Global Step: 166410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:39:30,428-Speed 5469.94 samples/sec Loss 2.2365 LearningRate 0.0130 Epoch: 16 Global Step: 166420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:39:37,929-Speed 5460.81 samples/sec Loss 2.2044 LearningRate 0.0130 Epoch: 16 Global Step: 166430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:39:45,423-Speed 5466.88 samples/sec Loss 2.2037 LearningRate 0.0130 Epoch: 16 Global Step: 166440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:39:52,913-Speed 5469.12 samples/sec Loss 2.2340 LearningRate 0.0129 Epoch: 16 Global Step: 166450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:40:00,316-Speed 5533.84 samples/sec Loss 2.1877 LearningRate 0.0129 Epoch: 16 Global Step: 166460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:40:07,843-Speed 5442.66 samples/sec Loss 2.1991 LearningRate 0.0129 Epoch: 16 Global Step: 166470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:40:15,371-Speed 5441.85 samples/sec Loss 2.1952 LearningRate 0.0129 Epoch: 16 Global Step: 166480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:40:22,963-Speed 5395.70 samples/sec Loss 2.1876 LearningRate 0.0129 Epoch: 16 Global Step: 166490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:40:30,430-Speed 5486.51 samples/sec Loss 2.2331 LearningRate 0.0129 Epoch: 16 Global Step: 166500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:40:37,885-Speed 5495.04 samples/sec Loss 2.1899 LearningRate 0.0129 Epoch: 16 Global Step: 166510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:40:45,401-Speed 5449.81 samples/sec Loss 2.1949 LearningRate 0.0129 Epoch: 16 Global Step: 166520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:40:52,899-Speed 5464.10 samples/sec Loss 2.1893 LearningRate 0.0129 Epoch: 16 Global Step: 166530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:41:00,454-Speed 5422.38 samples/sec Loss 2.2195 LearningRate 0.0129 Epoch: 16 Global Step: 166540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:41:07,942-Speed 5470.82 samples/sec Loss 2.1667 LearningRate 0.0129 Epoch: 16 Global Step: 166550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:41:15,430-Speed 5470.19 samples/sec Loss 2.1732 LearningRate 0.0129 Epoch: 16 Global Step: 166560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:41:22,914-Speed 5473.97 samples/sec Loss 2.1971 LearningRate 0.0129 Epoch: 16 Global Step: 166570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:41:30,356-Speed 5504.89 samples/sec Loss 2.1702 LearningRate 0.0129 Epoch: 16 Global Step: 166580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:41:37,876-Speed 5447.55 samples/sec Loss 2.2122 LearningRate 0.0129 Epoch: 16 Global Step: 166590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:41:45,388-Speed 5453.15 samples/sec Loss 2.1957 LearningRate 0.0129 Epoch: 16 Global Step: 166600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:41:52,994-Speed 5386.12 samples/sec Loss 2.1840 LearningRate 0.0128 Epoch: 16 Global Step: 166610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:42:00,641-Speed 5357.59 samples/sec Loss 2.1979 LearningRate 0.0128 Epoch: 16 Global Step: 166620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:42:08,235-Speed 5394.34 samples/sec Loss 2.2118 LearningRate 0.0128 Epoch: 16 Global Step: 166630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:42:15,920-Speed 5330.23 samples/sec Loss 2.2081 LearningRate 0.0128 Epoch: 16 Global Step: 166640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:42:23,423-Speed 5460.49 samples/sec Loss 2.1699 LearningRate 0.0128 Epoch: 16 Global Step: 166650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:42:30,825-Speed 5534.06 samples/sec Loss 2.2002 LearningRate 0.0128 Epoch: 16 Global Step: 166660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:42:38,404-Speed 5405.10 samples/sec Loss 2.1704 LearningRate 0.0128 Epoch: 16 Global Step: 166670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:42:46,060-Speed 5350.43 samples/sec Loss 2.2162 LearningRate 0.0128 Epoch: 16 Global Step: 166680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:42:53,704-Speed 5359.14 samples/sec Loss 2.1745 LearningRate 0.0128 Epoch: 16 Global Step: 166690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:43:01,125-Speed 5520.37 samples/sec Loss 2.1866 LearningRate 0.0128 Epoch: 16 Global Step: 166700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:43:08,642-Speed 5449.33 samples/sec Loss 2.1979 LearningRate 0.0128 Epoch: 16 Global Step: 166710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:43:16,206-Speed 5416.36 samples/sec Loss 2.2273 LearningRate 0.0128 Epoch: 16 Global Step: 166720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:43:23,792-Speed 5399.90 samples/sec Loss 2.1552 LearningRate 0.0128 Epoch: 16 Global Step: 166730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:43:31,297-Speed 5458.71 samples/sec Loss 2.2059 LearningRate 0.0128 Epoch: 16 Global Step: 166740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:43:38,837-Speed 5432.76 samples/sec Loss 2.2059 LearningRate 0.0128 Epoch: 16 Global Step: 166750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:43:46,397-Speed 5418.72 samples/sec Loss 2.1957 LearningRate 0.0128 Epoch: 16 Global Step: 166760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:43:53,906-Speed 5455.09 samples/sec Loss 2.1941 LearningRate 0.0127 Epoch: 16 Global Step: 166770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:44:01,371-Speed 5488.19 samples/sec Loss 2.1832 LearningRate 0.0127 Epoch: 16 Global Step: 166780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:44:08,887-Speed 5450.45 samples/sec Loss 2.1508 LearningRate 0.0127 Epoch: 16 Global Step: 166790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:44:16,407-Speed 5447.42 samples/sec Loss 2.1380 LearningRate 0.0127 Epoch: 16 Global Step: 166800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:44:23,922-Speed 5451.29 samples/sec Loss 2.1674 LearningRate 0.0127 Epoch: 16 Global Step: 166810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 08:44:31,478-Speed 5422.00 samples/sec Loss 2.1770 LearningRate 0.0127 Epoch: 16 Global Step: 166820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:44:38,964-Speed 5471.91 samples/sec Loss 2.1719 LearningRate 0.0127 Epoch: 16 Global Step: 166830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:44:46,724-Speed 5278.74 samples/sec Loss 2.1850 LearningRate 0.0127 Epoch: 16 Global Step: 166840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:44:54,345-Speed 5375.63 samples/sec Loss 2.1559 LearningRate 0.0127 Epoch: 16 Global Step: 166850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:45:01,911-Speed 5414.55 samples/sec Loss 2.1680 LearningRate 0.0127 Epoch: 16 Global Step: 166860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:45:09,420-Speed 5455.30 samples/sec Loss 2.1920 LearningRate 0.0127 Epoch: 16 Global Step: 166870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:45:17,031-Speed 5382.44 samples/sec Loss 2.2027 LearningRate 0.0127 Epoch: 16 Global Step: 166880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:45:24,495-Speed 5488.07 samples/sec Loss 2.1799 LearningRate 0.0127 Epoch: 16 Global Step: 166890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:45:32,030-Speed 5437.15 samples/sec Loss 2.1393 LearningRate 0.0127 Epoch: 16 Global Step: 166900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:45:39,565-Speed 5436.31 samples/sec Loss 2.1786 LearningRate 0.0127 Epoch: 16 Global Step: 166910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:45:47,151-Speed 5399.93 samples/sec Loss 2.1896 LearningRate 0.0127 Epoch: 16 Global Step: 166920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 08:45:54,645-Speed 5466.60 samples/sec Loss 2.1532 LearningRate 0.0126 Epoch: 16 Global Step: 166930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:46:02,174-Speed 5440.86 samples/sec Loss 2.1582 LearningRate 0.0126 Epoch: 16 Global Step: 166940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:46:09,778-Speed 5387.68 samples/sec Loss 2.1727 LearningRate 0.0126 Epoch: 16 Global Step: 166950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:46:17,496-Speed 5307.47 samples/sec Loss 2.1619 LearningRate 0.0126 Epoch: 16 Global Step: 166960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:46:25,035-Speed 5434.03 samples/sec Loss 2.1603 LearningRate 0.0126 Epoch: 16 Global Step: 166970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:46:32,571-Speed 5436.12 samples/sec Loss 2.1952 LearningRate 0.0126 Epoch: 16 Global Step: 166980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:46:40,010-Speed 5506.08 samples/sec Loss 2.1642 LearningRate 0.0126 Epoch: 16 Global Step: 166990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:46:47,592-Speed 5403.23 samples/sec Loss 2.1727 LearningRate 0.0126 Epoch: 16 Global Step: 167000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:46:55,112-Speed 5447.85 samples/sec Loss 2.1717 LearningRate 0.0126 Epoch: 16 Global Step: 167010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:47:02,654-Speed 5431.25 samples/sec Loss 2.1570 LearningRate 0.0126 Epoch: 16 Global Step: 167020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:47:10,149-Speed 5465.48 samples/sec Loss 2.1536 LearningRate 0.0126 Epoch: 16 Global Step: 167030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:47:17,589-Speed 5506.14 samples/sec Loss 2.1629 LearningRate 0.0126 Epoch: 16 Global Step: 167040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:47:25,142-Speed 5424.10 samples/sec Loss 2.1527 LearningRate 0.0126 Epoch: 16 Global Step: 167050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:47:32,594-Speed 5497.42 samples/sec Loss 2.1930 LearningRate 0.0126 Epoch: 16 Global Step: 167060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:47:40,042-Speed 5500.29 samples/sec Loss 2.1404 LearningRate 0.0126 Epoch: 16 Global Step: 167070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:47:47,576-Speed 5437.25 samples/sec Loss 2.1443 LearningRate 0.0126 Epoch: 16 Global Step: 167080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:47:55,062-Speed 5472.69 samples/sec Loss 2.1400 LearningRate 0.0125 Epoch: 16 Global Step: 167090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:48:02,542-Speed 5476.77 samples/sec Loss 2.1762 LearningRate 0.0125 Epoch: 16 Global Step: 167100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:48:10,159-Speed 5377.63 samples/sec Loss 2.1757 LearningRate 0.0125 Epoch: 16 Global Step: 167110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:48:17,740-Speed 5404.24 samples/sec Loss 2.1389 LearningRate 0.0125 Epoch: 16 Global Step: 167120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:48:25,189-Speed 5499.44 samples/sec Loss 2.1682 LearningRate 0.0125 Epoch: 16 Global Step: 167130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:48:32,623-Speed 5510.84 samples/sec Loss 2.1743 LearningRate 0.0125 Epoch: 16 Global Step: 167140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:48:40,093-Speed 5483.52 samples/sec Loss 2.1954 LearningRate 0.0125 Epoch: 16 Global Step: 167150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:48:47,528-Speed 5509.63 samples/sec Loss 2.1457 LearningRate 0.0125 Epoch: 16 Global Step: 167160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:48:55,108-Speed 5404.21 samples/sec Loss 2.1503 LearningRate 0.0125 Epoch: 16 Global Step: 167170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:49:02,577-Speed 5485.55 samples/sec Loss 2.1490 LearningRate 0.0125 Epoch: 16 Global Step: 167180 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-01-09 08:49:10,040-Speed 5489.29 samples/sec Loss 2.1644 LearningRate 0.0125 Epoch: 16 Global Step: 167190 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-01-09 08:49:17,527-Speed 5471.30 samples/sec Loss 2.1432 LearningRate 0.0125 Epoch: 16 Global Step: 167200 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-01-09 08:49:25,013-Speed 5471.94 samples/sec Loss 2.1652 LearningRate 0.0125 Epoch: 16 Global Step: 167210 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-01-09 08:49:32,493-Speed 5476.82 samples/sec Loss 2.1568 LearningRate 0.0125 Epoch: 16 Global Step: 167220 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-01-09 08:49:40,010-Speed 5450.16 samples/sec Loss 2.1705 LearningRate 0.0125 Epoch: 16 Global Step: 167230 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-01-09 08:49:47,637-Speed 5370.56 samples/sec Loss 2.1639 LearningRate 0.0125 Epoch: 16 Global Step: 167240 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-01-09 08:49:55,197-Speed 5418.62 samples/sec Loss 2.1369 LearningRate 0.0124 Epoch: 16 Global Step: 167250 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-01-09 08:50:02,770-Speed 5409.94 samples/sec Loss 2.1526 LearningRate 0.0124 Epoch: 16 Global Step: 167260 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-01-09 08:50:10,313-Speed 5431.24 samples/sec Loss 2.1364 LearningRate 0.0124 Epoch: 16 Global Step: 167270 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-01-09 08:50:17,858-Speed 5428.77 samples/sec Loss 2.1515 LearningRate 0.0124 Epoch: 16 Global Step: 167280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:50:25,463-Speed 5387.30 samples/sec Loss 2.1340 LearningRate 0.0124 Epoch: 16 Global Step: 167290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:50:33,002-Speed 5433.59 samples/sec Loss 2.1249 LearningRate 0.0124 Epoch: 16 Global Step: 167300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:50:40,564-Speed 5418.00 samples/sec Loss 2.1590 LearningRate 0.0124 Epoch: 16 Global Step: 167310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:50:48,099-Speed 5435.84 samples/sec Loss 2.1069 LearningRate 0.0124 Epoch: 16 Global Step: 167320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:50:55,695-Speed 5393.08 samples/sec Loss 2.1301 LearningRate 0.0124 Epoch: 16 Global Step: 167330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:51:03,336-Speed 5361.48 samples/sec Loss 2.1427 LearningRate 0.0124 Epoch: 16 Global Step: 167340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:51:11,026-Speed 5327.27 samples/sec Loss 2.1443 LearningRate 0.0124 Epoch: 16 Global Step: 167350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:51:18,734-Speed 5314.08 samples/sec Loss 2.1044 LearningRate 0.0124 Epoch: 16 Global Step: 167360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:51:26,276-Speed 5431.72 samples/sec Loss 2.1529 LearningRate 0.0124 Epoch: 16 Global Step: 167370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:51:33,832-Speed 5421.35 samples/sec Loss 2.1764 LearningRate 0.0124 Epoch: 16 Global Step: 167380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:51:41,379-Speed 5428.60 samples/sec Loss 2.1470 LearningRate 0.0124 Epoch: 16 Global Step: 167390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:51:49,007-Speed 5370.28 samples/sec Loss 2.1048 LearningRate 0.0124 Epoch: 16 Global Step: 167400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:51:56,538-Speed 5439.67 samples/sec Loss 2.1191 LearningRate 0.0123 Epoch: 16 Global Step: 167410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:52:04,110-Speed 5410.32 samples/sec Loss 2.1176 LearningRate 0.0123 Epoch: 16 Global Step: 167420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:52:11,784-Speed 5338.42 samples/sec Loss 2.1698 LearningRate 0.0123 Epoch: 16 Global Step: 167430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:52:19,285-Speed 5460.99 samples/sec Loss 2.1178 LearningRate 0.0123 Epoch: 16 Global Step: 167440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:52:26,821-Speed 5435.36 samples/sec Loss 2.1076 LearningRate 0.0123 Epoch: 16 Global Step: 167450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:52:34,277-Speed 5494.90 samples/sec Loss 2.1446 LearningRate 0.0123 Epoch: 16 Global Step: 167460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:52:41,750-Speed 5481.52 samples/sec Loss 2.1380 LearningRate 0.0123 Epoch: 16 Global Step: 167470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:52:49,308-Speed 5420.39 samples/sec Loss 2.1689 LearningRate 0.0123 Epoch: 16 Global Step: 167480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:52:56,808-Speed 5462.07 samples/sec Loss 2.1544 LearningRate 0.0123 Epoch: 16 Global Step: 167490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:53:04,339-Speed 5439.13 samples/sec Loss 2.1096 LearningRate 0.0123 Epoch: 16 Global Step: 167500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:53:11,790-Speed 5497.93 samples/sec Loss 2.1739 LearningRate 0.0123 Epoch: 16 Global Step: 167510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:53:19,232-Speed 5504.74 samples/sec Loss 2.1415 LearningRate 0.0123 Epoch: 16 Global Step: 167520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:53:26,747-Speed 5451.26 samples/sec Loss 2.1296 LearningRate 0.0123 Epoch: 16 Global Step: 167530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:53:34,203-Speed 5494.16 samples/sec Loss 2.1598 LearningRate 0.0123 Epoch: 16 Global Step: 167540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:53:41,721-Speed 5449.18 samples/sec Loss 2.1430 LearningRate 0.0123 Epoch: 16 Global Step: 167550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:53:49,411-Speed 5326.89 samples/sec Loss 2.1760 LearningRate 0.0123 Epoch: 16 Global Step: 167560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:53:56,868-Speed 5493.61 samples/sec Loss 2.1304 LearningRate 0.0122 Epoch: 16 Global Step: 167570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:54:04,470-Speed 5388.63 samples/sec Loss 2.1295 LearningRate 0.0122 Epoch: 16 Global Step: 167580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:54:11,955-Speed 5473.36 samples/sec Loss 2.1257 LearningRate 0.0122 Epoch: 16 Global Step: 167590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:54:19,496-Speed 5432.28 samples/sec Loss 2.1241 LearningRate 0.0122 Epoch: 16 Global Step: 167600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:54:27,331-Speed 5228.21 samples/sec Loss 2.0967 LearningRate 0.0122 Epoch: 16 Global Step: 167610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:54:34,988-Speed 5350.49 samples/sec Loss 2.1481 LearningRate 0.0122 Epoch: 16 Global Step: 167620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:54:42,536-Speed 5426.92 samples/sec Loss 2.1614 LearningRate 0.0122 Epoch: 16 Global Step: 167630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:54:50,012-Speed 5479.25 samples/sec Loss 2.1335 LearningRate 0.0122 Epoch: 16 Global Step: 167640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:54:57,670-Speed 5349.34 samples/sec Loss 2.1406 LearningRate 0.0122 Epoch: 16 Global Step: 167650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:55:05,173-Speed 5460.18 samples/sec Loss 2.1368 LearningRate 0.0122 Epoch: 16 Global Step: 167660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:55:12,718-Speed 5429.28 samples/sec Loss 2.1504 LearningRate 0.0122 Epoch: 16 Global Step: 167670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:55:20,154-Speed 5509.32 samples/sec Loss 2.1290 LearningRate 0.0122 Epoch: 16 Global Step: 167680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:55:27,657-Speed 5460.11 samples/sec Loss 2.1467 LearningRate 0.0122 Epoch: 16 Global Step: 167690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:55:35,262-Speed 5386.05 samples/sec Loss 2.1165 LearningRate 0.0122 Epoch: 16 Global Step: 167700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:55:42,876-Speed 5380.72 samples/sec Loss 2.1296 LearningRate 0.0122 Epoch: 16 Global Step: 167710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:55:50,504-Speed 5370.72 samples/sec Loss 2.1181 LearningRate 0.0122 Epoch: 16 Global Step: 167720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:55:58,087-Speed 5401.81 samples/sec Loss 2.1338 LearningRate 0.0122 Epoch: 16 Global Step: 167730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:56:05,573-Speed 5472.69 samples/sec Loss 2.1188 LearningRate 0.0121 Epoch: 16 Global Step: 167740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:56:12,972-Speed 5536.79 samples/sec Loss 2.1342 LearningRate 0.0121 Epoch: 16 Global Step: 167750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:56:20,407-Speed 5509.32 samples/sec Loss 2.1225 LearningRate 0.0121 Epoch: 16 Global Step: 167760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:56:27,952-Speed 5430.05 samples/sec Loss 2.1402 LearningRate 0.0121 Epoch: 16 Global Step: 167770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:56:35,478-Speed 5442.82 samples/sec Loss 2.1125 LearningRate 0.0121 Epoch: 16 Global Step: 167780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:56:43,060-Speed 5403.13 samples/sec Loss 2.1285 LearningRate 0.0121 Epoch: 16 Global Step: 167790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:56:50,643-Speed 5402.04 samples/sec Loss 2.1120 LearningRate 0.0121 Epoch: 16 Global Step: 167800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:56:58,200-Speed 5421.46 samples/sec Loss 2.0975 LearningRate 0.0121 Epoch: 16 Global Step: 167810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:57:05,748-Speed 5426.77 samples/sec Loss 2.1372 LearningRate 0.0121 Epoch: 16 Global Step: 167820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:57:13,262-Speed 5452.20 samples/sec Loss 2.1145 LearningRate 0.0121 Epoch: 16 Global Step: 167830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:57:20,796-Speed 5437.68 samples/sec Loss 2.1185 LearningRate 0.0121 Epoch: 16 Global Step: 167840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:57:28,478-Speed 5332.93 samples/sec Loss 2.1176 LearningRate 0.0121 Epoch: 16 Global Step: 167850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:57:36,009-Speed 5439.64 samples/sec Loss 2.0911 LearningRate 0.0121 Epoch: 16 Global Step: 167860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:57:43,575-Speed 5414.58 samples/sec Loss 2.1158 LearningRate 0.0121 Epoch: 16 Global Step: 167870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:57:51,100-Speed 5443.55 samples/sec Loss 2.1276 LearningRate 0.0121 Epoch: 16 Global Step: 167880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:57:58,566-Speed 5487.46 samples/sec Loss 2.0963 LearningRate 0.0121 Epoch: 16 Global Step: 167890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:58:06,054-Speed 5470.40 samples/sec Loss 2.1436 LearningRate 0.0120 Epoch: 16 Global Step: 167900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:58:13,642-Speed 5399.35 samples/sec Loss 2.0828 LearningRate 0.0120 Epoch: 16 Global Step: 167910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:58:21,142-Speed 5462.10 samples/sec Loss 2.1309 LearningRate 0.0120 Epoch: 16 Global Step: 167920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:58:28,732-Speed 5397.26 samples/sec Loss 2.1153 LearningRate 0.0120 Epoch: 16 Global Step: 167930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:58:36,182-Speed 5498.23 samples/sec Loss 2.1227 LearningRate 0.0120 Epoch: 16 Global Step: 167940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 08:58:43,697-Speed 5451.65 samples/sec Loss 2.1229 LearningRate 0.0120 Epoch: 16 Global Step: 167950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:58:51,191-Speed 5466.43 samples/sec Loss 2.1044 LearningRate 0.0120 Epoch: 16 Global Step: 167960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:58:58,731-Speed 5432.75 samples/sec Loss 2.0950 LearningRate 0.0120 Epoch: 16 Global Step: 167970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:59:06,259-Speed 5441.31 samples/sec Loss 2.1492 LearningRate 0.0120 Epoch: 16 Global Step: 167980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:59:13,769-Speed 5455.55 samples/sec Loss 2.1429 LearningRate 0.0120 Epoch: 16 Global Step: 167990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 08:59:21,384-Speed 5379.26 samples/sec Loss 2.0970 LearningRate 0.0120 Epoch: 16 Global Step: 168000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:00:05,742-[lfw][168000]XNorm: 23.593189 Training: 2022-01-09 09:00:05,743-[lfw][168000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 09:00:05,743-[lfw][168000]Accuracy-Highest: 0.99833 Training: 2022-01-09 09:00:57,019-[cfp_fp][168000]XNorm: 22.541390 Training: 2022-01-09 09:00:57,020-[cfp_fp][168000]Accuracy-Flip: 0.99271+-0.00391 Training: 2022-01-09 09:00:57,020-[cfp_fp][168000]Accuracy-Highest: 0.99371 Training: 2022-01-09 09:01:40,995-[agedb_30][168000]XNorm: 23.996785 Training: 2022-01-09 09:01:40,995-[agedb_30][168000]Accuracy-Flip: 0.98217+-0.00803 Training: 2022-01-09 09:01:40,996-[agedb_30][168000]Accuracy-Highest: 0.98333 Training: 2022-01-09 09:01:48,657-Speed 278.13 samples/sec Loss 2.1274 LearningRate 0.0120 Epoch: 16 Global Step: 168010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:01:56,236-Speed 5404.91 samples/sec Loss 2.1151 LearningRate 0.0120 Epoch: 16 Global Step: 168020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:02:03,732-Speed 5464.74 samples/sec Loss 2.0981 LearningRate 0.0120 Epoch: 16 Global Step: 168030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:02:11,201-Speed 5484.96 samples/sec Loss 2.0805 LearningRate 0.0120 Epoch: 16 Global Step: 168040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:02:18,693-Speed 5467.65 samples/sec Loss 2.1023 LearningRate 0.0120 Epoch: 16 Global Step: 168050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:02:26,293-Speed 5390.47 samples/sec Loss 2.1019 LearningRate 0.0119 Epoch: 16 Global Step: 168060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:02:33,783-Speed 5469.24 samples/sec Loss 2.1162 LearningRate 0.0119 Epoch: 16 Global Step: 168070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:02:41,258-Speed 5480.32 samples/sec Loss 2.1127 LearningRate 0.0119 Epoch: 16 Global Step: 168080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:02:48,794-Speed 5436.26 samples/sec Loss 2.0928 LearningRate 0.0119 Epoch: 16 Global Step: 168090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:02:56,301-Speed 5456.69 samples/sec Loss 2.1078 LearningRate 0.0119 Epoch: 16 Global Step: 168100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:03:03,983-Speed 5332.83 samples/sec Loss 2.1430 LearningRate 0.0119 Epoch: 16 Global Step: 168110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:03:11,473-Speed 5469.31 samples/sec Loss 2.1102 LearningRate 0.0119 Epoch: 16 Global Step: 168120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:03:19,011-Speed 5434.15 samples/sec Loss 2.1275 LearningRate 0.0119 Epoch: 16 Global Step: 168130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:03:26,552-Speed 5432.45 samples/sec Loss 2.1056 LearningRate 0.0119 Epoch: 16 Global Step: 168140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:03:34,095-Speed 5431.13 samples/sec Loss 2.1403 LearningRate 0.0119 Epoch: 16 Global Step: 168150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:03:41,681-Speed 5400.08 samples/sec Loss 2.0743 LearningRate 0.0119 Epoch: 16 Global Step: 168160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:03:49,168-Speed 5470.98 samples/sec Loss 2.1284 LearningRate 0.0119 Epoch: 16 Global Step: 168170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:03:56,726-Speed 5420.56 samples/sec Loss 2.0917 LearningRate 0.0119 Epoch: 16 Global Step: 168180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:04:04,364-Speed 5363.19 samples/sec Loss 2.1063 LearningRate 0.0119 Epoch: 16 Global Step: 168190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:04:11,935-Speed 5411.28 samples/sec Loss 2.0988 LearningRate 0.0119 Epoch: 16 Global Step: 168200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:04:19,540-Speed 5385.95 samples/sec Loss 2.1092 LearningRate 0.0119 Epoch: 16 Global Step: 168210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:04:27,064-Speed 5444.81 samples/sec Loss 2.0712 LearningRate 0.0119 Epoch: 16 Global Step: 168220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:04:34,627-Speed 5416.34 samples/sec Loss 2.1121 LearningRate 0.0118 Epoch: 16 Global Step: 168230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:04:42,133-Speed 5458.27 samples/sec Loss 2.1064 LearningRate 0.0118 Epoch: 16 Global Step: 168240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:04:49,598-Speed 5487.23 samples/sec Loss 2.1175 LearningRate 0.0118 Epoch: 16 Global Step: 168250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:04:57,096-Speed 5463.30 samples/sec Loss 2.0790 LearningRate 0.0118 Epoch: 16 Global Step: 168260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:05:04,599-Speed 5460.33 samples/sec Loss 2.1041 LearningRate 0.0118 Epoch: 16 Global Step: 168270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:05:11,993-Speed 5540.49 samples/sec Loss 2.0947 LearningRate 0.0118 Epoch: 16 Global Step: 168280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:05:19,452-Speed 5491.91 samples/sec Loss 2.1124 LearningRate 0.0118 Epoch: 16 Global Step: 168290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:05:27,082-Speed 5369.25 samples/sec Loss 2.1021 LearningRate 0.0118 Epoch: 16 Global Step: 168300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:05:34,524-Speed 5504.77 samples/sec Loss 2.1137 LearningRate 0.0118 Epoch: 16 Global Step: 168310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:05:42,107-Speed 5401.90 samples/sec Loss 2.1359 LearningRate 0.0118 Epoch: 16 Global Step: 168320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:05:49,599-Speed 5467.93 samples/sec Loss 2.0650 LearningRate 0.0118 Epoch: 16 Global Step: 168330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:05:57,155-Speed 5421.31 samples/sec Loss 2.0514 LearningRate 0.0118 Epoch: 16 Global Step: 168340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:06:04,743-Speed 5399.65 samples/sec Loss 2.1059 LearningRate 0.0118 Epoch: 16 Global Step: 168350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:06:12,406-Speed 5346.06 samples/sec Loss 2.0999 LearningRate 0.0118 Epoch: 16 Global Step: 168360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:06:19,943-Speed 5434.66 samples/sec Loss 2.1000 LearningRate 0.0118 Epoch: 16 Global Step: 168370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:06:27,507-Speed 5416.36 samples/sec Loss 2.1240 LearningRate 0.0118 Epoch: 16 Global Step: 168380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:06:34,931-Speed 5517.82 samples/sec Loss 2.0879 LearningRate 0.0118 Epoch: 16 Global Step: 168390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:06:42,366-Speed 5509.84 samples/sec Loss 2.0952 LearningRate 0.0117 Epoch: 16 Global Step: 168400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:06:49,916-Speed 5426.36 samples/sec Loss 2.0971 LearningRate 0.0117 Epoch: 16 Global Step: 168410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:06:57,347-Speed 5512.55 samples/sec Loss 2.1009 LearningRate 0.0117 Epoch: 16 Global Step: 168420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:07:04,767-Speed 5520.64 samples/sec Loss 2.0818 LearningRate 0.0117 Epoch: 16 Global Step: 168430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:07:12,166-Speed 5536.16 samples/sec Loss 2.0974 LearningRate 0.0117 Epoch: 16 Global Step: 168440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:07:19,678-Speed 5453.78 samples/sec Loss 2.1236 LearningRate 0.0117 Epoch: 16 Global Step: 168450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:07:27,109-Speed 5513.03 samples/sec Loss 2.1292 LearningRate 0.0117 Epoch: 16 Global Step: 168460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:07:34,549-Speed 5505.68 samples/sec Loss 2.0606 LearningRate 0.0117 Epoch: 16 Global Step: 168470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:07:42,038-Speed 5469.99 samples/sec Loss 2.1088 LearningRate 0.0117 Epoch: 16 Global Step: 168480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:07:49,605-Speed 5414.23 samples/sec Loss 2.1005 LearningRate 0.0117 Epoch: 16 Global Step: 168490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:07:57,044-Speed 5506.89 samples/sec Loss 2.0795 LearningRate 0.0117 Epoch: 16 Global Step: 168500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:08:04,621-Speed 5406.37 samples/sec Loss 2.0786 LearningRate 0.0117 Epoch: 16 Global Step: 168510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:08:12,114-Speed 5466.92 samples/sec Loss 2.0911 LearningRate 0.0117 Epoch: 16 Global Step: 168520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:08:19,577-Speed 5489.36 samples/sec Loss 2.0754 LearningRate 0.0117 Epoch: 16 Global Step: 168530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:08:27,049-Speed 5482.89 samples/sec Loss 2.0998 LearningRate 0.0117 Epoch: 16 Global Step: 168540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:08:34,557-Speed 5457.34 samples/sec Loss 2.0709 LearningRate 0.0117 Epoch: 16 Global Step: 168550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:08:42,027-Speed 5483.73 samples/sec Loss 2.0557 LearningRate 0.0116 Epoch: 16 Global Step: 168560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:08:49,461-Speed 5510.57 samples/sec Loss 2.0803 LearningRate 0.0116 Epoch: 16 Global Step: 168570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:08:56,903-Speed 5504.41 samples/sec Loss 2.0591 LearningRate 0.0116 Epoch: 16 Global Step: 168580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:09:04,411-Speed 5456.74 samples/sec Loss 2.0815 LearningRate 0.0116 Epoch: 16 Global Step: 168590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:09:11,969-Speed 5420.37 samples/sec Loss 2.0624 LearningRate 0.0116 Epoch: 16 Global Step: 168600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:09:19,598-Speed 5369.86 samples/sec Loss 2.0946 LearningRate 0.0116 Epoch: 16 Global Step: 168610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:09:27,131-Speed 5437.88 samples/sec Loss 2.0616 LearningRate 0.0116 Epoch: 16 Global Step: 168620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:09:34,701-Speed 5411.76 samples/sec Loss 2.0718 LearningRate 0.0116 Epoch: 16 Global Step: 168630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:09:42,202-Speed 5461.36 samples/sec Loss 2.0682 LearningRate 0.0116 Epoch: 16 Global Step: 168640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:09:49,741-Speed 5434.08 samples/sec Loss 2.0669 LearningRate 0.0116 Epoch: 16 Global Step: 168650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:09:57,222-Speed 5475.09 samples/sec Loss 2.0598 LearningRate 0.0116 Epoch: 16 Global Step: 168660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:10:04,829-Speed 5385.84 samples/sec Loss 2.1066 LearningRate 0.0116 Epoch: 16 Global Step: 168670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:10:12,365-Speed 5435.49 samples/sec Loss 2.1029 LearningRate 0.0116 Epoch: 16 Global Step: 168680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:10:19,857-Speed 5467.85 samples/sec Loss 2.1125 LearningRate 0.0116 Epoch: 16 Global Step: 168690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:10:27,417-Speed 5419.04 samples/sec Loss 2.0947 LearningRate 0.0116 Epoch: 16 Global Step: 168700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:10:34,906-Speed 5469.34 samples/sec Loss 2.0877 LearningRate 0.0116 Epoch: 16 Global Step: 168710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:10:42,442-Speed 5436.01 samples/sec Loss 2.1104 LearningRate 0.0116 Epoch: 16 Global Step: 168720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:10:49,953-Speed 5454.65 samples/sec Loss 2.0695 LearningRate 0.0115 Epoch: 16 Global Step: 168730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:10:57,466-Speed 5452.18 samples/sec Loss 2.0608 LearningRate 0.0115 Epoch: 16 Global Step: 168740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:11:05,036-Speed 5411.53 samples/sec Loss 2.0798 LearningRate 0.0115 Epoch: 16 Global Step: 168750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:11:12,558-Speed 5445.49 samples/sec Loss 2.1203 LearningRate 0.0115 Epoch: 16 Global Step: 168760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:11:20,057-Speed 5463.71 samples/sec Loss 2.0469 LearningRate 0.0115 Epoch: 16 Global Step: 168770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:11:27,566-Speed 5455.59 samples/sec Loss 2.0825 LearningRate 0.0115 Epoch: 16 Global Step: 168780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:11:41,286-Speed 2985.47 samples/sec Loss 2.1013 LearningRate 0.0115 Epoch: 16 Global Step: 168790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:11:48,870-Speed 5402.05 samples/sec Loss 2.0613 LearningRate 0.0115 Epoch: 16 Global Step: 168800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:11:56,434-Speed 5416.17 samples/sec Loss 2.0983 LearningRate 0.0115 Epoch: 16 Global Step: 168810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:12:03,988-Speed 5422.73 samples/sec Loss 2.0535 LearningRate 0.0115 Epoch: 16 Global Step: 168820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:12:11,518-Speed 5440.01 samples/sec Loss 2.0435 LearningRate 0.0115 Epoch: 16 Global Step: 168830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:12:19,055-Speed 5434.96 samples/sec Loss 2.0975 LearningRate 0.0115 Epoch: 16 Global Step: 168840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:12:26,575-Speed 5448.00 samples/sec Loss 2.0696 LearningRate 0.0115 Epoch: 16 Global Step: 168850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:12:34,099-Speed 5444.83 samples/sec Loss 2.0743 LearningRate 0.0115 Epoch: 16 Global Step: 168860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:12:41,648-Speed 5426.58 samples/sec Loss 2.0846 LearningRate 0.0115 Epoch: 16 Global Step: 168870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:12:49,212-Speed 5415.73 samples/sec Loss 2.0522 LearningRate 0.0115 Epoch: 16 Global Step: 168880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:12:56,703-Speed 5469.23 samples/sec Loss 2.0845 LearningRate 0.0115 Epoch: 16 Global Step: 168890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:13:04,234-Speed 5438.83 samples/sec Loss 2.0800 LearningRate 0.0114 Epoch: 16 Global Step: 168900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:13:11,808-Speed 5408.87 samples/sec Loss 2.1041 LearningRate 0.0114 Epoch: 16 Global Step: 168910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:13:19,249-Speed 5505.37 samples/sec Loss 2.0527 LearningRate 0.0114 Epoch: 16 Global Step: 168920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:13:26,830-Speed 5403.69 samples/sec Loss 2.0712 LearningRate 0.0114 Epoch: 16 Global Step: 168930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:13:34,488-Speed 5349.12 samples/sec Loss 2.0201 LearningRate 0.0114 Epoch: 16 Global Step: 168940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:13:42,057-Speed 5412.92 samples/sec Loss 2.0603 LearningRate 0.0114 Epoch: 16 Global Step: 168950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:13:49,676-Speed 5376.17 samples/sec Loss 2.0734 LearningRate 0.0114 Epoch: 16 Global Step: 168960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:13:57,391-Speed 5310.10 samples/sec Loss 2.0802 LearningRate 0.0114 Epoch: 16 Global Step: 168970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:14:05,131-Speed 5292.65 samples/sec Loss 2.1039 LearningRate 0.0114 Epoch: 16 Global Step: 168980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:14:12,893-Speed 5277.49 samples/sec Loss 2.0814 LearningRate 0.0114 Epoch: 16 Global Step: 168990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:14:20,417-Speed 5444.50 samples/sec Loss 2.0562 LearningRate 0.0114 Epoch: 16 Global Step: 169000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 09:14:27,909-Speed 5467.79 samples/sec Loss 2.0768 LearningRate 0.0114 Epoch: 16 Global Step: 169010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:14:35,365-Speed 5494.63 samples/sec Loss 2.0718 LearningRate 0.0114 Epoch: 16 Global Step: 169020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:14:42,893-Speed 5441.73 samples/sec Loss 2.1098 LearningRate 0.0114 Epoch: 16 Global Step: 169030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:14:50,373-Speed 5476.11 samples/sec Loss 2.0412 LearningRate 0.0114 Epoch: 16 Global Step: 169040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:14:57,800-Speed 5516.65 samples/sec Loss 2.0956 LearningRate 0.0114 Epoch: 16 Global Step: 169050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:15:05,279-Speed 5477.16 samples/sec Loss 2.0740 LearningRate 0.0113 Epoch: 16 Global Step: 169060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:15:12,842-Speed 5416.50 samples/sec Loss 2.0730 LearningRate 0.0113 Epoch: 16 Global Step: 169070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:15:20,317-Speed 5479.91 samples/sec Loss 2.0551 LearningRate 0.0113 Epoch: 16 Global Step: 169080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:15:27,755-Speed 5507.90 samples/sec Loss 2.1004 LearningRate 0.0113 Epoch: 16 Global Step: 169090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:15:35,204-Speed 5499.23 samples/sec Loss 2.0587 LearningRate 0.0113 Epoch: 16 Global Step: 169100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:15:42,626-Speed 5519.25 samples/sec Loss 2.0784 LearningRate 0.0113 Epoch: 16 Global Step: 169110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:15:50,033-Speed 5530.66 samples/sec Loss 2.0575 LearningRate 0.0113 Epoch: 16 Global Step: 169120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:15:57,678-Speed 5359.08 samples/sec Loss 2.0589 LearningRate 0.0113 Epoch: 16 Global Step: 169130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:16:05,294-Speed 5379.03 samples/sec Loss 2.0758 LearningRate 0.0113 Epoch: 16 Global Step: 169140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:16:12,822-Speed 5441.40 samples/sec Loss 2.0750 LearningRate 0.0113 Epoch: 16 Global Step: 169150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:16:20,328-Speed 5457.28 samples/sec Loss 2.0682 LearningRate 0.0113 Epoch: 16 Global Step: 169160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:16:27,791-Speed 5489.82 samples/sec Loss 2.0553 LearningRate 0.0113 Epoch: 16 Global Step: 169170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:16:35,290-Speed 5462.59 samples/sec Loss 2.0671 LearningRate 0.0113 Epoch: 16 Global Step: 169180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:16:42,793-Speed 5459.41 samples/sec Loss 2.0318 LearningRate 0.0113 Epoch: 16 Global Step: 169190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:16:50,382-Speed 5397.99 samples/sec Loss 2.0563 LearningRate 0.0113 Epoch: 16 Global Step: 169200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:16:57,828-Speed 5502.24 samples/sec Loss 2.0141 LearningRate 0.0113 Epoch: 16 Global Step: 169210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:17:05,419-Speed 5396.70 samples/sec Loss 2.0384 LearningRate 0.0113 Epoch: 16 Global Step: 169220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:17:12,850-Speed 5512.90 samples/sec Loss 2.0441 LearningRate 0.0112 Epoch: 16 Global Step: 169230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:17:20,397-Speed 5427.29 samples/sec Loss 2.0759 LearningRate 0.0112 Epoch: 16 Global Step: 169240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:17:27,825-Speed 5515.35 samples/sec Loss 2.0586 LearningRate 0.0112 Epoch: 16 Global Step: 169250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:17:35,280-Speed 5495.46 samples/sec Loss 2.0337 LearningRate 0.0112 Epoch: 16 Global Step: 169260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:17:42,851-Speed 5410.87 samples/sec Loss 2.0712 LearningRate 0.0112 Epoch: 16 Global Step: 169270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:17:50,485-Speed 5366.06 samples/sec Loss 2.0381 LearningRate 0.0112 Epoch: 16 Global Step: 169280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:17:58,347-Speed 5210.70 samples/sec Loss 2.0413 LearningRate 0.0112 Epoch: 16 Global Step: 169290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:18:05,943-Speed 5393.00 samples/sec Loss 2.0265 LearningRate 0.0112 Epoch: 16 Global Step: 169300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:18:13,538-Speed 5393.94 samples/sec Loss 2.0329 LearningRate 0.0112 Epoch: 16 Global Step: 169310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:18:21,027-Speed 5470.18 samples/sec Loss 2.0626 LearningRate 0.0112 Epoch: 16 Global Step: 169320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:18:28,665-Speed 5363.18 samples/sec Loss 2.0117 LearningRate 0.0112 Epoch: 16 Global Step: 169330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:18:36,197-Speed 5438.40 samples/sec Loss 2.0525 LearningRate 0.0112 Epoch: 16 Global Step: 169340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:18:43,771-Speed 5408.69 samples/sec Loss 2.0268 LearningRate 0.0112 Epoch: 16 Global Step: 169350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:18:51,259-Speed 5470.84 samples/sec Loss 2.0279 LearningRate 0.0112 Epoch: 16 Global Step: 169360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:18:58,698-Speed 5506.83 samples/sec Loss 2.0279 LearningRate 0.0112 Epoch: 16 Global Step: 169370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:19:06,165-Speed 5486.45 samples/sec Loss 2.0162 LearningRate 0.0112 Epoch: 16 Global Step: 169380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:19:13,666-Speed 5461.06 samples/sec Loss 2.0504 LearningRate 0.0112 Epoch: 16 Global Step: 169390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:19:21,122-Speed 5494.41 samples/sec Loss 2.0830 LearningRate 0.0111 Epoch: 16 Global Step: 169400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:19:28,748-Speed 5372.08 samples/sec Loss 2.0226 LearningRate 0.0111 Epoch: 16 Global Step: 169410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:19:36,167-Speed 5521.40 samples/sec Loss 1.9912 LearningRate 0.0111 Epoch: 16 Global Step: 169420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:19:43,707-Speed 5433.06 samples/sec Loss 2.0554 LearningRate 0.0111 Epoch: 16 Global Step: 169430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:19:51,229-Speed 5446.04 samples/sec Loss 2.0747 LearningRate 0.0111 Epoch: 16 Global Step: 169440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:19:58,682-Speed 5496.16 samples/sec Loss 2.0188 LearningRate 0.0111 Epoch: 16 Global Step: 169450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:20:06,132-Speed 5498.87 samples/sec Loss 2.0491 LearningRate 0.0111 Epoch: 16 Global Step: 169460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:20:13,669-Speed 5435.44 samples/sec Loss 2.0543 LearningRate 0.0111 Epoch: 16 Global Step: 169470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:20:21,145-Speed 5479.30 samples/sec Loss 2.0226 LearningRate 0.0111 Epoch: 16 Global Step: 169480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:20:28,608-Speed 5489.18 samples/sec Loss 2.0903 LearningRate 0.0111 Epoch: 16 Global Step: 169490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:20:36,129-Speed 5446.79 samples/sec Loss 2.0478 LearningRate 0.0111 Epoch: 16 Global Step: 169500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:20:43,726-Speed 5392.63 samples/sec Loss 2.0416 LearningRate 0.0111 Epoch: 16 Global Step: 169510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:20:51,418-Speed 5325.51 samples/sec Loss 2.0429 LearningRate 0.0111 Epoch: 16 Global Step: 169520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:20:58,883-Speed 5487.87 samples/sec Loss 2.0484 LearningRate 0.0111 Epoch: 16 Global Step: 169530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:21:06,314-Speed 5518.38 samples/sec Loss 2.0011 LearningRate 0.0111 Epoch: 16 Global Step: 169540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:21:13,770-Speed 5494.31 samples/sec Loss 2.0429 LearningRate 0.0111 Epoch: 16 Global Step: 169550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:21:21,294-Speed 5444.39 samples/sec Loss 2.0080 LearningRate 0.0111 Epoch: 16 Global Step: 169560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:21:28,786-Speed 5468.48 samples/sec Loss 2.0490 LearningRate 0.0110 Epoch: 16 Global Step: 169570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:21:36,214-Speed 5515.00 samples/sec Loss 2.0493 LearningRate 0.0110 Epoch: 16 Global Step: 169580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:21:43,677-Speed 5488.95 samples/sec Loss 2.0369 LearningRate 0.0110 Epoch: 16 Global Step: 169590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:21:51,145-Speed 5485.92 samples/sec Loss 2.0699 LearningRate 0.0110 Epoch: 16 Global Step: 169600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:21:58,702-Speed 5420.64 samples/sec Loss 2.0364 LearningRate 0.0110 Epoch: 16 Global Step: 169610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:22:06,242-Speed 5432.95 samples/sec Loss 2.0374 LearningRate 0.0110 Epoch: 16 Global Step: 169620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:22:13,663-Speed 5521.00 samples/sec Loss 2.0392 LearningRate 0.0110 Epoch: 16 Global Step: 169630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:22:21,257-Speed 5394.44 samples/sec Loss 2.0241 LearningRate 0.0110 Epoch: 16 Global Step: 169640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:22:28,810-Speed 5423.95 samples/sec Loss 2.0409 LearningRate 0.0110 Epoch: 16 Global Step: 169650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:22:36,406-Speed 5392.61 samples/sec Loss 2.0178 LearningRate 0.0110 Epoch: 16 Global Step: 169660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:22:43,994-Speed 5398.58 samples/sec Loss 1.9938 LearningRate 0.0110 Epoch: 16 Global Step: 169670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-09 09:22:51,518-Speed 5444.58 samples/sec Loss 2.0338 LearningRate 0.0110 Epoch: 16 Global Step: 169680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:22:59,064-Speed 5429.28 samples/sec Loss 2.0461 LearningRate 0.0110 Epoch: 16 Global Step: 169690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:23:06,670-Speed 5385.85 samples/sec Loss 2.0566 LearningRate 0.0110 Epoch: 16 Global Step: 169700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:23:14,116-Speed 5501.39 samples/sec Loss 2.0576 LearningRate 0.0110 Epoch: 16 Global Step: 169710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:23:21,554-Speed 5507.76 samples/sec Loss 2.0471 LearningRate 0.0110 Epoch: 16 Global Step: 169720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:23:29,122-Speed 5413.32 samples/sec Loss 1.9983 LearningRate 0.0110 Epoch: 16 Global Step: 169730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:23:36,564-Speed 5504.59 samples/sec Loss 2.0206 LearningRate 0.0110 Epoch: 16 Global Step: 169740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:23:44,055-Speed 5468.84 samples/sec Loss 1.9975 LearningRate 0.0109 Epoch: 16 Global Step: 169750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:23:51,544-Speed 5470.05 samples/sec Loss 1.9987 LearningRate 0.0109 Epoch: 16 Global Step: 169760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:23:59,063-Speed 5448.72 samples/sec Loss 2.0237 LearningRate 0.0109 Epoch: 16 Global Step: 169770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:24:06,571-Speed 5455.82 samples/sec Loss 2.0404 LearningRate 0.0109 Epoch: 16 Global Step: 169780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:24:14,064-Speed 5467.35 samples/sec Loss 2.0490 LearningRate 0.0109 Epoch: 16 Global Step: 169790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:24:21,580-Speed 5450.32 samples/sec Loss 2.0457 LearningRate 0.0109 Epoch: 16 Global Step: 169800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:24:29,056-Speed 5480.44 samples/sec Loss 1.9761 LearningRate 0.0109 Epoch: 16 Global Step: 169810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:24:36,608-Speed 5424.19 samples/sec Loss 2.0341 LearningRate 0.0109 Epoch: 16 Global Step: 169820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:24:44,161-Speed 5423.52 samples/sec Loss 2.0307 LearningRate 0.0109 Epoch: 16 Global Step: 169830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:24:51,831-Speed 5340.51 samples/sec Loss 2.0383 LearningRate 0.0109 Epoch: 16 Global Step: 169840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:24:59,329-Speed 5463.90 samples/sec Loss 2.0333 LearningRate 0.0109 Epoch: 16 Global Step: 169850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:25:06,831-Speed 5460.88 samples/sec Loss 1.9931 LearningRate 0.0109 Epoch: 16 Global Step: 169860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:25:14,303-Speed 5482.24 samples/sec Loss 2.0122 LearningRate 0.0109 Epoch: 16 Global Step: 169870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:25:21,849-Speed 5428.50 samples/sec Loss 2.0106 LearningRate 0.0109 Epoch: 16 Global Step: 169880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:25:29,352-Speed 5460.41 samples/sec Loss 2.0376 LearningRate 0.0109 Epoch: 16 Global Step: 169890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:25:36,955-Speed 5388.30 samples/sec Loss 2.0324 LearningRate 0.0109 Epoch: 16 Global Step: 169900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:25:44,495-Speed 5433.00 samples/sec Loss 2.0201 LearningRate 0.0109 Epoch: 16 Global Step: 169910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:25:51,948-Speed 5496.36 samples/sec Loss 2.0014 LearningRate 0.0108 Epoch: 16 Global Step: 169920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:25:59,498-Speed 5425.99 samples/sec Loss 2.0113 LearningRate 0.0108 Epoch: 16 Global Step: 169930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:26:06,990-Speed 5468.19 samples/sec Loss 2.0069 LearningRate 0.0108 Epoch: 16 Global Step: 169940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:26:14,490-Speed 5461.82 samples/sec Loss 2.0007 LearningRate 0.0108 Epoch: 16 Global Step: 169950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:26:21,908-Speed 5522.24 samples/sec Loss 2.0230 LearningRate 0.0108 Epoch: 16 Global Step: 169960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:26:29,470-Speed 5417.34 samples/sec Loss 2.0136 LearningRate 0.0108 Epoch: 16 Global Step: 169970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:26:36,997-Speed 5442.77 samples/sec Loss 2.0244 LearningRate 0.0108 Epoch: 16 Global Step: 169980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:26:44,559-Speed 5416.65 samples/sec Loss 2.0390 LearningRate 0.0108 Epoch: 16 Global Step: 169990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:26:52,006-Speed 5500.92 samples/sec Loss 2.0178 LearningRate 0.0108 Epoch: 16 Global Step: 170000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:27:35,747-[lfw][170000]XNorm: 22.682219 Training: 2022-01-09 09:27:35,747-[lfw][170000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 09:27:35,748-[lfw][170000]Accuracy-Highest: 0.99833 Training: 2022-01-09 09:28:26,944-[cfp_fp][170000]XNorm: 21.675479 Training: 2022-01-09 09:28:26,945-[cfp_fp][170000]Accuracy-Flip: 0.99257+-0.00360 Training: 2022-01-09 09:28:26,945-[cfp_fp][170000]Accuracy-Highest: 0.99371 Training: 2022-01-09 09:29:10,895-[agedb_30][170000]XNorm: 23.066802 Training: 2022-01-09 09:29:10,896-[agedb_30][170000]Accuracy-Flip: 0.98117+-0.00663 Training: 2022-01-09 09:29:10,897-[agedb_30][170000]Accuracy-Highest: 0.98333 Training: 2022-01-09 09:29:18,498-Speed 279.61 samples/sec Loss 2.0017 LearningRate 0.0108 Epoch: 16 Global Step: 170010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:29:25,968-Speed 5484.20 samples/sec Loss 2.0038 LearningRate 0.0108 Epoch: 16 Global Step: 170020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:29:33,494-Speed 5443.54 samples/sec Loss 2.0244 LearningRate 0.0108 Epoch: 16 Global Step: 170030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:29:40,973-Speed 5477.50 samples/sec Loss 2.0279 LearningRate 0.0108 Epoch: 16 Global Step: 170040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:29:48,496-Speed 5445.08 samples/sec Loss 2.0393 LearningRate 0.0108 Epoch: 16 Global Step: 170050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:29:55,996-Speed 5462.05 samples/sec Loss 1.9943 LearningRate 0.0108 Epoch: 16 Global Step: 170060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:30:03,530-Speed 5437.70 samples/sec Loss 2.0359 LearningRate 0.0108 Epoch: 16 Global Step: 170070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:30:11,013-Speed 5473.92 samples/sec Loss 1.9854 LearningRate 0.0108 Epoch: 16 Global Step: 170080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:30:18,481-Speed 5485.18 samples/sec Loss 2.0309 LearningRate 0.0107 Epoch: 16 Global Step: 170090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:30:25,920-Speed 5507.02 samples/sec Loss 2.0189 LearningRate 0.0107 Epoch: 16 Global Step: 170100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:30:33,426-Speed 5458.11 samples/sec Loss 2.0028 LearningRate 0.0107 Epoch: 16 Global Step: 170110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:30:40,976-Speed 5425.82 samples/sec Loss 1.9790 LearningRate 0.0107 Epoch: 16 Global Step: 170120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:30:48,528-Speed 5424.78 samples/sec Loss 2.0058 LearningRate 0.0107 Epoch: 16 Global Step: 170130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:30:56,090-Speed 5417.24 samples/sec Loss 1.9700 LearningRate 0.0107 Epoch: 16 Global Step: 170140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:31:03,643-Speed 5423.74 samples/sec Loss 1.9915 LearningRate 0.0107 Epoch: 16 Global Step: 170150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:31:11,207-Speed 5416.05 samples/sec Loss 2.0456 LearningRate 0.0107 Epoch: 16 Global Step: 170160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:31:18,734-Speed 5442.01 samples/sec Loss 1.9847 LearningRate 0.0107 Epoch: 16 Global Step: 170170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:31:26,373-Speed 5362.56 samples/sec Loss 2.0005 LearningRate 0.0107 Epoch: 16 Global Step: 170180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:31:33,951-Speed 5406.74 samples/sec Loss 2.0075 LearningRate 0.0107 Epoch: 16 Global Step: 170190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:31:41,512-Speed 5417.54 samples/sec Loss 2.0058 LearningRate 0.0107 Epoch: 16 Global Step: 170200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:31:49,050-Speed 5434.25 samples/sec Loss 2.0077 LearningRate 0.0107 Epoch: 16 Global Step: 170210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:31:56,628-Speed 5405.75 samples/sec Loss 1.9956 LearningRate 0.0107 Epoch: 16 Global Step: 170220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:32:04,192-Speed 5416.57 samples/sec Loss 2.0417 LearningRate 0.0107 Epoch: 16 Global Step: 170230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:32:11,767-Speed 5407.56 samples/sec Loss 2.0093 LearningRate 0.0107 Epoch: 16 Global Step: 170240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-09 09:32:19,386-Speed 5376.57 samples/sec Loss 1.9600 LearningRate 0.0107 Epoch: 16 Global Step: 170250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:32:26,879-Speed 5467.16 samples/sec Loss 2.0073 LearningRate 0.0107 Epoch: 16 Global Step: 170260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:32:34,364-Speed 5473.51 samples/sec Loss 1.9850 LearningRate 0.0106 Epoch: 16 Global Step: 170270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:32:41,900-Speed 5435.54 samples/sec Loss 1.9949 LearningRate 0.0106 Epoch: 16 Global Step: 170280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:32:49,454-Speed 5422.81 samples/sec Loss 2.0084 LearningRate 0.0106 Epoch: 16 Global Step: 170290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-09 09:32:57,041-Speed 5399.63 samples/sec Loss 1.9952 LearningRate 0.0106 Epoch: 16 Global Step: 170300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:33:04,665-Speed 5373.53 samples/sec Loss 2.0087 LearningRate 0.0106 Epoch: 16 Global Step: 170310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:33:12,228-Speed 5416.74 samples/sec Loss 1.9723 LearningRate 0.0106 Epoch: 16 Global Step: 170320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:33:19,740-Speed 5453.19 samples/sec Loss 2.0079 LearningRate 0.0106 Epoch: 16 Global Step: 170330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:33:27,374-Speed 5366.05 samples/sec Loss 2.0209 LearningRate 0.0106 Epoch: 16 Global Step: 170340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:33:35,102-Speed 5300.99 samples/sec Loss 1.9805 LearningRate 0.0106 Epoch: 16 Global Step: 170350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:33:42,652-Speed 5426.23 samples/sec Loss 1.9731 LearningRate 0.0106 Epoch: 16 Global Step: 170360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:33:50,243-Speed 5396.83 samples/sec Loss 1.9977 LearningRate 0.0106 Epoch: 16 Global Step: 170370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:33:57,785-Speed 5431.57 samples/sec Loss 1.9954 LearningRate 0.0106 Epoch: 16 Global Step: 170380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:34:05,317-Speed 5438.72 samples/sec Loss 2.0418 LearningRate 0.0106 Epoch: 16 Global Step: 170390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:34:12,836-Speed 5449.07 samples/sec Loss 1.9684 LearningRate 0.0106 Epoch: 16 Global Step: 170400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:34:20,372-Speed 5435.37 samples/sec Loss 1.9983 LearningRate 0.0106 Epoch: 16 Global Step: 170410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:34:27,903-Speed 5439.98 samples/sec Loss 1.9831 LearningRate 0.0106 Epoch: 16 Global Step: 170420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:34:35,420-Speed 5449.64 samples/sec Loss 1.9796 LearningRate 0.0106 Epoch: 16 Global Step: 170430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:34:42,945-Speed 5444.14 samples/sec Loss 1.9820 LearningRate 0.0105 Epoch: 16 Global Step: 170440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:34:50,528-Speed 5401.85 samples/sec Loss 1.9999 LearningRate 0.0105 Epoch: 16 Global Step: 170450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:34:58,045-Speed 5449.39 samples/sec Loss 1.9697 LearningRate 0.0105 Epoch: 16 Global Step: 170460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:35:05,563-Speed 5449.30 samples/sec Loss 1.9804 LearningRate 0.0105 Epoch: 16 Global Step: 170470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:35:13,093-Speed 5440.64 samples/sec Loss 1.9600 LearningRate 0.0105 Epoch: 16 Global Step: 170480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:35:20,592-Speed 5463.49 samples/sec Loss 1.9694 LearningRate 0.0105 Epoch: 16 Global Step: 170490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:35:28,071-Speed 5477.10 samples/sec Loss 1.9845 LearningRate 0.0105 Epoch: 16 Global Step: 170500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:35:35,557-Speed 5472.30 samples/sec Loss 2.0031 LearningRate 0.0105 Epoch: 16 Global Step: 170510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:35:43,069-Speed 5453.49 samples/sec Loss 1.9885 LearningRate 0.0105 Epoch: 16 Global Step: 170520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:35:50,598-Speed 5441.64 samples/sec Loss 1.9615 LearningRate 0.0105 Epoch: 16 Global Step: 170530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:35:58,212-Speed 5379.47 samples/sec Loss 1.9625 LearningRate 0.0105 Epoch: 16 Global Step: 170540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:36:05,748-Speed 5436.04 samples/sec Loss 1.9839 LearningRate 0.0105 Epoch: 16 Global Step: 170550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:36:13,234-Speed 5472.30 samples/sec Loss 1.9884 LearningRate 0.0105 Epoch: 16 Global Step: 170560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:36:20,689-Speed 5495.27 samples/sec Loss 1.9737 LearningRate 0.0105 Epoch: 16 Global Step: 170570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:36:28,216-Speed 5441.82 samples/sec Loss 1.9570 LearningRate 0.0105 Epoch: 16 Global Step: 170580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:36:35,729-Speed 5452.55 samples/sec Loss 1.9824 LearningRate 0.0105 Epoch: 16 Global Step: 170590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:36:43,226-Speed 5464.70 samples/sec Loss 1.9771 LearningRate 0.0105 Epoch: 16 Global Step: 170600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:36:50,714-Speed 5471.09 samples/sec Loss 1.9727 LearningRate 0.0105 Epoch: 16 Global Step: 170610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:36:58,275-Speed 5417.21 samples/sec Loss 1.9932 LearningRate 0.0104 Epoch: 16 Global Step: 170620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:37:05,870-Speed 5394.02 samples/sec Loss 2.0212 LearningRate 0.0104 Epoch: 16 Global Step: 170630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:37:13,365-Speed 5465.39 samples/sec Loss 1.9820 LearningRate 0.0104 Epoch: 16 Global Step: 170640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:37:20,908-Speed 5431.08 samples/sec Loss 2.0011 LearningRate 0.0104 Epoch: 16 Global Step: 170650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:37:28,382-Speed 5481.53 samples/sec Loss 1.9603 LearningRate 0.0104 Epoch: 16 Global Step: 170660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:37:35,927-Speed 5429.08 samples/sec Loss 1.9935 LearningRate 0.0104 Epoch: 16 Global Step: 170670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:37:43,584-Speed 5350.33 samples/sec Loss 1.9993 LearningRate 0.0104 Epoch: 16 Global Step: 170680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:37:51,269-Speed 5330.26 samples/sec Loss 2.0194 LearningRate 0.0104 Epoch: 16 Global Step: 170690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:37:58,818-Speed 5426.92 samples/sec Loss 1.9858 LearningRate 0.0104 Epoch: 16 Global Step: 170700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:38:06,397-Speed 5405.41 samples/sec Loss 1.9955 LearningRate 0.0104 Epoch: 16 Global Step: 170710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:38:13,948-Speed 5424.90 samples/sec Loss 1.9801 LearningRate 0.0104 Epoch: 16 Global Step: 170720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:38:21,542-Speed 5394.90 samples/sec Loss 1.9858 LearningRate 0.0104 Epoch: 16 Global Step: 170730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:38:29,115-Speed 5409.70 samples/sec Loss 1.9797 LearningRate 0.0104 Epoch: 16 Global Step: 170740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:38:36,611-Speed 5464.46 samples/sec Loss 1.9915 LearningRate 0.0104 Epoch: 16 Global Step: 170750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:38:44,071-Speed 5491.24 samples/sec Loss 1.9665 LearningRate 0.0104 Epoch: 16 Global Step: 170760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:38:51,677-Speed 5385.96 samples/sec Loss 1.9606 LearningRate 0.0104 Epoch: 16 Global Step: 170770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:38:59,100-Speed 5519.42 samples/sec Loss 1.9901 LearningRate 0.0104 Epoch: 16 Global Step: 170780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:39:06,649-Speed 5426.23 samples/sec Loss 1.9535 LearningRate 0.0103 Epoch: 16 Global Step: 170790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:39:14,084-Speed 5509.84 samples/sec Loss 1.9681 LearningRate 0.0103 Epoch: 16 Global Step: 170800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:39:21,531-Speed 5501.35 samples/sec Loss 1.9854 LearningRate 0.0103 Epoch: 16 Global Step: 170810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:39:29,079-Speed 5426.88 samples/sec Loss 1.9919 LearningRate 0.0103 Epoch: 16 Global Step: 170820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:39:36,623-Speed 5430.54 samples/sec Loss 1.9583 LearningRate 0.0103 Epoch: 16 Global Step: 170830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-09 09:39:44,049-Speed 5516.78 samples/sec Loss 1.9640 LearningRate 0.0103 Epoch: 16 Global Step: 170840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:39:51,592-Speed 5430.70 samples/sec Loss 2.0002 LearningRate 0.0103 Epoch: 16 Global Step: 170850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:39:59,124-Speed 5439.01 samples/sec Loss 1.9388 LearningRate 0.0103 Epoch: 16 Global Step: 170860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:40:06,710-Speed 5399.41 samples/sec Loss 1.9647 LearningRate 0.0103 Epoch: 16 Global Step: 170870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:40:14,257-Speed 5428.55 samples/sec Loss 1.9879 LearningRate 0.0103 Epoch: 16 Global Step: 170880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:40:21,721-Speed 5488.58 samples/sec Loss 1.9709 LearningRate 0.0103 Epoch: 16 Global Step: 170890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:40:29,262-Speed 5432.48 samples/sec Loss 1.9884 LearningRate 0.0103 Epoch: 16 Global Step: 170900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:40:36,771-Speed 5455.33 samples/sec Loss 1.9826 LearningRate 0.0103 Epoch: 16 Global Step: 170910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:40:44,200-Speed 5514.37 samples/sec Loss 1.9383 LearningRate 0.0103 Epoch: 16 Global Step: 170920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:40:51,750-Speed 5425.94 samples/sec Loss 1.9586 LearningRate 0.0103 Epoch: 16 Global Step: 170930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:40:59,255-Speed 5458.36 samples/sec Loss 1.9814 LearningRate 0.0103 Epoch: 16 Global Step: 170940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:41:06,904-Speed 5355.48 samples/sec Loss 1.9653 LearningRate 0.0103 Epoch: 16 Global Step: 170950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:41:14,468-Speed 5416.13 samples/sec Loss 1.9698 LearningRate 0.0103 Epoch: 16 Global Step: 170960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:41:21,978-Speed 5454.90 samples/sec Loss 1.9758 LearningRate 0.0102 Epoch: 16 Global Step: 170970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:41:29,486-Speed 5455.98 samples/sec Loss 1.9844 LearningRate 0.0102 Epoch: 16 Global Step: 170980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:41:37,004-Speed 5449.31 samples/sec Loss 1.9610 LearningRate 0.0102 Epoch: 16 Global Step: 170990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:41:44,585-Speed 5403.15 samples/sec Loss 1.9716 LearningRate 0.0102 Epoch: 16 Global Step: 171000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:41:52,085-Speed 5462.18 samples/sec Loss 1.9212 LearningRate 0.0102 Epoch: 16 Global Step: 171010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:41:59,678-Speed 5395.22 samples/sec Loss 1.9626 LearningRate 0.0102 Epoch: 16 Global Step: 171020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:42:07,239-Speed 5417.60 samples/sec Loss 1.9625 LearningRate 0.0102 Epoch: 16 Global Step: 171030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:42:14,819-Speed 5404.45 samples/sec Loss 1.9410 LearningRate 0.0102 Epoch: 16 Global Step: 171040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:42:22,346-Speed 5442.64 samples/sec Loss 1.9873 LearningRate 0.0102 Epoch: 16 Global Step: 171050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:42:29,900-Speed 5423.28 samples/sec Loss 1.9581 LearningRate 0.0102 Epoch: 16 Global Step: 171060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:42:37,437-Speed 5434.68 samples/sec Loss 1.9465 LearningRate 0.0102 Epoch: 16 Global Step: 171070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:42:44,946-Speed 5455.47 samples/sec Loss 2.0050 LearningRate 0.0102 Epoch: 16 Global Step: 171080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:42:52,476-Speed 5440.74 samples/sec Loss 1.9275 LearningRate 0.0102 Epoch: 16 Global Step: 171090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:42:59,971-Speed 5465.69 samples/sec Loss 1.9848 LearningRate 0.0102 Epoch: 16 Global Step: 171100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:43:07,532-Speed 5417.79 samples/sec Loss 1.9607 LearningRate 0.0102 Epoch: 16 Global Step: 171110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:43:14,989-Speed 5493.74 samples/sec Loss 1.9523 LearningRate 0.0102 Epoch: 16 Global Step: 171120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:43:22,439-Speed 5498.41 samples/sec Loss 1.9218 LearningRate 0.0102 Epoch: 16 Global Step: 171130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:43:29,888-Speed 5499.54 samples/sec Loss 1.9779 LearningRate 0.0102 Epoch: 16 Global Step: 171140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:43:37,438-Speed 5426.24 samples/sec Loss 1.9627 LearningRate 0.0101 Epoch: 16 Global Step: 171150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:43:44,920-Speed 5475.16 samples/sec Loss 1.9629 LearningRate 0.0101 Epoch: 16 Global Step: 171160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:43:52,559-Speed 5362.01 samples/sec Loss 1.9930 LearningRate 0.0101 Epoch: 16 Global Step: 171170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:44:00,218-Speed 5348.95 samples/sec Loss 1.9359 LearningRate 0.0101 Epoch: 16 Global Step: 171180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:44:07,804-Speed 5400.18 samples/sec Loss 1.9123 LearningRate 0.0101 Epoch: 16 Global Step: 171190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:44:15,282-Speed 5477.66 samples/sec Loss 1.9301 LearningRate 0.0101 Epoch: 16 Global Step: 171200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:44:22,840-Speed 5420.72 samples/sec Loss 1.9813 LearningRate 0.0101 Epoch: 16 Global Step: 171210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:44:30,332-Speed 5468.23 samples/sec Loss 1.9495 LearningRate 0.0101 Epoch: 16 Global Step: 171220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:44:37,881-Speed 5426.16 samples/sec Loss 1.9274 LearningRate 0.0101 Epoch: 16 Global Step: 171230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:44:45,398-Speed 5449.77 samples/sec Loss 1.9545 LearningRate 0.0101 Epoch: 16 Global Step: 171240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:44:52,931-Speed 5438.15 samples/sec Loss 1.9473 LearningRate 0.0101 Epoch: 16 Global Step: 171250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:45:00,465-Speed 5437.86 samples/sec Loss 1.9590 LearningRate 0.0101 Epoch: 16 Global Step: 171260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:45:07,960-Speed 5465.26 samples/sec Loss 1.9990 LearningRate 0.0101 Epoch: 16 Global Step: 171270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:45:15,496-Speed 5436.22 samples/sec Loss 1.9468 LearningRate 0.0101 Epoch: 16 Global Step: 171280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:45:23,059-Speed 5416.75 samples/sec Loss 1.9202 LearningRate 0.0101 Epoch: 16 Global Step: 171290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:45:30,595-Speed 5435.80 samples/sec Loss 1.9508 LearningRate 0.0101 Epoch: 16 Global Step: 171300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:45:38,143-Speed 5427.24 samples/sec Loss 1.9353 LearningRate 0.0101 Epoch: 16 Global Step: 171310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:45:45,576-Speed 5510.71 samples/sec Loss 1.9206 LearningRate 0.0101 Epoch: 16 Global Step: 171320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:45:53,054-Speed 5478.67 samples/sec Loss 1.9476 LearningRate 0.0100 Epoch: 16 Global Step: 171330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:46:00,536-Speed 5474.95 samples/sec Loss 1.9392 LearningRate 0.0100 Epoch: 16 Global Step: 171340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:46:08,208-Speed 5339.44 samples/sec Loss 1.9427 LearningRate 0.0100 Epoch: 16 Global Step: 171350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:46:15,733-Speed 5444.22 samples/sec Loss 1.8994 LearningRate 0.0100 Epoch: 16 Global Step: 171360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:46:23,178-Speed 5502.88 samples/sec Loss 1.9325 LearningRate 0.0100 Epoch: 16 Global Step: 171370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:46:30,672-Speed 5466.38 samples/sec Loss 1.9272 LearningRate 0.0100 Epoch: 16 Global Step: 171380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:46:38,126-Speed 5495.10 samples/sec Loss 1.9245 LearningRate 0.0100 Epoch: 16 Global Step: 171390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:46:45,616-Speed 5469.79 samples/sec Loss 1.9600 LearningRate 0.0100 Epoch: 16 Global Step: 171400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:46:53,089-Speed 5482.14 samples/sec Loss 1.9320 LearningRate 0.0100 Epoch: 16 Global Step: 171410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:47:00,575-Speed 5471.52 samples/sec Loss 1.9184 LearningRate 0.0100 Epoch: 16 Global Step: 171420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:47:08,134-Speed 5419.83 samples/sec Loss 1.9325 LearningRate 0.0100 Epoch: 16 Global Step: 171430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:47:15,609-Speed 5480.45 samples/sec Loss 1.9104 LearningRate 0.0100 Epoch: 16 Global Step: 171440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:47:23,132-Speed 5445.64 samples/sec Loss 1.9537 LearningRate 0.0100 Epoch: 16 Global Step: 171450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:47:30,694-Speed 5417.34 samples/sec Loss 1.9247 LearningRate 0.0100 Epoch: 16 Global Step: 171460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:47:38,264-Speed 5411.23 samples/sec Loss 1.9604 LearningRate 0.0100 Epoch: 16 Global Step: 171470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:47:45,779-Speed 5451.24 samples/sec Loss 1.9120 LearningRate 0.0100 Epoch: 16 Global Step: 171480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:47:53,367-Speed 5399.06 samples/sec Loss 1.9777 LearningRate 0.0100 Epoch: 16 Global Step: 171490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:48:00,874-Speed 5456.45 samples/sec Loss 1.9454 LearningRate 0.0100 Epoch: 16 Global Step: 171500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:48:08,377-Speed 5459.56 samples/sec Loss 1.9261 LearningRate 0.0099 Epoch: 16 Global Step: 171510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:48:15,853-Speed 5480.18 samples/sec Loss 1.9468 LearningRate 0.0099 Epoch: 16 Global Step: 171520 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:48:23,426-Speed 5409.32 samples/sec Loss 1.9264 LearningRate 0.0099 Epoch: 16 Global Step: 171530 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:48:30,950-Speed 5444.51 samples/sec Loss 1.9322 LearningRate 0.0099 Epoch: 16 Global Step: 171540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:48:38,468-Speed 5449.11 samples/sec Loss 1.9557 LearningRate 0.0099 Epoch: 16 Global Step: 171550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:48:46,029-Speed 5418.09 samples/sec Loss 1.9477 LearningRate 0.0099 Epoch: 16 Global Step: 171560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:48:53,579-Speed 5425.55 samples/sec Loss 1.9392 LearningRate 0.0099 Epoch: 16 Global Step: 171570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:49:01,126-Speed 5427.95 samples/sec Loss 1.8962 LearningRate 0.0099 Epoch: 16 Global Step: 171580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:49:08,672-Speed 5429.07 samples/sec Loss 1.9561 LearningRate 0.0099 Epoch: 16 Global Step: 171590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:49:16,213-Speed 5431.91 samples/sec Loss 1.9295 LearningRate 0.0099 Epoch: 16 Global Step: 171600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:49:23,827-Speed 5380.95 samples/sec Loss 1.9149 LearningRate 0.0099 Epoch: 16 Global Step: 171610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:49:31,430-Speed 5387.77 samples/sec Loss 1.9312 LearningRate 0.0099 Epoch: 16 Global Step: 171620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:49:39,022-Speed 5396.05 samples/sec Loss 1.9294 LearningRate 0.0099 Epoch: 16 Global Step: 171630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:49:46,578-Speed 5421.68 samples/sec Loss 1.9485 LearningRate 0.0099 Epoch: 16 Global Step: 171640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:49:54,205-Speed 5371.04 samples/sec Loss 1.9301 LearningRate 0.0099 Epoch: 16 Global Step: 171650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:50:01,828-Speed 5373.76 samples/sec Loss 1.9217 LearningRate 0.0099 Epoch: 16 Global Step: 171660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:50:09,439-Speed 5382.57 samples/sec Loss 1.9105 LearningRate 0.0099 Epoch: 16 Global Step: 171670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:50:17,046-Speed 5385.09 samples/sec Loss 1.9231 LearningRate 0.0099 Epoch: 16 Global Step: 171680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:50:24,595-Speed 5426.38 samples/sec Loss 1.9530 LearningRate 0.0098 Epoch: 16 Global Step: 171690 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:50:32,209-Speed 5380.53 samples/sec Loss 1.9439 LearningRate 0.0098 Epoch: 16 Global Step: 171700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:50:39,928-Speed 5306.94 samples/sec Loss 1.9464 LearningRate 0.0098 Epoch: 16 Global Step: 171710 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:50:47,541-Speed 5380.56 samples/sec Loss 1.9552 LearningRate 0.0098 Epoch: 16 Global Step: 171720 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:50:55,074-Speed 5438.21 samples/sec Loss 1.9553 LearningRate 0.0098 Epoch: 16 Global Step: 171730 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:51:02,643-Speed 5412.40 samples/sec Loss 1.9395 LearningRate 0.0098 Epoch: 16 Global Step: 171740 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:51:10,076-Speed 5511.71 samples/sec Loss 1.8964 LearningRate 0.0098 Epoch: 16 Global Step: 171750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:51:17,587-Speed 5453.54 samples/sec Loss 1.9389 LearningRate 0.0098 Epoch: 16 Global Step: 171760 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:51:25,098-Speed 5454.42 samples/sec Loss 1.9413 LearningRate 0.0098 Epoch: 16 Global Step: 171770 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:51:32,567-Speed 5484.66 samples/sec Loss 1.9839 LearningRate 0.0098 Epoch: 16 Global Step: 171780 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 09:51:40,127-Speed 5418.77 samples/sec Loss 1.9448 LearningRate 0.0098 Epoch: 16 Global Step: 171790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:51:47,611-Speed 5473.21 samples/sec Loss 1.9151 LearningRate 0.0098 Epoch: 16 Global Step: 171800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:51:55,077-Speed 5487.13 samples/sec Loss 1.9123 LearningRate 0.0098 Epoch: 16 Global Step: 171810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:52:02,545-Speed 5485.40 samples/sec Loss 1.9231 LearningRate 0.0098 Epoch: 16 Global Step: 171820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:52:10,050-Speed 5458.38 samples/sec Loss 1.9195 LearningRate 0.0098 Epoch: 16 Global Step: 171830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:52:17,640-Speed 5397.46 samples/sec Loss 1.9055 LearningRate 0.0098 Epoch: 16 Global Step: 171840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:52:25,146-Speed 5457.54 samples/sec Loss 1.9345 LearningRate 0.0098 Epoch: 16 Global Step: 171850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:52:32,723-Speed 5407.15 samples/sec Loss 1.9157 LearningRate 0.0098 Epoch: 16 Global Step: 171860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:52:40,197-Speed 5480.84 samples/sec Loss 1.9245 LearningRate 0.0097 Epoch: 16 Global Step: 171870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:52:47,739-Speed 5431.82 samples/sec Loss 1.9287 LearningRate 0.0097 Epoch: 16 Global Step: 171880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:52:55,294-Speed 5421.62 samples/sec Loss 1.9470 LearningRate 0.0097 Epoch: 16 Global Step: 171890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:53:02,910-Speed 5379.16 samples/sec Loss 1.9358 LearningRate 0.0097 Epoch: 16 Global Step: 171900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:53:10,516-Speed 5386.55 samples/sec Loss 1.9036 LearningRate 0.0097 Epoch: 16 Global Step: 171910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:53:18,036-Speed 5447.20 samples/sec Loss 1.9180 LearningRate 0.0097 Epoch: 16 Global Step: 171920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:53:25,548-Speed 5453.02 samples/sec Loss 1.8878 LearningRate 0.0097 Epoch: 16 Global Step: 171930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:53:33,116-Speed 5413.06 samples/sec Loss 1.9166 LearningRate 0.0097 Epoch: 16 Global Step: 171940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:53:40,595-Speed 5494.58 samples/sec Loss 1.9689 LearningRate 0.0097 Epoch: 16 Global Step: 171950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:53:48,118-Speed 5444.89 samples/sec Loss 1.9047 LearningRate 0.0097 Epoch: 16 Global Step: 171960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:53:55,590-Speed 5482.59 samples/sec Loss 1.9300 LearningRate 0.0097 Epoch: 16 Global Step: 171970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:54:03,107-Speed 5449.61 samples/sec Loss 1.8914 LearningRate 0.0097 Epoch: 16 Global Step: 171980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:54:10,554-Speed 5501.57 samples/sec Loss 1.9313 LearningRate 0.0097 Epoch: 16 Global Step: 171990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:54:18,087-Speed 5437.81 samples/sec Loss 1.9355 LearningRate 0.0097 Epoch: 16 Global Step: 172000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:55:01,955-[lfw][172000]XNorm: 22.950953 Training: 2022-01-09 09:55:01,956-[lfw][172000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-01-09 09:55:01,957-[lfw][172000]Accuracy-Highest: 0.99833 Training: 2022-01-09 09:55:53,022-[cfp_fp][172000]XNorm: 22.024559 Training: 2022-01-09 09:55:53,023-[cfp_fp][172000]Accuracy-Flip: 0.99314+-0.00349 Training: 2022-01-09 09:55:53,023-[cfp_fp][172000]Accuracy-Highest: 0.99371 Training: 2022-01-09 09:56:36,909-[agedb_30][172000]XNorm: 23.269388 Training: 2022-01-09 09:56:36,910-[agedb_30][172000]Accuracy-Flip: 0.98433+-0.00588 Training: 2022-01-09 09:56:36,911-[agedb_30][172000]Accuracy-Highest: 0.98433 Training: 2022-01-09 09:56:44,492-Speed 279.77 samples/sec Loss 1.9287 LearningRate 0.0097 Epoch: 16 Global Step: 172010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:56:52,032-Speed 5433.06 samples/sec Loss 1.9253 LearningRate 0.0097 Epoch: 16 Global Step: 172020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:56:59,582-Speed 5425.40 samples/sec Loss 1.9117 LearningRate 0.0097 Epoch: 16 Global Step: 172030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:57:07,221-Speed 5362.97 samples/sec Loss 1.9259 LearningRate 0.0097 Epoch: 16 Global Step: 172040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:57:14,794-Speed 5409.33 samples/sec Loss 1.9040 LearningRate 0.0096 Epoch: 16 Global Step: 172050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:57:22,311-Speed 5449.76 samples/sec Loss 1.8938 LearningRate 0.0096 Epoch: 16 Global Step: 172060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:57:29,769-Speed 5492.59 samples/sec Loss 1.8862 LearningRate 0.0096 Epoch: 16 Global Step: 172070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:57:37,236-Speed 5486.79 samples/sec Loss 1.9024 LearningRate 0.0096 Epoch: 16 Global Step: 172080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:57:44,680-Speed 5502.37 samples/sec Loss 1.8841 LearningRate 0.0096 Epoch: 16 Global Step: 172090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:57:52,211-Speed 5440.16 samples/sec Loss 1.9504 LearningRate 0.0096 Epoch: 16 Global Step: 172100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:57:59,719-Speed 5455.90 samples/sec Loss 1.9525 LearningRate 0.0096 Epoch: 16 Global Step: 172110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:58:07,218-Speed 5463.02 samples/sec Loss 1.8947 LearningRate 0.0096 Epoch: 16 Global Step: 172120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:58:14,756-Speed 5434.19 samples/sec Loss 1.9134 LearningRate 0.0096 Epoch: 16 Global Step: 172130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:58:22,308-Speed 5424.69 samples/sec Loss 1.9096 LearningRate 0.0096 Epoch: 16 Global Step: 172140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:58:29,827-Speed 5447.71 samples/sec Loss 1.9211 LearningRate 0.0096 Epoch: 16 Global Step: 172150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:58:37,406-Speed 5405.22 samples/sec Loss 1.9243 LearningRate 0.0096 Epoch: 16 Global Step: 172160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:58:44,888-Speed 5475.13 samples/sec Loss 1.9135 LearningRate 0.0096 Epoch: 16 Global Step: 172170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 09:58:54,107-Speed 4443.47 samples/sec Loss 1.8898 LearningRate 0.0096 Epoch: 16 Global Step: 172180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:59:01,657-Speed 5426.33 samples/sec Loss 1.9212 LearningRate 0.0096 Epoch: 16 Global Step: 172190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:59:09,168-Speed 5453.45 samples/sec Loss 1.9279 LearningRate 0.0096 Epoch: 16 Global Step: 172200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:59:16,648-Speed 5476.80 samples/sec Loss 1.8914 LearningRate 0.0096 Epoch: 16 Global Step: 172210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:59:24,219-Speed 5411.12 samples/sec Loss 1.9230 LearningRate 0.0096 Epoch: 16 Global Step: 172220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:59:31,737-Speed 5448.96 samples/sec Loss 1.8948 LearningRate 0.0095 Epoch: 16 Global Step: 172230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:59:39,408-Speed 5340.28 samples/sec Loss 1.8833 LearningRate 0.0095 Epoch: 16 Global Step: 172240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:59:46,852-Speed 5503.24 samples/sec Loss 1.9100 LearningRate 0.0095 Epoch: 16 Global Step: 172250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 09:59:54,373-Speed 5446.59 samples/sec Loss 1.8972 LearningRate 0.0095 Epoch: 16 Global Step: 172260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:00:01,896-Speed 5445.59 samples/sec Loss 1.8724 LearningRate 0.0095 Epoch: 16 Global Step: 172270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:00:09,329-Speed 5511.57 samples/sec Loss 1.9232 LearningRate 0.0095 Epoch: 16 Global Step: 172280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:00:16,975-Speed 5357.22 samples/sec Loss 1.8919 LearningRate 0.0095 Epoch: 16 Global Step: 172290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:00:24,465-Speed 5469.54 samples/sec Loss 1.8771 LearningRate 0.0095 Epoch: 16 Global Step: 172300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:00:31,975-Speed 5455.35 samples/sec Loss 1.9008 LearningRate 0.0095 Epoch: 16 Global Step: 172310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:00:39,491-Speed 5450.40 samples/sec Loss 1.8925 LearningRate 0.0095 Epoch: 16 Global Step: 172320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:00:47,033-Speed 5431.50 samples/sec Loss 1.9073 LearningRate 0.0095 Epoch: 16 Global Step: 172330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:00:54,684-Speed 5354.45 samples/sec Loss 1.8654 LearningRate 0.0095 Epoch: 16 Global Step: 172340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:01:02,201-Speed 5449.99 samples/sec Loss 1.9014 LearningRate 0.0095 Epoch: 16 Global Step: 172350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:01:09,860-Speed 5348.69 samples/sec Loss 1.8833 LearningRate 0.0095 Epoch: 16 Global Step: 172360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:01:17,418-Speed 5419.62 samples/sec Loss 1.9120 LearningRate 0.0095 Epoch: 16 Global Step: 172370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:01:24,952-Speed 5437.97 samples/sec Loss 1.9149 LearningRate 0.0095 Epoch: 16 Global Step: 172380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:01:32,477-Speed 5444.22 samples/sec Loss 1.9061 LearningRate 0.0095 Epoch: 16 Global Step: 172390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:01:40,077-Speed 5389.60 samples/sec Loss 1.9034 LearningRate 0.0095 Epoch: 16 Global Step: 172400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:01:47,560-Speed 5475.19 samples/sec Loss 1.8914 LearningRate 0.0095 Epoch: 16 Global Step: 172410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:01:55,046-Speed 5472.16 samples/sec Loss 1.9006 LearningRate 0.0094 Epoch: 16 Global Step: 172420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:02:02,644-Speed 5391.35 samples/sec Loss 1.8803 LearningRate 0.0094 Epoch: 16 Global Step: 172430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:02:10,190-Speed 5428.90 samples/sec Loss 1.8689 LearningRate 0.0094 Epoch: 16 Global Step: 172440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:02:17,812-Speed 5374.43 samples/sec Loss 1.8734 LearningRate 0.0094 Epoch: 16 Global Step: 172450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:02:25,378-Speed 5414.68 samples/sec Loss 1.8787 LearningRate 0.0094 Epoch: 16 Global Step: 172460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:02:32,937-Speed 5419.65 samples/sec Loss 1.8908 LearningRate 0.0094 Epoch: 16 Global Step: 172470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:02:40,463-Speed 5443.03 samples/sec Loss 1.9228 LearningRate 0.0094 Epoch: 16 Global Step: 172480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:02:48,069-Speed 5386.26 samples/sec Loss 1.8762 LearningRate 0.0094 Epoch: 16 Global Step: 172490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:02:55,529-Speed 5491.09 samples/sec Loss 1.8816 LearningRate 0.0094 Epoch: 16 Global Step: 172500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:03:03,110-Speed 5403.48 samples/sec Loss 1.8855 LearningRate 0.0094 Epoch: 16 Global Step: 172510 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:03:10,617-Speed 5457.03 samples/sec Loss 1.8763 LearningRate 0.0094 Epoch: 16 Global Step: 172520 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:03:18,169-Speed 5424.90 samples/sec Loss 1.8931 LearningRate 0.0094 Epoch: 16 Global Step: 172530 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:03:25,649-Speed 5476.61 samples/sec Loss 1.8897 LearningRate 0.0094 Epoch: 16 Global Step: 172540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:03:33,201-Speed 5424.80 samples/sec Loss 1.8738 LearningRate 0.0094 Epoch: 16 Global Step: 172550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:03:40,710-Speed 5455.25 samples/sec Loss 1.8840 LearningRate 0.0094 Epoch: 16 Global Step: 172560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:03:48,226-Speed 5450.13 samples/sec Loss 1.8545 LearningRate 0.0094 Epoch: 16 Global Step: 172570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:03:55,741-Speed 5451.74 samples/sec Loss 1.8934 LearningRate 0.0094 Epoch: 16 Global Step: 172580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:04:03,329-Speed 5398.24 samples/sec Loss 1.8765 LearningRate 0.0094 Epoch: 16 Global Step: 172590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:04:10,857-Speed 5442.08 samples/sec Loss 1.8650 LearningRate 0.0093 Epoch: 16 Global Step: 172600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:04:18,343-Speed 5472.21 samples/sec Loss 1.9203 LearningRate 0.0093 Epoch: 16 Global Step: 172610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:04:25,856-Speed 5452.35 samples/sec Loss 1.8738 LearningRate 0.0093 Epoch: 16 Global Step: 172620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:04:33,360-Speed 5458.97 samples/sec Loss 1.8903 LearningRate 0.0093 Epoch: 16 Global Step: 172630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:04:40,834-Speed 5481.60 samples/sec Loss 1.8747 LearningRate 0.0093 Epoch: 16 Global Step: 172640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:04:48,380-Speed 5428.36 samples/sec Loss 1.8764 LearningRate 0.0093 Epoch: 16 Global Step: 172650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:04:55,867-Speed 5471.68 samples/sec Loss 1.8394 LearningRate 0.0093 Epoch: 16 Global Step: 172660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:05:03,409-Speed 5431.52 samples/sec Loss 1.8829 LearningRate 0.0093 Epoch: 16 Global Step: 172670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:05:10,895-Speed 5472.63 samples/sec Loss 1.8351 LearningRate 0.0093 Epoch: 16 Global Step: 172680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:05:18,431-Speed 5435.92 samples/sec Loss 1.8904 LearningRate 0.0093 Epoch: 16 Global Step: 172690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:05:25,933-Speed 5460.86 samples/sec Loss 1.8662 LearningRate 0.0093 Epoch: 16 Global Step: 172700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:05:33,417-Speed 5473.72 samples/sec Loss 1.8729 LearningRate 0.0093 Epoch: 16 Global Step: 172710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:05:40,961-Speed 5430.30 samples/sec Loss 1.8690 LearningRate 0.0093 Epoch: 16 Global Step: 172720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:05:48,490-Speed 5440.91 samples/sec Loss 1.8776 LearningRate 0.0093 Epoch: 16 Global Step: 172730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:05:56,085-Speed 5393.89 samples/sec Loss 1.8757 LearningRate 0.0093 Epoch: 16 Global Step: 172740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:06:03,557-Speed 5483.75 samples/sec Loss 1.9126 LearningRate 0.0093 Epoch: 16 Global Step: 172750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:06:11,307-Speed 5286.28 samples/sec Loss 1.8668 LearningRate 0.0093 Epoch: 16 Global Step: 172760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:06:18,941-Speed 5366.32 samples/sec Loss 1.8819 LearningRate 0.0093 Epoch: 16 Global Step: 172770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:06:26,524-Speed 5401.51 samples/sec Loss 1.8490 LearningRate 0.0093 Epoch: 16 Global Step: 172780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:06:34,083-Speed 5419.26 samples/sec Loss 1.9166 LearningRate 0.0092 Epoch: 16 Global Step: 172790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:06:41,612-Speed 5441.08 samples/sec Loss 1.8373 LearningRate 0.0092 Epoch: 16 Global Step: 172800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:06:49,116-Speed 5459.76 samples/sec Loss 1.8725 LearningRate 0.0092 Epoch: 16 Global Step: 172810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:06:56,665-Speed 5425.92 samples/sec Loss 1.8766 LearningRate 0.0092 Epoch: 16 Global Step: 172820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:07:04,082-Speed 5523.38 samples/sec Loss 1.8710 LearningRate 0.0092 Epoch: 16 Global Step: 172830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:07:11,626-Speed 5430.68 samples/sec Loss 1.8985 LearningRate 0.0092 Epoch: 16 Global Step: 172840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:07:19,111-Speed 5473.22 samples/sec Loss 1.8548 LearningRate 0.0092 Epoch: 16 Global Step: 172850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:07:26,665-Speed 5422.61 samples/sec Loss 1.8193 LearningRate 0.0092 Epoch: 16 Global Step: 172860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:07:34,152-Speed 5471.85 samples/sec Loss 1.8589 LearningRate 0.0092 Epoch: 16 Global Step: 172870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:07:41,758-Speed 5386.15 samples/sec Loss 1.8631 LearningRate 0.0092 Epoch: 16 Global Step: 172880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:07:49,325-Speed 5414.07 samples/sec Loss 1.8354 LearningRate 0.0092 Epoch: 16 Global Step: 172890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:07:56,896-Speed 5410.18 samples/sec Loss 1.8745 LearningRate 0.0092 Epoch: 16 Global Step: 172900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:08:04,432-Speed 5435.92 samples/sec Loss 1.8457 LearningRate 0.0092 Epoch: 16 Global Step: 172910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:08:12,070-Speed 5363.25 samples/sec Loss 1.8965 LearningRate 0.0092 Epoch: 16 Global Step: 172920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:08:19,557-Speed 5472.18 samples/sec Loss 1.8634 LearningRate 0.0092 Epoch: 16 Global Step: 172930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:08:27,086-Speed 5440.55 samples/sec Loss 1.8677 LearningRate 0.0092 Epoch: 16 Global Step: 172940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:08:34,561-Speed 5480.05 samples/sec Loss 1.8642 LearningRate 0.0092 Epoch: 16 Global Step: 172950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:08:42,096-Speed 5436.90 samples/sec Loss 1.8649 LearningRate 0.0092 Epoch: 16 Global Step: 172960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:08:49,603-Speed 5457.54 samples/sec Loss 1.8458 LearningRate 0.0092 Epoch: 16 Global Step: 172970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:08:57,139-Speed 5436.03 samples/sec Loss 1.8450 LearningRate 0.0091 Epoch: 16 Global Step: 172980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:09:04,674-Speed 5436.46 samples/sec Loss 1.8568 LearningRate 0.0091 Epoch: 16 Global Step: 172990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:09:12,258-Speed 5400.73 samples/sec Loss 1.8697 LearningRate 0.0091 Epoch: 16 Global Step: 173000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:09:19,811-Speed 5424.29 samples/sec Loss 1.8702 LearningRate 0.0091 Epoch: 16 Global Step: 173010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:09:27,374-Speed 5416.27 samples/sec Loss 1.8246 LearningRate 0.0091 Epoch: 16 Global Step: 173020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:09:34,903-Speed 5441.29 samples/sec Loss 1.8832 LearningRate 0.0091 Epoch: 16 Global Step: 173030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:09:42,418-Speed 5450.43 samples/sec Loss 1.8677 LearningRate 0.0091 Epoch: 16 Global Step: 173040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:09:49,965-Speed 5428.74 samples/sec Loss 1.8728 LearningRate 0.0091 Epoch: 16 Global Step: 173050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:09:57,576-Speed 5382.14 samples/sec Loss 1.8484 LearningRate 0.0091 Epoch: 16 Global Step: 173060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:10:05,212-Speed 5364.51 samples/sec Loss 1.8674 LearningRate 0.0091 Epoch: 16 Global Step: 173070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:10:12,799-Speed 5399.56 samples/sec Loss 1.8449 LearningRate 0.0091 Epoch: 16 Global Step: 173080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:10:20,199-Speed 5535.93 samples/sec Loss 1.8616 LearningRate 0.0091 Epoch: 16 Global Step: 173090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:10:27,781-Speed 5403.50 samples/sec Loss 1.8523 LearningRate 0.0091 Epoch: 16 Global Step: 173100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:10:35,337-Speed 5420.95 samples/sec Loss 1.8600 LearningRate 0.0091 Epoch: 16 Global Step: 173110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:10:42,858-Speed 5446.76 samples/sec Loss 1.8626 LearningRate 0.0091 Epoch: 16 Global Step: 173120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:10:50,373-Speed 5451.63 samples/sec Loss 1.8591 LearningRate 0.0091 Epoch: 16 Global Step: 173130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:10:57,902-Speed 5441.24 samples/sec Loss 1.8710 LearningRate 0.0091 Epoch: 16 Global Step: 173140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:11:05,423-Speed 5446.27 samples/sec Loss 1.8641 LearningRate 0.0091 Epoch: 16 Global Step: 173150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:11:12,972-Speed 5426.57 samples/sec Loss 1.8812 LearningRate 0.0091 Epoch: 16 Global Step: 173160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:11:20,481-Speed 5456.20 samples/sec Loss 1.8817 LearningRate 0.0090 Epoch: 16 Global Step: 173170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:11:28,077-Speed 5393.02 samples/sec Loss 1.8538 LearningRate 0.0090 Epoch: 16 Global Step: 173180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:11:35,574-Speed 5464.63 samples/sec Loss 1.8680 LearningRate 0.0090 Epoch: 16 Global Step: 173190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:11:43,019-Speed 5501.88 samples/sec Loss 1.8231 LearningRate 0.0090 Epoch: 16 Global Step: 173200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:11:50,629-Speed 5382.98 samples/sec Loss 1.8513 LearningRate 0.0090 Epoch: 16 Global Step: 173210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:11:58,152-Speed 5445.96 samples/sec Loss 1.8393 LearningRate 0.0090 Epoch: 16 Global Step: 173220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:12:05,710-Speed 5419.76 samples/sec Loss 1.8497 LearningRate 0.0090 Epoch: 16 Global Step: 173230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:12:13,263-Speed 5423.37 samples/sec Loss 1.8360 LearningRate 0.0090 Epoch: 16 Global Step: 173240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:12:20,789-Speed 5443.71 samples/sec Loss 1.8256 LearningRate 0.0090 Epoch: 16 Global Step: 173250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:12:28,400-Speed 5382.72 samples/sec Loss 1.8429 LearningRate 0.0090 Epoch: 16 Global Step: 173260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:12:35,987-Speed 5399.69 samples/sec Loss 1.8684 LearningRate 0.0090 Epoch: 16 Global Step: 173270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:12:43,532-Speed 5428.84 samples/sec Loss 1.8515 LearningRate 0.0090 Epoch: 16 Global Step: 173280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:12:51,036-Speed 5459.88 samples/sec Loss 1.8582 LearningRate 0.0090 Epoch: 16 Global Step: 173290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:12:58,530-Speed 5466.50 samples/sec Loss 1.8367 LearningRate 0.0090 Epoch: 16 Global Step: 173300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:13:05,994-Speed 5487.85 samples/sec Loss 1.8431 LearningRate 0.0090 Epoch: 16 Global Step: 173310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:13:13,430-Speed 5509.43 samples/sec Loss 1.8565 LearningRate 0.0090 Epoch: 16 Global Step: 173320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:13:20,896-Speed 5486.88 samples/sec Loss 1.8434 LearningRate 0.0090 Epoch: 16 Global Step: 173330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:13:28,438-Speed 5432.09 samples/sec Loss 1.8629 LearningRate 0.0090 Epoch: 16 Global Step: 173340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:13:35,984-Speed 5428.66 samples/sec Loss 1.8560 LearningRate 0.0090 Epoch: 16 Global Step: 173350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:13:43,418-Speed 5509.59 samples/sec Loss 1.8570 LearningRate 0.0089 Epoch: 16 Global Step: 173360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:13:50,912-Speed 5467.28 samples/sec Loss 1.8548 LearningRate 0.0089 Epoch: 16 Global Step: 173370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:13:58,440-Speed 5441.54 samples/sec Loss 1.8676 LearningRate 0.0089 Epoch: 16 Global Step: 173380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:14:06,046-Speed 5386.30 samples/sec Loss 1.8481 LearningRate 0.0089 Epoch: 16 Global Step: 173390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:14:13,548-Speed 5460.23 samples/sec Loss 1.8453 LearningRate 0.0089 Epoch: 16 Global Step: 173400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:14:21,014-Speed 5486.78 samples/sec Loss 1.8289 LearningRate 0.0089 Epoch: 16 Global Step: 173410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:14:28,520-Speed 5458.45 samples/sec Loss 1.8212 LearningRate 0.0089 Epoch: 16 Global Step: 173420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:14:36,076-Speed 5420.97 samples/sec Loss 1.8668 LearningRate 0.0089 Epoch: 16 Global Step: 173430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:14:43,684-Speed 5384.28 samples/sec Loss 1.8619 LearningRate 0.0089 Epoch: 16 Global Step: 173440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:14:51,161-Speed 5479.31 samples/sec Loss 1.8322 LearningRate 0.0089 Epoch: 16 Global Step: 173450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:14:58,593-Speed 5512.37 samples/sec Loss 1.8296 LearningRate 0.0089 Epoch: 16 Global Step: 173460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:15:06,144-Speed 5425.21 samples/sec Loss 1.8364 LearningRate 0.0089 Epoch: 16 Global Step: 173470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:15:13,692-Speed 5426.90 samples/sec Loss 1.8217 LearningRate 0.0089 Epoch: 16 Global Step: 173480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:15:21,248-Speed 5421.94 samples/sec Loss 1.8402 LearningRate 0.0089 Epoch: 16 Global Step: 173490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:15:28,829-Speed 5403.96 samples/sec Loss 1.8210 LearningRate 0.0089 Epoch: 16 Global Step: 173500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:15:36,328-Speed 5462.39 samples/sec Loss 1.8282 LearningRate 0.0089 Epoch: 16 Global Step: 173510 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:15:43,846-Speed 5449.20 samples/sec Loss 1.8075 LearningRate 0.0089 Epoch: 16 Global Step: 173520 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:15:51,390-Speed 5429.66 samples/sec Loss 1.8395 LearningRate 0.0089 Epoch: 16 Global Step: 173530 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:15:59,005-Speed 5379.88 samples/sec Loss 1.8402 LearningRate 0.0089 Epoch: 16 Global Step: 173540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:16:06,575-Speed 5411.60 samples/sec Loss 1.8251 LearningRate 0.0088 Epoch: 16 Global Step: 173550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:16:14,158-Speed 5401.98 samples/sec Loss 1.8442 LearningRate 0.0088 Epoch: 16 Global Step: 173560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:16:21,637-Speed 5477.61 samples/sec Loss 1.8266 LearningRate 0.0088 Epoch: 16 Global Step: 173570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:16:29,163-Speed 5443.38 samples/sec Loss 1.8505 LearningRate 0.0088 Epoch: 16 Global Step: 173580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:16:36,764-Speed 5389.52 samples/sec Loss 1.8052 LearningRate 0.0088 Epoch: 16 Global Step: 173590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:16:44,301-Speed 5435.10 samples/sec Loss 1.8634 LearningRate 0.0088 Epoch: 16 Global Step: 173600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:16:51,911-Speed 5382.96 samples/sec Loss 1.8216 LearningRate 0.0088 Epoch: 16 Global Step: 173610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:16:59,419-Speed 5456.79 samples/sec Loss 1.8201 LearningRate 0.0088 Epoch: 16 Global Step: 173620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:17:06,984-Speed 5415.30 samples/sec Loss 1.8407 LearningRate 0.0088 Epoch: 16 Global Step: 173630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:17:14,584-Speed 5389.89 samples/sec Loss 1.8238 LearningRate 0.0088 Epoch: 16 Global Step: 173640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:17:22,165-Speed 5403.41 samples/sec Loss 1.8711 LearningRate 0.0088 Epoch: 16 Global Step: 173650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:17:29,705-Speed 5433.52 samples/sec Loss 1.8176 LearningRate 0.0088 Epoch: 16 Global Step: 173660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:17:37,195-Speed 5469.26 samples/sec Loss 1.8232 LearningRate 0.0088 Epoch: 16 Global Step: 173670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:17:44,716-Speed 5447.11 samples/sec Loss 1.8367 LearningRate 0.0088 Epoch: 16 Global Step: 173680 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:17:52,303-Speed 5398.95 samples/sec Loss 1.8031 LearningRate 0.0088 Epoch: 16 Global Step: 173690 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:17:59,887-Speed 5401.65 samples/sec Loss 1.8490 LearningRate 0.0088 Epoch: 16 Global Step: 173700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:18:07,386-Speed 5462.42 samples/sec Loss 1.7967 LearningRate 0.0088 Epoch: 16 Global Step: 173710 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:18:14,915-Speed 5441.15 samples/sec Loss 1.8011 LearningRate 0.0088 Epoch: 16 Global Step: 173720 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:18:22,446-Speed 5439.42 samples/sec Loss 1.8561 LearningRate 0.0088 Epoch: 16 Global Step: 173730 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:18:30,019-Speed 5409.46 samples/sec Loss 1.8148 LearningRate 0.0087 Epoch: 16 Global Step: 173740 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:18:37,558-Speed 5433.99 samples/sec Loss 1.8453 LearningRate 0.0087 Epoch: 16 Global Step: 173750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:18:45,011-Speed 5496.86 samples/sec Loss 1.8241 LearningRate 0.0087 Epoch: 16 Global Step: 173760 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:18:52,563-Speed 5423.77 samples/sec Loss 1.8031 LearningRate 0.0087 Epoch: 16 Global Step: 173770 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-01-09 10:19:00,029-Speed 5487.00 samples/sec Loss 1.8341 LearningRate 0.0087 Epoch: 16 Global Step: 173780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:19:07,546-Speed 5450.29 samples/sec Loss 1.8183 LearningRate 0.0087 Epoch: 16 Global Step: 173790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:19:15,051-Speed 5458.17 samples/sec Loss 1.8048 LearningRate 0.0087 Epoch: 16 Global Step: 173800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:19:22,564-Speed 5452.33 samples/sec Loss 1.7886 LearningRate 0.0087 Epoch: 16 Global Step: 173810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:19:30,076-Speed 5452.97 samples/sec Loss 1.8031 LearningRate 0.0087 Epoch: 16 Global Step: 173820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:19:37,528-Speed 5497.64 samples/sec Loss 1.8424 LearningRate 0.0087 Epoch: 16 Global Step: 173830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:19:45,015-Speed 5471.60 samples/sec Loss 1.7829 LearningRate 0.0087 Epoch: 16 Global Step: 173840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:19:52,511-Speed 5464.58 samples/sec Loss 1.7876 LearningRate 0.0087 Epoch: 16 Global Step: 173850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:20:00,020-Speed 5455.17 samples/sec Loss 1.8489 LearningRate 0.0087 Epoch: 16 Global Step: 173860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:20:07,612-Speed 5396.34 samples/sec Loss 1.8207 LearningRate 0.0087 Epoch: 16 Global Step: 173870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:20:15,099-Speed 5471.52 samples/sec Loss 1.8430 LearningRate 0.0087 Epoch: 16 Global Step: 173880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:20:22,854-Speed 5282.55 samples/sec Loss 1.8460 LearningRate 0.0087 Epoch: 16 Global Step: 173890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:20:30,410-Speed 5421.17 samples/sec Loss 1.8295 LearningRate 0.0087 Epoch: 16 Global Step: 173900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:20:38,145-Speed 5296.32 samples/sec Loss 1.8637 LearningRate 0.0087 Epoch: 16 Global Step: 173910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:20:45,665-Speed 5447.44 samples/sec Loss 1.8523 LearningRate 0.0087 Epoch: 16 Global Step: 173920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:20:53,212-Speed 5428.11 samples/sec Loss 1.8195 LearningRate 0.0086 Epoch: 16 Global Step: 173930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:21:00,755-Speed 5431.04 samples/sec Loss 1.8143 LearningRate 0.0086 Epoch: 16 Global Step: 173940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:21:08,244-Speed 5470.39 samples/sec Loss 1.8175 LearningRate 0.0086 Epoch: 16 Global Step: 173950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:21:15,742-Speed 5463.45 samples/sec Loss 1.7952 LearningRate 0.0086 Epoch: 16 Global Step: 173960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:21:23,283-Speed 5432.31 samples/sec Loss 1.8256 LearningRate 0.0086 Epoch: 16 Global Step: 173970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:21:30,815-Speed 5438.44 samples/sec Loss 1.8187 LearningRate 0.0086 Epoch: 16 Global Step: 173980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:21:38,301-Speed 5472.63 samples/sec Loss 1.8395 LearningRate 0.0086 Epoch: 16 Global Step: 173990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:21:45,818-Speed 5450.37 samples/sec Loss 1.8341 LearningRate 0.0086 Epoch: 16 Global Step: 174000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:22:30,344-[lfw][174000]XNorm: 23.097489 Training: 2022-01-09 10:22:30,345-[lfw][174000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 10:22:30,345-[lfw][174000]Accuracy-Highest: 0.99833 Training: 2022-01-09 10:23:22,116-[cfp_fp][174000]XNorm: 22.366889 Training: 2022-01-09 10:23:22,117-[cfp_fp][174000]Accuracy-Flip: 0.99300+-0.00364 Training: 2022-01-09 10:23:22,117-[cfp_fp][174000]Accuracy-Highest: 0.99371 Training: 2022-01-09 10:24:06,609-[agedb_30][174000]XNorm: 23.369141 Training: 2022-01-09 10:24:06,610-[agedb_30][174000]Accuracy-Flip: 0.98400+-0.00484 Training: 2022-01-09 10:24:06,610-[agedb_30][174000]Accuracy-Highest: 0.98433 Training: 2022-01-09 10:24:14,202-Speed 276.04 samples/sec Loss 1.8180 LearningRate 0.0086 Epoch: 16 Global Step: 174010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:24:21,648-Speed 5501.23 samples/sec Loss 1.7896 LearningRate 0.0086 Epoch: 16 Global Step: 174020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:24:29,252-Speed 5387.35 samples/sec Loss 1.8389 LearningRate 0.0086 Epoch: 16 Global Step: 174030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:24:36,732-Speed 5476.78 samples/sec Loss 1.7926 LearningRate 0.0086 Epoch: 16 Global Step: 174040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:24:44,236-Speed 5459.27 samples/sec Loss 1.7933 LearningRate 0.0086 Epoch: 16 Global Step: 174050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:24:51,817-Speed 5403.40 samples/sec Loss 1.8024 LearningRate 0.0086 Epoch: 16 Global Step: 174060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:24:59,371-Speed 5423.41 samples/sec Loss 1.8198 LearningRate 0.0086 Epoch: 16 Global Step: 174070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:25:06,864-Speed 5467.31 samples/sec Loss 1.8213 LearningRate 0.0086 Epoch: 16 Global Step: 174080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:25:14,426-Speed 5417.77 samples/sec Loss 1.8393 LearningRate 0.0086 Epoch: 16 Global Step: 174090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:25:21,907-Speed 5475.21 samples/sec Loss 1.7604 LearningRate 0.0086 Epoch: 16 Global Step: 174100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:25:29,498-Speed 5396.78 samples/sec Loss 1.8315 LearningRate 0.0086 Epoch: 16 Global Step: 174110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:25:37,116-Speed 5377.78 samples/sec Loss 1.7939 LearningRate 0.0086 Epoch: 16 Global Step: 174120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:25:44,600-Speed 5473.56 samples/sec Loss 1.8154 LearningRate 0.0085 Epoch: 16 Global Step: 174130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:25:52,134-Speed 5437.08 samples/sec Loss 1.8408 LearningRate 0.0085 Epoch: 16 Global Step: 174140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:25:59,683-Speed 5426.83 samples/sec Loss 1.7906 LearningRate 0.0085 Epoch: 16 Global Step: 174150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:26:07,189-Speed 5458.12 samples/sec Loss 1.8014 LearningRate 0.0085 Epoch: 16 Global Step: 174160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:26:14,680-Speed 5468.66 samples/sec Loss 1.8105 LearningRate 0.0085 Epoch: 16 Global Step: 174170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:26:22,189-Speed 5455.13 samples/sec Loss 1.8059 LearningRate 0.0085 Epoch: 16 Global Step: 174180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:26:29,726-Speed 5435.26 samples/sec Loss 1.8148 LearningRate 0.0085 Epoch: 16 Global Step: 174190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:26:37,255-Speed 5441.43 samples/sec Loss 1.7982 LearningRate 0.0085 Epoch: 16 Global Step: 174200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:26:44,714-Speed 5491.53 samples/sec Loss 1.8119 LearningRate 0.0085 Epoch: 16 Global Step: 174210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:26:52,288-Speed 5408.73 samples/sec Loss 1.7890 LearningRate 0.0085 Epoch: 16 Global Step: 174220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:26:59,800-Speed 5453.02 samples/sec Loss 1.8170 LearningRate 0.0085 Epoch: 16 Global Step: 174230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:27:07,356-Speed 5421.68 samples/sec Loss 1.8114 LearningRate 0.0085 Epoch: 16 Global Step: 174240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:27:14,865-Speed 5456.19 samples/sec Loss 1.7917 LearningRate 0.0085 Epoch: 16 Global Step: 174250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:27:22,344-Speed 5477.27 samples/sec Loss 1.8507 LearningRate 0.0085 Epoch: 16 Global Step: 174260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:27:29,888-Speed 5429.90 samples/sec Loss 1.7751 LearningRate 0.0085 Epoch: 16 Global Step: 174270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:27:37,528-Speed 5361.58 samples/sec Loss 1.7905 LearningRate 0.0085 Epoch: 16 Global Step: 174280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:27:45,302-Speed 5269.88 samples/sec Loss 1.8028 LearningRate 0.0085 Epoch: 16 Global Step: 174290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:27:53,062-Speed 5279.12 samples/sec Loss 1.7854 LearningRate 0.0085 Epoch: 16 Global Step: 174300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:28:00,733-Speed 5339.82 samples/sec Loss 1.7773 LearningRate 0.0085 Epoch: 16 Global Step: 174310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:28:08,328-Speed 5393.72 samples/sec Loss 1.7932 LearningRate 0.0084 Epoch: 16 Global Step: 174320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:28:16,052-Speed 5304.14 samples/sec Loss 1.7609 LearningRate 0.0084 Epoch: 16 Global Step: 174330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:28:23,714-Speed 5346.89 samples/sec Loss 1.8010 LearningRate 0.0084 Epoch: 16 Global Step: 174340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:28:31,306-Speed 5395.04 samples/sec Loss 1.7990 LearningRate 0.0084 Epoch: 16 Global Step: 174350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:28:38,851-Speed 5429.74 samples/sec Loss 1.7837 LearningRate 0.0084 Epoch: 16 Global Step: 174360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:28:46,374-Speed 5445.80 samples/sec Loss 1.8038 LearningRate 0.0084 Epoch: 16 Global Step: 174370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:28:53,885-Speed 5453.85 samples/sec Loss 1.7973 LearningRate 0.0084 Epoch: 16 Global Step: 174380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:29:01,424-Speed 5433.78 samples/sec Loss 1.7999 LearningRate 0.0084 Epoch: 16 Global Step: 174390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:29:08,984-Speed 5418.61 samples/sec Loss 1.8023 LearningRate 0.0084 Epoch: 16 Global Step: 174400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:29:16,509-Speed 5444.36 samples/sec Loss 1.7888 LearningRate 0.0084 Epoch: 16 Global Step: 174410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:29:24,010-Speed 5461.16 samples/sec Loss 1.7957 LearningRate 0.0084 Epoch: 16 Global Step: 174420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:29:31,496-Speed 5471.61 samples/sec Loss 1.7551 LearningRate 0.0084 Epoch: 16 Global Step: 174430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:29:39,266-Speed 5272.20 samples/sec Loss 1.8252 LearningRate 0.0084 Epoch: 16 Global Step: 174440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:29:46,835-Speed 5412.51 samples/sec Loss 1.7962 LearningRate 0.0084 Epoch: 16 Global Step: 174450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:29:54,295-Speed 5491.07 samples/sec Loss 1.7636 LearningRate 0.0084 Epoch: 16 Global Step: 174460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:30:01,831-Speed 5436.13 samples/sec Loss 1.8236 LearningRate 0.0084 Epoch: 16 Global Step: 174470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:30:09,363-Speed 5438.60 samples/sec Loss 1.7867 LearningRate 0.0084 Epoch: 16 Global Step: 174480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:30:16,847-Speed 5473.94 samples/sec Loss 1.7743 LearningRate 0.0084 Epoch: 16 Global Step: 174490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:30:24,340-Speed 5466.94 samples/sec Loss 1.7711 LearningRate 0.0084 Epoch: 16 Global Step: 174500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:30:31,836-Speed 5464.72 samples/sec Loss 1.7881 LearningRate 0.0084 Epoch: 16 Global Step: 174510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:30:39,453-Speed 5378.78 samples/sec Loss 1.7908 LearningRate 0.0083 Epoch: 16 Global Step: 174520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:30:46,890-Speed 5508.55 samples/sec Loss 1.8044 LearningRate 0.0083 Epoch: 16 Global Step: 174530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:30:54,383-Speed 5467.26 samples/sec Loss 1.7900 LearningRate 0.0083 Epoch: 16 Global Step: 174540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:31:02,071-Speed 5328.18 samples/sec Loss 1.7710 LearningRate 0.0083 Epoch: 16 Global Step: 174550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:31:09,659-Speed 5398.65 samples/sec Loss 1.7598 LearningRate 0.0083 Epoch: 16 Global Step: 174560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:31:17,173-Speed 5452.55 samples/sec Loss 1.7748 LearningRate 0.0083 Epoch: 16 Global Step: 174570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:31:24,831-Speed 5349.03 samples/sec Loss 1.8093 LearningRate 0.0083 Epoch: 16 Global Step: 174580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:31:32,291-Speed 5491.24 samples/sec Loss 1.7855 LearningRate 0.0083 Epoch: 16 Global Step: 174590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:31:39,778-Speed 5471.93 samples/sec Loss 1.7889 LearningRate 0.0083 Epoch: 16 Global Step: 174600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-09 10:31:47,278-Speed 5462.32 samples/sec Loss 1.7902 LearningRate 0.0083 Epoch: 16 Global Step: 174610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:31:54,778-Speed 5461.73 samples/sec Loss 1.7765 LearningRate 0.0083 Epoch: 16 Global Step: 174620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:32:02,303-Speed 5443.60 samples/sec Loss 1.7530 LearningRate 0.0083 Epoch: 16 Global Step: 174630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:32:09,918-Speed 5379.77 samples/sec Loss 1.7676 LearningRate 0.0083 Epoch: 16 Global Step: 174640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-09 10:32:17,445-Speed 5442.30 samples/sec Loss 1.7818 LearningRate 0.0083 Epoch: 16 Global Step: 174650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:32:24,976-Speed 5439.27 samples/sec Loss 1.7690 LearningRate 0.0083 Epoch: 16 Global Step: 174660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:32:32,670-Speed 5324.73 samples/sec Loss 1.8112 LearningRate 0.0083 Epoch: 16 Global Step: 174670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:32:40,147-Speed 5478.81 samples/sec Loss 1.8018 LearningRate 0.0083 Epoch: 16 Global Step: 174680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:32:47,669-Speed 5445.92 samples/sec Loss 1.7931 LearningRate 0.0083 Epoch: 16 Global Step: 174690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:32:55,165-Speed 5464.80 samples/sec Loss 1.7614 LearningRate 0.0083 Epoch: 16 Global Step: 174700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:33:02,616-Speed 5498.44 samples/sec Loss 1.8115 LearningRate 0.0082 Epoch: 16 Global Step: 174710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:33:10,198-Speed 5402.87 samples/sec Loss 1.7688 LearningRate 0.0082 Epoch: 16 Global Step: 174720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:33:17,726-Speed 5441.74 samples/sec Loss 1.7761 LearningRate 0.0082 Epoch: 16 Global Step: 174730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:33:25,230-Speed 5458.95 samples/sec Loss 1.7923 LearningRate 0.0082 Epoch: 16 Global Step: 174740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:33:32,779-Speed 5426.38 samples/sec Loss 1.7902 LearningRate 0.0082 Epoch: 16 Global Step: 174750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:33:40,312-Speed 5438.40 samples/sec Loss 1.7900 LearningRate 0.0082 Epoch: 16 Global Step: 174760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:33:47,824-Speed 5452.91 samples/sec Loss 1.8205 LearningRate 0.0082 Epoch: 16 Global Step: 174770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:33:55,381-Speed 5420.89 samples/sec Loss 1.7993 LearningRate 0.0082 Epoch: 16 Global Step: 174780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:34:02,900-Speed 5448.75 samples/sec Loss 1.7928 LearningRate 0.0082 Epoch: 16 Global Step: 174790 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:34:10,361-Speed 5490.01 samples/sec Loss 1.7983 LearningRate 0.0082 Epoch: 16 Global Step: 174800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:34:17,892-Speed 5440.27 samples/sec Loss 1.7428 LearningRate 0.0082 Epoch: 16 Global Step: 174810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:34:25,424-Speed 5438.41 samples/sec Loss 1.7809 LearningRate 0.0082 Epoch: 16 Global Step: 174820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:34:33,032-Speed 5384.71 samples/sec Loss 1.7552 LearningRate 0.0082 Epoch: 16 Global Step: 174830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:34:40,557-Speed 5443.44 samples/sec Loss 1.7862 LearningRate 0.0082 Epoch: 16 Global Step: 174840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:34:48,505-Speed 5154.17 samples/sec Loss 1.7752 LearningRate 0.0082 Epoch: 16 Global Step: 174850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:34:56,088-Speed 5402.38 samples/sec Loss 1.7562 LearningRate 0.0082 Epoch: 16 Global Step: 174860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:35:03,608-Speed 5447.68 samples/sec Loss 1.7813 LearningRate 0.0082 Epoch: 16 Global Step: 174870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:35:11,167-Speed 5419.07 samples/sec Loss 1.7638 LearningRate 0.0082 Epoch: 16 Global Step: 174880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:35:18,685-Speed 5448.71 samples/sec Loss 1.7755 LearningRate 0.0082 Epoch: 16 Global Step: 174890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:35:26,222-Speed 5436.00 samples/sec Loss 1.7707 LearningRate 0.0082 Epoch: 16 Global Step: 174900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:35:33,787-Speed 5414.51 samples/sec Loss 1.7571 LearningRate 0.0081 Epoch: 16 Global Step: 174910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:35:41,422-Speed 5365.78 samples/sec Loss 1.7439 LearningRate 0.0081 Epoch: 16 Global Step: 174920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:35:48,947-Speed 5444.48 samples/sec Loss 1.7475 LearningRate 0.0081 Epoch: 16 Global Step: 174930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:35:56,590-Speed 5359.86 samples/sec Loss 1.7863 LearningRate 0.0081 Epoch: 16 Global Step: 174940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:36:04,168-Speed 5405.95 samples/sec Loss 1.7868 LearningRate 0.0081 Epoch: 16 Global Step: 174950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:36:11,793-Speed 5372.54 samples/sec Loss 1.7416 LearningRate 0.0081 Epoch: 16 Global Step: 174960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:36:19,302-Speed 5455.57 samples/sec Loss 1.7814 LearningRate 0.0081 Epoch: 16 Global Step: 174970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:36:26,842-Speed 5432.64 samples/sec Loss 1.8023 LearningRate 0.0081 Epoch: 16 Global Step: 174980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:36:34,483-Speed 5361.66 samples/sec Loss 1.7966 LearningRate 0.0081 Epoch: 16 Global Step: 174990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:36:42,045-Speed 5416.81 samples/sec Loss 1.7565 LearningRate 0.0081 Epoch: 16 Global Step: 175000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:36:49,709-Speed 5345.38 samples/sec Loss 1.7665 LearningRate 0.0081 Epoch: 16 Global Step: 175010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:36:57,302-Speed 5395.25 samples/sec Loss 1.7502 LearningRate 0.0081 Epoch: 16 Global Step: 175020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:37:04,899-Speed 5391.93 samples/sec Loss 1.7646 LearningRate 0.0081 Epoch: 16 Global Step: 175030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:37:12,493-Speed 5394.25 samples/sec Loss 1.7911 LearningRate 0.0081 Epoch: 16 Global Step: 175040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:37:19,977-Speed 5473.88 samples/sec Loss 1.7670 LearningRate 0.0081 Epoch: 16 Global Step: 175050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:37:27,515-Speed 5434.68 samples/sec Loss 1.7677 LearningRate 0.0081 Epoch: 16 Global Step: 175060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:37:35,137-Speed 5374.52 samples/sec Loss 1.7544 LearningRate 0.0081 Epoch: 16 Global Step: 175070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:37:42,753-Speed 5378.17 samples/sec Loss 1.7681 LearningRate 0.0081 Epoch: 16 Global Step: 175080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:37:50,359-Speed 5386.61 samples/sec Loss 1.7422 LearningRate 0.0081 Epoch: 16 Global Step: 175090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:37:57,995-Speed 5364.48 samples/sec Loss 1.7778 LearningRate 0.0081 Epoch: 16 Global Step: 175100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:38:05,579-Speed 5401.59 samples/sec Loss 1.7747 LearningRate 0.0080 Epoch: 16 Global Step: 175110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:38:13,169-Speed 5396.87 samples/sec Loss 1.7409 LearningRate 0.0080 Epoch: 16 Global Step: 175120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:38:20,743-Speed 5408.72 samples/sec Loss 1.7836 LearningRate 0.0080 Epoch: 16 Global Step: 175130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:38:28,291-Speed 5427.53 samples/sec Loss 1.7648 LearningRate 0.0080 Epoch: 16 Global Step: 175140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:38:35,780-Speed 5469.75 samples/sec Loss 1.7626 LearningRate 0.0080 Epoch: 16 Global Step: 175150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:38:43,420-Speed 5362.21 samples/sec Loss 1.7577 LearningRate 0.0080 Epoch: 16 Global Step: 175160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:38:51,073-Speed 5353.04 samples/sec Loss 1.7710 LearningRate 0.0080 Epoch: 16 Global Step: 175170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:38:58,621-Speed 5427.26 samples/sec Loss 1.7790 LearningRate 0.0080 Epoch: 16 Global Step: 175180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:39:06,117-Speed 5464.75 samples/sec Loss 1.7853 LearningRate 0.0080 Epoch: 16 Global Step: 175190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:39:13,764-Speed 5356.94 samples/sec Loss 1.7567 LearningRate 0.0080 Epoch: 16 Global Step: 175200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:39:21,306-Speed 5432.04 samples/sec Loss 1.7517 LearningRate 0.0080 Epoch: 16 Global Step: 175210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:39:28,883-Speed 5406.38 samples/sec Loss 1.7440 LearningRate 0.0080 Epoch: 16 Global Step: 175220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:39:36,414-Speed 5439.43 samples/sec Loss 1.7744 LearningRate 0.0080 Epoch: 16 Global Step: 175230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:39:44,009-Speed 5394.00 samples/sec Loss 1.7538 LearningRate 0.0080 Epoch: 16 Global Step: 175240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:39:51,642-Speed 5366.48 samples/sec Loss 1.7324 LearningRate 0.0080 Epoch: 16 Global Step: 175250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:39:59,149-Speed 5457.43 samples/sec Loss 1.7445 LearningRate 0.0080 Epoch: 16 Global Step: 175260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:40:06,878-Speed 5300.01 samples/sec Loss 1.7490 LearningRate 0.0080 Epoch: 16 Global Step: 175270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:40:14,486-Speed 5384.70 samples/sec Loss 1.7765 LearningRate 0.0080 Epoch: 16 Global Step: 175280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:40:22,056-Speed 5411.67 samples/sec Loss 1.7673 LearningRate 0.0080 Epoch: 16 Global Step: 175290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:40:29,599-Speed 5430.78 samples/sec Loss 1.7582 LearningRate 0.0080 Epoch: 16 Global Step: 175300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:40:37,079-Speed 5476.78 samples/sec Loss 1.7696 LearningRate 0.0079 Epoch: 16 Global Step: 175310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:40:44,651-Speed 5410.04 samples/sec Loss 1.7747 LearningRate 0.0079 Epoch: 16 Global Step: 175320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:40:52,129-Speed 5477.80 samples/sec Loss 1.7622 LearningRate 0.0079 Epoch: 16 Global Step: 175330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:40:59,617-Speed 5470.97 samples/sec Loss 1.7237 LearningRate 0.0079 Epoch: 16 Global Step: 175340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:41:07,220-Speed 5388.22 samples/sec Loss 1.7293 LearningRate 0.0079 Epoch: 16 Global Step: 175350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:41:14,843-Speed 5373.84 samples/sec Loss 1.7405 LearningRate 0.0079 Epoch: 16 Global Step: 175360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:41:22,607-Speed 5275.59 samples/sec Loss 1.7545 LearningRate 0.0079 Epoch: 16 Global Step: 175370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:41:30,131-Speed 5445.51 samples/sec Loss 1.7367 LearningRate 0.0079 Epoch: 16 Global Step: 175380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:41:37,645-Speed 5451.74 samples/sec Loss 1.7698 LearningRate 0.0079 Epoch: 16 Global Step: 175390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:41:45,193-Speed 5426.76 samples/sec Loss 1.7594 LearningRate 0.0079 Epoch: 16 Global Step: 175400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:41:52,799-Speed 5385.64 samples/sec Loss 1.7495 LearningRate 0.0079 Epoch: 16 Global Step: 175410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:42:00,283-Speed 5474.04 samples/sec Loss 1.7428 LearningRate 0.0079 Epoch: 16 Global Step: 175420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:42:07,758-Speed 5480.53 samples/sec Loss 1.7392 LearningRate 0.0079 Epoch: 16 Global Step: 175430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:42:15,266-Speed 5456.05 samples/sec Loss 1.7458 LearningRate 0.0079 Epoch: 16 Global Step: 175440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:42:22,819-Speed 5423.36 samples/sec Loss 1.7591 LearningRate 0.0079 Epoch: 16 Global Step: 175450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:42:30,397-Speed 5406.55 samples/sec Loss 1.7204 LearningRate 0.0079 Epoch: 16 Global Step: 175460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:42:37,918-Speed 5446.85 samples/sec Loss 1.7431 LearningRate 0.0079 Epoch: 16 Global Step: 175470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:42:45,358-Speed 5505.28 samples/sec Loss 1.7229 LearningRate 0.0079 Epoch: 16 Global Step: 175480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:42:52,874-Speed 5450.08 samples/sec Loss 1.7426 LearningRate 0.0079 Epoch: 16 Global Step: 175490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:43:00,503-Speed 5370.18 samples/sec Loss 1.7385 LearningRate 0.0079 Epoch: 16 Global Step: 175500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:43:08,038-Speed 5436.82 samples/sec Loss 1.7575 LearningRate 0.0079 Epoch: 16 Global Step: 175510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 10:43:15,585-Speed 5428.16 samples/sec Loss 1.7260 LearningRate 0.0078 Epoch: 16 Global Step: 175520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-09 10:43:23,202-Speed 5377.46 samples/sec Loss 1.7582 LearningRate 0.0078 Epoch: 16 Global Step: 175530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:43:30,679-Speed 5479.05 samples/sec Loss 1.7314 LearningRate 0.0078 Epoch: 16 Global Step: 175540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:43:38,158-Speed 5477.70 samples/sec Loss 1.7331 LearningRate 0.0078 Epoch: 16 Global Step: 175550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:43:45,669-Speed 5453.89 samples/sec Loss 1.7525 LearningRate 0.0078 Epoch: 16 Global Step: 175560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:43:53,238-Speed 5411.60 samples/sec Loss 1.7275 LearningRate 0.0078 Epoch: 16 Global Step: 175570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:44:00,744-Speed 5458.31 samples/sec Loss 1.7451 LearningRate 0.0078 Epoch: 16 Global Step: 175580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:44:08,249-Speed 5458.73 samples/sec Loss 1.7328 LearningRate 0.0078 Epoch: 16 Global Step: 175590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:44:15,829-Speed 5403.86 samples/sec Loss 1.7648 LearningRate 0.0078 Epoch: 16 Global Step: 175600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:44:23,298-Speed 5484.45 samples/sec Loss 1.7473 LearningRate 0.0078 Epoch: 16 Global Step: 175610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:44:30,789-Speed 5469.10 samples/sec Loss 1.7303 LearningRate 0.0078 Epoch: 16 Global Step: 175620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:44:38,356-Speed 5413.52 samples/sec Loss 1.7624 LearningRate 0.0078 Epoch: 16 Global Step: 175630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:44:45,907-Speed 5424.70 samples/sec Loss 1.7483 LearningRate 0.0078 Epoch: 16 Global Step: 175640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:44:53,421-Speed 5452.00 samples/sec Loss 1.7422 LearningRate 0.0078 Epoch: 16 Global Step: 175650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:45:00,923-Speed 5460.75 samples/sec Loss 1.7495 LearningRate 0.0078 Epoch: 16 Global Step: 175660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:45:08,510-Speed 5399.84 samples/sec Loss 1.7492 LearningRate 0.0078 Epoch: 16 Global Step: 175670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:45:16,071-Speed 5417.14 samples/sec Loss 1.7595 LearningRate 0.0078 Epoch: 16 Global Step: 175680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:45:23,602-Speed 5439.48 samples/sec Loss 1.7466 LearningRate 0.0078 Epoch: 16 Global Step: 175690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:45:31,152-Speed 5426.10 samples/sec Loss 1.7349 LearningRate 0.0078 Epoch: 16 Global Step: 175700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:45:38,664-Speed 5453.81 samples/sec Loss 1.7097 LearningRate 0.0078 Epoch: 16 Global Step: 175710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:45:46,122-Speed 5492.59 samples/sec Loss 1.7477 LearningRate 0.0077 Epoch: 16 Global Step: 175720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:45:53,734-Speed 5382.03 samples/sec Loss 1.7384 LearningRate 0.0077 Epoch: 16 Global Step: 175730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:46:01,283-Speed 5426.01 samples/sec Loss 1.7176 LearningRate 0.0077 Epoch: 16 Global Step: 175740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:46:08,969-Speed 5330.59 samples/sec Loss 1.7406 LearningRate 0.0077 Epoch: 16 Global Step: 175750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:46:16,524-Speed 5422.88 samples/sec Loss 1.7097 LearningRate 0.0077 Epoch: 16 Global Step: 175760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:46:24,100-Speed 5406.39 samples/sec Loss 1.7278 LearningRate 0.0077 Epoch: 16 Global Step: 175770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:46:31,535-Speed 5509.87 samples/sec Loss 1.7547 LearningRate 0.0077 Epoch: 16 Global Step: 175780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:46:39,086-Speed 5426.10 samples/sec Loss 1.7384 LearningRate 0.0077 Epoch: 16 Global Step: 175790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:46:46,666-Speed 5403.93 samples/sec Loss 1.7453 LearningRate 0.0077 Epoch: 16 Global Step: 175800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:46:54,154-Speed 5470.95 samples/sec Loss 1.7260 LearningRate 0.0077 Epoch: 16 Global Step: 175810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:47:01,671-Speed 5449.33 samples/sec Loss 1.7370 LearningRate 0.0077 Epoch: 16 Global Step: 175820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:47:09,233-Speed 5417.94 samples/sec Loss 1.7279 LearningRate 0.0077 Epoch: 16 Global Step: 175830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:47:16,668-Speed 5509.74 samples/sec Loss 1.7560 LearningRate 0.0077 Epoch: 16 Global Step: 175840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:47:24,146-Speed 5477.82 samples/sec Loss 1.7088 LearningRate 0.0077 Epoch: 16 Global Step: 175850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:47:31,610-Speed 5488.35 samples/sec Loss 1.7078 LearningRate 0.0077 Epoch: 16 Global Step: 175860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:47:39,075-Speed 5488.43 samples/sec Loss 1.6979 LearningRate 0.0077 Epoch: 16 Global Step: 175870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:47:46,645-Speed 5411.77 samples/sec Loss 1.7620 LearningRate 0.0077 Epoch: 16 Global Step: 175880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:47:54,079-Speed 5510.14 samples/sec Loss 1.6937 LearningRate 0.0077 Epoch: 16 Global Step: 175890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:48:01,691-Speed 5381.33 samples/sec Loss 1.7746 LearningRate 0.0077 Epoch: 16 Global Step: 175900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:48:09,183-Speed 5468.38 samples/sec Loss 1.7078 LearningRate 0.0077 Epoch: 16 Global Step: 175910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:48:16,617-Speed 5511.05 samples/sec Loss 1.7296 LearningRate 0.0076 Epoch: 16 Global Step: 175920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:48:24,144-Speed 5441.99 samples/sec Loss 1.7379 LearningRate 0.0076 Epoch: 16 Global Step: 175930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:48:31,678-Speed 5437.12 samples/sec Loss 1.7263 LearningRate 0.0076 Epoch: 16 Global Step: 175940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:48:39,194-Speed 5451.13 samples/sec Loss 1.7214 LearningRate 0.0076 Epoch: 16 Global Step: 175950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:48:46,739-Speed 5429.67 samples/sec Loss 1.7189 LearningRate 0.0076 Epoch: 16 Global Step: 175960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:48:54,333-Speed 5393.84 samples/sec Loss 1.7133 LearningRate 0.0076 Epoch: 16 Global Step: 175970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:49:01,795-Speed 5490.00 samples/sec Loss 1.7091 LearningRate 0.0076 Epoch: 16 Global Step: 175980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:49:09,292-Speed 5464.40 samples/sec Loss 1.7324 LearningRate 0.0076 Epoch: 16 Global Step: 175990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:49:16,805-Speed 5453.21 samples/sec Loss 1.7108 LearningRate 0.0076 Epoch: 16 Global Step: 176000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:50:00,680-[lfw][176000]XNorm: 23.440188 Training: 2022-01-09 10:50:00,681-[lfw][176000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-01-09 10:50:00,681-[lfw][176000]Accuracy-Highest: 0.99833 Training: 2022-01-09 10:50:51,874-[cfp_fp][176000]XNorm: 22.570946 Training: 2022-01-09 10:50:51,874-[cfp_fp][176000]Accuracy-Flip: 0.99300+-0.00431 Training: 2022-01-09 10:50:51,875-[cfp_fp][176000]Accuracy-Highest: 0.99371 Training: 2022-01-09 10:51:35,833-[agedb_30][176000]XNorm: 23.694088 Training: 2022-01-09 10:51:35,834-[agedb_30][176000]Accuracy-Flip: 0.98333+-0.00654 Training: 2022-01-09 10:51:35,834-[agedb_30][176000]Accuracy-Highest: 0.98433 Training: 2022-01-09 10:51:43,414-Speed 279.38 samples/sec Loss 1.7228 LearningRate 0.0076 Epoch: 16 Global Step: 176010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:51:50,879-Speed 5487.53 samples/sec Loss 1.7199 LearningRate 0.0076 Epoch: 16 Global Step: 176020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:51:58,384-Speed 5458.07 samples/sec Loss 1.7259 LearningRate 0.0076 Epoch: 16 Global Step: 176030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:52:05,958-Speed 5409.03 samples/sec Loss 1.7140 LearningRate 0.0076 Epoch: 16 Global Step: 176040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:52:13,453-Speed 5465.85 samples/sec Loss 1.7026 LearningRate 0.0076 Epoch: 16 Global Step: 176050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:52:21,004-Speed 5425.14 samples/sec Loss 1.6922 LearningRate 0.0076 Epoch: 16 Global Step: 176060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:52:28,564-Speed 5418.18 samples/sec Loss 1.7109 LearningRate 0.0076 Epoch: 16 Global Step: 176070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:52:36,064-Speed 5462.65 samples/sec Loss 1.7116 LearningRate 0.0076 Epoch: 16 Global Step: 176080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:52:43,590-Speed 5443.09 samples/sec Loss 1.7314 LearningRate 0.0076 Epoch: 16 Global Step: 176090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:52:51,263-Speed 5339.07 samples/sec Loss 1.7416 LearningRate 0.0076 Epoch: 16 Global Step: 176100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:52:58,783-Speed 5447.28 samples/sec Loss 1.7159 LearningRate 0.0076 Epoch: 16 Global Step: 176110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:53:06,241-Speed 5492.29 samples/sec Loss 1.7595 LearningRate 0.0076 Epoch: 16 Global Step: 176120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:53:13,801-Speed 5419.08 samples/sec Loss 1.7343 LearningRate 0.0075 Epoch: 16 Global Step: 176130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:53:21,327-Speed 5443.13 samples/sec Loss 1.6993 LearningRate 0.0075 Epoch: 16 Global Step: 176140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:53:28,881-Speed 5422.83 samples/sec Loss 1.7546 LearningRate 0.0075 Epoch: 16 Global Step: 176150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:53:36,401-Speed 5447.63 samples/sec Loss 1.7170 LearningRate 0.0075 Epoch: 16 Global Step: 176160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:53:44,057-Speed 5350.60 samples/sec Loss 1.7430 LearningRate 0.0075 Epoch: 16 Global Step: 176170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:53:51,587-Speed 5440.71 samples/sec Loss 1.7393 LearningRate 0.0075 Epoch: 16 Global Step: 176180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:53:59,157-Speed 5411.54 samples/sec Loss 1.7388 LearningRate 0.0075 Epoch: 16 Global Step: 176190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:54:06,669-Speed 5452.84 samples/sec Loss 1.7148 LearningRate 0.0075 Epoch: 16 Global Step: 176200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:54:14,209-Speed 5433.36 samples/sec Loss 1.6912 LearningRate 0.0075 Epoch: 16 Global Step: 176210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:54:21,747-Speed 5434.24 samples/sec Loss 1.7338 LearningRate 0.0075 Epoch: 16 Global Step: 176220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:54:29,261-Speed 5451.78 samples/sec Loss 1.7174 LearningRate 0.0075 Epoch: 16 Global Step: 176230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:54:36,872-Speed 5382.37 samples/sec Loss 1.7074 LearningRate 0.0075 Epoch: 16 Global Step: 176240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:54:44,428-Speed 5421.74 samples/sec Loss 1.7047 LearningRate 0.0075 Epoch: 16 Global Step: 176250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:54:52,009-Speed 5403.58 samples/sec Loss 1.7274 LearningRate 0.0075 Epoch: 16 Global Step: 176260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 10:54:59,404-Speed 5539.26 samples/sec Loss 1.7173 LearningRate 0.0075 Epoch: 16 Global Step: 176270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:55:06,873-Speed 5484.77 samples/sec Loss 1.7401 LearningRate 0.0075 Epoch: 16 Global Step: 176280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:55:29,458-Speed 1813.71 samples/sec Loss 1.6794 LearningRate 0.0075 Epoch: 17 Global Step: 176290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:55:36,903-Speed 5502.31 samples/sec Loss 1.7063 LearningRate 0.0075 Epoch: 17 Global Step: 176300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:55:44,371-Speed 5485.58 samples/sec Loss 1.7185 LearningRate 0.0075 Epoch: 17 Global Step: 176310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:55:51,872-Speed 5461.52 samples/sec Loss 1.7203 LearningRate 0.0075 Epoch: 17 Global Step: 176320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:55:59,340-Speed 5484.91 samples/sec Loss 1.6664 LearningRate 0.0075 Epoch: 17 Global Step: 176330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:56:06,860-Speed 5447.82 samples/sec Loss 1.6837 LearningRate 0.0074 Epoch: 17 Global Step: 176340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:56:14,287-Speed 5515.75 samples/sec Loss 1.6731 LearningRate 0.0074 Epoch: 17 Global Step: 176350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:56:21,688-Speed 5535.30 samples/sec Loss 1.7235 LearningRate 0.0074 Epoch: 17 Global Step: 176360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:56:29,138-Speed 5497.83 samples/sec Loss 1.6918 LearningRate 0.0074 Epoch: 17 Global Step: 176370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:56:36,700-Speed 5418.18 samples/sec Loss 1.7223 LearningRate 0.0074 Epoch: 17 Global Step: 176380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:56:44,159-Speed 5492.25 samples/sec Loss 1.6944 LearningRate 0.0074 Epoch: 17 Global Step: 176390 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:56:51,612-Speed 5496.81 samples/sec Loss 1.7073 LearningRate 0.0074 Epoch: 17 Global Step: 176400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:56:59,038-Speed 5515.85 samples/sec Loss 1.7267 LearningRate 0.0074 Epoch: 17 Global Step: 176410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:57:06,523-Speed 5473.28 samples/sec Loss 1.6989 LearningRate 0.0074 Epoch: 17 Global Step: 176420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:57:14,118-Speed 5393.86 samples/sec Loss 1.6944 LearningRate 0.0074 Epoch: 17 Global Step: 176430 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:57:21,834-Speed 5309.14 samples/sec Loss 1.6814 LearningRate 0.0074 Epoch: 17 Global Step: 176440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:57:29,442-Speed 5383.96 samples/sec Loss 1.6816 LearningRate 0.0074 Epoch: 17 Global Step: 176450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:57:37,075-Speed 5366.95 samples/sec Loss 1.6841 LearningRate 0.0074 Epoch: 17 Global Step: 176460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:57:44,753-Speed 5335.24 samples/sec Loss 1.6780 LearningRate 0.0074 Epoch: 17 Global Step: 176470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:57:52,390-Speed 5364.37 samples/sec Loss 1.6845 LearningRate 0.0074 Epoch: 17 Global Step: 176480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 10:58:00,057-Speed 5343.07 samples/sec Loss 1.6782 LearningRate 0.0074 Epoch: 17 Global Step: 176490 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:58:07,752-Speed 5323.41 samples/sec Loss 1.7011 LearningRate 0.0074 Epoch: 17 Global Step: 176500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:58:15,491-Speed 5293.98 samples/sec Loss 1.7188 LearningRate 0.0074 Epoch: 17 Global Step: 176510 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:58:23,159-Speed 5342.32 samples/sec Loss 1.6771 LearningRate 0.0074 Epoch: 17 Global Step: 176520 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:58:30,791-Speed 5367.42 samples/sec Loss 1.7081 LearningRate 0.0074 Epoch: 17 Global Step: 176530 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:58:38,451-Speed 5347.87 samples/sec Loss 1.6765 LearningRate 0.0074 Epoch: 17 Global Step: 176540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:58:45,905-Speed 5495.68 samples/sec Loss 1.7187 LearningRate 0.0073 Epoch: 17 Global Step: 176550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:58:53,545-Speed 5361.92 samples/sec Loss 1.6947 LearningRate 0.0073 Epoch: 17 Global Step: 176560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:59:01,049-Speed 5459.44 samples/sec Loss 1.6800 LearningRate 0.0073 Epoch: 17 Global Step: 176570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:59:08,553-Speed 5459.20 samples/sec Loss 1.6850 LearningRate 0.0073 Epoch: 17 Global Step: 176580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:59:16,093-Speed 5433.15 samples/sec Loss 1.6811 LearningRate 0.0073 Epoch: 17 Global Step: 176590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:59:23,905-Speed 5243.68 samples/sec Loss 1.7103 LearningRate 0.0073 Epoch: 17 Global Step: 176600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:59:31,360-Speed 5495.37 samples/sec Loss 1.6766 LearningRate 0.0073 Epoch: 17 Global Step: 176610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:59:38,806-Speed 5501.73 samples/sec Loss 1.6957 LearningRate 0.0073 Epoch: 17 Global Step: 176620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:59:46,312-Speed 5458.12 samples/sec Loss 1.7022 LearningRate 0.0073 Epoch: 17 Global Step: 176630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 10:59:53,789-Speed 5478.40 samples/sec Loss 1.6931 LearningRate 0.0073 Epoch: 17 Global Step: 176640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:00:01,206-Speed 5523.20 samples/sec Loss 1.6976 LearningRate 0.0073 Epoch: 17 Global Step: 176650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:00:08,722-Speed 5451.06 samples/sec Loss 1.6719 LearningRate 0.0073 Epoch: 17 Global Step: 176660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:00:16,291-Speed 5412.10 samples/sec Loss 1.6387 LearningRate 0.0073 Epoch: 17 Global Step: 176670 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:00:23,728-Speed 5508.56 samples/sec Loss 1.6692 LearningRate 0.0073 Epoch: 17 Global Step: 176680 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:00:31,168-Speed 5506.07 samples/sec Loss 1.6860 LearningRate 0.0073 Epoch: 17 Global Step: 176690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:00:38,759-Speed 5396.65 samples/sec Loss 1.6830 LearningRate 0.0073 Epoch: 17 Global Step: 176700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:00:46,179-Speed 5521.37 samples/sec Loss 1.6853 LearningRate 0.0073 Epoch: 17 Global Step: 176710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:00:53,613-Speed 5510.18 samples/sec Loss 1.6625 LearningRate 0.0073 Epoch: 17 Global Step: 176720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:01:01,047-Speed 5510.66 samples/sec Loss 1.6875 LearningRate 0.0073 Epoch: 17 Global Step: 176730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:01:08,557-Speed 5454.75 samples/sec Loss 1.6758 LearningRate 0.0073 Epoch: 17 Global Step: 176740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:01:16,063-Speed 5457.89 samples/sec Loss 1.6588 LearningRate 0.0073 Epoch: 17 Global Step: 176750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:01:23,592-Speed 5440.85 samples/sec Loss 1.6753 LearningRate 0.0072 Epoch: 17 Global Step: 176760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:01:31,146-Speed 5423.20 samples/sec Loss 1.6675 LearningRate 0.0072 Epoch: 17 Global Step: 176770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:01:38,798-Speed 5353.13 samples/sec Loss 1.6676 LearningRate 0.0072 Epoch: 17 Global Step: 176780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:01:46,334-Speed 5436.85 samples/sec Loss 1.6746 LearningRate 0.0072 Epoch: 17 Global Step: 176790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:01:53,796-Speed 5489.59 samples/sec Loss 1.6809 LearningRate 0.0072 Epoch: 17 Global Step: 176800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:02:01,290-Speed 5466.30 samples/sec Loss 1.6819 LearningRate 0.0072 Epoch: 17 Global Step: 176810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:02:08,869-Speed 5404.83 samples/sec Loss 1.6741 LearningRate 0.0072 Epoch: 17 Global Step: 176820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:02:16,393-Speed 5445.01 samples/sec Loss 1.6773 LearningRate 0.0072 Epoch: 17 Global Step: 176830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:02:23,923-Speed 5440.30 samples/sec Loss 1.6913 LearningRate 0.0072 Epoch: 17 Global Step: 176840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:02:31,424-Speed 5460.66 samples/sec Loss 1.6893 LearningRate 0.0072 Epoch: 17 Global Step: 176850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:02:38,955-Speed 5440.12 samples/sec Loss 1.6857 LearningRate 0.0072 Epoch: 17 Global Step: 176860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:02:46,483-Speed 5441.38 samples/sec Loss 1.6892 LearningRate 0.0072 Epoch: 17 Global Step: 176870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:02:54,004-Speed 5447.12 samples/sec Loss 1.6781 LearningRate 0.0072 Epoch: 17 Global Step: 176880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:03:01,557-Speed 5423.53 samples/sec Loss 1.7021 LearningRate 0.0072 Epoch: 17 Global Step: 176890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:03:09,073-Speed 5450.44 samples/sec Loss 1.6576 LearningRate 0.0072 Epoch: 17 Global Step: 176900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:03:16,544-Speed 5483.29 samples/sec Loss 1.6515 LearningRate 0.0072 Epoch: 17 Global Step: 176910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:03:24,001-Speed 5493.53 samples/sec Loss 1.7008 LearningRate 0.0072 Epoch: 17 Global Step: 176920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:03:31,407-Speed 5531.42 samples/sec Loss 1.6975 LearningRate 0.0072 Epoch: 17 Global Step: 176930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:03:38,865-Speed 5492.90 samples/sec Loss 1.6776 LearningRate 0.0072 Epoch: 17 Global Step: 176940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:03:46,308-Speed 5504.25 samples/sec Loss 1.6767 LearningRate 0.0072 Epoch: 17 Global Step: 176950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:03:53,794-Speed 5472.06 samples/sec Loss 1.6847 LearningRate 0.0072 Epoch: 17 Global Step: 176960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:04:01,303-Speed 5455.18 samples/sec Loss 1.6773 LearningRate 0.0071 Epoch: 17 Global Step: 176970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:04:08,788-Speed 5473.22 samples/sec Loss 1.6711 LearningRate 0.0071 Epoch: 17 Global Step: 176980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:04:16,318-Speed 5440.25 samples/sec Loss 1.6396 LearningRate 0.0071 Epoch: 17 Global Step: 176990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:04:23,965-Speed 5357.31 samples/sec Loss 1.6933 LearningRate 0.0071 Epoch: 17 Global Step: 177000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:04:31,473-Speed 5456.10 samples/sec Loss 1.6905 LearningRate 0.0071 Epoch: 17 Global Step: 177010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:04:39,051-Speed 5405.67 samples/sec Loss 1.6321 LearningRate 0.0071 Epoch: 17 Global Step: 177020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:04:46,546-Speed 5465.95 samples/sec Loss 1.6572 LearningRate 0.0071 Epoch: 17 Global Step: 177030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:04:54,075-Speed 5440.93 samples/sec Loss 1.6874 LearningRate 0.0071 Epoch: 17 Global Step: 177040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:05:01,544-Speed 5484.88 samples/sec Loss 1.6736 LearningRate 0.0071 Epoch: 17 Global Step: 177050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:05:09,116-Speed 5410.14 samples/sec Loss 1.6880 LearningRate 0.0071 Epoch: 17 Global Step: 177060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:05:16,550-Speed 5510.19 samples/sec Loss 1.6806 LearningRate 0.0071 Epoch: 17 Global Step: 177070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:05:24,047-Speed 5464.63 samples/sec Loss 1.6653 LearningRate 0.0071 Epoch: 17 Global Step: 177080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:05:31,519-Speed 5482.22 samples/sec Loss 1.6685 LearningRate 0.0071 Epoch: 17 Global Step: 177090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:05:39,195-Speed 5337.03 samples/sec Loss 1.6729 LearningRate 0.0071 Epoch: 17 Global Step: 177100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:05:46,761-Speed 5414.04 samples/sec Loss 1.6328 LearningRate 0.0071 Epoch: 17 Global Step: 177110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:05:54,266-Speed 5459.24 samples/sec Loss 1.6821 LearningRate 0.0071 Epoch: 17 Global Step: 177120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:06:01,808-Speed 5431.41 samples/sec Loss 1.6950 LearningRate 0.0071 Epoch: 17 Global Step: 177130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:06:09,292-Speed 5473.09 samples/sec Loss 1.6667 LearningRate 0.0071 Epoch: 17 Global Step: 177140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:06:16,764-Speed 5482.86 samples/sec Loss 1.6841 LearningRate 0.0071 Epoch: 17 Global Step: 177150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:06:24,232-Speed 5485.51 samples/sec Loss 1.7159 LearningRate 0.0071 Epoch: 17 Global Step: 177160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:06:31,788-Speed 5421.57 samples/sec Loss 1.6452 LearningRate 0.0071 Epoch: 17 Global Step: 177170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:06:39,253-Speed 5487.65 samples/sec Loss 1.6526 LearningRate 0.0070 Epoch: 17 Global Step: 177180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:06:46,753-Speed 5461.91 samples/sec Loss 1.6796 LearningRate 0.0070 Epoch: 17 Global Step: 177190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:06:54,272-Speed 5448.78 samples/sec Loss 1.7040 LearningRate 0.0070 Epoch: 17 Global Step: 177200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:07:01,839-Speed 5413.32 samples/sec Loss 1.6607 LearningRate 0.0070 Epoch: 17 Global Step: 177210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:07:09,390-Speed 5424.43 samples/sec Loss 1.6687 LearningRate 0.0070 Epoch: 17 Global Step: 177220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:07:17,101-Speed 5313.05 samples/sec Loss 1.6580 LearningRate 0.0070 Epoch: 17 Global Step: 177230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:07:24,622-Speed 5446.62 samples/sec Loss 1.6666 LearningRate 0.0070 Epoch: 17 Global Step: 177240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:07:32,170-Speed 5427.26 samples/sec Loss 1.6466 LearningRate 0.0070 Epoch: 17 Global Step: 177250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:07:39,662-Speed 5467.83 samples/sec Loss 1.6401 LearningRate 0.0070 Epoch: 17 Global Step: 177260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:07:47,094-Speed 5512.23 samples/sec Loss 1.6538 LearningRate 0.0070 Epoch: 17 Global Step: 177270 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:07:54,634-Speed 5433.14 samples/sec Loss 1.6925 LearningRate 0.0070 Epoch: 17 Global Step: 177280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:08:02,092-Speed 5493.26 samples/sec Loss 1.6415 LearningRate 0.0070 Epoch: 17 Global Step: 177290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:08:09,577-Speed 5473.27 samples/sec Loss 1.6502 LearningRate 0.0070 Epoch: 17 Global Step: 177300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:08:17,154-Speed 5405.99 samples/sec Loss 1.6834 LearningRate 0.0070 Epoch: 17 Global Step: 177310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:08:24,681-Speed 5442.43 samples/sec Loss 1.6541 LearningRate 0.0070 Epoch: 17 Global Step: 177320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:08:32,240-Speed 5419.30 samples/sec Loss 1.6439 LearningRate 0.0070 Epoch: 17 Global Step: 177330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:08:39,790-Speed 5426.30 samples/sec Loss 1.6646 LearningRate 0.0070 Epoch: 17 Global Step: 177340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:08:47,311-Speed 5446.53 samples/sec Loss 1.6756 LearningRate 0.0070 Epoch: 17 Global Step: 177350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:08:54,827-Speed 5450.35 samples/sec Loss 1.6671 LearningRate 0.0070 Epoch: 17 Global Step: 177360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:09:02,316-Speed 5470.46 samples/sec Loss 1.6523 LearningRate 0.0070 Epoch: 17 Global Step: 177370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:09:09,793-Speed 5478.45 samples/sec Loss 1.6384 LearningRate 0.0070 Epoch: 17 Global Step: 177380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:09:17,398-Speed 5386.66 samples/sec Loss 1.6640 LearningRate 0.0070 Epoch: 17 Global Step: 177390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:09:24,871-Speed 5481.73 samples/sec Loss 1.6914 LearningRate 0.0069 Epoch: 17 Global Step: 177400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:09:32,351-Speed 5477.31 samples/sec Loss 1.6777 LearningRate 0.0069 Epoch: 17 Global Step: 177410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:09:39,858-Speed 5456.22 samples/sec Loss 1.6639 LearningRate 0.0069 Epoch: 17 Global Step: 177420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:09:47,358-Speed 5462.07 samples/sec Loss 1.6511 LearningRate 0.0069 Epoch: 17 Global Step: 177430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:09:54,914-Speed 5422.23 samples/sec Loss 1.6511 LearningRate 0.0069 Epoch: 17 Global Step: 177440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:10:02,450-Speed 5435.91 samples/sec Loss 1.6157 LearningRate 0.0069 Epoch: 17 Global Step: 177450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:10:09,994-Speed 5429.99 samples/sec Loss 1.6401 LearningRate 0.0069 Epoch: 17 Global Step: 177460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:10:17,551-Speed 5420.78 samples/sec Loss 1.6647 LearningRate 0.0069 Epoch: 17 Global Step: 177470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:10:25,064-Speed 5452.87 samples/sec Loss 1.6731 LearningRate 0.0069 Epoch: 17 Global Step: 177480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:10:32,654-Speed 5397.37 samples/sec Loss 1.6722 LearningRate 0.0069 Epoch: 17 Global Step: 177490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:10:40,240-Speed 5399.53 samples/sec Loss 1.6641 LearningRate 0.0069 Epoch: 17 Global Step: 177500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:10:47,748-Speed 5456.07 samples/sec Loss 1.6469 LearningRate 0.0069 Epoch: 17 Global Step: 177510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:10:55,231-Speed 5474.84 samples/sec Loss 1.6571 LearningRate 0.0069 Epoch: 17 Global Step: 177520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:11:02,848-Speed 5378.28 samples/sec Loss 1.6670 LearningRate 0.0069 Epoch: 17 Global Step: 177530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:11:10,319-Speed 5483.22 samples/sec Loss 1.6778 LearningRate 0.0069 Epoch: 17 Global Step: 177540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:11:17,818-Speed 5462.64 samples/sec Loss 1.6421 LearningRate 0.0069 Epoch: 17 Global Step: 177550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:11:25,448-Speed 5369.35 samples/sec Loss 1.6200 LearningRate 0.0069 Epoch: 17 Global Step: 177560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:11:33,013-Speed 5414.89 samples/sec Loss 1.6395 LearningRate 0.0069 Epoch: 17 Global Step: 177570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:11:40,588-Speed 5407.27 samples/sec Loss 1.6032 LearningRate 0.0069 Epoch: 17 Global Step: 177580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:11:48,131-Speed 5431.37 samples/sec Loss 1.6499 LearningRate 0.0069 Epoch: 17 Global Step: 177590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:11:55,656-Speed 5443.79 samples/sec Loss 1.6280 LearningRate 0.0069 Epoch: 17 Global Step: 177600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:12:03,237-Speed 5403.58 samples/sec Loss 1.6454 LearningRate 0.0069 Epoch: 17 Global Step: 177610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:12:10,751-Speed 5451.95 samples/sec Loss 1.6326 LearningRate 0.0068 Epoch: 17 Global Step: 177620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:12:18,315-Speed 5416.13 samples/sec Loss 1.6377 LearningRate 0.0068 Epoch: 17 Global Step: 177630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:12:25,981-Speed 5343.55 samples/sec Loss 1.6358 LearningRate 0.0068 Epoch: 17 Global Step: 177640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:12:33,588-Speed 5385.30 samples/sec Loss 1.6462 LearningRate 0.0068 Epoch: 17 Global Step: 177650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:12:41,148-Speed 5419.14 samples/sec Loss 1.6300 LearningRate 0.0068 Epoch: 17 Global Step: 177660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:12:48,635-Speed 5471.58 samples/sec Loss 1.6598 LearningRate 0.0068 Epoch: 17 Global Step: 177670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:12:56,162-Speed 5442.75 samples/sec Loss 1.6171 LearningRate 0.0068 Epoch: 17 Global Step: 177680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:13:03,758-Speed 5392.85 samples/sec Loss 1.6185 LearningRate 0.0068 Epoch: 17 Global Step: 177690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:13:11,283-Speed 5443.92 samples/sec Loss 1.6433 LearningRate 0.0068 Epoch: 17 Global Step: 177700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:13:18,778-Speed 5465.38 samples/sec Loss 1.6013 LearningRate 0.0068 Epoch: 17 Global Step: 177710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:13:26,238-Speed 5491.79 samples/sec Loss 1.5836 LearningRate 0.0068 Epoch: 17 Global Step: 177720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:13:33,804-Speed 5413.90 samples/sec Loss 1.6510 LearningRate 0.0068 Epoch: 17 Global Step: 177730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:13:41,367-Speed 5417.05 samples/sec Loss 1.6181 LearningRate 0.0068 Epoch: 17 Global Step: 177740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:13:48,993-Speed 5371.15 samples/sec Loss 1.6312 LearningRate 0.0068 Epoch: 17 Global Step: 177750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:13:56,587-Speed 5394.77 samples/sec Loss 1.6235 LearningRate 0.0068 Epoch: 17 Global Step: 177760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:14:04,130-Speed 5431.14 samples/sec Loss 1.6257 LearningRate 0.0068 Epoch: 17 Global Step: 177770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:14:11,618-Speed 5470.23 samples/sec Loss 1.6510 LearningRate 0.0068 Epoch: 17 Global Step: 177780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:14:19,045-Speed 5515.48 samples/sec Loss 1.6430 LearningRate 0.0068 Epoch: 17 Global Step: 177790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:14:26,603-Speed 5420.56 samples/sec Loss 1.6329 LearningRate 0.0068 Epoch: 17 Global Step: 177800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:14:34,184-Speed 5404.31 samples/sec Loss 1.6450 LearningRate 0.0068 Epoch: 17 Global Step: 177810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:14:41,823-Speed 5361.88 samples/sec Loss 1.6353 LearningRate 0.0068 Epoch: 17 Global Step: 177820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:14:49,351-Speed 5441.48 samples/sec Loss 1.6333 LearningRate 0.0067 Epoch: 17 Global Step: 177830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:14:57,056-Speed 5317.56 samples/sec Loss 1.6398 LearningRate 0.0067 Epoch: 17 Global Step: 177840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:15:04,673-Speed 5377.66 samples/sec Loss 1.6243 LearningRate 0.0067 Epoch: 17 Global Step: 177850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:15:12,169-Speed 5464.86 samples/sec Loss 1.6318 LearningRate 0.0067 Epoch: 17 Global Step: 177860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:15:19,687-Speed 5449.00 samples/sec Loss 1.6524 LearningRate 0.0067 Epoch: 17 Global Step: 177870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:15:27,215-Speed 5441.68 samples/sec Loss 1.6295 LearningRate 0.0067 Epoch: 17 Global Step: 177880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:15:34,766-Speed 5425.17 samples/sec Loss 1.6304 LearningRate 0.0067 Epoch: 17 Global Step: 177890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:15:42,337-Speed 5410.97 samples/sec Loss 1.6673 LearningRate 0.0067 Epoch: 17 Global Step: 177900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:15:49,987-Speed 5354.77 samples/sec Loss 1.6695 LearningRate 0.0067 Epoch: 17 Global Step: 177910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:15:57,586-Speed 5390.55 samples/sec Loss 1.6637 LearningRate 0.0067 Epoch: 17 Global Step: 177920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:16:05,182-Speed 5393.64 samples/sec Loss 1.6211 LearningRate 0.0067 Epoch: 17 Global Step: 177930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:16:12,932-Speed 5285.37 samples/sec Loss 1.6243 LearningRate 0.0067 Epoch: 17 Global Step: 177940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:16:20,588-Speed 5350.37 samples/sec Loss 1.6237 LearningRate 0.0067 Epoch: 17 Global Step: 177950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:16:28,352-Speed 5277.04 samples/sec Loss 1.6374 LearningRate 0.0067 Epoch: 17 Global Step: 177960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:16:36,010-Speed 5349.14 samples/sec Loss 1.6524 LearningRate 0.0067 Epoch: 17 Global Step: 177970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:16:43,664-Speed 5351.85 samples/sec Loss 1.6377 LearningRate 0.0067 Epoch: 17 Global Step: 177980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:16:51,429-Speed 5275.63 samples/sec Loss 1.6006 LearningRate 0.0067 Epoch: 17 Global Step: 177990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:16:58,960-Speed 5439.94 samples/sec Loss 1.6289 LearningRate 0.0067 Epoch: 17 Global Step: 178000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:17:43,353-[lfw][178000]XNorm: 22.461704 Training: 2022-01-09 11:17:43,354-[lfw][178000]Accuracy-Flip: 0.99833+-0.00197 Training: 2022-01-09 11:17:43,354-[lfw][178000]Accuracy-Highest: 0.99833 Training: 2022-01-09 11:18:35,079-[cfp_fp][178000]XNorm: 21.921971 Training: 2022-01-09 11:18:35,080-[cfp_fp][178000]Accuracy-Flip: 0.99257+-0.00343 Training: 2022-01-09 11:18:35,081-[cfp_fp][178000]Accuracy-Highest: 0.99371 Training: 2022-01-09 11:19:19,827-[agedb_30][178000]XNorm: 23.023412 Training: 2022-01-09 11:19:19,828-[agedb_30][178000]Accuracy-Flip: 0.98367+-0.00557 Training: 2022-01-09 11:19:19,828-[agedb_30][178000]Accuracy-Highest: 0.98433 Training: 2022-01-09 11:19:27,453-Speed 275.84 samples/sec Loss 1.6437 LearningRate 0.0067 Epoch: 17 Global Step: 178010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:19:34,991-Speed 5434.45 samples/sec Loss 1.5900 LearningRate 0.0067 Epoch: 17 Global Step: 178020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:19:42,549-Speed 5420.14 samples/sec Loss 1.6305 LearningRate 0.0067 Epoch: 17 Global Step: 178030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:19:49,992-Speed 5503.76 samples/sec Loss 1.6233 LearningRate 0.0067 Epoch: 17 Global Step: 178040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:20:01,486-Speed 3564.01 samples/sec Loss 1.5903 LearningRate 0.0066 Epoch: 17 Global Step: 178050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:20:08,814-Speed 5590.04 samples/sec Loss 1.6485 LearningRate 0.0066 Epoch: 17 Global Step: 178060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:20:16,370-Speed 5421.78 samples/sec Loss 1.6403 LearningRate 0.0066 Epoch: 17 Global Step: 178070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:20:23,900-Speed 5440.13 samples/sec Loss 1.6214 LearningRate 0.0066 Epoch: 17 Global Step: 178080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:20:31,442-Speed 5432.03 samples/sec Loss 1.6650 LearningRate 0.0066 Epoch: 17 Global Step: 178090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:20:39,015-Speed 5409.07 samples/sec Loss 1.6277 LearningRate 0.0066 Epoch: 17 Global Step: 178100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:20:46,620-Speed 5386.61 samples/sec Loss 1.6316 LearningRate 0.0066 Epoch: 17 Global Step: 178110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:20:54,120-Speed 5462.35 samples/sec Loss 1.6306 LearningRate 0.0066 Epoch: 17 Global Step: 178120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:21:01,593-Speed 5482.35 samples/sec Loss 1.6107 LearningRate 0.0066 Epoch: 17 Global Step: 178130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:21:09,122-Speed 5440.84 samples/sec Loss 1.6251 LearningRate 0.0066 Epoch: 17 Global Step: 178140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:21:16,677-Speed 5421.64 samples/sec Loss 1.6225 LearningRate 0.0066 Epoch: 17 Global Step: 178150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:21:24,142-Speed 5487.88 samples/sec Loss 1.6104 LearningRate 0.0066 Epoch: 17 Global Step: 178160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:21:31,656-Speed 5452.23 samples/sec Loss 1.6226 LearningRate 0.0066 Epoch: 17 Global Step: 178170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:21:39,120-Speed 5488.25 samples/sec Loss 1.5986 LearningRate 0.0066 Epoch: 17 Global Step: 178180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:21:46,635-Speed 5451.31 samples/sec Loss 1.6319 LearningRate 0.0066 Epoch: 17 Global Step: 178190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:21:54,143-Speed 5456.67 samples/sec Loss 1.5839 LearningRate 0.0066 Epoch: 17 Global Step: 178200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:22:01,736-Speed 5394.91 samples/sec Loss 1.6130 LearningRate 0.0066 Epoch: 17 Global Step: 178210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:22:09,195-Speed 5492.25 samples/sec Loss 1.6107 LearningRate 0.0066 Epoch: 17 Global Step: 178220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:22:16,687-Speed 5467.67 samples/sec Loss 1.5847 LearningRate 0.0066 Epoch: 17 Global Step: 178230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:22:24,224-Speed 5435.44 samples/sec Loss 1.6394 LearningRate 0.0066 Epoch: 17 Global Step: 178240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:22:31,711-Speed 5471.75 samples/sec Loss 1.6198 LearningRate 0.0066 Epoch: 17 Global Step: 178250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:22:39,156-Speed 5502.46 samples/sec Loss 1.6144 LearningRate 0.0066 Epoch: 17 Global Step: 178260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:22:46,624-Speed 5485.44 samples/sec Loss 1.5861 LearningRate 0.0065 Epoch: 17 Global Step: 178270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:22:54,093-Speed 5485.33 samples/sec Loss 1.6105 LearningRate 0.0065 Epoch: 17 Global Step: 178280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:23:01,559-Speed 5487.00 samples/sec Loss 1.6402 LearningRate 0.0065 Epoch: 17 Global Step: 178290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:23:09,051-Speed 5467.58 samples/sec Loss 1.6367 LearningRate 0.0065 Epoch: 17 Global Step: 178300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:23:16,576-Speed 5444.13 samples/sec Loss 1.6331 LearningRate 0.0065 Epoch: 17 Global Step: 178310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:23:24,093-Speed 5449.74 samples/sec Loss 1.6097 LearningRate 0.0065 Epoch: 17 Global Step: 178320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:23:31,613-Speed 5447.96 samples/sec Loss 1.6032 LearningRate 0.0065 Epoch: 17 Global Step: 178330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:23:39,183-Speed 5411.06 samples/sec Loss 1.6107 LearningRate 0.0065 Epoch: 17 Global Step: 178340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:23:46,755-Speed 5410.18 samples/sec Loss 1.5644 LearningRate 0.0065 Epoch: 17 Global Step: 178350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:23:54,274-Speed 5448.38 samples/sec Loss 1.5824 LearningRate 0.0065 Epoch: 17 Global Step: 178360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:24:01,799-Speed 5444.35 samples/sec Loss 1.5958 LearningRate 0.0065 Epoch: 17 Global Step: 178370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:24:09,386-Speed 5399.47 samples/sec Loss 1.5966 LearningRate 0.0065 Epoch: 17 Global Step: 178380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:24:16,831-Speed 5501.95 samples/sec Loss 1.6122 LearningRate 0.0065 Epoch: 17 Global Step: 178390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:24:24,389-Speed 5420.20 samples/sec Loss 1.5909 LearningRate 0.0065 Epoch: 17 Global Step: 178400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:24:31,922-Speed 5438.67 samples/sec Loss 1.5904 LearningRate 0.0065 Epoch: 17 Global Step: 178410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:24:39,436-Speed 5451.56 samples/sec Loss 1.5808 LearningRate 0.0065 Epoch: 17 Global Step: 178420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:24:46,996-Speed 5418.65 samples/sec Loss 1.5960 LearningRate 0.0065 Epoch: 17 Global Step: 178430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:24:54,477-Speed 5475.96 samples/sec Loss 1.6241 LearningRate 0.0065 Epoch: 17 Global Step: 178440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:25:02,019-Speed 5431.69 samples/sec Loss 1.6153 LearningRate 0.0065 Epoch: 17 Global Step: 178450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:25:09,584-Speed 5415.24 samples/sec Loss 1.5940 LearningRate 0.0065 Epoch: 17 Global Step: 178460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:25:17,079-Speed 5465.93 samples/sec Loss 1.5993 LearningRate 0.0065 Epoch: 17 Global Step: 178470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:25:24,603-Speed 5444.60 samples/sec Loss 1.6180 LearningRate 0.0065 Epoch: 17 Global Step: 178480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:25:32,140-Speed 5435.47 samples/sec Loss 1.6216 LearningRate 0.0065 Epoch: 17 Global Step: 178490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:25:39,616-Speed 5478.87 samples/sec Loss 1.6120 LearningRate 0.0064 Epoch: 17 Global Step: 178500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:25:47,086-Speed 5484.10 samples/sec Loss 1.6080 LearningRate 0.0064 Epoch: 17 Global Step: 178510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:25:54,667-Speed 5403.86 samples/sec Loss 1.6122 LearningRate 0.0064 Epoch: 17 Global Step: 178520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:26:02,120-Speed 5497.12 samples/sec Loss 1.6294 LearningRate 0.0064 Epoch: 17 Global Step: 178530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:26:09,562-Speed 5503.98 samples/sec Loss 1.6119 LearningRate 0.0064 Epoch: 17 Global Step: 178540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:26:17,059-Speed 5464.42 samples/sec Loss 1.6139 LearningRate 0.0064 Epoch: 17 Global Step: 178550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:26:24,538-Speed 5477.88 samples/sec Loss 1.6110 LearningRate 0.0064 Epoch: 17 Global Step: 178560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:26:32,013-Speed 5479.81 samples/sec Loss 1.5978 LearningRate 0.0064 Epoch: 17 Global Step: 178570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:26:39,483-Speed 5484.18 samples/sec Loss 1.6054 LearningRate 0.0064 Epoch: 17 Global Step: 178580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:26:47,028-Speed 5429.71 samples/sec Loss 1.5872 LearningRate 0.0064 Epoch: 17 Global Step: 178590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:26:54,541-Speed 5453.10 samples/sec Loss 1.6153 LearningRate 0.0064 Epoch: 17 Global Step: 178600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:27:02,077-Speed 5436.63 samples/sec Loss 1.6086 LearningRate 0.0064 Epoch: 17 Global Step: 178610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:27:09,690-Speed 5380.74 samples/sec Loss 1.6030 LearningRate 0.0064 Epoch: 17 Global Step: 178620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:27:17,295-Speed 5386.38 samples/sec Loss 1.5795 LearningRate 0.0064 Epoch: 17 Global Step: 178630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:27:24,881-Speed 5400.27 samples/sec Loss 1.6175 LearningRate 0.0064 Epoch: 17 Global Step: 178640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:27:32,443-Speed 5417.77 samples/sec Loss 1.5989 LearningRate 0.0064 Epoch: 17 Global Step: 178650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:27:40,008-Speed 5414.66 samples/sec Loss 1.5895 LearningRate 0.0064 Epoch: 17 Global Step: 178660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:27:47,577-Speed 5412.41 samples/sec Loss 1.6029 LearningRate 0.0064 Epoch: 17 Global Step: 178670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:27:55,139-Speed 5417.34 samples/sec Loss 1.5782 LearningRate 0.0064 Epoch: 17 Global Step: 178680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:28:02,672-Speed 5437.85 samples/sec Loss 1.6197 LearningRate 0.0064 Epoch: 17 Global Step: 178690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:28:10,175-Speed 5459.90 samples/sec Loss 1.6109 LearningRate 0.0064 Epoch: 17 Global Step: 178700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:28:17,780-Speed 5386.95 samples/sec Loss 1.6337 LearningRate 0.0064 Epoch: 17 Global Step: 178710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:28:25,205-Speed 5516.65 samples/sec Loss 1.5752 LearningRate 0.0063 Epoch: 17 Global Step: 178720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:28:32,689-Speed 5474.31 samples/sec Loss 1.5803 LearningRate 0.0063 Epoch: 17 Global Step: 178730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:28:40,163-Speed 5480.54 samples/sec Loss 1.5738 LearningRate 0.0063 Epoch: 17 Global Step: 178740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:28:47,701-Speed 5434.39 samples/sec Loss 1.5964 LearningRate 0.0063 Epoch: 17 Global Step: 178750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:28:55,189-Speed 5471.50 samples/sec Loss 1.5610 LearningRate 0.0063 Epoch: 17 Global Step: 178760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:29:02,690-Speed 5461.05 samples/sec Loss 1.5842 LearningRate 0.0063 Epoch: 17 Global Step: 178770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:29:10,240-Speed 5426.07 samples/sec Loss 1.5836 LearningRate 0.0063 Epoch: 17 Global Step: 178780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:29:17,754-Speed 5451.66 samples/sec Loss 1.6007 LearningRate 0.0063 Epoch: 17 Global Step: 178790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:29:25,233-Speed 5477.42 samples/sec Loss 1.5860 LearningRate 0.0063 Epoch: 17 Global Step: 178800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:29:32,712-Speed 5477.56 samples/sec Loss 1.5863 LearningRate 0.0063 Epoch: 17 Global Step: 178810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-09 11:29:40,234-Speed 5446.06 samples/sec Loss 1.6029 LearningRate 0.0063 Epoch: 17 Global Step: 178820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:29:47,760-Speed 5443.09 samples/sec Loss 1.6134 LearningRate 0.0063 Epoch: 17 Global Step: 178830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:29:55,232-Speed 5482.29 samples/sec Loss 1.5815 LearningRate 0.0063 Epoch: 17 Global Step: 178840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:30:02,777-Speed 5429.79 samples/sec Loss 1.5978 LearningRate 0.0063 Epoch: 17 Global Step: 178850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:30:10,320-Speed 5431.35 samples/sec Loss 1.5808 LearningRate 0.0063 Epoch: 17 Global Step: 178860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:30:17,847-Speed 5441.66 samples/sec Loss 1.5976 LearningRate 0.0063 Epoch: 17 Global Step: 178870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:30:25,344-Speed 5464.47 samples/sec Loss 1.5819 LearningRate 0.0063 Epoch: 17 Global Step: 178880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:30:32,839-Speed 5466.24 samples/sec Loss 1.5696 LearningRate 0.0063 Epoch: 17 Global Step: 178890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:30:40,310-Speed 5483.34 samples/sec Loss 1.5933 LearningRate 0.0063 Epoch: 17 Global Step: 178900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-09 11:30:47,763-Speed 5495.89 samples/sec Loss 1.5708 LearningRate 0.0063 Epoch: 17 Global Step: 178910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:30:55,202-Speed 5506.45 samples/sec Loss 1.5859 LearningRate 0.0063 Epoch: 17 Global Step: 178920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:31:02,633-Speed 5513.68 samples/sec Loss 1.6077 LearningRate 0.0063 Epoch: 17 Global Step: 178930 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:31:10,114-Speed 5476.17 samples/sec Loss 1.5838 LearningRate 0.0063 Epoch: 17 Global Step: 178940 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:31:17,562-Speed 5499.96 samples/sec Loss 1.5593 LearningRate 0.0062 Epoch: 17 Global Step: 178950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:31:25,080-Speed 5448.22 samples/sec Loss 1.6080 LearningRate 0.0062 Epoch: 17 Global Step: 178960 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:31:32,542-Speed 5490.19 samples/sec Loss 1.5981 LearningRate 0.0062 Epoch: 17 Global Step: 178970 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:31:40,028-Speed 5472.22 samples/sec Loss 1.5839 LearningRate 0.0062 Epoch: 17 Global Step: 178980 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:31:47,636-Speed 5384.49 samples/sec Loss 1.5753 LearningRate 0.0062 Epoch: 17 Global Step: 178990 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:31:55,841-Speed 5509.73 samples/sec Loss 1.5624 LearningRate 0.0062 Epoch: 17 Global Step: 179000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-09 11:32:03,399-Speed 5420.76 samples/sec Loss 1.5821 LearningRate 0.0062 Epoch: 17 Global Step: 179010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:32:10,863-Speed 5488.83 samples/sec Loss 1.5446 LearningRate 0.0062 Epoch: 17 Global Step: 179020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:32:18,449-Speed 5399.67 samples/sec Loss 1.5619 LearningRate 0.0062 Epoch: 17 Global Step: 179030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:32:26,004-Speed 5422.45 samples/sec Loss 1.5809 LearningRate 0.0062 Epoch: 17 Global Step: 179040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:32:33,485-Speed 5476.37 samples/sec Loss 1.5868 LearningRate 0.0062 Epoch: 17 Global Step: 179050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:32:41,099-Speed 5380.46 samples/sec Loss 1.5694 LearningRate 0.0062 Epoch: 17 Global Step: 179060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:32:48,650-Speed 5424.69 samples/sec Loss 1.5783 LearningRate 0.0062 Epoch: 17 Global Step: 179070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:32:56,168-Speed 5448.97 samples/sec Loss 1.5749 LearningRate 0.0062 Epoch: 17 Global Step: 179080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:33:03,703-Speed 5436.73 samples/sec Loss 1.5858 LearningRate 0.0062 Epoch: 17 Global Step: 179090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:33:11,233-Speed 5440.10 samples/sec Loss 1.5839 LearningRate 0.0062 Epoch: 17 Global Step: 179100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:33:18,793-Speed 5419.17 samples/sec Loss 1.5644 LearningRate 0.0062 Epoch: 17 Global Step: 179110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:33:26,364-Speed 5410.67 samples/sec Loss 1.5957 LearningRate 0.0062 Epoch: 17 Global Step: 179120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:33:33,890-Speed 5442.89 samples/sec Loss 1.6085 LearningRate 0.0062 Epoch: 17 Global Step: 179130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:33:41,388-Speed 5463.61 samples/sec Loss 1.5714 LearningRate 0.0062 Epoch: 17 Global Step: 179140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:33:48,954-Speed 5414.35 samples/sec Loss 1.5624 LearningRate 0.0062 Epoch: 17 Global Step: 179150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:33:56,495-Speed 5432.69 samples/sec Loss 1.5594 LearningRate 0.0062 Epoch: 17 Global Step: 179160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:34:04,067-Speed 5409.96 samples/sec Loss 1.5967 LearningRate 0.0062 Epoch: 17 Global Step: 179170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:34:11,559-Speed 5467.82 samples/sec Loss 1.5845 LearningRate 0.0061 Epoch: 17 Global Step: 179180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:34:19,003-Speed 5503.02 samples/sec Loss 1.5683 LearningRate 0.0061 Epoch: 17 Global Step: 179190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:34:26,502-Speed 5462.36 samples/sec Loss 1.6064 LearningRate 0.0061 Epoch: 17 Global Step: 179200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:34:33,985-Speed 5474.97 samples/sec Loss 1.5876 LearningRate 0.0061 Epoch: 17 Global Step: 179210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:34:41,489-Speed 5458.87 samples/sec Loss 1.5604 LearningRate 0.0061 Epoch: 17 Global Step: 179220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:34:48,946-Speed 5493.50 samples/sec Loss 1.5670 LearningRate 0.0061 Epoch: 17 Global Step: 179230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:34:56,474-Speed 5441.72 samples/sec Loss 1.5507 LearningRate 0.0061 Epoch: 17 Global Step: 179240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:35:04,020-Speed 5428.95 samples/sec Loss 1.5854 LearningRate 0.0061 Epoch: 17 Global Step: 179250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:35:11,580-Speed 5418.68 samples/sec Loss 1.5724 LearningRate 0.0061 Epoch: 17 Global Step: 179260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:35:19,077-Speed 5464.53 samples/sec Loss 1.5693 LearningRate 0.0061 Epoch: 17 Global Step: 179270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:35:26,589-Speed 5452.67 samples/sec Loss 1.5699 LearningRate 0.0061 Epoch: 17 Global Step: 179280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:35:34,040-Speed 5498.50 samples/sec Loss 1.5672 LearningRate 0.0061 Epoch: 17 Global Step: 179290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:35:41,545-Speed 5457.82 samples/sec Loss 1.5783 LearningRate 0.0061 Epoch: 17 Global Step: 179300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:35:49,069-Speed 5445.12 samples/sec Loss 1.5756 LearningRate 0.0061 Epoch: 17 Global Step: 179310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:35:56,644-Speed 5408.05 samples/sec Loss 1.5673 LearningRate 0.0061 Epoch: 17 Global Step: 179320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:36:04,213-Speed 5412.15 samples/sec Loss 1.5681 LearningRate 0.0061 Epoch: 17 Global Step: 179330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:36:11,662-Speed 5498.81 samples/sec Loss 1.5521 LearningRate 0.0061 Epoch: 17 Global Step: 179340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:36:19,196-Speed 5437.97 samples/sec Loss 1.5818 LearningRate 0.0061 Epoch: 17 Global Step: 179350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:36:26,754-Speed 5420.31 samples/sec Loss 1.5914 LearningRate 0.0061 Epoch: 17 Global Step: 179360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:36:34,261-Speed 5456.98 samples/sec Loss 1.5656 LearningRate 0.0061 Epoch: 17 Global Step: 179370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:36:41,738-Speed 5478.62 samples/sec Loss 1.5612 LearningRate 0.0061 Epoch: 17 Global Step: 179380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:36:49,294-Speed 5421.92 samples/sec Loss 1.5776 LearningRate 0.0061 Epoch: 17 Global Step: 179390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:36:56,721-Speed 5515.87 samples/sec Loss 1.5674 LearningRate 0.0061 Epoch: 17 Global Step: 179400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:37:04,220-Speed 5462.30 samples/sec Loss 1.5628 LearningRate 0.0060 Epoch: 17 Global Step: 179410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:37:11,742-Speed 5446.49 samples/sec Loss 1.5736 LearningRate 0.0060 Epoch: 17 Global Step: 179420 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:37:19,241-Speed 5462.46 samples/sec Loss 1.6077 LearningRate 0.0060 Epoch: 17 Global Step: 179430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:37:26,750-Speed 5455.63 samples/sec Loss 1.5743 LearningRate 0.0060 Epoch: 17 Global Step: 179440 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:37:34,199-Speed 5499.21 samples/sec Loss 1.5447 LearningRate 0.0060 Epoch: 17 Global Step: 179450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:37:41,718-Speed 5448.47 samples/sec Loss 1.5976 LearningRate 0.0060 Epoch: 17 Global Step: 179460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:37:49,205-Speed 5471.77 samples/sec Loss 1.5844 LearningRate 0.0060 Epoch: 17 Global Step: 179470 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-09 11:37:56,706-Speed 5461.33 samples/sec Loss 1.5389 LearningRate 0.0060 Epoch: 17 Global Step: 179480 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-09 11:38:04,193-Speed 5471.36 samples/sec Loss 1.5671 LearningRate 0.0060 Epoch: 17 Global Step: 179490 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-09 11:38:11,741-Speed 5427.87 samples/sec Loss 1.5762 LearningRate 0.0060 Epoch: 17 Global Step: 179500 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-09 11:38:19,310-Speed 5411.79 samples/sec Loss 1.5688 LearningRate 0.0060 Epoch: 17 Global Step: 179510 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-09 11:38:26,786-Speed 5479.89 samples/sec Loss 1.5555 LearningRate 0.0060 Epoch: 17 Global Step: 179520 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-09 11:38:34,307-Speed 5446.95 samples/sec Loss 1.5869 LearningRate 0.0060 Epoch: 17 Global Step: 179530 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-09 11:38:41,803-Speed 5464.67 samples/sec Loss 1.5482 LearningRate 0.0060 Epoch: 17 Global Step: 179540 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-09 11:38:49,295-Speed 5468.17 samples/sec Loss 1.5718 LearningRate 0.0060 Epoch: 17 Global Step: 179550 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-09 11:38:56,732-Speed 5508.83 samples/sec Loss 1.5469 LearningRate 0.0060 Epoch: 17 Global Step: 179560 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-09 11:39:04,296-Speed 5415.51 samples/sec Loss 1.5979 LearningRate 0.0060 Epoch: 17 Global Step: 179570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:39:11,785-Speed 5470.28 samples/sec Loss 1.5567 LearningRate 0.0060 Epoch: 17 Global Step: 179580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:39:19,271-Speed 5472.16 samples/sec Loss 1.5826 LearningRate 0.0060 Epoch: 17 Global Step: 179590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:39:26,827-Speed 5421.35 samples/sec Loss 1.5698 LearningRate 0.0060 Epoch: 17 Global Step: 179600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:39:34,402-Speed 5408.47 samples/sec Loss 1.5509 LearningRate 0.0060 Epoch: 17 Global Step: 179610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:39:42,037-Speed 5365.56 samples/sec Loss 1.5532 LearningRate 0.0060 Epoch: 17 Global Step: 179620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:39:49,694-Speed 5349.71 samples/sec Loss 1.5279 LearningRate 0.0060 Epoch: 17 Global Step: 179630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:39:57,254-Speed 5419.09 samples/sec Loss 1.5578 LearningRate 0.0059 Epoch: 17 Global Step: 179640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:40:04,880-Speed 5371.72 samples/sec Loss 1.5594 LearningRate 0.0059 Epoch: 17 Global Step: 179650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:40:12,550-Speed 5340.54 samples/sec Loss 1.5550 LearningRate 0.0059 Epoch: 17 Global Step: 179660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:40:20,101-Speed 5425.00 samples/sec Loss 1.5478 LearningRate 0.0059 Epoch: 17 Global Step: 179670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:40:27,821-Speed 5306.61 samples/sec Loss 1.5359 LearningRate 0.0059 Epoch: 17 Global Step: 179680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:40:35,541-Speed 5306.22 samples/sec Loss 1.5421 LearningRate 0.0059 Epoch: 17 Global Step: 179690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:40:43,143-Speed 5388.84 samples/sec Loss 1.5459 LearningRate 0.0059 Epoch: 17 Global Step: 179700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:40:50,650-Speed 5457.12 samples/sec Loss 1.5206 LearningRate 0.0059 Epoch: 17 Global Step: 179710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:40:58,146-Speed 5465.04 samples/sec Loss 1.5426 LearningRate 0.0059 Epoch: 17 Global Step: 179720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:41:05,748-Speed 5389.24 samples/sec Loss 1.5333 LearningRate 0.0059 Epoch: 17 Global Step: 179730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:41:13,270-Speed 5446.26 samples/sec Loss 1.5528 LearningRate 0.0059 Epoch: 17 Global Step: 179740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:41:20,855-Speed 5400.41 samples/sec Loss 1.5570 LearningRate 0.0059 Epoch: 17 Global Step: 179750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:41:28,394-Speed 5433.85 samples/sec Loss 1.5286 LearningRate 0.0059 Epoch: 17 Global Step: 179760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:41:35,865-Speed 5483.18 samples/sec Loss 1.5430 LearningRate 0.0059 Epoch: 17 Global Step: 179770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:41:43,440-Speed 5408.72 samples/sec Loss 1.5436 LearningRate 0.0059 Epoch: 17 Global Step: 179780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:41:50,931-Speed 5468.44 samples/sec Loss 1.5328 LearningRate 0.0059 Epoch: 17 Global Step: 179790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:41:58,461-Speed 5440.05 samples/sec Loss 1.5142 LearningRate 0.0059 Epoch: 17 Global Step: 179800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:42:06,002-Speed 5431.88 samples/sec Loss 1.5392 LearningRate 0.0059 Epoch: 17 Global Step: 179810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:42:13,712-Speed 5314.10 samples/sec Loss 1.5374 LearningRate 0.0059 Epoch: 17 Global Step: 179820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:42:21,191-Speed 5477.24 samples/sec Loss 1.5243 LearningRate 0.0059 Epoch: 17 Global Step: 179830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:42:28,722-Speed 5439.21 samples/sec Loss 1.5468 LearningRate 0.0059 Epoch: 17 Global Step: 179840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:42:36,334-Speed 5381.79 samples/sec Loss 1.5559 LearningRate 0.0059 Epoch: 17 Global Step: 179850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:42:43,885-Speed 5425.03 samples/sec Loss 1.5131 LearningRate 0.0059 Epoch: 17 Global Step: 179860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:42:51,367-Speed 5476.03 samples/sec Loss 1.5636 LearningRate 0.0058 Epoch: 17 Global Step: 179870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:42:58,921-Speed 5422.13 samples/sec Loss 1.5717 LearningRate 0.0058 Epoch: 17 Global Step: 179880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:43:06,453-Speed 5439.08 samples/sec Loss 1.5433 LearningRate 0.0058 Epoch: 17 Global Step: 179890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:43:14,046-Speed 5395.42 samples/sec Loss 1.5311 LearningRate 0.0058 Epoch: 17 Global Step: 179900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:43:21,589-Speed 5431.13 samples/sec Loss 1.5508 LearningRate 0.0058 Epoch: 17 Global Step: 179910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:43:29,201-Speed 5381.46 samples/sec Loss 1.5287 LearningRate 0.0058 Epoch: 17 Global Step: 179920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:43:36,752-Speed 5424.66 samples/sec Loss 1.5593 LearningRate 0.0058 Epoch: 17 Global Step: 179930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:43:44,255-Speed 5460.16 samples/sec Loss 1.5272 LearningRate 0.0058 Epoch: 17 Global Step: 179940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:43:51,777-Speed 5445.91 samples/sec Loss 1.5587 LearningRate 0.0058 Epoch: 17 Global Step: 179950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:43:59,334-Speed 5420.62 samples/sec Loss 1.5286 LearningRate 0.0058 Epoch: 17 Global Step: 179960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:44:06,905-Speed 5410.89 samples/sec Loss 1.5477 LearningRate 0.0058 Epoch: 17 Global Step: 179970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:44:14,393-Speed 5470.61 samples/sec Loss 1.5467 LearningRate 0.0058 Epoch: 17 Global Step: 179980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:44:21,935-Speed 5432.33 samples/sec Loss 1.5359 LearningRate 0.0058 Epoch: 17 Global Step: 179990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:44:29,503-Speed 5412.81 samples/sec Loss 1.5487 LearningRate 0.0058 Epoch: 17 Global Step: 180000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:45:13,738-[lfw][180000]XNorm: 23.220754 Training: 2022-01-09 11:45:13,739-[lfw][180000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 11:45:13,739-[lfw][180000]Accuracy-Highest: 0.99833 Training: 2022-01-09 11:46:05,256-[cfp_fp][180000]XNorm: 22.335384 Training: 2022-01-09 11:46:05,257-[cfp_fp][180000]Accuracy-Flip: 0.99314+-0.00408 Training: 2022-01-09 11:46:05,258-[cfp_fp][180000]Accuracy-Highest: 0.99371 Training: 2022-01-09 11:46:49,575-[agedb_30][180000]XNorm: 23.338926 Training: 2022-01-09 11:46:49,576-[agedb_30][180000]Accuracy-Flip: 0.98300+-0.00600 Training: 2022-01-09 11:46:49,576-[agedb_30][180000]Accuracy-Highest: 0.98433 Training: 2022-01-09 11:46:57,151-Speed 277.42 samples/sec Loss 1.5445 LearningRate 0.0058 Epoch: 17 Global Step: 180010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:47:04,658-Speed 5457.01 samples/sec Loss 1.5341 LearningRate 0.0058 Epoch: 17 Global Step: 180020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:47:12,207-Speed 5426.20 samples/sec Loss 1.5508 LearningRate 0.0058 Epoch: 17 Global Step: 180030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:47:19,772-Speed 5415.18 samples/sec Loss 1.5559 LearningRate 0.0058 Epoch: 17 Global Step: 180040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:47:27,238-Speed 5487.41 samples/sec Loss 1.5583 LearningRate 0.0058 Epoch: 17 Global Step: 180050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:47:34,720-Speed 5475.11 samples/sec Loss 1.5257 LearningRate 0.0058 Epoch: 17 Global Step: 180060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:47:42,254-Speed 5437.40 samples/sec Loss 1.5279 LearningRate 0.0058 Epoch: 17 Global Step: 180070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:47:49,805-Speed 5424.57 samples/sec Loss 1.5514 LearningRate 0.0058 Epoch: 17 Global Step: 180080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:47:57,453-Speed 5356.30 samples/sec Loss 1.5230 LearningRate 0.0058 Epoch: 17 Global Step: 180090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:48:05,265-Speed 5243.93 samples/sec Loss 1.5343 LearningRate 0.0058 Epoch: 17 Global Step: 180100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:48:12,741-Speed 5479.49 samples/sec Loss 1.5048 LearningRate 0.0057 Epoch: 17 Global Step: 180110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:48:20,206-Speed 5487.99 samples/sec Loss 1.5288 LearningRate 0.0057 Epoch: 17 Global Step: 180120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:48:27,734-Speed 5441.51 samples/sec Loss 1.5151 LearningRate 0.0057 Epoch: 17 Global Step: 180130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:48:35,295-Speed 5417.86 samples/sec Loss 1.5254 LearningRate 0.0057 Epoch: 17 Global Step: 180140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:48:42,828-Speed 5438.13 samples/sec Loss 1.5319 LearningRate 0.0057 Epoch: 17 Global Step: 180150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:48:50,363-Speed 5436.80 samples/sec Loss 1.5109 LearningRate 0.0057 Epoch: 17 Global Step: 180160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:48:57,888-Speed 5444.06 samples/sec Loss 1.5248 LearningRate 0.0057 Epoch: 17 Global Step: 180170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:49:05,401-Speed 5452.13 samples/sec Loss 1.5341 LearningRate 0.0057 Epoch: 17 Global Step: 180180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:49:12,919-Speed 5448.96 samples/sec Loss 1.5302 LearningRate 0.0057 Epoch: 17 Global Step: 180190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:49:20,424-Speed 5458.81 samples/sec Loss 1.5239 LearningRate 0.0057 Epoch: 17 Global Step: 180200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:49:27,911-Speed 5471.23 samples/sec Loss 1.5222 LearningRate 0.0057 Epoch: 17 Global Step: 180210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:49:35,363-Speed 5497.23 samples/sec Loss 1.5200 LearningRate 0.0057 Epoch: 17 Global Step: 180220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:49:42,922-Speed 5419.55 samples/sec Loss 1.5426 LearningRate 0.0057 Epoch: 17 Global Step: 180230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:49:50,411-Speed 5469.85 samples/sec Loss 1.5198 LearningRate 0.0057 Epoch: 17 Global Step: 180240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:49:57,851-Speed 5505.95 samples/sec Loss 1.5110 LearningRate 0.0057 Epoch: 17 Global Step: 180250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:50:05,319-Speed 5485.41 samples/sec Loss 1.5142 LearningRate 0.0057 Epoch: 17 Global Step: 180260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:50:12,797-Speed 5478.09 samples/sec Loss 1.5536 LearningRate 0.0057 Epoch: 17 Global Step: 180270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:50:20,257-Speed 5491.42 samples/sec Loss 1.5113 LearningRate 0.0057 Epoch: 17 Global Step: 180280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:50:27,762-Speed 5458.31 samples/sec Loss 1.5239 LearningRate 0.0057 Epoch: 17 Global Step: 180290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:50:35,247-Speed 5472.87 samples/sec Loss 1.5016 LearningRate 0.0057 Epoch: 17 Global Step: 180300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:50:42,754-Speed 5457.08 samples/sec Loss 1.5279 LearningRate 0.0057 Epoch: 17 Global Step: 180310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 11:50:50,246-Speed 5468.19 samples/sec Loss 1.5544 LearningRate 0.0057 Epoch: 17 Global Step: 180320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:50:57,718-Speed 5482.05 samples/sec Loss 1.5210 LearningRate 0.0057 Epoch: 17 Global Step: 180330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:51:05,233-Speed 5451.74 samples/sec Loss 1.5443 LearningRate 0.0057 Epoch: 17 Global Step: 180340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:51:12,723-Speed 5468.73 samples/sec Loss 1.5146 LearningRate 0.0056 Epoch: 17 Global Step: 180350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:51:20,223-Speed 5462.71 samples/sec Loss 1.5113 LearningRate 0.0056 Epoch: 17 Global Step: 180360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:51:27,748-Speed 5443.85 samples/sec Loss 1.5207 LearningRate 0.0056 Epoch: 17 Global Step: 180370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:51:35,211-Speed 5488.65 samples/sec Loss 1.5312 LearningRate 0.0056 Epoch: 17 Global Step: 180380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:51:42,740-Speed 5441.64 samples/sec Loss 1.5441 LearningRate 0.0056 Epoch: 17 Global Step: 180390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:51:50,349-Speed 5383.87 samples/sec Loss 1.5414 LearningRate 0.0056 Epoch: 17 Global Step: 180400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:51:57,859-Speed 5454.47 samples/sec Loss 1.5397 LearningRate 0.0056 Epoch: 17 Global Step: 180410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:52:05,405-Speed 5429.09 samples/sec Loss 1.5191 LearningRate 0.0056 Epoch: 17 Global Step: 180420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:52:12,918-Speed 5452.20 samples/sec Loss 1.5249 LearningRate 0.0056 Epoch: 17 Global Step: 180430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:52:20,467-Speed 5426.54 samples/sec Loss 1.5223 LearningRate 0.0056 Epoch: 17 Global Step: 180440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:52:28,100-Speed 5367.37 samples/sec Loss 1.4995 LearningRate 0.0056 Epoch: 17 Global Step: 180450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:52:35,617-Speed 5449.37 samples/sec Loss 1.5344 LearningRate 0.0056 Epoch: 17 Global Step: 180460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:52:43,175-Speed 5420.86 samples/sec Loss 1.4781 LearningRate 0.0056 Epoch: 17 Global Step: 180470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:52:50,718-Speed 5430.33 samples/sec Loss 1.5179 LearningRate 0.0056 Epoch: 17 Global Step: 180480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:52:58,221-Speed 5460.21 samples/sec Loss 1.5233 LearningRate 0.0056 Epoch: 17 Global Step: 180490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:53:05,778-Speed 5420.98 samples/sec Loss 1.5079 LearningRate 0.0056 Epoch: 17 Global Step: 180500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:53:13,240-Speed 5490.05 samples/sec Loss 1.5151 LearningRate 0.0056 Epoch: 17 Global Step: 180510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:53:20,726-Speed 5472.38 samples/sec Loss 1.5064 LearningRate 0.0056 Epoch: 17 Global Step: 180520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:53:28,245-Speed 5448.33 samples/sec Loss 1.5129 LearningRate 0.0056 Epoch: 17 Global Step: 180530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:53:35,733-Speed 5471.01 samples/sec Loss 1.5171 LearningRate 0.0056 Epoch: 17 Global Step: 180540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:53:43,180-Speed 5500.66 samples/sec Loss 1.5077 LearningRate 0.0056 Epoch: 17 Global Step: 180550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:53:50,647-Speed 5486.12 samples/sec Loss 1.5015 LearningRate 0.0056 Epoch: 17 Global Step: 180560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:53:58,152-Speed 5458.60 samples/sec Loss 1.4983 LearningRate 0.0056 Epoch: 17 Global Step: 180570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:54:05,651-Speed 5463.06 samples/sec Loss 1.5211 LearningRate 0.0056 Epoch: 17 Global Step: 180580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:54:13,226-Speed 5407.60 samples/sec Loss 1.5133 LearningRate 0.0055 Epoch: 17 Global Step: 180590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:54:20,720-Speed 5466.48 samples/sec Loss 1.5183 LearningRate 0.0055 Epoch: 17 Global Step: 180600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:54:28,223-Speed 5459.76 samples/sec Loss 1.5110 LearningRate 0.0055 Epoch: 17 Global Step: 180610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:54:35,724-Speed 5461.60 samples/sec Loss 1.4808 LearningRate 0.0055 Epoch: 17 Global Step: 180620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:54:43,223-Speed 5463.21 samples/sec Loss 1.5304 LearningRate 0.0055 Epoch: 17 Global Step: 180630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:54:50,738-Speed 5451.15 samples/sec Loss 1.5132 LearningRate 0.0055 Epoch: 17 Global Step: 180640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:54:58,162-Speed 5518.20 samples/sec Loss 1.4941 LearningRate 0.0055 Epoch: 17 Global Step: 180650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:55:05,724-Speed 5416.95 samples/sec Loss 1.5266 LearningRate 0.0055 Epoch: 17 Global Step: 180660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:55:13,163-Speed 5507.06 samples/sec Loss 1.5231 LearningRate 0.0055 Epoch: 17 Global Step: 180670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:55:20,739-Speed 5407.49 samples/sec Loss 1.4984 LearningRate 0.0055 Epoch: 17 Global Step: 180680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:55:28,248-Speed 5454.68 samples/sec Loss 1.5320 LearningRate 0.0055 Epoch: 17 Global Step: 180690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:55:35,716-Speed 5485.85 samples/sec Loss 1.5007 LearningRate 0.0055 Epoch: 17 Global Step: 180700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:55:43,289-Speed 5409.67 samples/sec Loss 1.5043 LearningRate 0.0055 Epoch: 17 Global Step: 180710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:55:50,972-Speed 5332.07 samples/sec Loss 1.5024 LearningRate 0.0055 Epoch: 17 Global Step: 180720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:55:58,516-Speed 5429.68 samples/sec Loss 1.4950 LearningRate 0.0055 Epoch: 17 Global Step: 180730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:56:06,067-Speed 5426.09 samples/sec Loss 1.5110 LearningRate 0.0055 Epoch: 17 Global Step: 180740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:56:13,585-Speed 5448.47 samples/sec Loss 1.5242 LearningRate 0.0055 Epoch: 17 Global Step: 180750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:56:21,062-Speed 5479.37 samples/sec Loss 1.4949 LearningRate 0.0055 Epoch: 17 Global Step: 180760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:56:28,649-Speed 5398.97 samples/sec Loss 1.4938 LearningRate 0.0055 Epoch: 17 Global Step: 180770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:56:36,180-Speed 5439.35 samples/sec Loss 1.4742 LearningRate 0.0055 Epoch: 17 Global Step: 180780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:56:43,665-Speed 5473.53 samples/sec Loss 1.5129 LearningRate 0.0055 Epoch: 17 Global Step: 180790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:56:51,150-Speed 5472.36 samples/sec Loss 1.5015 LearningRate 0.0055 Epoch: 17 Global Step: 180800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:56:58,663-Speed 5453.14 samples/sec Loss 1.4833 LearningRate 0.0055 Epoch: 17 Global Step: 180810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:57:06,258-Speed 5393.55 samples/sec Loss 1.5272 LearningRate 0.0055 Epoch: 17 Global Step: 180820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:57:13,725-Speed 5486.00 samples/sec Loss 1.5154 LearningRate 0.0054 Epoch: 17 Global Step: 180830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:57:21,266-Speed 5432.26 samples/sec Loss 1.5157 LearningRate 0.0054 Epoch: 17 Global Step: 180840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:57:28,789-Speed 5445.42 samples/sec Loss 1.5143 LearningRate 0.0054 Epoch: 17 Global Step: 180850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:57:36,313-Speed 5444.41 samples/sec Loss 1.4930 LearningRate 0.0054 Epoch: 17 Global Step: 180860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:57:43,847-Speed 5437.57 samples/sec Loss 1.4936 LearningRate 0.0054 Epoch: 17 Global Step: 180870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:57:51,353-Speed 5457.47 samples/sec Loss 1.5083 LearningRate 0.0054 Epoch: 17 Global Step: 180880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:57:58,893-Speed 5433.44 samples/sec Loss 1.5389 LearningRate 0.0054 Epoch: 17 Global Step: 180890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:58:06,435-Speed 5431.26 samples/sec Loss 1.5193 LearningRate 0.0054 Epoch: 17 Global Step: 180900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:58:13,971-Speed 5436.50 samples/sec Loss 1.4901 LearningRate 0.0054 Epoch: 17 Global Step: 180910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:58:21,459-Speed 5470.42 samples/sec Loss 1.4966 LearningRate 0.0054 Epoch: 17 Global Step: 180920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:58:28,985-Speed 5443.87 samples/sec Loss 1.4889 LearningRate 0.0054 Epoch: 17 Global Step: 180930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:58:36,573-Speed 5398.55 samples/sec Loss 1.5209 LearningRate 0.0054 Epoch: 17 Global Step: 180940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:58:44,098-Speed 5443.84 samples/sec Loss 1.4986 LearningRate 0.0054 Epoch: 17 Global Step: 180950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:58:51,576-Speed 5478.41 samples/sec Loss 1.4677 LearningRate 0.0054 Epoch: 17 Global Step: 180960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:58:59,136-Speed 5418.69 samples/sec Loss 1.5049 LearningRate 0.0054 Epoch: 17 Global Step: 180970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:59:06,659-Speed 5445.54 samples/sec Loss 1.4903 LearningRate 0.0054 Epoch: 17 Global Step: 180980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:59:14,193-Speed 5436.97 samples/sec Loss 1.5124 LearningRate 0.0054 Epoch: 17 Global Step: 180990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:59:21,694-Speed 5461.16 samples/sec Loss 1.4982 LearningRate 0.0054 Epoch: 17 Global Step: 181000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:59:29,267-Speed 5409.23 samples/sec Loss 1.4826 LearningRate 0.0054 Epoch: 17 Global Step: 181010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 11:59:36,754-Speed 5471.91 samples/sec Loss 1.4969 LearningRate 0.0054 Epoch: 17 Global Step: 181020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:59:44,343-Speed 5397.48 samples/sec Loss 1.4976 LearningRate 0.0054 Epoch: 17 Global Step: 181030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:59:51,903-Speed 5418.96 samples/sec Loss 1.5006 LearningRate 0.0054 Epoch: 17 Global Step: 181040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 11:59:59,442-Speed 5433.77 samples/sec Loss 1.4871 LearningRate 0.0054 Epoch: 17 Global Step: 181050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:00:06,958-Speed 5450.69 samples/sec Loss 1.5009 LearningRate 0.0054 Epoch: 17 Global Step: 181060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:00:14,432-Speed 5480.32 samples/sec Loss 1.5000 LearningRate 0.0054 Epoch: 17 Global Step: 181070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:00:22,040-Speed 5384.85 samples/sec Loss 1.4773 LearningRate 0.0053 Epoch: 17 Global Step: 181080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:00:29,540-Speed 5462.05 samples/sec Loss 1.4723 LearningRate 0.0053 Epoch: 17 Global Step: 181090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:00:37,089-Speed 5426.62 samples/sec Loss 1.5081 LearningRate 0.0053 Epoch: 17 Global Step: 181100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:00:44,589-Speed 5462.04 samples/sec Loss 1.5025 LearningRate 0.0053 Epoch: 17 Global Step: 181110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:00:52,061-Speed 5483.06 samples/sec Loss 1.4710 LearningRate 0.0053 Epoch: 17 Global Step: 181120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:00:59,567-Speed 5457.19 samples/sec Loss 1.4856 LearningRate 0.0053 Epoch: 17 Global Step: 181130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:01:07,112-Speed 5429.77 samples/sec Loss 1.4954 LearningRate 0.0053 Epoch: 17 Global Step: 181140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:01:14,664-Speed 5424.33 samples/sec Loss 1.5160 LearningRate 0.0053 Epoch: 17 Global Step: 181150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:01:22,245-Speed 5403.95 samples/sec Loss 1.4747 LearningRate 0.0053 Epoch: 17 Global Step: 181160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:01:29,860-Speed 5379.68 samples/sec Loss 1.4813 LearningRate 0.0053 Epoch: 17 Global Step: 181170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:01:37,409-Speed 5426.22 samples/sec Loss 1.4928 LearningRate 0.0053 Epoch: 17 Global Step: 181180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:01:44,955-Speed 5428.45 samples/sec Loss 1.5119 LearningRate 0.0053 Epoch: 17 Global Step: 181190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:01:52,493-Speed 5435.22 samples/sec Loss 1.4788 LearningRate 0.0053 Epoch: 17 Global Step: 181200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:02:00,025-Speed 5439.02 samples/sec Loss 1.5025 LearningRate 0.0053 Epoch: 17 Global Step: 181210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:02:07,522-Speed 5464.01 samples/sec Loss 1.5041 LearningRate 0.0053 Epoch: 17 Global Step: 181220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:02:15,075-Speed 5423.57 samples/sec Loss 1.4775 LearningRate 0.0053 Epoch: 17 Global Step: 181230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:02:22,554-Speed 5477.41 samples/sec Loss 1.4894 LearningRate 0.0053 Epoch: 17 Global Step: 181240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:02:30,088-Speed 5438.22 samples/sec Loss 1.4646 LearningRate 0.0053 Epoch: 17 Global Step: 181250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:02:37,598-Speed 5454.15 samples/sec Loss 1.4649 LearningRate 0.0053 Epoch: 17 Global Step: 181260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:02:45,148-Speed 5425.85 samples/sec Loss 1.4598 LearningRate 0.0053 Epoch: 17 Global Step: 181270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:02:52,636-Speed 5471.31 samples/sec Loss 1.4990 LearningRate 0.0053 Epoch: 17 Global Step: 181280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:03:00,130-Speed 5466.49 samples/sec Loss 1.4857 LearningRate 0.0053 Epoch: 17 Global Step: 181290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:03:07,693-Speed 5416.41 samples/sec Loss 1.4863 LearningRate 0.0053 Epoch: 17 Global Step: 181300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:03:15,199-Speed 5457.88 samples/sec Loss 1.4647 LearningRate 0.0053 Epoch: 17 Global Step: 181310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:03:22,646-Speed 5500.96 samples/sec Loss 1.4707 LearningRate 0.0052 Epoch: 17 Global Step: 181320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:03:30,166-Speed 5447.74 samples/sec Loss 1.5003 LearningRate 0.0052 Epoch: 17 Global Step: 181330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:03:37,793-Speed 5370.77 samples/sec Loss 1.4891 LearningRate 0.0052 Epoch: 17 Global Step: 181340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:03:45,331-Speed 5434.98 samples/sec Loss 1.4922 LearningRate 0.0052 Epoch: 17 Global Step: 181350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:03:52,886-Speed 5422.46 samples/sec Loss 1.4634 LearningRate 0.0052 Epoch: 17 Global Step: 181360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:04:00,502-Speed 5379.20 samples/sec Loss 1.4629 LearningRate 0.0052 Epoch: 17 Global Step: 181370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:04:07,938-Speed 5508.92 samples/sec Loss 1.4753 LearningRate 0.0052 Epoch: 17 Global Step: 181380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:04:15,482-Speed 5429.77 samples/sec Loss 1.4783 LearningRate 0.0052 Epoch: 17 Global Step: 181390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:04:23,163-Speed 5333.98 samples/sec Loss 1.5082 LearningRate 0.0052 Epoch: 17 Global Step: 181400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:04:30,694-Speed 5439.94 samples/sec Loss 1.4706 LearningRate 0.0052 Epoch: 17 Global Step: 181410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:04:38,223-Speed 5440.72 samples/sec Loss 1.4920 LearningRate 0.0052 Epoch: 17 Global Step: 181420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:04:45,691-Speed 5485.02 samples/sec Loss 1.4576 LearningRate 0.0052 Epoch: 17 Global Step: 181430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:04:53,204-Speed 5453.11 samples/sec Loss 1.4962 LearningRate 0.0052 Epoch: 17 Global Step: 181440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:05:00,710-Speed 5457.87 samples/sec Loss 1.4937 LearningRate 0.0052 Epoch: 17 Global Step: 181450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:05:08,260-Speed 5425.91 samples/sec Loss 1.5062 LearningRate 0.0052 Epoch: 17 Global Step: 181460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:05:15,773-Speed 5452.35 samples/sec Loss 1.4863 LearningRate 0.0052 Epoch: 17 Global Step: 181470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:05:23,290-Speed 5449.25 samples/sec Loss 1.4824 LearningRate 0.0052 Epoch: 17 Global Step: 181480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:05:30,871-Speed 5403.96 samples/sec Loss 1.4832 LearningRate 0.0052 Epoch: 17 Global Step: 181490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:05:38,480-Speed 5384.17 samples/sec Loss 1.4557 LearningRate 0.0052 Epoch: 17 Global Step: 181500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:05:46,104-Speed 5373.19 samples/sec Loss 1.4848 LearningRate 0.0052 Epoch: 17 Global Step: 181510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:05:53,587-Speed 5474.27 samples/sec Loss 1.4870 LearningRate 0.0052 Epoch: 17 Global Step: 181520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:06:01,207-Speed 5376.28 samples/sec Loss 1.4831 LearningRate 0.0052 Epoch: 17 Global Step: 181530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:06:08,706-Speed 5462.86 samples/sec Loss 1.4815 LearningRate 0.0052 Epoch: 17 Global Step: 181540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:06:16,296-Speed 5397.89 samples/sec Loss 1.4754 LearningRate 0.0052 Epoch: 17 Global Step: 181550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:06:23,885-Speed 5397.58 samples/sec Loss 1.4686 LearningRate 0.0052 Epoch: 17 Global Step: 181560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:06:31,381-Speed 5465.21 samples/sec Loss 1.4546 LearningRate 0.0051 Epoch: 17 Global Step: 181570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:06:38,925-Speed 5430.44 samples/sec Loss 1.4688 LearningRate 0.0051 Epoch: 17 Global Step: 181580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:06:46,450-Speed 5443.77 samples/sec Loss 1.4693 LearningRate 0.0051 Epoch: 17 Global Step: 181590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:06:53,933-Speed 5474.06 samples/sec Loss 1.4738 LearningRate 0.0051 Epoch: 17 Global Step: 181600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:07:01,388-Speed 5494.95 samples/sec Loss 1.4521 LearningRate 0.0051 Epoch: 17 Global Step: 181610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:07:08,894-Speed 5457.94 samples/sec Loss 1.4630 LearningRate 0.0051 Epoch: 17 Global Step: 181620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:07:16,411-Speed 5449.73 samples/sec Loss 1.4778 LearningRate 0.0051 Epoch: 17 Global Step: 181630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:07:23,943-Speed 5438.45 samples/sec Loss 1.4398 LearningRate 0.0051 Epoch: 17 Global Step: 181640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:07:31,547-Speed 5387.64 samples/sec Loss 1.4753 LearningRate 0.0051 Epoch: 17 Global Step: 181650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:07:39,071-Speed 5444.47 samples/sec Loss 1.4788 LearningRate 0.0051 Epoch: 17 Global Step: 181660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:07:46,542-Speed 5483.92 samples/sec Loss 1.4575 LearningRate 0.0051 Epoch: 17 Global Step: 181670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:07:54,053-Speed 5453.75 samples/sec Loss 1.4760 LearningRate 0.0051 Epoch: 17 Global Step: 181680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:08:01,606-Speed 5423.37 samples/sec Loss 1.4714 LearningRate 0.0051 Epoch: 17 Global Step: 181690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:08:09,122-Speed 5450.99 samples/sec Loss 1.4670 LearningRate 0.0051 Epoch: 17 Global Step: 181700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:08:16,664-Speed 5431.22 samples/sec Loss 1.4475 LearningRate 0.0051 Epoch: 17 Global Step: 181710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:08:24,154-Speed 5469.30 samples/sec Loss 1.4702 LearningRate 0.0051 Epoch: 17 Global Step: 181720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:08:31,666-Speed 5452.79 samples/sec Loss 1.4609 LearningRate 0.0051 Epoch: 17 Global Step: 181730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:08:39,159-Speed 5468.04 samples/sec Loss 1.4601 LearningRate 0.0051 Epoch: 17 Global Step: 181740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:08:46,597-Speed 5507.44 samples/sec Loss 1.4579 LearningRate 0.0051 Epoch: 17 Global Step: 181750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:08:54,069-Speed 5482.72 samples/sec Loss 1.4751 LearningRate 0.0051 Epoch: 17 Global Step: 181760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:09:01,481-Speed 5526.82 samples/sec Loss 1.4617 LearningRate 0.0051 Epoch: 17 Global Step: 181770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:09:08,961-Speed 5476.39 samples/sec Loss 1.4499 LearningRate 0.0051 Epoch: 17 Global Step: 181780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:09:16,417-Speed 5494.72 samples/sec Loss 1.4512 LearningRate 0.0051 Epoch: 17 Global Step: 181790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:09:23,907-Speed 5469.00 samples/sec Loss 1.4583 LearningRate 0.0051 Epoch: 17 Global Step: 181800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:09:31,331-Speed 5518.03 samples/sec Loss 1.4534 LearningRate 0.0051 Epoch: 17 Global Step: 181810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:09:38,857-Speed 5442.85 samples/sec Loss 1.4580 LearningRate 0.0050 Epoch: 17 Global Step: 181820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:09:46,377-Speed 5448.02 samples/sec Loss 1.4739 LearningRate 0.0050 Epoch: 17 Global Step: 181830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:09:53,832-Speed 5495.66 samples/sec Loss 1.4452 LearningRate 0.0050 Epoch: 17 Global Step: 181840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:10:01,334-Speed 5460.13 samples/sec Loss 1.4611 LearningRate 0.0050 Epoch: 17 Global Step: 181850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:10:08,867-Speed 5438.25 samples/sec Loss 1.4759 LearningRate 0.0050 Epoch: 17 Global Step: 181860 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:10:16,421-Speed 5422.69 samples/sec Loss 1.4549 LearningRate 0.0050 Epoch: 17 Global Step: 181870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:10:23,847-Speed 5516.59 samples/sec Loss 1.4785 LearningRate 0.0050 Epoch: 17 Global Step: 181880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:10:31,295-Speed 5500.74 samples/sec Loss 1.4630 LearningRate 0.0050 Epoch: 17 Global Step: 181890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:10:38,801-Speed 5456.82 samples/sec Loss 1.4753 LearningRate 0.0050 Epoch: 17 Global Step: 181900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:10:46,298-Speed 5464.45 samples/sec Loss 1.4559 LearningRate 0.0050 Epoch: 17 Global Step: 181910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:10:53,732-Speed 5511.08 samples/sec Loss 1.4405 LearningRate 0.0050 Epoch: 17 Global Step: 181920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:11:01,228-Speed 5464.70 samples/sec Loss 1.4766 LearningRate 0.0050 Epoch: 17 Global Step: 181930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:11:08,663-Speed 5509.50 samples/sec Loss 1.4551 LearningRate 0.0050 Epoch: 17 Global Step: 181940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:11:16,171-Speed 5456.79 samples/sec Loss 1.4628 LearningRate 0.0050 Epoch: 17 Global Step: 181950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:11:23,676-Speed 5458.63 samples/sec Loss 1.4530 LearningRate 0.0050 Epoch: 17 Global Step: 181960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:11:31,132-Speed 5494.23 samples/sec Loss 1.4383 LearningRate 0.0050 Epoch: 17 Global Step: 181970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:11:38,606-Speed 5481.01 samples/sec Loss 1.4407 LearningRate 0.0050 Epoch: 17 Global Step: 181980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:11:46,099-Speed 5467.20 samples/sec Loss 1.4602 LearningRate 0.0050 Epoch: 17 Global Step: 181990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:11:53,755-Speed 5350.82 samples/sec Loss 1.4646 LearningRate 0.0050 Epoch: 17 Global Step: 182000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:12:38,309-[lfw][182000]XNorm: 22.457532 Training: 2022-01-09 12:12:38,310-[lfw][182000]Accuracy-Flip: 0.99850+-0.00203 Training: 2022-01-09 12:12:38,311-[lfw][182000]Accuracy-Highest: 0.99850 Training: 2022-01-09 12:13:30,232-[cfp_fp][182000]XNorm: 21.930279 Training: 2022-01-09 12:13:30,233-[cfp_fp][182000]Accuracy-Flip: 0.99357+-0.00346 Training: 2022-01-09 12:13:30,233-[cfp_fp][182000]Accuracy-Highest: 0.99371 Training: 2022-01-09 12:14:14,765-[agedb_30][182000]XNorm: 22.879698 Training: 2022-01-09 12:14:14,766-[agedb_30][182000]Accuracy-Flip: 0.98283+-0.00568 Training: 2022-01-09 12:14:14,767-[agedb_30][182000]Accuracy-Highest: 0.98433 Training: 2022-01-09 12:14:22,289-Speed 275.76 samples/sec Loss 1.4693 LearningRate 0.0050 Epoch: 17 Global Step: 182010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:14:29,785-Speed 5464.51 samples/sec Loss 1.4609 LearningRate 0.0050 Epoch: 17 Global Step: 182020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:14:37,313-Speed 5441.54 samples/sec Loss 1.4732 LearningRate 0.0050 Epoch: 17 Global Step: 182030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:14:44,901-Speed 5398.74 samples/sec Loss 1.4625 LearningRate 0.0050 Epoch: 17 Global Step: 182040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:14:52,625-Speed 5304.03 samples/sec Loss 1.4690 LearningRate 0.0050 Epoch: 17 Global Step: 182050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:15:00,207-Speed 5402.53 samples/sec Loss 1.4610 LearningRate 0.0050 Epoch: 17 Global Step: 182060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:15:07,702-Speed 5465.67 samples/sec Loss 1.4425 LearningRate 0.0050 Epoch: 17 Global Step: 182070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:15:15,211-Speed 5455.56 samples/sec Loss 1.4419 LearningRate 0.0049 Epoch: 17 Global Step: 182080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:15:22,807-Speed 5393.02 samples/sec Loss 1.4523 LearningRate 0.0049 Epoch: 17 Global Step: 182090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:15:30,428-Speed 5375.03 samples/sec Loss 1.4682 LearningRate 0.0049 Epoch: 17 Global Step: 182100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:15:37,935-Speed 5457.46 samples/sec Loss 1.4547 LearningRate 0.0049 Epoch: 17 Global Step: 182110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:15:45,351-Speed 5523.42 samples/sec Loss 1.4499 LearningRate 0.0049 Epoch: 17 Global Step: 182120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:15:52,846-Speed 5465.74 samples/sec Loss 1.4492 LearningRate 0.0049 Epoch: 17 Global Step: 182130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:16:00,372-Speed 5443.16 samples/sec Loss 1.4434 LearningRate 0.0049 Epoch: 17 Global Step: 182140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:16:07,822-Speed 5499.03 samples/sec Loss 1.4613 LearningRate 0.0049 Epoch: 17 Global Step: 182150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:16:15,289-Speed 5486.15 samples/sec Loss 1.4464 LearningRate 0.0049 Epoch: 17 Global Step: 182160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:16:22,790-Speed 5461.09 samples/sec Loss 1.4571 LearningRate 0.0049 Epoch: 17 Global Step: 182170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:16:30,345-Speed 5422.27 samples/sec Loss 1.4588 LearningRate 0.0049 Epoch: 17 Global Step: 182180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:16:37,833-Speed 5471.21 samples/sec Loss 1.4454 LearningRate 0.0049 Epoch: 17 Global Step: 182190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:16:45,317-Speed 5473.38 samples/sec Loss 1.4423 LearningRate 0.0049 Epoch: 17 Global Step: 182200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:16:52,801-Speed 5473.81 samples/sec Loss 1.4431 LearningRate 0.0049 Epoch: 17 Global Step: 182210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:17:00,234-Speed 5511.28 samples/sec Loss 1.4525 LearningRate 0.0049 Epoch: 17 Global Step: 182220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:17:07,728-Speed 5466.70 samples/sec Loss 1.4412 LearningRate 0.0049 Epoch: 17 Global Step: 182230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:17:15,166-Speed 5507.41 samples/sec Loss 1.4671 LearningRate 0.0049 Epoch: 17 Global Step: 182240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:17:22,613-Speed 5501.06 samples/sec Loss 1.4540 LearningRate 0.0049 Epoch: 17 Global Step: 182250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:17:30,054-Speed 5504.96 samples/sec Loss 1.4312 LearningRate 0.0049 Epoch: 17 Global Step: 182260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:17:37,606-Speed 5423.95 samples/sec Loss 1.4667 LearningRate 0.0049 Epoch: 17 Global Step: 182270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:17:45,046-Speed 5506.22 samples/sec Loss 1.4470 LearningRate 0.0049 Epoch: 17 Global Step: 182280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:17:52,540-Speed 5466.31 samples/sec Loss 1.4402 LearningRate 0.0049 Epoch: 17 Global Step: 182290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:18:00,068-Speed 5441.78 samples/sec Loss 1.4390 LearningRate 0.0049 Epoch: 17 Global Step: 182300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:18:07,520-Speed 5497.47 samples/sec Loss 1.4320 LearningRate 0.0049 Epoch: 17 Global Step: 182310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:18:14,988-Speed 5485.41 samples/sec Loss 1.4421 LearningRate 0.0049 Epoch: 17 Global Step: 182320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:18:22,476-Speed 5470.65 samples/sec Loss 1.4458 LearningRate 0.0049 Epoch: 17 Global Step: 182330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:18:30,029-Speed 5423.88 samples/sec Loss 1.4620 LearningRate 0.0048 Epoch: 17 Global Step: 182340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:18:37,635-Speed 5386.35 samples/sec Loss 1.4447 LearningRate 0.0048 Epoch: 17 Global Step: 182350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:18:45,211-Speed 5407.20 samples/sec Loss 1.4623 LearningRate 0.0048 Epoch: 17 Global Step: 182360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:18:52,775-Speed 5416.29 samples/sec Loss 1.4465 LearningRate 0.0048 Epoch: 17 Global Step: 182370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:19:00,272-Speed 5463.64 samples/sec Loss 1.4357 LearningRate 0.0048 Epoch: 17 Global Step: 182380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:19:07,896-Speed 5373.08 samples/sec Loss 1.4138 LearningRate 0.0048 Epoch: 17 Global Step: 182390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:19:15,400-Speed 5458.81 samples/sec Loss 1.4525 LearningRate 0.0048 Epoch: 17 Global Step: 182400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:19:23,018-Speed 5377.73 samples/sec Loss 1.4320 LearningRate 0.0048 Epoch: 17 Global Step: 182410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:19:30,613-Speed 5393.48 samples/sec Loss 1.4408 LearningRate 0.0048 Epoch: 17 Global Step: 182420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:19:38,263-Speed 5355.11 samples/sec Loss 1.4163 LearningRate 0.0048 Epoch: 17 Global Step: 182430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:19:45,805-Speed 5431.68 samples/sec Loss 1.4329 LearningRate 0.0048 Epoch: 17 Global Step: 182440 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:19:53,319-Speed 5452.27 samples/sec Loss 1.4207 LearningRate 0.0048 Epoch: 17 Global Step: 182450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:20:00,889-Speed 5411.08 samples/sec Loss 1.4451 LearningRate 0.0048 Epoch: 17 Global Step: 182460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:20:08,370-Speed 5475.33 samples/sec Loss 1.4301 LearningRate 0.0048 Epoch: 17 Global Step: 182470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:20:15,922-Speed 5425.01 samples/sec Loss 1.4424 LearningRate 0.0048 Epoch: 17 Global Step: 182480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:20:23,340-Speed 5522.73 samples/sec Loss 1.4502 LearningRate 0.0048 Epoch: 17 Global Step: 182490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:20:30,833-Speed 5466.95 samples/sec Loss 1.4508 LearningRate 0.0048 Epoch: 17 Global Step: 182500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:20:38,365-Speed 5438.13 samples/sec Loss 1.4320 LearningRate 0.0048 Epoch: 17 Global Step: 182510 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:20:46,034-Speed 5342.19 samples/sec Loss 1.4190 LearningRate 0.0048 Epoch: 17 Global Step: 182520 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:20:53,538-Speed 5459.06 samples/sec Loss 1.4417 LearningRate 0.0048 Epoch: 17 Global Step: 182530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:21:00,990-Speed 5497.55 samples/sec Loss 1.4499 LearningRate 0.0048 Epoch: 17 Global Step: 182540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:21:08,458-Speed 5485.05 samples/sec Loss 1.4110 LearningRate 0.0048 Epoch: 17 Global Step: 182550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:21:15,984-Speed 5443.23 samples/sec Loss 1.4065 LearningRate 0.0048 Epoch: 17 Global Step: 182560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:21:23,365-Speed 5552.30 samples/sec Loss 1.4406 LearningRate 0.0048 Epoch: 17 Global Step: 182570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:21:30,912-Speed 5427.88 samples/sec Loss 1.4047 LearningRate 0.0048 Epoch: 17 Global Step: 182580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:21:38,459-Speed 5427.44 samples/sec Loss 1.4333 LearningRate 0.0047 Epoch: 17 Global Step: 182590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:21:46,055-Speed 5393.00 samples/sec Loss 1.4607 LearningRate 0.0047 Epoch: 17 Global Step: 182600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:21:53,523-Speed 5485.85 samples/sec Loss 1.4600 LearningRate 0.0047 Epoch: 17 Global Step: 182610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:22:00,992-Speed 5484.89 samples/sec Loss 1.4307 LearningRate 0.0047 Epoch: 17 Global Step: 182620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:22:08,472-Speed 5476.44 samples/sec Loss 1.4341 LearningRate 0.0047 Epoch: 17 Global Step: 182630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:22:15,920-Speed 5499.88 samples/sec Loss 1.4368 LearningRate 0.0047 Epoch: 17 Global Step: 182640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:22:23,418-Speed 5463.62 samples/sec Loss 1.3987 LearningRate 0.0047 Epoch: 17 Global Step: 182650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:22:31,068-Speed 5355.30 samples/sec Loss 1.4358 LearningRate 0.0047 Epoch: 17 Global Step: 182660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:22:38,635-Speed 5412.85 samples/sec Loss 1.4324 LearningRate 0.0047 Epoch: 17 Global Step: 182670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:22:46,156-Speed 5447.34 samples/sec Loss 1.4159 LearningRate 0.0047 Epoch: 17 Global Step: 182680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:22:53,673-Speed 5449.42 samples/sec Loss 1.3919 LearningRate 0.0047 Epoch: 17 Global Step: 182690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:23:01,163-Speed 5469.18 samples/sec Loss 1.4277 LearningRate 0.0047 Epoch: 17 Global Step: 182700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:23:08,630-Speed 5486.26 samples/sec Loss 1.4276 LearningRate 0.0047 Epoch: 17 Global Step: 182710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:23:16,109-Speed 5477.14 samples/sec Loss 1.4231 LearningRate 0.0047 Epoch: 17 Global Step: 182720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:23:23,631-Speed 5446.52 samples/sec Loss 1.4346 LearningRate 0.0047 Epoch: 17 Global Step: 182730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:23:31,126-Speed 5465.72 samples/sec Loss 1.4126 LearningRate 0.0047 Epoch: 17 Global Step: 182740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:23:38,683-Speed 5420.35 samples/sec Loss 1.4154 LearningRate 0.0047 Epoch: 17 Global Step: 182750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:23:46,222-Speed 5433.93 samples/sec Loss 1.4210 LearningRate 0.0047 Epoch: 17 Global Step: 182760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:23:53,711-Speed 5470.13 samples/sec Loss 1.4044 LearningRate 0.0047 Epoch: 17 Global Step: 182770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:24:01,226-Speed 5451.15 samples/sec Loss 1.4413 LearningRate 0.0047 Epoch: 17 Global Step: 182780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:24:08,738-Speed 5453.21 samples/sec Loss 1.3911 LearningRate 0.0047 Epoch: 17 Global Step: 182790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:24:16,240-Speed 5460.91 samples/sec Loss 1.4264 LearningRate 0.0047 Epoch: 17 Global Step: 182800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:24:23,775-Speed 5436.75 samples/sec Loss 1.4213 LearningRate 0.0047 Epoch: 17 Global Step: 182810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:24:31,280-Speed 5458.44 samples/sec Loss 1.4251 LearningRate 0.0047 Epoch: 17 Global Step: 182820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:24:38,883-Speed 5388.03 samples/sec Loss 1.4282 LearningRate 0.0047 Epoch: 17 Global Step: 182830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:24:46,355-Speed 5482.29 samples/sec Loss 1.4351 LearningRate 0.0047 Epoch: 17 Global Step: 182840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:24:53,935-Speed 5404.64 samples/sec Loss 1.3935 LearningRate 0.0047 Epoch: 17 Global Step: 182850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:25:01,434-Speed 5463.36 samples/sec Loss 1.4155 LearningRate 0.0046 Epoch: 17 Global Step: 182860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:25:08,918-Speed 5473.07 samples/sec Loss 1.4313 LearningRate 0.0046 Epoch: 17 Global Step: 182870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:25:16,454-Speed 5435.88 samples/sec Loss 1.4268 LearningRate 0.0046 Epoch: 17 Global Step: 182880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:25:23,962-Speed 5456.76 samples/sec Loss 1.4204 LearningRate 0.0046 Epoch: 17 Global Step: 182890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:25:31,536-Speed 5408.64 samples/sec Loss 1.4098 LearningRate 0.0046 Epoch: 17 Global Step: 182900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:25:39,024-Speed 5470.57 samples/sec Loss 1.4075 LearningRate 0.0046 Epoch: 17 Global Step: 182910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:25:46,580-Speed 5421.91 samples/sec Loss 1.3994 LearningRate 0.0046 Epoch: 17 Global Step: 182920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:25:54,016-Speed 5509.27 samples/sec Loss 1.4124 LearningRate 0.0046 Epoch: 17 Global Step: 182930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:26:01,446-Speed 5513.90 samples/sec Loss 1.3914 LearningRate 0.0046 Epoch: 17 Global Step: 182940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:26:09,081-Speed 5364.87 samples/sec Loss 1.4161 LearningRate 0.0046 Epoch: 17 Global Step: 182950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:26:16,577-Speed 5465.10 samples/sec Loss 1.4188 LearningRate 0.0046 Epoch: 17 Global Step: 182960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:26:24,029-Speed 5497.44 samples/sec Loss 1.3993 LearningRate 0.0046 Epoch: 17 Global Step: 182970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:26:31,481-Speed 5497.40 samples/sec Loss 1.4144 LearningRate 0.0046 Epoch: 17 Global Step: 182980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:26:38,984-Speed 5459.99 samples/sec Loss 1.4116 LearningRate 0.0046 Epoch: 17 Global Step: 182990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:26:46,438-Speed 5495.55 samples/sec Loss 1.4163 LearningRate 0.0046 Epoch: 17 Global Step: 183000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:26:54,006-Speed 5412.93 samples/sec Loss 1.4164 LearningRate 0.0046 Epoch: 17 Global Step: 183010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:27:01,550-Speed 5430.62 samples/sec Loss 1.4063 LearningRate 0.0046 Epoch: 17 Global Step: 183020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:27:09,155-Speed 5386.61 samples/sec Loss 1.4232 LearningRate 0.0046 Epoch: 17 Global Step: 183030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:27:16,827-Speed 5339.49 samples/sec Loss 1.4014 LearningRate 0.0046 Epoch: 17 Global Step: 183040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:27:24,372-Speed 5429.62 samples/sec Loss 1.4120 LearningRate 0.0046 Epoch: 17 Global Step: 183050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:27:31,844-Speed 5482.83 samples/sec Loss 1.4212 LearningRate 0.0046 Epoch: 17 Global Step: 183060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:27:39,419-Speed 5407.78 samples/sec Loss 1.4342 LearningRate 0.0046 Epoch: 17 Global Step: 183070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:27:46,905-Speed 5472.45 samples/sec Loss 1.4048 LearningRate 0.0046 Epoch: 17 Global Step: 183080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:27:54,492-Speed 5399.46 samples/sec Loss 1.4115 LearningRate 0.0046 Epoch: 17 Global Step: 183090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:28:02,064-Speed 5410.06 samples/sec Loss 1.4037 LearningRate 0.0046 Epoch: 17 Global Step: 183100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-09 12:28:09,603-Speed 5434.12 samples/sec Loss 1.3857 LearningRate 0.0046 Epoch: 17 Global Step: 183110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:28:17,183-Speed 5404.53 samples/sec Loss 1.3980 LearningRate 0.0045 Epoch: 17 Global Step: 183120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:28:24,801-Speed 5376.85 samples/sec Loss 1.4186 LearningRate 0.0045 Epoch: 17 Global Step: 183130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:28:32,468-Speed 5343.73 samples/sec Loss 1.4067 LearningRate 0.0045 Epoch: 17 Global Step: 183140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:28:40,017-Speed 5426.88 samples/sec Loss 1.4150 LearningRate 0.0045 Epoch: 17 Global Step: 183150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:28:47,650-Speed 5366.57 samples/sec Loss 1.4188 LearningRate 0.0045 Epoch: 17 Global Step: 183160 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:28:55,317-Speed 5343.20 samples/sec Loss 1.3997 LearningRate 0.0045 Epoch: 17 Global Step: 183170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:29:02,867-Speed 5426.16 samples/sec Loss 1.4177 LearningRate 0.0045 Epoch: 17 Global Step: 183180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:29:10,455-Speed 5398.28 samples/sec Loss 1.3866 LearningRate 0.0045 Epoch: 17 Global Step: 183190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:29:18,050-Speed 5393.80 samples/sec Loss 1.4070 LearningRate 0.0045 Epoch: 17 Global Step: 183200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:29:25,628-Speed 5405.92 samples/sec Loss 1.4011 LearningRate 0.0045 Epoch: 17 Global Step: 183210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:29:33,199-Speed 5411.19 samples/sec Loss 1.4014 LearningRate 0.0045 Epoch: 17 Global Step: 183220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:29:40,717-Speed 5448.89 samples/sec Loss 1.4206 LearningRate 0.0045 Epoch: 17 Global Step: 183230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:29:48,371-Speed 5351.90 samples/sec Loss 1.4151 LearningRate 0.0045 Epoch: 17 Global Step: 183240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:29:55,911-Speed 5433.23 samples/sec Loss 1.3951 LearningRate 0.0045 Epoch: 17 Global Step: 183250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:30:03,494-Speed 5402.14 samples/sec Loss 1.3847 LearningRate 0.0045 Epoch: 17 Global Step: 183260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:30:11,090-Speed 5393.47 samples/sec Loss 1.3966 LearningRate 0.0045 Epoch: 17 Global Step: 183270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:30:18,568-Speed 5478.05 samples/sec Loss 1.3934 LearningRate 0.0045 Epoch: 17 Global Step: 183280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:30:26,218-Speed 5355.12 samples/sec Loss 1.3681 LearningRate 0.0045 Epoch: 17 Global Step: 183290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:30:33,930-Speed 5311.76 samples/sec Loss 1.3785 LearningRate 0.0045 Epoch: 17 Global Step: 183300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-09 12:30:41,480-Speed 5426.01 samples/sec Loss 1.3893 LearningRate 0.0045 Epoch: 17 Global Step: 183310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:30:49,129-Speed 5355.27 samples/sec Loss 1.4116 LearningRate 0.0045 Epoch: 17 Global Step: 183320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:30:56,762-Speed 5366.63 samples/sec Loss 1.4045 LearningRate 0.0045 Epoch: 17 Global Step: 183330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:31:04,474-Speed 5312.60 samples/sec Loss 1.4019 LearningRate 0.0045 Epoch: 17 Global Step: 183340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:31:12,039-Speed 5415.27 samples/sec Loss 1.4224 LearningRate 0.0045 Epoch: 17 Global Step: 183350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:31:19,615-Speed 5407.06 samples/sec Loss 1.4079 LearningRate 0.0045 Epoch: 17 Global Step: 183360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-09 12:31:27,098-Speed 5473.82 samples/sec Loss 1.3982 LearningRate 0.0045 Epoch: 17 Global Step: 183370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:31:34,654-Speed 5421.75 samples/sec Loss 1.3862 LearningRate 0.0045 Epoch: 17 Global Step: 183380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:31:42,288-Speed 5366.72 samples/sec Loss 1.3975 LearningRate 0.0044 Epoch: 17 Global Step: 183390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:31:49,850-Speed 5416.95 samples/sec Loss 1.4057 LearningRate 0.0044 Epoch: 17 Global Step: 183400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:31:57,301-Speed 5497.48 samples/sec Loss 1.4173 LearningRate 0.0044 Epoch: 17 Global Step: 183410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:32:04,773-Speed 5482.44 samples/sec Loss 1.4017 LearningRate 0.0044 Epoch: 17 Global Step: 183420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:32:12,256-Speed 5474.86 samples/sec Loss 1.3675 LearningRate 0.0044 Epoch: 17 Global Step: 183430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:32:19,804-Speed 5427.08 samples/sec Loss 1.3973 LearningRate 0.0044 Epoch: 17 Global Step: 183440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:32:27,452-Speed 5356.38 samples/sec Loss 1.4055 LearningRate 0.0044 Epoch: 17 Global Step: 183450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:32:34,976-Speed 5444.80 samples/sec Loss 1.3877 LearningRate 0.0044 Epoch: 17 Global Step: 183460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:32:42,525-Speed 5426.93 samples/sec Loss 1.4116 LearningRate 0.0044 Epoch: 17 Global Step: 183470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:32:50,054-Speed 5440.73 samples/sec Loss 1.4269 LearningRate 0.0044 Epoch: 17 Global Step: 183480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:32:57,643-Speed 5397.99 samples/sec Loss 1.3950 LearningRate 0.0044 Epoch: 17 Global Step: 183490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:33:05,183-Speed 5433.08 samples/sec Loss 1.3969 LearningRate 0.0044 Epoch: 17 Global Step: 183500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:33:12,687-Speed 5459.19 samples/sec Loss 1.3872 LearningRate 0.0044 Epoch: 17 Global Step: 183510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:33:20,134-Speed 5501.20 samples/sec Loss 1.3830 LearningRate 0.0044 Epoch: 17 Global Step: 183520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:33:27,607-Speed 5481.62 samples/sec Loss 1.3750 LearningRate 0.0044 Epoch: 17 Global Step: 183530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:33:35,134-Speed 5441.96 samples/sec Loss 1.3739 LearningRate 0.0044 Epoch: 17 Global Step: 183540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:33:42,796-Speed 5347.27 samples/sec Loss 1.3895 LearningRate 0.0044 Epoch: 17 Global Step: 183550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:33:50,472-Speed 5336.49 samples/sec Loss 1.3947 LearningRate 0.0044 Epoch: 17 Global Step: 183560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:33:58,016-Speed 5429.92 samples/sec Loss 1.3600 LearningRate 0.0044 Epoch: 17 Global Step: 183570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:34:05,544-Speed 5441.98 samples/sec Loss 1.3955 LearningRate 0.0044 Epoch: 17 Global Step: 183580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:34:13,142-Speed 5391.40 samples/sec Loss 1.3814 LearningRate 0.0044 Epoch: 17 Global Step: 183590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:34:20,620-Speed 5478.36 samples/sec Loss 1.4128 LearningRate 0.0044 Epoch: 17 Global Step: 183600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:34:28,181-Speed 5417.26 samples/sec Loss 1.3700 LearningRate 0.0044 Epoch: 17 Global Step: 183610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:34:35,688-Speed 5457.33 samples/sec Loss 1.4034 LearningRate 0.0044 Epoch: 17 Global Step: 183620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:34:43,210-Speed 5446.33 samples/sec Loss 1.3859 LearningRate 0.0044 Epoch: 17 Global Step: 183630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:34:50,829-Speed 5377.15 samples/sec Loss 1.3998 LearningRate 0.0044 Epoch: 17 Global Step: 183640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:34:58,365-Speed 5435.19 samples/sec Loss 1.4052 LearningRate 0.0044 Epoch: 17 Global Step: 183650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:35:05,874-Speed 5455.61 samples/sec Loss 1.3808 LearningRate 0.0043 Epoch: 17 Global Step: 183660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:35:13,416-Speed 5432.45 samples/sec Loss 1.3789 LearningRate 0.0043 Epoch: 17 Global Step: 183670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:35:20,925-Speed 5455.77 samples/sec Loss 1.3692 LearningRate 0.0043 Epoch: 17 Global Step: 183680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:35:28,574-Speed 5355.00 samples/sec Loss 1.3794 LearningRate 0.0043 Epoch: 17 Global Step: 183690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:35:36,134-Speed 5418.70 samples/sec Loss 1.3746 LearningRate 0.0043 Epoch: 17 Global Step: 183700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:35:43,653-Speed 5448.47 samples/sec Loss 1.3992 LearningRate 0.0043 Epoch: 17 Global Step: 183710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:35:51,224-Speed 5410.83 samples/sec Loss 1.3715 LearningRate 0.0043 Epoch: 17 Global Step: 183720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:35:58,794-Speed 5411.29 samples/sec Loss 1.3734 LearningRate 0.0043 Epoch: 17 Global Step: 183730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:36:06,441-Speed 5357.44 samples/sec Loss 1.3874 LearningRate 0.0043 Epoch: 17 Global Step: 183740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:36:13,971-Speed 5440.73 samples/sec Loss 1.3855 LearningRate 0.0043 Epoch: 17 Global Step: 183750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:36:21,461-Speed 5469.09 samples/sec Loss 1.3720 LearningRate 0.0043 Epoch: 17 Global Step: 183760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:36:29,036-Speed 5408.04 samples/sec Loss 1.3688 LearningRate 0.0043 Epoch: 17 Global Step: 183770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:36:36,643-Speed 5385.20 samples/sec Loss 1.3648 LearningRate 0.0043 Epoch: 17 Global Step: 183780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:36:44,305-Speed 5346.87 samples/sec Loss 1.3705 LearningRate 0.0043 Epoch: 17 Global Step: 183790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:36:51,823-Speed 5449.29 samples/sec Loss 1.3665 LearningRate 0.0043 Epoch: 17 Global Step: 183800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:36:59,391-Speed 5412.78 samples/sec Loss 1.3668 LearningRate 0.0043 Epoch: 17 Global Step: 183810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:37:06,927-Speed 5435.72 samples/sec Loss 1.3800 LearningRate 0.0043 Epoch: 17 Global Step: 183820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:37:14,454-Speed 5442.59 samples/sec Loss 1.3902 LearningRate 0.0043 Epoch: 17 Global Step: 183830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:37:21,955-Speed 5461.94 samples/sec Loss 1.3589 LearningRate 0.0043 Epoch: 17 Global Step: 183840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:37:29,421-Speed 5486.56 samples/sec Loss 1.3634 LearningRate 0.0043 Epoch: 17 Global Step: 183850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:37:37,125-Speed 5317.30 samples/sec Loss 1.3795 LearningRate 0.0043 Epoch: 17 Global Step: 183860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:37:44,651-Speed 5442.99 samples/sec Loss 1.3561 LearningRate 0.0043 Epoch: 17 Global Step: 183870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:37:52,122-Speed 5484.06 samples/sec Loss 1.3594 LearningRate 0.0043 Epoch: 17 Global Step: 183880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:37:59,682-Speed 5418.02 samples/sec Loss 1.3734 LearningRate 0.0043 Epoch: 17 Global Step: 183890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:38:07,150-Speed 5485.68 samples/sec Loss 1.3836 LearningRate 0.0043 Epoch: 17 Global Step: 183900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:38:14,636-Speed 5472.39 samples/sec Loss 1.3799 LearningRate 0.0043 Epoch: 17 Global Step: 183910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:38:22,279-Speed 5359.78 samples/sec Loss 1.3771 LearningRate 0.0043 Epoch: 17 Global Step: 183920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:38:29,756-Speed 5478.81 samples/sec Loss 1.3675 LearningRate 0.0043 Epoch: 17 Global Step: 183930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:38:37,349-Speed 5394.49 samples/sec Loss 1.3797 LearningRate 0.0042 Epoch: 17 Global Step: 183940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:38:44,868-Speed 5449.07 samples/sec Loss 1.3978 LearningRate 0.0042 Epoch: 17 Global Step: 183950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:38:52,325-Speed 5493.15 samples/sec Loss 1.3873 LearningRate 0.0042 Epoch: 17 Global Step: 183960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:38:59,852-Speed 5442.63 samples/sec Loss 1.3522 LearningRate 0.0042 Epoch: 17 Global Step: 183970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:39:07,336-Speed 5473.11 samples/sec Loss 1.3622 LearningRate 0.0042 Epoch: 17 Global Step: 183980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:39:14,895-Speed 5420.30 samples/sec Loss 1.3788 LearningRate 0.0042 Epoch: 17 Global Step: 183990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:39:22,361-Speed 5486.72 samples/sec Loss 1.3674 LearningRate 0.0042 Epoch: 17 Global Step: 184000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:40:06,627-[lfw][184000]XNorm: 22.478137 Training: 2022-01-09 12:40:06,628-[lfw][184000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-01-09 12:40:06,628-[lfw][184000]Accuracy-Highest: 0.99850 Training: 2022-01-09 12:40:58,147-[cfp_fp][184000]XNorm: 22.006548 Training: 2022-01-09 12:40:58,148-[cfp_fp][184000]Accuracy-Flip: 0.99343+-0.00395 Training: 2022-01-09 12:40:58,148-[cfp_fp][184000]Accuracy-Highest: 0.99371 Training: 2022-01-09 12:41:42,445-[agedb_30][184000]XNorm: 22.937021 Training: 2022-01-09 12:41:42,446-[agedb_30][184000]Accuracy-Flip: 0.98500+-0.00587 Training: 2022-01-09 12:41:42,446-[agedb_30][184000]Accuracy-Highest: 0.98500 Training: 2022-01-09 12:41:49,952-Speed 277.53 samples/sec Loss 1.3797 LearningRate 0.0042 Epoch: 17 Global Step: 184010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:41:57,380-Speed 5515.04 samples/sec Loss 1.3692 LearningRate 0.0042 Epoch: 17 Global Step: 184020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:42:04,996-Speed 5378.68 samples/sec Loss 1.3430 LearningRate 0.0042 Epoch: 17 Global Step: 184030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:42:12,500-Speed 5459.06 samples/sec Loss 1.3531 LearningRate 0.0042 Epoch: 17 Global Step: 184040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:42:19,897-Speed 5537.91 samples/sec Loss 1.3564 LearningRate 0.0042 Epoch: 17 Global Step: 184050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:42:27,349-Speed 5497.77 samples/sec Loss 1.3929 LearningRate 0.0042 Epoch: 17 Global Step: 184060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:42:34,739-Speed 5543.22 samples/sec Loss 1.3655 LearningRate 0.0042 Epoch: 17 Global Step: 184070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:42:42,198-Speed 5491.66 samples/sec Loss 1.3527 LearningRate 0.0042 Epoch: 17 Global Step: 184080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:42:49,727-Speed 5441.60 samples/sec Loss 1.3556 LearningRate 0.0042 Epoch: 17 Global Step: 184090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:42:57,209-Speed 5475.55 samples/sec Loss 1.3741 LearningRate 0.0042 Epoch: 17 Global Step: 184100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:43:04,712-Speed 5459.59 samples/sec Loss 1.3666 LearningRate 0.0042 Epoch: 17 Global Step: 184110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:43:12,204-Speed 5467.41 samples/sec Loss 1.3646 LearningRate 0.0042 Epoch: 17 Global Step: 184120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:43:19,672-Speed 5486.17 samples/sec Loss 1.3524 LearningRate 0.0042 Epoch: 17 Global Step: 184130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:43:27,173-Speed 5461.58 samples/sec Loss 1.3868 LearningRate 0.0042 Epoch: 17 Global Step: 184140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:43:34,645-Speed 5481.94 samples/sec Loss 1.3699 LearningRate 0.0042 Epoch: 17 Global Step: 184150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:43:42,203-Speed 5420.30 samples/sec Loss 1.3473 LearningRate 0.0042 Epoch: 17 Global Step: 184160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:43:49,661-Speed 5492.85 samples/sec Loss 1.3535 LearningRate 0.0042 Epoch: 17 Global Step: 184170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:43:57,268-Speed 5385.29 samples/sec Loss 1.3306 LearningRate 0.0042 Epoch: 17 Global Step: 184180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:44:04,797-Speed 5441.22 samples/sec Loss 1.3438 LearningRate 0.0042 Epoch: 17 Global Step: 184190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:44:12,422-Speed 5372.00 samples/sec Loss 1.3803 LearningRate 0.0042 Epoch: 17 Global Step: 184200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:44:19,881-Speed 5493.04 samples/sec Loss 1.3743 LearningRate 0.0041 Epoch: 17 Global Step: 184210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:44:27,343-Speed 5489.69 samples/sec Loss 1.3645 LearningRate 0.0041 Epoch: 17 Global Step: 184220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:44:34,820-Speed 5478.76 samples/sec Loss 1.3753 LearningRate 0.0041 Epoch: 17 Global Step: 184230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:44:42,323-Speed 5460.05 samples/sec Loss 1.3783 LearningRate 0.0041 Epoch: 17 Global Step: 184240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:44:49,873-Speed 5425.67 samples/sec Loss 1.3999 LearningRate 0.0041 Epoch: 17 Global Step: 184250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:44:57,355-Speed 5475.49 samples/sec Loss 1.3597 LearningRate 0.0041 Epoch: 17 Global Step: 184260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:45:04,781-Speed 5516.71 samples/sec Loss 1.3632 LearningRate 0.0041 Epoch: 17 Global Step: 184270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:45:12,237-Speed 5494.15 samples/sec Loss 1.3597 LearningRate 0.0041 Epoch: 17 Global Step: 184280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:45:19,661-Speed 5518.60 samples/sec Loss 1.3740 LearningRate 0.0041 Epoch: 17 Global Step: 184290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:45:27,159-Speed 5463.31 samples/sec Loss 1.3535 LearningRate 0.0041 Epoch: 17 Global Step: 184300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:45:34,632-Speed 5481.76 samples/sec Loss 1.3616 LearningRate 0.0041 Epoch: 17 Global Step: 184310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:45:42,097-Speed 5487.52 samples/sec Loss 1.3516 LearningRate 0.0041 Epoch: 17 Global Step: 184320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:45:49,513-Speed 5523.81 samples/sec Loss 1.3971 LearningRate 0.0041 Epoch: 17 Global Step: 184330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:45:56,945-Speed 5512.06 samples/sec Loss 1.3534 LearningRate 0.0041 Epoch: 17 Global Step: 184340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:46:04,493-Speed 5427.50 samples/sec Loss 1.3674 LearningRate 0.0041 Epoch: 17 Global Step: 184350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:46:11,930-Speed 5508.52 samples/sec Loss 1.3429 LearningRate 0.0041 Epoch: 17 Global Step: 184360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:46:19,377-Speed 5500.61 samples/sec Loss 1.3376 LearningRate 0.0041 Epoch: 17 Global Step: 184370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:46:26,958-Speed 5404.17 samples/sec Loss 1.3708 LearningRate 0.0041 Epoch: 17 Global Step: 184380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:46:34,544-Speed 5400.53 samples/sec Loss 1.3731 LearningRate 0.0041 Epoch: 17 Global Step: 184390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:46:42,148-Speed 5387.02 samples/sec Loss 1.3554 LearningRate 0.0041 Epoch: 17 Global Step: 184400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:46:49,690-Speed 5431.30 samples/sec Loss 1.3519 LearningRate 0.0041 Epoch: 17 Global Step: 184410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:46:57,188-Speed 5463.90 samples/sec Loss 1.3434 LearningRate 0.0041 Epoch: 17 Global Step: 184420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:47:04,813-Speed 5373.00 samples/sec Loss 1.3551 LearningRate 0.0041 Epoch: 17 Global Step: 184430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:47:12,402-Speed 5397.88 samples/sec Loss 1.3593 LearningRate 0.0041 Epoch: 17 Global Step: 184440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:47:19,992-Speed 5396.52 samples/sec Loss 1.3653 LearningRate 0.0041 Epoch: 17 Global Step: 184450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:47:27,483-Speed 5469.21 samples/sec Loss 1.3386 LearningRate 0.0041 Epoch: 17 Global Step: 184460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:47:35,031-Speed 5427.47 samples/sec Loss 1.3440 LearningRate 0.0041 Epoch: 17 Global Step: 184470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:47:42,532-Speed 5460.81 samples/sec Loss 1.3589 LearningRate 0.0041 Epoch: 17 Global Step: 184480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:47:50,015-Speed 5474.21 samples/sec Loss 1.3675 LearningRate 0.0040 Epoch: 17 Global Step: 184490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:47:57,576-Speed 5418.57 samples/sec Loss 1.3306 LearningRate 0.0040 Epoch: 17 Global Step: 184500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:48:05,065-Speed 5469.30 samples/sec Loss 1.3666 LearningRate 0.0040 Epoch: 17 Global Step: 184510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:48:12,668-Speed 5388.51 samples/sec Loss 1.3589 LearningRate 0.0040 Epoch: 17 Global Step: 184520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:48:20,381-Speed 5310.96 samples/sec Loss 1.3236 LearningRate 0.0040 Epoch: 17 Global Step: 184530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:48:27,846-Speed 5487.75 samples/sec Loss 1.3435 LearningRate 0.0040 Epoch: 17 Global Step: 184540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:48:35,298-Speed 5497.28 samples/sec Loss 1.3597 LearningRate 0.0040 Epoch: 17 Global Step: 184550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:48:42,850-Speed 5424.66 samples/sec Loss 1.3551 LearningRate 0.0040 Epoch: 17 Global Step: 184560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:48:50,370-Speed 5447.46 samples/sec Loss 1.3470 LearningRate 0.0040 Epoch: 17 Global Step: 184570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:48:57,966-Speed 5392.82 samples/sec Loss 1.3228 LearningRate 0.0040 Epoch: 17 Global Step: 184580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:49:05,454-Speed 5471.37 samples/sec Loss 1.3516 LearningRate 0.0040 Epoch: 17 Global Step: 184590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:49:12,945-Speed 5468.03 samples/sec Loss 1.3572 LearningRate 0.0040 Epoch: 17 Global Step: 184600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:49:20,624-Speed 5335.15 samples/sec Loss 1.3421 LearningRate 0.0040 Epoch: 17 Global Step: 184610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:49:28,237-Speed 5380.69 samples/sec Loss 1.3739 LearningRate 0.0040 Epoch: 17 Global Step: 184620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:49:35,765-Speed 5441.97 samples/sec Loss 1.3592 LearningRate 0.0040 Epoch: 17 Global Step: 184630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:49:43,572-Speed 5247.10 samples/sec Loss 1.3250 LearningRate 0.0040 Epoch: 17 Global Step: 184640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:49:51,130-Speed 5419.95 samples/sec Loss 1.3341 LearningRate 0.0040 Epoch: 17 Global Step: 184650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:49:58,655-Speed 5444.37 samples/sec Loss 1.3560 LearningRate 0.0040 Epoch: 17 Global Step: 184660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:50:06,211-Speed 5421.49 samples/sec Loss 1.3220 LearningRate 0.0040 Epoch: 17 Global Step: 184670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:50:13,691-Speed 5476.20 samples/sec Loss 1.3649 LearningRate 0.0040 Epoch: 17 Global Step: 184680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:50:21,279-Speed 5399.16 samples/sec Loss 1.3431 LearningRate 0.0040 Epoch: 17 Global Step: 184690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:50:28,730-Speed 5498.27 samples/sec Loss 1.3484 LearningRate 0.0040 Epoch: 17 Global Step: 184700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:50:36,227-Speed 5464.07 samples/sec Loss 1.3343 LearningRate 0.0040 Epoch: 17 Global Step: 184710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:50:43,710-Speed 5474.41 samples/sec Loss 1.3588 LearningRate 0.0040 Epoch: 17 Global Step: 184720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:50:51,236-Speed 5443.15 samples/sec Loss 1.3311 LearningRate 0.0040 Epoch: 17 Global Step: 184730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:50:58,791-Speed 5422.27 samples/sec Loss 1.3439 LearningRate 0.0040 Epoch: 17 Global Step: 184740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:51:06,311-Speed 5447.46 samples/sec Loss 1.3394 LearningRate 0.0040 Epoch: 17 Global Step: 184750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:51:13,783-Speed 5482.20 samples/sec Loss 1.3393 LearningRate 0.0040 Epoch: 17 Global Step: 184760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:51:21,329-Speed 5428.94 samples/sec Loss 1.3337 LearningRate 0.0040 Epoch: 17 Global Step: 184770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:51:28,803-Speed 5481.00 samples/sec Loss 1.3337 LearningRate 0.0039 Epoch: 17 Global Step: 184780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:51:36,379-Speed 5407.10 samples/sec Loss 1.3393 LearningRate 0.0039 Epoch: 17 Global Step: 184790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:51:43,978-Speed 5390.85 samples/sec Loss 1.3440 LearningRate 0.0039 Epoch: 17 Global Step: 184800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:51:51,638-Speed 5348.26 samples/sec Loss 1.3423 LearningRate 0.0039 Epoch: 17 Global Step: 184810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:51:59,197-Speed 5419.00 samples/sec Loss 1.3614 LearningRate 0.0039 Epoch: 17 Global Step: 184820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:52:06,841-Speed 5359.40 samples/sec Loss 1.3443 LearningRate 0.0039 Epoch: 17 Global Step: 184830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:52:14,460-Speed 5376.18 samples/sec Loss 1.3408 LearningRate 0.0039 Epoch: 17 Global Step: 184840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:52:22,027-Speed 5413.78 samples/sec Loss 1.3563 LearningRate 0.0039 Epoch: 17 Global Step: 184850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:52:29,655-Speed 5371.15 samples/sec Loss 1.3290 LearningRate 0.0039 Epoch: 17 Global Step: 184860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:52:37,169-Speed 5451.08 samples/sec Loss 1.3646 LearningRate 0.0039 Epoch: 17 Global Step: 184870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:52:44,648-Speed 5477.48 samples/sec Loss 1.3570 LearningRate 0.0039 Epoch: 17 Global Step: 184880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:52:52,156-Speed 5456.56 samples/sec Loss 1.3281 LearningRate 0.0039 Epoch: 17 Global Step: 184890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:52:59,607-Speed 5498.46 samples/sec Loss 1.3111 LearningRate 0.0039 Epoch: 17 Global Step: 184900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:53:07,083-Speed 5479.49 samples/sec Loss 1.3255 LearningRate 0.0039 Epoch: 17 Global Step: 184910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:53:14,565-Speed 5474.99 samples/sec Loss 1.3246 LearningRate 0.0039 Epoch: 17 Global Step: 184920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:53:22,080-Speed 5450.71 samples/sec Loss 1.3297 LearningRate 0.0039 Epoch: 17 Global Step: 184930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:53:29,766-Speed 5330.58 samples/sec Loss 1.3373 LearningRate 0.0039 Epoch: 17 Global Step: 184940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:53:37,276-Speed 5455.30 samples/sec Loss 1.3603 LearningRate 0.0039 Epoch: 17 Global Step: 184950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:53:44,865-Speed 5397.69 samples/sec Loss 1.3492 LearningRate 0.0039 Epoch: 17 Global Step: 184960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:53:52,458-Speed 5395.00 samples/sec Loss 1.3386 LearningRate 0.0039 Epoch: 17 Global Step: 184970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:54:00,078-Speed 5376.25 samples/sec Loss 1.3255 LearningRate 0.0039 Epoch: 17 Global Step: 184980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:54:07,641-Speed 5416.93 samples/sec Loss 1.3128 LearningRate 0.0039 Epoch: 17 Global Step: 184990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:54:15,183-Speed 5431.36 samples/sec Loss 1.3363 LearningRate 0.0039 Epoch: 17 Global Step: 185000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:54:22,717-Speed 5436.97 samples/sec Loss 1.3557 LearningRate 0.0039 Epoch: 17 Global Step: 185010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:54:30,191-Speed 5481.14 samples/sec Loss 1.3573 LearningRate 0.0039 Epoch: 17 Global Step: 185020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:54:37,622-Speed 5513.24 samples/sec Loss 1.3496 LearningRate 0.0039 Epoch: 17 Global Step: 185030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:54:45,111-Speed 5469.65 samples/sec Loss 1.3103 LearningRate 0.0039 Epoch: 17 Global Step: 185040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:54:52,599-Speed 5470.55 samples/sec Loss 1.3215 LearningRate 0.0039 Epoch: 17 Global Step: 185050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:55:00,079-Speed 5476.74 samples/sec Loss 1.3344 LearningRate 0.0039 Epoch: 17 Global Step: 185060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:55:07,627-Speed 5427.63 samples/sec Loss 1.3531 LearningRate 0.0038 Epoch: 17 Global Step: 185070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:55:15,192-Speed 5414.87 samples/sec Loss 1.3514 LearningRate 0.0038 Epoch: 17 Global Step: 185080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:55:22,717-Speed 5444.39 samples/sec Loss 1.3454 LearningRate 0.0038 Epoch: 17 Global Step: 185090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:55:30,195-Speed 5478.08 samples/sec Loss 1.3214 LearningRate 0.0038 Epoch: 17 Global Step: 185100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:55:37,652-Speed 5493.99 samples/sec Loss 1.3355 LearningRate 0.0038 Epoch: 17 Global Step: 185110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:55:45,170-Speed 5448.64 samples/sec Loss 1.3168 LearningRate 0.0038 Epoch: 17 Global Step: 185120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:55:52,823-Speed 5353.50 samples/sec Loss 1.3567 LearningRate 0.0038 Epoch: 17 Global Step: 185130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:56:00,411-Speed 5398.29 samples/sec Loss 1.3366 LearningRate 0.0038 Epoch: 17 Global Step: 185140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:56:07,936-Speed 5444.67 samples/sec Loss 1.3221 LearningRate 0.0038 Epoch: 17 Global Step: 185150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:56:15,403-Speed 5485.83 samples/sec Loss 1.3639 LearningRate 0.0038 Epoch: 17 Global Step: 185160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:56:22,898-Speed 5466.28 samples/sec Loss 1.3080 LearningRate 0.0038 Epoch: 17 Global Step: 185170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:56:30,471-Speed 5409.30 samples/sec Loss 1.3419 LearningRate 0.0038 Epoch: 17 Global Step: 185180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:56:38,022-Speed 5425.04 samples/sec Loss 1.3438 LearningRate 0.0038 Epoch: 17 Global Step: 185190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:56:45,617-Speed 5393.90 samples/sec Loss 1.3081 LearningRate 0.0038 Epoch: 17 Global Step: 185200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:56:53,181-Speed 5415.71 samples/sec Loss 1.3292 LearningRate 0.0038 Epoch: 17 Global Step: 185210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 12:57:00,717-Speed 5435.75 samples/sec Loss 1.3082 LearningRate 0.0038 Epoch: 17 Global Step: 185220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:57:08,335-Speed 5377.85 samples/sec Loss 1.3309 LearningRate 0.0038 Epoch: 17 Global Step: 185230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:57:15,842-Speed 5456.87 samples/sec Loss 1.3095 LearningRate 0.0038 Epoch: 17 Global Step: 185240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:57:23,304-Speed 5489.70 samples/sec Loss 1.3371 LearningRate 0.0038 Epoch: 17 Global Step: 185250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:57:30,761-Speed 5493.77 samples/sec Loss 1.2951 LearningRate 0.0038 Epoch: 17 Global Step: 185260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:57:38,205-Speed 5503.63 samples/sec Loss 1.3187 LearningRate 0.0038 Epoch: 17 Global Step: 185270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:57:45,671-Speed 5487.01 samples/sec Loss 1.3432 LearningRate 0.0038 Epoch: 17 Global Step: 185280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:57:53,114-Speed 5503.50 samples/sec Loss 1.3264 LearningRate 0.0038 Epoch: 17 Global Step: 185290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:58:00,621-Speed 5457.07 samples/sec Loss 1.3164 LearningRate 0.0038 Epoch: 17 Global Step: 185300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:58:08,097-Speed 5479.75 samples/sec Loss 1.3252 LearningRate 0.0038 Epoch: 17 Global Step: 185310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:58:15,605-Speed 5456.14 samples/sec Loss 1.3329 LearningRate 0.0038 Epoch: 17 Global Step: 185320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:58:23,104-Speed 5462.99 samples/sec Loss 1.2934 LearningRate 0.0038 Epoch: 17 Global Step: 185330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:58:30,626-Speed 5446.48 samples/sec Loss 1.2881 LearningRate 0.0038 Epoch: 17 Global Step: 185340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:58:38,089-Speed 5489.51 samples/sec Loss 1.2781 LearningRate 0.0038 Epoch: 17 Global Step: 185350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:58:45,586-Speed 5464.17 samples/sec Loss 1.3550 LearningRate 0.0037 Epoch: 17 Global Step: 185360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:58:53,079-Speed 5467.43 samples/sec Loss 1.3045 LearningRate 0.0037 Epoch: 17 Global Step: 185370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 12:59:00,589-Speed 5454.79 samples/sec Loss 1.3042 LearningRate 0.0037 Epoch: 17 Global Step: 185380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:59:08,072-Speed 5474.45 samples/sec Loss 1.3151 LearningRate 0.0037 Epoch: 17 Global Step: 185390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:59:15,660-Speed 5398.67 samples/sec Loss 1.3168 LearningRate 0.0037 Epoch: 17 Global Step: 185400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:59:23,141-Speed 5476.12 samples/sec Loss 1.3269 LearningRate 0.0037 Epoch: 17 Global Step: 185410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:59:30,650-Speed 5455.52 samples/sec Loss 1.3047 LearningRate 0.0037 Epoch: 17 Global Step: 185420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:59:38,165-Speed 5450.93 samples/sec Loss 1.3227 LearningRate 0.0037 Epoch: 17 Global Step: 185430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:59:45,592-Speed 5516.01 samples/sec Loss 1.3330 LearningRate 0.0037 Epoch: 17 Global Step: 185440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 12:59:53,172-Speed 5404.45 samples/sec Loss 1.3164 LearningRate 0.0037 Epoch: 17 Global Step: 185450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:00:00,738-Speed 5414.71 samples/sec Loss 1.2986 LearningRate 0.0037 Epoch: 17 Global Step: 185460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:00:08,236-Speed 5463.53 samples/sec Loss 1.3255 LearningRate 0.0037 Epoch: 17 Global Step: 185470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:00:15,765-Speed 5441.10 samples/sec Loss 1.3176 LearningRate 0.0037 Epoch: 17 Global Step: 185480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:00:23,289-Speed 5444.28 samples/sec Loss 1.3254 LearningRate 0.0037 Epoch: 17 Global Step: 185490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:00:30,752-Speed 5489.63 samples/sec Loss 1.3476 LearningRate 0.0037 Epoch: 17 Global Step: 185500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:00:38,189-Speed 5508.57 samples/sec Loss 1.2933 LearningRate 0.0037 Epoch: 17 Global Step: 185510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:00:45,696-Speed 5456.93 samples/sec Loss 1.2675 LearningRate 0.0037 Epoch: 17 Global Step: 185520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:00:53,358-Speed 5346.73 samples/sec Loss 1.3344 LearningRate 0.0037 Epoch: 17 Global Step: 185530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:01:00,841-Speed 5473.82 samples/sec Loss 1.2997 LearningRate 0.0037 Epoch: 17 Global Step: 185540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:01:08,397-Speed 5421.31 samples/sec Loss 1.3263 LearningRate 0.0037 Epoch: 17 Global Step: 185550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:01:15,945-Speed 5428.11 samples/sec Loss 1.2920 LearningRate 0.0037 Epoch: 17 Global Step: 185560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:01:23,495-Speed 5426.04 samples/sec Loss 1.3008 LearningRate 0.0037 Epoch: 17 Global Step: 185570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:01:31,018-Speed 5444.70 samples/sec Loss 1.3279 LearningRate 0.0037 Epoch: 17 Global Step: 185580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:01:38,538-Speed 5447.73 samples/sec Loss 1.3187 LearningRate 0.0037 Epoch: 17 Global Step: 185590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:01:46,240-Speed 5318.24 samples/sec Loss 1.3200 LearningRate 0.0037 Epoch: 17 Global Step: 185600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:01:53,833-Speed 5395.38 samples/sec Loss 1.3220 LearningRate 0.0037 Epoch: 17 Global Step: 185610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:02:01,436-Speed 5388.33 samples/sec Loss 1.3073 LearningRate 0.0037 Epoch: 17 Global Step: 185620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:02:08,959-Speed 5445.14 samples/sec Loss 1.3221 LearningRate 0.0037 Epoch: 17 Global Step: 185630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:02:16,482-Speed 5445.00 samples/sec Loss 1.3277 LearningRate 0.0037 Epoch: 17 Global Step: 185640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:02:23,988-Speed 5458.03 samples/sec Loss 1.3059 LearningRate 0.0036 Epoch: 17 Global Step: 185650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:02:31,582-Speed 5394.18 samples/sec Loss 1.2885 LearningRate 0.0036 Epoch: 17 Global Step: 185660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:02:39,173-Speed 5396.76 samples/sec Loss 1.3143 LearningRate 0.0036 Epoch: 17 Global Step: 185670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:02:46,667-Speed 5467.40 samples/sec Loss 1.3228 LearningRate 0.0036 Epoch: 17 Global Step: 185680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:02:54,221-Speed 5422.79 samples/sec Loss 1.3098 LearningRate 0.0036 Epoch: 17 Global Step: 185690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:03:01,731-Speed 5455.16 samples/sec Loss 1.2859 LearningRate 0.0036 Epoch: 17 Global Step: 185700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:03:09,147-Speed 5524.08 samples/sec Loss 1.2796 LearningRate 0.0036 Epoch: 17 Global Step: 185710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:03:16,692-Speed 5429.59 samples/sec Loss 1.3175 LearningRate 0.0036 Epoch: 17 Global Step: 185720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:03:24,227-Speed 5436.37 samples/sec Loss 1.3018 LearningRate 0.0036 Epoch: 17 Global Step: 185730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:03:31,757-Speed 5440.46 samples/sec Loss 1.3157 LearningRate 0.0036 Epoch: 17 Global Step: 185740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:03:39,229-Speed 5482.74 samples/sec Loss 1.3347 LearningRate 0.0036 Epoch: 17 Global Step: 185750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:03:46,744-Speed 5451.23 samples/sec Loss 1.3277 LearningRate 0.0036 Epoch: 17 Global Step: 185760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:03:54,239-Speed 5465.42 samples/sec Loss 1.2939 LearningRate 0.0036 Epoch: 17 Global Step: 185770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:04:01,754-Speed 5451.46 samples/sec Loss 1.3016 LearningRate 0.0036 Epoch: 17 Global Step: 185780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:04:09,270-Speed 5450.52 samples/sec Loss 1.3331 LearningRate 0.0036 Epoch: 17 Global Step: 185790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:04:16,761-Speed 5468.88 samples/sec Loss 1.3204 LearningRate 0.0036 Epoch: 17 Global Step: 185800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:04:24,298-Speed 5434.70 samples/sec Loss 1.3103 LearningRate 0.0036 Epoch: 17 Global Step: 185810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:04:31,838-Speed 5433.00 samples/sec Loss 1.3054 LearningRate 0.0036 Epoch: 17 Global Step: 185820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:04:39,433-Speed 5394.48 samples/sec Loss 1.3043 LearningRate 0.0036 Epoch: 17 Global Step: 185830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:04:46,956-Speed 5445.06 samples/sec Loss 1.2684 LearningRate 0.0036 Epoch: 17 Global Step: 185840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:04:54,416-Speed 5490.96 samples/sec Loss 1.2922 LearningRate 0.0036 Epoch: 17 Global Step: 185850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:05:01,934-Speed 5449.62 samples/sec Loss 1.3026 LearningRate 0.0036 Epoch: 17 Global Step: 185860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:05:09,402-Speed 5485.49 samples/sec Loss 1.2786 LearningRate 0.0036 Epoch: 17 Global Step: 185870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:05:16,903-Speed 5461.17 samples/sec Loss 1.2749 LearningRate 0.0036 Epoch: 17 Global Step: 185880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:05:24,348-Speed 5502.00 samples/sec Loss 1.2669 LearningRate 0.0036 Epoch: 17 Global Step: 185890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:05:31,819-Speed 5484.18 samples/sec Loss 1.3208 LearningRate 0.0036 Epoch: 17 Global Step: 185900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:05:39,327-Speed 5456.01 samples/sec Loss 1.3001 LearningRate 0.0036 Epoch: 17 Global Step: 185910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:05:46,824-Speed 5464.63 samples/sec Loss 1.3018 LearningRate 0.0036 Epoch: 17 Global Step: 185920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:05:54,343-Speed 5447.78 samples/sec Loss 1.2913 LearningRate 0.0036 Epoch: 17 Global Step: 185930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:06:01,879-Speed 5436.00 samples/sec Loss 1.2729 LearningRate 0.0036 Epoch: 17 Global Step: 185940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:06:09,378-Speed 5463.08 samples/sec Loss 1.3149 LearningRate 0.0035 Epoch: 17 Global Step: 185950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:06:16,855-Speed 5478.83 samples/sec Loss 1.3022 LearningRate 0.0035 Epoch: 17 Global Step: 185960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:06:24,294-Speed 5506.84 samples/sec Loss 1.3238 LearningRate 0.0035 Epoch: 17 Global Step: 185970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:06:31,967-Speed 5338.95 samples/sec Loss 1.2875 LearningRate 0.0035 Epoch: 17 Global Step: 185980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:06:39,466-Speed 5462.77 samples/sec Loss 1.2759 LearningRate 0.0035 Epoch: 17 Global Step: 185990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:06:46,943-Speed 5479.08 samples/sec Loss 1.2913 LearningRate 0.0035 Epoch: 17 Global Step: 186000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:07:31,268-[lfw][186000]XNorm: 22.746796 Training: 2022-01-09 13:07:31,269-[lfw][186000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 13:07:31,269-[lfw][186000]Accuracy-Highest: 0.99850 Training: 2022-01-09 13:08:22,818-[cfp_fp][186000]XNorm: 22.144995 Training: 2022-01-09 13:08:22,819-[cfp_fp][186000]Accuracy-Flip: 0.99371+-0.00345 Training: 2022-01-09 13:08:22,820-[cfp_fp][186000]Accuracy-Highest: 0.99371 Training: 2022-01-09 13:09:07,030-[agedb_30][186000]XNorm: 23.323391 Training: 2022-01-09 13:09:07,030-[agedb_30][186000]Accuracy-Flip: 0.98450+-0.00543 Training: 2022-01-09 13:09:07,031-[agedb_30][186000]Accuracy-Highest: 0.98500 Training: 2022-01-09 13:09:14,073-Speed 278.40 samples/sec Loss 1.3107 LearningRate 0.0035 Epoch: 17 Global Step: 186010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:09:21,497-Speed 5518.28 samples/sec Loss 1.3049 LearningRate 0.0035 Epoch: 17 Global Step: 186020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:09:29,070-Speed 5409.61 samples/sec Loss 1.3192 LearningRate 0.0035 Epoch: 17 Global Step: 186030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:09:36,615-Speed 5429.86 samples/sec Loss 1.2869 LearningRate 0.0035 Epoch: 17 Global Step: 186040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:09:44,134-Speed 5448.23 samples/sec Loss 1.3060 LearningRate 0.0035 Epoch: 17 Global Step: 186050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:09:51,645-Speed 5453.84 samples/sec Loss 1.3040 LearningRate 0.0035 Epoch: 17 Global Step: 186060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:10:00,362-Speed 5436.61 samples/sec Loss 1.3067 LearningRate 0.0035 Epoch: 17 Global Step: 186070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:10:07,888-Speed 5443.25 samples/sec Loss 1.2899 LearningRate 0.0035 Epoch: 17 Global Step: 186080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:10:15,426-Speed 5434.09 samples/sec Loss 1.3157 LearningRate 0.0035 Epoch: 17 Global Step: 186090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:10:22,982-Speed 5421.93 samples/sec Loss 1.2918 LearningRate 0.0035 Epoch: 17 Global Step: 186100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:10:30,527-Speed 5429.62 samples/sec Loss 1.2794 LearningRate 0.0035 Epoch: 17 Global Step: 186110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:10:38,043-Speed 5449.84 samples/sec Loss 1.3039 LearningRate 0.0035 Epoch: 17 Global Step: 186120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:10:45,557-Speed 5451.69 samples/sec Loss 1.2906 LearningRate 0.0035 Epoch: 17 Global Step: 186130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:10:53,093-Speed 5436.29 samples/sec Loss 1.2889 LearningRate 0.0035 Epoch: 17 Global Step: 186140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:11:00,576-Speed 5474.74 samples/sec Loss 1.2948 LearningRate 0.0035 Epoch: 17 Global Step: 186150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:11:08,118-Speed 5431.81 samples/sec Loss 1.2806 LearningRate 0.0035 Epoch: 17 Global Step: 186160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:11:15,681-Speed 5415.93 samples/sec Loss 1.3006 LearningRate 0.0035 Epoch: 17 Global Step: 186170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:11:23,232-Speed 5425.13 samples/sec Loss 1.2836 LearningRate 0.0035 Epoch: 17 Global Step: 186180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:11:30,734-Speed 5461.14 samples/sec Loss 1.2861 LearningRate 0.0035 Epoch: 17 Global Step: 186190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:11:38,225-Speed 5468.68 samples/sec Loss 1.2768 LearningRate 0.0035 Epoch: 17 Global Step: 186200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:11:45,726-Speed 5460.92 samples/sec Loss 1.3004 LearningRate 0.0035 Epoch: 17 Global Step: 186210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:11:53,256-Speed 5440.60 samples/sec Loss 1.3030 LearningRate 0.0035 Epoch: 17 Global Step: 186220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:12:00,793-Speed 5435.23 samples/sec Loss 1.3037 LearningRate 0.0035 Epoch: 17 Global Step: 186230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:12:08,397-Speed 5387.08 samples/sec Loss 1.2758 LearningRate 0.0035 Epoch: 17 Global Step: 186240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:12:15,899-Speed 5460.61 samples/sec Loss 1.2797 LearningRate 0.0035 Epoch: 17 Global Step: 186250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:12:23,406-Speed 5457.19 samples/sec Loss 1.2875 LearningRate 0.0034 Epoch: 17 Global Step: 186260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:12:30,935-Speed 5441.06 samples/sec Loss 1.2891 LearningRate 0.0034 Epoch: 17 Global Step: 186270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:12:38,422-Speed 5471.86 samples/sec Loss 1.2855 LearningRate 0.0034 Epoch: 17 Global Step: 186280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:12:45,911-Speed 5470.01 samples/sec Loss 1.2945 LearningRate 0.0034 Epoch: 17 Global Step: 186290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:12:53,567-Speed 5350.64 samples/sec Loss 1.2796 LearningRate 0.0034 Epoch: 17 Global Step: 186300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:13:01,079-Speed 5453.29 samples/sec Loss 1.2811 LearningRate 0.0034 Epoch: 17 Global Step: 186310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:13:08,694-Speed 5379.98 samples/sec Loss 1.2982 LearningRate 0.0034 Epoch: 17 Global Step: 186320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:13:16,319-Speed 5372.36 samples/sec Loss 1.3031 LearningRate 0.0034 Epoch: 17 Global Step: 186330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:13:23,850-Speed 5439.71 samples/sec Loss 1.2760 LearningRate 0.0034 Epoch: 17 Global Step: 186340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:13:31,381-Speed 5439.35 samples/sec Loss 1.2820 LearningRate 0.0034 Epoch: 17 Global Step: 186350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:13:39,289-Speed 5180.30 samples/sec Loss 1.3006 LearningRate 0.0034 Epoch: 17 Global Step: 186360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:13:46,955-Speed 5343.60 samples/sec Loss 1.2792 LearningRate 0.0034 Epoch: 17 Global Step: 186370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:13:54,464-Speed 5455.50 samples/sec Loss 1.2902 LearningRate 0.0034 Epoch: 17 Global Step: 186380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:14:02,012-Speed 5427.41 samples/sec Loss 1.2970 LearningRate 0.0034 Epoch: 17 Global Step: 186390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:14:09,511-Speed 5463.10 samples/sec Loss 1.2779 LearningRate 0.0034 Epoch: 17 Global Step: 186400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:14:16,959-Speed 5500.58 samples/sec Loss 1.2712 LearningRate 0.0034 Epoch: 17 Global Step: 186410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:14:24,462-Speed 5459.90 samples/sec Loss 1.3120 LearningRate 0.0034 Epoch: 17 Global Step: 186420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:14:32,005-Speed 5430.57 samples/sec Loss 1.2707 LearningRate 0.0034 Epoch: 17 Global Step: 186430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:14:39,675-Speed 5340.92 samples/sec Loss 1.2993 LearningRate 0.0034 Epoch: 17 Global Step: 186440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:14:47,232-Speed 5421.36 samples/sec Loss 1.2898 LearningRate 0.0034 Epoch: 17 Global Step: 186450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:14:54,760-Speed 5441.78 samples/sec Loss 1.2681 LearningRate 0.0034 Epoch: 17 Global Step: 186460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:15:02,330-Speed 5410.88 samples/sec Loss 1.2991 LearningRate 0.0034 Epoch: 17 Global Step: 186470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:15:09,847-Speed 5449.88 samples/sec Loss 1.2900 LearningRate 0.0034 Epoch: 17 Global Step: 186480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:15:17,451-Speed 5387.70 samples/sec Loss 1.2979 LearningRate 0.0034 Epoch: 17 Global Step: 186490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:15:24,958-Speed 5457.17 samples/sec Loss 1.2595 LearningRate 0.0034 Epoch: 17 Global Step: 186500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:15:32,477-Speed 5448.16 samples/sec Loss 1.2820 LearningRate 0.0034 Epoch: 17 Global Step: 186510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:15:39,987-Speed 5455.08 samples/sec Loss 1.2902 LearningRate 0.0034 Epoch: 17 Global Step: 186520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:15:47,515-Speed 5441.16 samples/sec Loss 1.3065 LearningRate 0.0034 Epoch: 17 Global Step: 186530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:15:55,034-Speed 5448.51 samples/sec Loss 1.2813 LearningRate 0.0034 Epoch: 17 Global Step: 186540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:16:02,638-Speed 5387.74 samples/sec Loss 1.2693 LearningRate 0.0034 Epoch: 17 Global Step: 186550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:16:10,176-Speed 5434.45 samples/sec Loss 1.2722 LearningRate 0.0034 Epoch: 17 Global Step: 186560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:16:17,727-Speed 5425.38 samples/sec Loss 1.2775 LearningRate 0.0033 Epoch: 17 Global Step: 186570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:16:25,258-Speed 5439.11 samples/sec Loss 1.2888 LearningRate 0.0033 Epoch: 17 Global Step: 186580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:16:32,768-Speed 5454.68 samples/sec Loss 1.2547 LearningRate 0.0033 Epoch: 17 Global Step: 186590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:16:40,297-Speed 5441.52 samples/sec Loss 1.3167 LearningRate 0.0033 Epoch: 17 Global Step: 186600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:16:47,808-Speed 5453.53 samples/sec Loss 1.2833 LearningRate 0.0033 Epoch: 17 Global Step: 186610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:16:55,329-Speed 5446.80 samples/sec Loss 1.2723 LearningRate 0.0033 Epoch: 17 Global Step: 186620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:17:02,832-Speed 5460.05 samples/sec Loss 1.2780 LearningRate 0.0033 Epoch: 17 Global Step: 186630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:17:10,345-Speed 5452.33 samples/sec Loss 1.2730 LearningRate 0.0033 Epoch: 17 Global Step: 186640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:17:17,878-Speed 5438.72 samples/sec Loss 1.3029 LearningRate 0.0033 Epoch: 17 Global Step: 186650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:17:41,772-Speed 1714.30 samples/sec Loss 1.2811 LearningRate 0.0033 Epoch: 18 Global Step: 186660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:17:49,281-Speed 5455.38 samples/sec Loss 1.2775 LearningRate 0.0033 Epoch: 18 Global Step: 186670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:17:56,752-Speed 5482.85 samples/sec Loss 1.2878 LearningRate 0.0033 Epoch: 18 Global Step: 186680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:18:04,239-Speed 5471.78 samples/sec Loss 1.2927 LearningRate 0.0033 Epoch: 18 Global Step: 186690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:18:11,709-Speed 5484.20 samples/sec Loss 1.2463 LearningRate 0.0033 Epoch: 18 Global Step: 186700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:18:19,205-Speed 5464.62 samples/sec Loss 1.2600 LearningRate 0.0033 Epoch: 18 Global Step: 186710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:18:26,692-Speed 5470.90 samples/sec Loss 1.2806 LearningRate 0.0033 Epoch: 18 Global Step: 186720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:18:34,193-Speed 5461.76 samples/sec Loss 1.2637 LearningRate 0.0033 Epoch: 18 Global Step: 186730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:18:41,667-Speed 5481.67 samples/sec Loss 1.2533 LearningRate 0.0033 Epoch: 18 Global Step: 186740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:18:49,163-Speed 5464.54 samples/sec Loss 1.2585 LearningRate 0.0033 Epoch: 18 Global Step: 186750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:18:56,620-Speed 5493.37 samples/sec Loss 1.2508 LearningRate 0.0033 Epoch: 18 Global Step: 186760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:19:04,130-Speed 5455.44 samples/sec Loss 1.2687 LearningRate 0.0033 Epoch: 18 Global Step: 186770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:19:11,673-Speed 5431.42 samples/sec Loss 1.2729 LearningRate 0.0033 Epoch: 18 Global Step: 186780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:19:19,208-Speed 5436.31 samples/sec Loss 1.2827 LearningRate 0.0033 Epoch: 18 Global Step: 186790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:19:26,706-Speed 5463.50 samples/sec Loss 1.2781 LearningRate 0.0033 Epoch: 18 Global Step: 186800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:19:34,246-Speed 5433.04 samples/sec Loss 1.2431 LearningRate 0.0033 Epoch: 18 Global Step: 186810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:19:41,761-Speed 5451.53 samples/sec Loss 1.2525 LearningRate 0.0033 Epoch: 18 Global Step: 186820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:19:49,246-Speed 5472.74 samples/sec Loss 1.2479 LearningRate 0.0033 Epoch: 18 Global Step: 186830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:19:56,821-Speed 5407.88 samples/sec Loss 1.2575 LearningRate 0.0033 Epoch: 18 Global Step: 186840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:20:04,260-Speed 5507.20 samples/sec Loss 1.2883 LearningRate 0.0033 Epoch: 18 Global Step: 186850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:20:11,759-Speed 5462.83 samples/sec Loss 1.2801 LearningRate 0.0033 Epoch: 18 Global Step: 186860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:20:19,250-Speed 5468.43 samples/sec Loss 1.2599 LearningRate 0.0033 Epoch: 18 Global Step: 186870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:20:26,759-Speed 5455.68 samples/sec Loss 1.2474 LearningRate 0.0032 Epoch: 18 Global Step: 186880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:20:34,344-Speed 5400.79 samples/sec Loss 1.2482 LearningRate 0.0032 Epoch: 18 Global Step: 186890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:20:41,855-Speed 5454.47 samples/sec Loss 1.2809 LearningRate 0.0032 Epoch: 18 Global Step: 186900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:20:49,362-Speed 5456.80 samples/sec Loss 1.2480 LearningRate 0.0032 Epoch: 18 Global Step: 186910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:20:56,855-Speed 5467.29 samples/sec Loss 1.2668 LearningRate 0.0032 Epoch: 18 Global Step: 186920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:21:04,429-Speed 5409.07 samples/sec Loss 1.2541 LearningRate 0.0032 Epoch: 18 Global Step: 186930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:21:12,131-Speed 5319.20 samples/sec Loss 1.2730 LearningRate 0.0032 Epoch: 18 Global Step: 186940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:21:19,810-Speed 5334.44 samples/sec Loss 1.2393 LearningRate 0.0032 Epoch: 18 Global Step: 186950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:21:27,351-Speed 5432.39 samples/sec Loss 1.2583 LearningRate 0.0032 Epoch: 18 Global Step: 186960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:21:34,869-Speed 5449.38 samples/sec Loss 1.2824 LearningRate 0.0032 Epoch: 18 Global Step: 186970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:21:42,419-Speed 5426.08 samples/sec Loss 1.2403 LearningRate 0.0032 Epoch: 18 Global Step: 186980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:21:50,133-Speed 5309.75 samples/sec Loss 1.2457 LearningRate 0.0032 Epoch: 18 Global Step: 186990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:21:57,672-Speed 5434.72 samples/sec Loss 1.2418 LearningRate 0.0032 Epoch: 18 Global Step: 187000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:22:05,181-Speed 5455.69 samples/sec Loss 1.2544 LearningRate 0.0032 Epoch: 18 Global Step: 187010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:22:12,707-Speed 5443.42 samples/sec Loss 1.2578 LearningRate 0.0032 Epoch: 18 Global Step: 187020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:22:20,211-Speed 5459.36 samples/sec Loss 1.2749 LearningRate 0.0032 Epoch: 18 Global Step: 187030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:22:27,706-Speed 5465.24 samples/sec Loss 1.2566 LearningRate 0.0032 Epoch: 18 Global Step: 187040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:22:35,207-Speed 5461.06 samples/sec Loss 1.2397 LearningRate 0.0032 Epoch: 18 Global Step: 187050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:22:42,859-Speed 5354.51 samples/sec Loss 1.2535 LearningRate 0.0032 Epoch: 18 Global Step: 187060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:22:50,372-Speed 5451.76 samples/sec Loss 1.2776 LearningRate 0.0032 Epoch: 18 Global Step: 187070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:22:57,915-Speed 5431.33 samples/sec Loss 1.2593 LearningRate 0.0032 Epoch: 18 Global Step: 187080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:23:05,420-Speed 5458.70 samples/sec Loss 1.2363 LearningRate 0.0032 Epoch: 18 Global Step: 187090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:23:12,926-Speed 5457.72 samples/sec Loss 1.2849 LearningRate 0.0032 Epoch: 18 Global Step: 187100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:23:20,448-Speed 5446.07 samples/sec Loss 1.2450 LearningRate 0.0032 Epoch: 18 Global Step: 187110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:23:27,971-Speed 5445.02 samples/sec Loss 1.2426 LearningRate 0.0032 Epoch: 18 Global Step: 187120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:23:35,508-Speed 5435.52 samples/sec Loss 1.2350 LearningRate 0.0032 Epoch: 18 Global Step: 187130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:23:42,987-Speed 5477.83 samples/sec Loss 1.2587 LearningRate 0.0032 Epoch: 18 Global Step: 187140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:23:50,508-Speed 5446.51 samples/sec Loss 1.2260 LearningRate 0.0032 Epoch: 18 Global Step: 187150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:23:58,052-Speed 5429.90 samples/sec Loss 1.2520 LearningRate 0.0032 Epoch: 18 Global Step: 187160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:24:05,582-Speed 5440.44 samples/sec Loss 1.2577 LearningRate 0.0032 Epoch: 18 Global Step: 187170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:24:13,151-Speed 5412.62 samples/sec Loss 1.2629 LearningRate 0.0032 Epoch: 18 Global Step: 187180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:24:20,735-Speed 5401.15 samples/sec Loss 1.2699 LearningRate 0.0032 Epoch: 18 Global Step: 187190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:24:28,317-Speed 5403.03 samples/sec Loss 1.2465 LearningRate 0.0031 Epoch: 18 Global Step: 187200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:24:35,905-Speed 5398.45 samples/sec Loss 1.2376 LearningRate 0.0031 Epoch: 18 Global Step: 187210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:24:43,405-Speed 5462.47 samples/sec Loss 1.2670 LearningRate 0.0031 Epoch: 18 Global Step: 187220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:24:50,957-Speed 5424.95 samples/sec Loss 1.2342 LearningRate 0.0031 Epoch: 18 Global Step: 187230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:24:58,441-Speed 5472.99 samples/sec Loss 1.2685 LearningRate 0.0031 Epoch: 18 Global Step: 187240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:25:06,061-Speed 5375.93 samples/sec Loss 1.2526 LearningRate 0.0031 Epoch: 18 Global Step: 187250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:25:13,588-Speed 5442.78 samples/sec Loss 1.2444 LearningRate 0.0031 Epoch: 18 Global Step: 187260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:25:21,190-Speed 5388.99 samples/sec Loss 1.2508 LearningRate 0.0031 Epoch: 18 Global Step: 187270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:25:28,713-Speed 5445.43 samples/sec Loss 1.2465 LearningRate 0.0031 Epoch: 18 Global Step: 187280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:25:36,461-Speed 5286.47 samples/sec Loss 1.2517 LearningRate 0.0031 Epoch: 18 Global Step: 187290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:25:44,080-Speed 5377.03 samples/sec Loss 1.2345 LearningRate 0.0031 Epoch: 18 Global Step: 187300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:25:51,609-Speed 5441.37 samples/sec Loss 1.2564 LearningRate 0.0031 Epoch: 18 Global Step: 187310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:25:59,144-Speed 5436.40 samples/sec Loss 1.2871 LearningRate 0.0031 Epoch: 18 Global Step: 187320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:26:06,635-Speed 5468.14 samples/sec Loss 1.2712 LearningRate 0.0031 Epoch: 18 Global Step: 187330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:26:14,101-Speed 5487.32 samples/sec Loss 1.2418 LearningRate 0.0031 Epoch: 18 Global Step: 187340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:26:21,595-Speed 5466.32 samples/sec Loss 1.2473 LearningRate 0.0031 Epoch: 18 Global Step: 187350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:26:29,182-Speed 5399.18 samples/sec Loss 1.2617 LearningRate 0.0031 Epoch: 18 Global Step: 187360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:26:36,761-Speed 5405.38 samples/sec Loss 1.2343 LearningRate 0.0031 Epoch: 18 Global Step: 187370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:26:44,370-Speed 5383.49 samples/sec Loss 1.2418 LearningRate 0.0031 Epoch: 18 Global Step: 187380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-09 13:26:51,943-Speed 5409.66 samples/sec Loss 1.2473 LearningRate 0.0031 Epoch: 18 Global Step: 187390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:26:59,463-Speed 5447.90 samples/sec Loss 1.2247 LearningRate 0.0031 Epoch: 18 Global Step: 187400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:27:06,986-Speed 5444.98 samples/sec Loss 1.2342 LearningRate 0.0031 Epoch: 18 Global Step: 187410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:27:14,537-Speed 5425.13 samples/sec Loss 1.2487 LearningRate 0.0031 Epoch: 18 Global Step: 187420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:27:22,081-Speed 5430.10 samples/sec Loss 1.1981 LearningRate 0.0031 Epoch: 18 Global Step: 187430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:27:29,621-Speed 5432.87 samples/sec Loss 1.2546 LearningRate 0.0031 Epoch: 18 Global Step: 187440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:27:37,183-Speed 5417.12 samples/sec Loss 1.2478 LearningRate 0.0031 Epoch: 18 Global Step: 187450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:27:44,750-Speed 5413.60 samples/sec Loss 1.2549 LearningRate 0.0031 Epoch: 18 Global Step: 187460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:27:52,271-Speed 5446.76 samples/sec Loss 1.2290 LearningRate 0.0031 Epoch: 18 Global Step: 187470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:27:59,813-Speed 5431.67 samples/sec Loss 1.2524 LearningRate 0.0031 Epoch: 18 Global Step: 187480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:28:07,351-Speed 5434.59 samples/sec Loss 1.2160 LearningRate 0.0031 Epoch: 18 Global Step: 187490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:28:14,851-Speed 5461.87 samples/sec Loss 1.2613 LearningRate 0.0031 Epoch: 18 Global Step: 187500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:28:22,403-Speed 5424.88 samples/sec Loss 1.2437 LearningRate 0.0031 Epoch: 18 Global Step: 187510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:28:29,952-Speed 5426.07 samples/sec Loss 1.2403 LearningRate 0.0030 Epoch: 18 Global Step: 187520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:28:37,516-Speed 5415.88 samples/sec Loss 1.2352 LearningRate 0.0030 Epoch: 18 Global Step: 187530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:28:45,081-Speed 5414.96 samples/sec Loss 1.2487 LearningRate 0.0030 Epoch: 18 Global Step: 187540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:28:52,807-Speed 5302.31 samples/sec Loss 1.2564 LearningRate 0.0030 Epoch: 18 Global Step: 187550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:29:00,376-Speed 5412.96 samples/sec Loss 1.2473 LearningRate 0.0030 Epoch: 18 Global Step: 187560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:29:07,883-Speed 5456.03 samples/sec Loss 1.2549 LearningRate 0.0030 Epoch: 18 Global Step: 187570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:29:15,447-Speed 5416.09 samples/sec Loss 1.2333 LearningRate 0.0030 Epoch: 18 Global Step: 187580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:29:23,134-Speed 5329.07 samples/sec Loss 1.2451 LearningRate 0.0030 Epoch: 18 Global Step: 187590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-09 13:29:30,642-Speed 5456.78 samples/sec Loss 1.2656 LearningRate 0.0030 Epoch: 18 Global Step: 187600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:29:38,190-Speed 5426.57 samples/sec Loss 1.2475 LearningRate 0.0030 Epoch: 18 Global Step: 187610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:29:45,716-Speed 5443.62 samples/sec Loss 1.2470 LearningRate 0.0030 Epoch: 18 Global Step: 187620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:29:53,270-Speed 5423.23 samples/sec Loss 1.2622 LearningRate 0.0030 Epoch: 18 Global Step: 187630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:30:00,827-Speed 5420.47 samples/sec Loss 1.2540 LearningRate 0.0030 Epoch: 18 Global Step: 187640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:30:08,377-Speed 5425.91 samples/sec Loss 1.2486 LearningRate 0.0030 Epoch: 18 Global Step: 187650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:30:15,950-Speed 5409.07 samples/sec Loss 1.2335 LearningRate 0.0030 Epoch: 18 Global Step: 187660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:30:23,543-Speed 5395.58 samples/sec Loss 1.2203 LearningRate 0.0030 Epoch: 18 Global Step: 187670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:30:31,053-Speed 5454.55 samples/sec Loss 1.2339 LearningRate 0.0030 Epoch: 18 Global Step: 187680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:30:38,570-Speed 5449.89 samples/sec Loss 1.2666 LearningRate 0.0030 Epoch: 18 Global Step: 187690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:30:46,022-Speed 5497.42 samples/sec Loss 1.2349 LearningRate 0.0030 Epoch: 18 Global Step: 187700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:30:53,545-Speed 5445.59 samples/sec Loss 1.2347 LearningRate 0.0030 Epoch: 18 Global Step: 187710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:31:01,147-Speed 5388.73 samples/sec Loss 1.2476 LearningRate 0.0030 Epoch: 18 Global Step: 187720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:31:08,695-Speed 5426.68 samples/sec Loss 1.2271 LearningRate 0.0030 Epoch: 18 Global Step: 187730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-09 13:31:16,172-Speed 5479.22 samples/sec Loss 1.2463 LearningRate 0.0030 Epoch: 18 Global Step: 187740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:31:23,719-Speed 5428.34 samples/sec Loss 1.2281 LearningRate 0.0030 Epoch: 18 Global Step: 187750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:31:31,226-Speed 5457.43 samples/sec Loss 1.2188 LearningRate 0.0030 Epoch: 18 Global Step: 187760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:31:38,738-Speed 5452.59 samples/sec Loss 1.2420 LearningRate 0.0030 Epoch: 18 Global Step: 187770 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-01-09 13:31:46,242-Speed 5458.96 samples/sec Loss 1.2636 LearningRate 0.0030 Epoch: 18 Global Step: 187780 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-01-09 13:31:53,852-Speed 5383.66 samples/sec Loss 1.2061 LearningRate 0.0030 Epoch: 18 Global Step: 187790 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-01-09 13:32:01,506-Speed 5352.31 samples/sec Loss 1.2212 LearningRate 0.0030 Epoch: 18 Global Step: 187800 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-01-09 13:32:09,070-Speed 5415.36 samples/sec Loss 1.2139 LearningRate 0.0030 Epoch: 18 Global Step: 187810 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-01-09 13:32:16,626-Speed 5422.16 samples/sec Loss 1.1994 LearningRate 0.0030 Epoch: 18 Global Step: 187820 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-01-09 13:32:24,247-Speed 5375.14 samples/sec Loss 1.2275 LearningRate 0.0030 Epoch: 18 Global Step: 187830 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-01-09 13:32:31,896-Speed 5356.23 samples/sec Loss 1.2231 LearningRate 0.0030 Epoch: 18 Global Step: 187840 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-01-09 13:32:39,596-Speed 5319.92 samples/sec Loss 1.2339 LearningRate 0.0029 Epoch: 18 Global Step: 187850 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-01-09 13:32:47,131-Speed 5436.18 samples/sec Loss 1.2437 LearningRate 0.0029 Epoch: 18 Global Step: 187860 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-01-09 13:32:54,707-Speed 5407.89 samples/sec Loss 1.2077 LearningRate 0.0029 Epoch: 18 Global Step: 187870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:33:02,347-Speed 5361.69 samples/sec Loss 1.2411 LearningRate 0.0029 Epoch: 18 Global Step: 187880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:33:09,887-Speed 5433.17 samples/sec Loss 1.2302 LearningRate 0.0029 Epoch: 18 Global Step: 187890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:33:17,381-Speed 5465.76 samples/sec Loss 1.2209 LearningRate 0.0029 Epoch: 18 Global Step: 187900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:33:24,840-Speed 5492.37 samples/sec Loss 1.2395 LearningRate 0.0029 Epoch: 18 Global Step: 187910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:33:32,331-Speed 5468.89 samples/sec Loss 1.2342 LearningRate 0.0029 Epoch: 18 Global Step: 187920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:33:39,909-Speed 5405.51 samples/sec Loss 1.2279 LearningRate 0.0029 Epoch: 18 Global Step: 187930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:33:47,496-Speed 5400.14 samples/sec Loss 1.2090 LearningRate 0.0029 Epoch: 18 Global Step: 187940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:33:55,042-Speed 5428.84 samples/sec Loss 1.2116 LearningRate 0.0029 Epoch: 18 Global Step: 187950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:34:02,628-Speed 5400.42 samples/sec Loss 1.2289 LearningRate 0.0029 Epoch: 18 Global Step: 187960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:34:10,303-Speed 5337.08 samples/sec Loss 1.2245 LearningRate 0.0029 Epoch: 18 Global Step: 187970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:34:17,767-Speed 5488.30 samples/sec Loss 1.2020 LearningRate 0.0029 Epoch: 18 Global Step: 187980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:34:25,319-Speed 5424.79 samples/sec Loss 1.2178 LearningRate 0.0029 Epoch: 18 Global Step: 187990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:34:32,745-Speed 5516.40 samples/sec Loss 1.2125 LearningRate 0.0029 Epoch: 18 Global Step: 188000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:35:16,728-[lfw][188000]XNorm: 21.914746 Training: 2022-01-09 13:35:16,729-[lfw][188000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 13:35:16,730-[lfw][188000]Accuracy-Highest: 0.99850 Training: 2022-01-09 13:36:07,931-[cfp_fp][188000]XNorm: 21.509004 Training: 2022-01-09 13:36:07,931-[cfp_fp][188000]Accuracy-Flip: 0.99371+-0.00357 Training: 2022-01-09 13:36:07,932-[cfp_fp][188000]Accuracy-Highest: 0.99371 Training: 2022-01-09 13:36:51,967-[agedb_30][188000]XNorm: 22.512485 Training: 2022-01-09 13:36:51,968-[agedb_30][188000]Accuracy-Flip: 0.98467+-0.00600 Training: 2022-01-09 13:36:51,968-[agedb_30][188000]Accuracy-Highest: 0.98500 Training: 2022-01-09 13:36:59,706-Speed 278.72 samples/sec Loss 1.2108 LearningRate 0.0029 Epoch: 18 Global Step: 188010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:37:07,247-Speed 5432.53 samples/sec Loss 1.2158 LearningRate 0.0029 Epoch: 18 Global Step: 188020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:37:14,678-Speed 5512.59 samples/sec Loss 1.2291 LearningRate 0.0029 Epoch: 18 Global Step: 188030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:37:22,224-Speed 5428.89 samples/sec Loss 1.2058 LearningRate 0.0029 Epoch: 18 Global Step: 188040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:37:29,902-Speed 5335.64 samples/sec Loss 1.2241 LearningRate 0.0029 Epoch: 18 Global Step: 188050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:37:37,437-Speed 5436.32 samples/sec Loss 1.2333 LearningRate 0.0029 Epoch: 18 Global Step: 188060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:37:44,940-Speed 5460.17 samples/sec Loss 1.2066 LearningRate 0.0029 Epoch: 18 Global Step: 188070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:37:52,512-Speed 5409.86 samples/sec Loss 1.2298 LearningRate 0.0029 Epoch: 18 Global Step: 188080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:38:00,119-Speed 5385.73 samples/sec Loss 1.2320 LearningRate 0.0029 Epoch: 18 Global Step: 188090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:38:07,607-Speed 5470.29 samples/sec Loss 1.2329 LearningRate 0.0029 Epoch: 18 Global Step: 188100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:38:15,150-Speed 5430.87 samples/sec Loss 1.2272 LearningRate 0.0029 Epoch: 18 Global Step: 188110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:38:22,685-Speed 5436.75 samples/sec Loss 1.2110 LearningRate 0.0029 Epoch: 18 Global Step: 188120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:38:30,403-Speed 5308.03 samples/sec Loss 1.2306 LearningRate 0.0029 Epoch: 18 Global Step: 188130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:38:38,067-Speed 5345.40 samples/sec Loss 1.2485 LearningRate 0.0029 Epoch: 18 Global Step: 188140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:38:45,585-Speed 5448.85 samples/sec Loss 1.2001 LearningRate 0.0029 Epoch: 18 Global Step: 188150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:38:53,165-Speed 5404.40 samples/sec Loss 1.2133 LearningRate 0.0029 Epoch: 18 Global Step: 188160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:39:00,756-Speed 5396.46 samples/sec Loss 1.2016 LearningRate 0.0029 Epoch: 18 Global Step: 188170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:39:08,245-Speed 5470.37 samples/sec Loss 1.1981 LearningRate 0.0028 Epoch: 18 Global Step: 188180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:39:15,783-Speed 5434.13 samples/sec Loss 1.2058 LearningRate 0.0028 Epoch: 18 Global Step: 188190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:39:23,332-Speed 5426.60 samples/sec Loss 1.2264 LearningRate 0.0028 Epoch: 18 Global Step: 188200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:39:31,051-Speed 5307.64 samples/sec Loss 1.1930 LearningRate 0.0028 Epoch: 18 Global Step: 188210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:39:38,625-Speed 5408.69 samples/sec Loss 1.1937 LearningRate 0.0028 Epoch: 18 Global Step: 188220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:39:46,202-Speed 5406.38 samples/sec Loss 1.1940 LearningRate 0.0028 Epoch: 18 Global Step: 188230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:39:53,699-Speed 5464.11 samples/sec Loss 1.2178 LearningRate 0.0028 Epoch: 18 Global Step: 188240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:40:01,259-Speed 5418.94 samples/sec Loss 1.2359 LearningRate 0.0028 Epoch: 18 Global Step: 188250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:40:08,814-Speed 5421.95 samples/sec Loss 1.2120 LearningRate 0.0028 Epoch: 18 Global Step: 188260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:40:16,374-Speed 5418.75 samples/sec Loss 1.1983 LearningRate 0.0028 Epoch: 18 Global Step: 188270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:40:23,991-Speed 5378.09 samples/sec Loss 1.2231 LearningRate 0.0028 Epoch: 18 Global Step: 188280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:40:31,506-Speed 5451.07 samples/sec Loss 1.2146 LearningRate 0.0028 Epoch: 18 Global Step: 188290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:40:39,084-Speed 5405.72 samples/sec Loss 1.2329 LearningRate 0.0028 Epoch: 18 Global Step: 188300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:40:46,606-Speed 5446.00 samples/sec Loss 1.1942 LearningRate 0.0028 Epoch: 18 Global Step: 188310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:40:54,148-Speed 5431.21 samples/sec Loss 1.2276 LearningRate 0.0028 Epoch: 18 Global Step: 188320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:41:01,728-Speed 5404.97 samples/sec Loss 1.2174 LearningRate 0.0028 Epoch: 18 Global Step: 188330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:41:09,343-Speed 5379.79 samples/sec Loss 1.1844 LearningRate 0.0028 Epoch: 18 Global Step: 188340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:41:16,929-Speed 5400.14 samples/sec Loss 1.2313 LearningRate 0.0028 Epoch: 18 Global Step: 188350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:41:24,416-Speed 5471.20 samples/sec Loss 1.1943 LearningRate 0.0028 Epoch: 18 Global Step: 188360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:41:31,978-Speed 5416.99 samples/sec Loss 1.2151 LearningRate 0.0028 Epoch: 18 Global Step: 188370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:41:39,413-Speed 5510.25 samples/sec Loss 1.2125 LearningRate 0.0028 Epoch: 18 Global Step: 188380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:41:46,981-Speed 5412.94 samples/sec Loss 1.2445 LearningRate 0.0028 Epoch: 18 Global Step: 188390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:41:54,538-Speed 5420.15 samples/sec Loss 1.2029 LearningRate 0.0028 Epoch: 18 Global Step: 188400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:42:02,060-Speed 5446.45 samples/sec Loss 1.2282 LearningRate 0.0028 Epoch: 18 Global Step: 188410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:42:09,618-Speed 5420.32 samples/sec Loss 1.1919 LearningRate 0.0028 Epoch: 18 Global Step: 188420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:42:17,191-Speed 5409.09 samples/sec Loss 1.2322 LearningRate 0.0028 Epoch: 18 Global Step: 188430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:42:24,721-Speed 5440.17 samples/sec Loss 1.2145 LearningRate 0.0028 Epoch: 18 Global Step: 188440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:42:32,299-Speed 5406.30 samples/sec Loss 1.2169 LearningRate 0.0028 Epoch: 18 Global Step: 188450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:42:39,779-Speed 5476.72 samples/sec Loss 1.1809 LearningRate 0.0028 Epoch: 18 Global Step: 188460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:42:47,300-Speed 5446.74 samples/sec Loss 1.2198 LearningRate 0.0028 Epoch: 18 Global Step: 188470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:42:54,749-Speed 5499.67 samples/sec Loss 1.1979 LearningRate 0.0028 Epoch: 18 Global Step: 188480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:43:02,248-Speed 5463.18 samples/sec Loss 1.2136 LearningRate 0.0028 Epoch: 18 Global Step: 188490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:43:09,690-Speed 5504.75 samples/sec Loss 1.2035 LearningRate 0.0028 Epoch: 18 Global Step: 188500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:43:17,131-Speed 5505.09 samples/sec Loss 1.2102 LearningRate 0.0028 Epoch: 18 Global Step: 188510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:43:24,686-Speed 5422.28 samples/sec Loss 1.2206 LearningRate 0.0027 Epoch: 18 Global Step: 188520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:43:32,155-Speed 5484.79 samples/sec Loss 1.2044 LearningRate 0.0027 Epoch: 18 Global Step: 188530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:43:39,818-Speed 5345.79 samples/sec Loss 1.1906 LearningRate 0.0027 Epoch: 18 Global Step: 188540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:43:47,366-Speed 5427.37 samples/sec Loss 1.1821 LearningRate 0.0027 Epoch: 18 Global Step: 188550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:43:54,828-Speed 5489.97 samples/sec Loss 1.2107 LearningRate 0.0027 Epoch: 18 Global Step: 188560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:44:02,353-Speed 5444.07 samples/sec Loss 1.1680 LearningRate 0.0027 Epoch: 18 Global Step: 188570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:44:09,858-Speed 5458.07 samples/sec Loss 1.2023 LearningRate 0.0027 Epoch: 18 Global Step: 188580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:44:17,329-Speed 5483.48 samples/sec Loss 1.2010 LearningRate 0.0027 Epoch: 18 Global Step: 188590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:44:24,792-Speed 5489.10 samples/sec Loss 1.1916 LearningRate 0.0027 Epoch: 18 Global Step: 188600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:44:32,301-Speed 5455.07 samples/sec Loss 1.1697 LearningRate 0.0027 Epoch: 18 Global Step: 188610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:44:39,763-Speed 5489.93 samples/sec Loss 1.1843 LearningRate 0.0027 Epoch: 18 Global Step: 188620 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:44:47,244-Speed 5476.69 samples/sec Loss 1.2113 LearningRate 0.0027 Epoch: 18 Global Step: 188630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:44:54,818-Speed 5408.47 samples/sec Loss 1.1933 LearningRate 0.0027 Epoch: 18 Global Step: 188640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:45:02,316-Speed 5463.64 samples/sec Loss 1.2007 LearningRate 0.0027 Epoch: 18 Global Step: 188650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:45:09,838-Speed 5446.02 samples/sec Loss 1.1925 LearningRate 0.0027 Epoch: 18 Global Step: 188660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:45:17,336-Speed 5463.97 samples/sec Loss 1.2063 LearningRate 0.0027 Epoch: 18 Global Step: 188670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:45:24,894-Speed 5419.81 samples/sec Loss 1.1964 LearningRate 0.0027 Epoch: 18 Global Step: 188680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:45:32,313-Speed 5521.46 samples/sec Loss 1.2002 LearningRate 0.0027 Epoch: 18 Global Step: 188690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:45:39,821-Speed 5456.58 samples/sec Loss 1.2049 LearningRate 0.0027 Epoch: 18 Global Step: 188700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:45:47,387-Speed 5414.74 samples/sec Loss 1.1936 LearningRate 0.0027 Epoch: 18 Global Step: 188710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:45:54,909-Speed 5446.24 samples/sec Loss 1.1955 LearningRate 0.0027 Epoch: 18 Global Step: 188720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:46:02,365-Speed 5494.23 samples/sec Loss 1.1741 LearningRate 0.0027 Epoch: 18 Global Step: 188730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:46:09,878-Speed 5452.12 samples/sec Loss 1.1837 LearningRate 0.0027 Epoch: 18 Global Step: 188740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:46:17,377-Speed 5463.17 samples/sec Loss 1.1830 LearningRate 0.0027 Epoch: 18 Global Step: 188750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:46:24,914-Speed 5435.61 samples/sec Loss 1.1920 LearningRate 0.0027 Epoch: 18 Global Step: 188760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:46:32,447-Speed 5437.66 samples/sec Loss 1.1837 LearningRate 0.0027 Epoch: 18 Global Step: 188770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:46:39,953-Speed 5457.50 samples/sec Loss 1.1847 LearningRate 0.0027 Epoch: 18 Global Step: 188780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:46:47,452-Speed 5462.84 samples/sec Loss 1.2165 LearningRate 0.0027 Epoch: 18 Global Step: 188790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:46:54,934-Speed 5475.62 samples/sec Loss 1.1897 LearningRate 0.0027 Epoch: 18 Global Step: 188800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:47:02,397-Speed 5488.87 samples/sec Loss 1.1796 LearningRate 0.0027 Epoch: 18 Global Step: 188810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:47:09,853-Speed 5494.34 samples/sec Loss 1.2264 LearningRate 0.0027 Epoch: 18 Global Step: 188820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:47:17,360-Speed 5457.20 samples/sec Loss 1.2067 LearningRate 0.0027 Epoch: 18 Global Step: 188830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:47:24,858-Speed 5464.06 samples/sec Loss 1.1970 LearningRate 0.0027 Epoch: 18 Global Step: 188840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:47:32,410-Speed 5424.48 samples/sec Loss 1.1924 LearningRate 0.0027 Epoch: 18 Global Step: 188850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:47:39,850-Speed 5505.42 samples/sec Loss 1.2123 LearningRate 0.0027 Epoch: 18 Global Step: 188860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:47:47,317-Speed 5486.58 samples/sec Loss 1.2017 LearningRate 0.0026 Epoch: 18 Global Step: 188870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:47:54,773-Speed 5494.56 samples/sec Loss 1.1935 LearningRate 0.0026 Epoch: 18 Global Step: 188880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:48:02,213-Speed 5505.63 samples/sec Loss 1.2093 LearningRate 0.0026 Epoch: 18 Global Step: 188890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:48:09,629-Speed 5523.92 samples/sec Loss 1.1971 LearningRate 0.0026 Epoch: 18 Global Step: 188900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:48:17,118-Speed 5469.96 samples/sec Loss 1.1973 LearningRate 0.0026 Epoch: 18 Global Step: 188910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:48:24,541-Speed 5519.55 samples/sec Loss 1.1885 LearningRate 0.0026 Epoch: 18 Global Step: 188920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:48:31,997-Speed 5494.17 samples/sec Loss 1.1759 LearningRate 0.0026 Epoch: 18 Global Step: 188930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:48:39,447-Speed 5498.43 samples/sec Loss 1.1862 LearningRate 0.0026 Epoch: 18 Global Step: 188940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:48:46,889-Speed 5504.83 samples/sec Loss 1.1972 LearningRate 0.0026 Epoch: 18 Global Step: 188950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:48:54,335-Speed 5501.57 samples/sec Loss 1.2034 LearningRate 0.0026 Epoch: 18 Global Step: 188960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:49:01,801-Speed 5486.83 samples/sec Loss 1.1854 LearningRate 0.0026 Epoch: 18 Global Step: 188970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:49:09,372-Speed 5411.03 samples/sec Loss 1.1938 LearningRate 0.0026 Epoch: 18 Global Step: 188980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:49:16,942-Speed 5411.54 samples/sec Loss 1.1926 LearningRate 0.0026 Epoch: 18 Global Step: 188990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:49:24,393-Speed 5498.45 samples/sec Loss 1.1827 LearningRate 0.0026 Epoch: 18 Global Step: 189000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:49:31,914-Speed 5446.74 samples/sec Loss 1.2073 LearningRate 0.0026 Epoch: 18 Global Step: 189010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:49:39,394-Speed 5476.47 samples/sec Loss 1.2093 LearningRate 0.0026 Epoch: 18 Global Step: 189020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:49:46,816-Speed 5519.35 samples/sec Loss 1.1873 LearningRate 0.0026 Epoch: 18 Global Step: 189030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:49:54,230-Speed 5525.50 samples/sec Loss 1.1927 LearningRate 0.0026 Epoch: 18 Global Step: 189040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:50:01,750-Speed 5447.22 samples/sec Loss 1.1789 LearningRate 0.0026 Epoch: 18 Global Step: 189050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:50:09,296-Speed 5429.13 samples/sec Loss 1.1847 LearningRate 0.0026 Epoch: 18 Global Step: 189060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:50:16,743-Speed 5500.48 samples/sec Loss 1.2068 LearningRate 0.0026 Epoch: 18 Global Step: 189070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:50:24,231-Speed 5470.99 samples/sec Loss 1.1993 LearningRate 0.0026 Epoch: 18 Global Step: 189080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:50:31,647-Speed 5524.40 samples/sec Loss 1.1931 LearningRate 0.0026 Epoch: 18 Global Step: 189090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:50:39,204-Speed 5420.91 samples/sec Loss 1.1836 LearningRate 0.0026 Epoch: 18 Global Step: 189100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:50:46,660-Speed 5494.11 samples/sec Loss 1.1868 LearningRate 0.0026 Epoch: 18 Global Step: 189110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:50:54,134-Speed 5481.14 samples/sec Loss 1.1646 LearningRate 0.0026 Epoch: 18 Global Step: 189120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:51:01,644-Speed 5454.31 samples/sec Loss 1.1896 LearningRate 0.0026 Epoch: 18 Global Step: 189130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:51:09,054-Speed 5528.80 samples/sec Loss 1.1735 LearningRate 0.0026 Epoch: 18 Global Step: 189140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:51:16,531-Speed 5478.98 samples/sec Loss 1.1667 LearningRate 0.0026 Epoch: 18 Global Step: 189150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:51:23,995-Speed 5488.30 samples/sec Loss 1.1276 LearningRate 0.0026 Epoch: 18 Global Step: 189160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:51:31,466-Speed 5482.95 samples/sec Loss 1.1795 LearningRate 0.0026 Epoch: 18 Global Step: 189170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:51:39,000-Speed 5437.99 samples/sec Loss 1.2113 LearningRate 0.0026 Epoch: 18 Global Step: 189180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:51:46,435-Speed 5509.77 samples/sec Loss 1.1864 LearningRate 0.0026 Epoch: 18 Global Step: 189190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:51:53,933-Speed 5463.41 samples/sec Loss 1.1931 LearningRate 0.0026 Epoch: 18 Global Step: 189200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:52:01,421-Speed 5470.85 samples/sec Loss 1.2093 LearningRate 0.0026 Epoch: 18 Global Step: 189210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:52:08,884-Speed 5489.36 samples/sec Loss 1.1929 LearningRate 0.0025 Epoch: 18 Global Step: 189220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:52:16,280-Speed 5539.02 samples/sec Loss 1.1935 LearningRate 0.0025 Epoch: 18 Global Step: 189230 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:52:23,738-Speed 5492.25 samples/sec Loss 1.1898 LearningRate 0.0025 Epoch: 18 Global Step: 189240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:52:31,190-Speed 5497.77 samples/sec Loss 1.1789 LearningRate 0.0025 Epoch: 18 Global Step: 189250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:52:38,695-Speed 5458.24 samples/sec Loss 1.1837 LearningRate 0.0025 Epoch: 18 Global Step: 189260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:52:46,247-Speed 5424.64 samples/sec Loss 1.1673 LearningRate 0.0025 Epoch: 18 Global Step: 189270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:52:53,668-Speed 5520.11 samples/sec Loss 1.1837 LearningRate 0.0025 Epoch: 18 Global Step: 189280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:53:01,089-Speed 5520.72 samples/sec Loss 1.1778 LearningRate 0.0025 Epoch: 18 Global Step: 189290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:53:08,562-Speed 5481.83 samples/sec Loss 1.1883 LearningRate 0.0025 Epoch: 18 Global Step: 189300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:53:16,015-Speed 5495.95 samples/sec Loss 1.1723 LearningRate 0.0025 Epoch: 18 Global Step: 189310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:53:23,579-Speed 5415.76 samples/sec Loss 1.1871 LearningRate 0.0025 Epoch: 18 Global Step: 189320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:53:31,035-Speed 5494.52 samples/sec Loss 1.1815 LearningRate 0.0025 Epoch: 18 Global Step: 189330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:53:38,465-Speed 5513.77 samples/sec Loss 1.1410 LearningRate 0.0025 Epoch: 18 Global Step: 189340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:53:45,897-Speed 5512.43 samples/sec Loss 1.1454 LearningRate 0.0025 Epoch: 18 Global Step: 189350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:53:53,327-Speed 5513.29 samples/sec Loss 1.1744 LearningRate 0.0025 Epoch: 18 Global Step: 189360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:54:00,897-Speed 5411.62 samples/sec Loss 1.1582 LearningRate 0.0025 Epoch: 18 Global Step: 189370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:54:08,380-Speed 5473.97 samples/sec Loss 1.1864 LearningRate 0.0025 Epoch: 18 Global Step: 189380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:54:15,869-Speed 5469.97 samples/sec Loss 1.1884 LearningRate 0.0025 Epoch: 18 Global Step: 189390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:54:23,393-Speed 5445.05 samples/sec Loss 1.1772 LearningRate 0.0025 Epoch: 18 Global Step: 189400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:54:30,906-Speed 5452.76 samples/sec Loss 1.1678 LearningRate 0.0025 Epoch: 18 Global Step: 189410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:54:38,324-Speed 5522.23 samples/sec Loss 1.1664 LearningRate 0.0025 Epoch: 18 Global Step: 189420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:54:45,732-Speed 5529.92 samples/sec Loss 1.1859 LearningRate 0.0025 Epoch: 18 Global Step: 189430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:54:53,111-Speed 5551.66 samples/sec Loss 1.1721 LearningRate 0.0025 Epoch: 18 Global Step: 189440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:55:00,561-Speed 5499.15 samples/sec Loss 1.1759 LearningRate 0.0025 Epoch: 18 Global Step: 189450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:55:08,150-Speed 5397.86 samples/sec Loss 1.1622 LearningRate 0.0025 Epoch: 18 Global Step: 189460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:55:15,633-Speed 5474.50 samples/sec Loss 1.1709 LearningRate 0.0025 Epoch: 18 Global Step: 189470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:55:23,061-Speed 5515.14 samples/sec Loss 1.1945 LearningRate 0.0025 Epoch: 18 Global Step: 189480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:55:30,491-Speed 5513.59 samples/sec Loss 1.1953 LearningRate 0.0025 Epoch: 18 Global Step: 189490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:55:37,916-Speed 5517.20 samples/sec Loss 1.1509 LearningRate 0.0025 Epoch: 18 Global Step: 189500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:55:45,350-Speed 5510.81 samples/sec Loss 1.1878 LearningRate 0.0025 Epoch: 18 Global Step: 189510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:55:52,841-Speed 5468.65 samples/sec Loss 1.1665 LearningRate 0.0025 Epoch: 18 Global Step: 189520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:56:00,283-Speed 5504.63 samples/sec Loss 1.1582 LearningRate 0.0025 Epoch: 18 Global Step: 189530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:56:07,747-Speed 5488.33 samples/sec Loss 1.1805 LearningRate 0.0025 Epoch: 18 Global Step: 189540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:56:15,620-Speed 5203.53 samples/sec Loss 1.1526 LearningRate 0.0025 Epoch: 18 Global Step: 189550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:56:23,132-Speed 5453.03 samples/sec Loss 1.1832 LearningRate 0.0025 Epoch: 18 Global Step: 189560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:56:30,757-Speed 5372.76 samples/sec Loss 1.1665 LearningRate 0.0025 Epoch: 18 Global Step: 189570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:56:38,257-Speed 5461.82 samples/sec Loss 1.1705 LearningRate 0.0024 Epoch: 18 Global Step: 189580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:56:45,746-Speed 5470.18 samples/sec Loss 1.1869 LearningRate 0.0024 Epoch: 18 Global Step: 189590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:56:53,243-Speed 5464.41 samples/sec Loss 1.1701 LearningRate 0.0024 Epoch: 18 Global Step: 189600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:57:00,842-Speed 5390.84 samples/sec Loss 1.1747 LearningRate 0.0024 Epoch: 18 Global Step: 189610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:57:08,333-Speed 5469.10 samples/sec Loss 1.1696 LearningRate 0.0024 Epoch: 18 Global Step: 189620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:57:15,787-Speed 5495.52 samples/sec Loss 1.1925 LearningRate 0.0024 Epoch: 18 Global Step: 189630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:57:23,229-Speed 5504.40 samples/sec Loss 1.1733 LearningRate 0.0024 Epoch: 18 Global Step: 189640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:57:30,723-Speed 5466.68 samples/sec Loss 1.1724 LearningRate 0.0024 Epoch: 18 Global Step: 189650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:57:38,215-Speed 5468.00 samples/sec Loss 1.1585 LearningRate 0.0024 Epoch: 18 Global Step: 189660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 13:57:45,719-Speed 5459.02 samples/sec Loss 1.1755 LearningRate 0.0024 Epoch: 18 Global Step: 189670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:57:53,266-Speed 5428.16 samples/sec Loss 1.1910 LearningRate 0.0024 Epoch: 18 Global Step: 189680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:58:00,743-Speed 5478.63 samples/sec Loss 1.1532 LearningRate 0.0024 Epoch: 18 Global Step: 189690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:58:08,196-Speed 5496.85 samples/sec Loss 1.1583 LearningRate 0.0024 Epoch: 18 Global Step: 189700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:58:15,695-Speed 5462.68 samples/sec Loss 1.1591 LearningRate 0.0024 Epoch: 18 Global Step: 189710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:58:23,281-Speed 5400.21 samples/sec Loss 1.1573 LearningRate 0.0024 Epoch: 18 Global Step: 189720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:58:30,852-Speed 5411.00 samples/sec Loss 1.1744 LearningRate 0.0024 Epoch: 18 Global Step: 189730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:58:38,448-Speed 5393.39 samples/sec Loss 1.1749 LearningRate 0.0024 Epoch: 18 Global Step: 189740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:58:45,962-Speed 5451.54 samples/sec Loss 1.1952 LearningRate 0.0024 Epoch: 18 Global Step: 189750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:58:53,373-Speed 5527.63 samples/sec Loss 1.1739 LearningRate 0.0024 Epoch: 18 Global Step: 189760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 13:59:00,831-Speed 5493.13 samples/sec Loss 1.1677 LearningRate 0.0024 Epoch: 18 Global Step: 189770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:59:08,311-Speed 5476.72 samples/sec Loss 1.1841 LearningRate 0.0024 Epoch: 18 Global Step: 189780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:59:16,033-Speed 5304.85 samples/sec Loss 1.1554 LearningRate 0.0024 Epoch: 18 Global Step: 189790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:59:23,604-Speed 5410.77 samples/sec Loss 1.1527 LearningRate 0.0024 Epoch: 18 Global Step: 189800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:59:31,212-Speed 5384.84 samples/sec Loss 1.1660 LearningRate 0.0024 Epoch: 18 Global Step: 189810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:59:39,003-Speed 5257.74 samples/sec Loss 1.1685 LearningRate 0.0024 Epoch: 18 Global Step: 189820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:59:46,762-Speed 5279.70 samples/sec Loss 1.1752 LearningRate 0.0024 Epoch: 18 Global Step: 189830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 13:59:54,307-Speed 5429.36 samples/sec Loss 1.1638 LearningRate 0.0024 Epoch: 18 Global Step: 189840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:00:01,814-Speed 5457.21 samples/sec Loss 1.1842 LearningRate 0.0024 Epoch: 18 Global Step: 189850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:00:09,396-Speed 5403.27 samples/sec Loss 1.1748 LearningRate 0.0024 Epoch: 18 Global Step: 189860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:00:17,147-Speed 5285.04 samples/sec Loss 1.1651 LearningRate 0.0024 Epoch: 18 Global Step: 189870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:00:24,810-Speed 5346.16 samples/sec Loss 1.1851 LearningRate 0.0024 Epoch: 18 Global Step: 189880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:00:32,370-Speed 5418.57 samples/sec Loss 1.1603 LearningRate 0.0024 Epoch: 18 Global Step: 189890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:00:39,788-Speed 5522.49 samples/sec Loss 1.1775 LearningRate 0.0024 Epoch: 18 Global Step: 189900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:00:47,203-Speed 5525.08 samples/sec Loss 1.1382 LearningRate 0.0024 Epoch: 18 Global Step: 189910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:00:54,644-Speed 5505.39 samples/sec Loss 1.1635 LearningRate 0.0024 Epoch: 18 Global Step: 189920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:01:02,138-Speed 5466.21 samples/sec Loss 1.1793 LearningRate 0.0024 Epoch: 18 Global Step: 189930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:01:09,784-Speed 5358.52 samples/sec Loss 1.1690 LearningRate 0.0024 Epoch: 18 Global Step: 189940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:01:17,273-Speed 5469.16 samples/sec Loss 1.1495 LearningRate 0.0023 Epoch: 18 Global Step: 189950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:01:24,885-Speed 5382.37 samples/sec Loss 1.1254 LearningRate 0.0023 Epoch: 18 Global Step: 189960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:01:32,406-Speed 5446.14 samples/sec Loss 1.1638 LearningRate 0.0023 Epoch: 18 Global Step: 189970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:01:39,894-Speed 5470.94 samples/sec Loss 1.1779 LearningRate 0.0023 Epoch: 18 Global Step: 189980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:01:47,367-Speed 5481.94 samples/sec Loss 1.1447 LearningRate 0.0023 Epoch: 18 Global Step: 189990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:01:54,843-Speed 5479.46 samples/sec Loss 1.1486 LearningRate 0.0023 Epoch: 18 Global Step: 190000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:02:38,322-[lfw][190000]XNorm: 22.212047 Training: 2022-01-09 14:02:38,323-[lfw][190000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 14:02:38,323-[lfw][190000]Accuracy-Highest: 0.99850 Training: 2022-01-09 14:03:29,070-[cfp_fp][190000]XNorm: 21.831634 Training: 2022-01-09 14:03:29,071-[cfp_fp][190000]Accuracy-Flip: 0.99343+-0.00420 Training: 2022-01-09 14:03:29,072-[cfp_fp][190000]Accuracy-Highest: 0.99371 Training: 2022-01-09 14:04:12,595-[agedb_30][190000]XNorm: 22.882939 Training: 2022-01-09 14:04:12,596-[agedb_30][190000]Accuracy-Flip: 0.98483+-0.00626 Training: 2022-01-09 14:04:12,596-[agedb_30][190000]Accuracy-Highest: 0.98500 Training: 2022-01-09 14:04:20,135-Speed 281.92 samples/sec Loss 1.1583 LearningRate 0.0023 Epoch: 18 Global Step: 190010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:04:27,623-Speed 5471.14 samples/sec Loss 1.1748 LearningRate 0.0023 Epoch: 18 Global Step: 190020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:04:35,064-Speed 5504.66 samples/sec Loss 1.1609 LearningRate 0.0023 Epoch: 18 Global Step: 190030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:04:42,656-Speed 5396.59 samples/sec Loss 1.1825 LearningRate 0.0023 Epoch: 18 Global Step: 190040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:04:50,309-Speed 5352.18 samples/sec Loss 1.1518 LearningRate 0.0023 Epoch: 18 Global Step: 190050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:04:57,813-Speed 5459.87 samples/sec Loss 1.1557 LearningRate 0.0023 Epoch: 18 Global Step: 190060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:05:05,322-Speed 5455.31 samples/sec Loss 1.1467 LearningRate 0.0023 Epoch: 18 Global Step: 190070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:05:12,782-Speed 5491.50 samples/sec Loss 1.1464 LearningRate 0.0023 Epoch: 18 Global Step: 190080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:05:20,205-Speed 5518.14 samples/sec Loss 1.1572 LearningRate 0.0023 Epoch: 18 Global Step: 190090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:05:27,633-Speed 5515.45 samples/sec Loss 1.1499 LearningRate 0.0023 Epoch: 18 Global Step: 190100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:05:35,090-Speed 5493.40 samples/sec Loss 1.1637 LearningRate 0.0023 Epoch: 18 Global Step: 190110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:05:42,626-Speed 5435.89 samples/sec Loss 1.1666 LearningRate 0.0023 Epoch: 18 Global Step: 190120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:05:50,242-Speed 5378.75 samples/sec Loss 1.1357 LearningRate 0.0023 Epoch: 18 Global Step: 190130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:05:57,805-Speed 5416.92 samples/sec Loss 1.1449 LearningRate 0.0023 Epoch: 18 Global Step: 190140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:06:05,328-Speed 5445.11 samples/sec Loss 1.1563 LearningRate 0.0023 Epoch: 18 Global Step: 190150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:06:12,756-Speed 5514.97 samples/sec Loss 1.1337 LearningRate 0.0023 Epoch: 18 Global Step: 190160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:06:20,177-Speed 5520.23 samples/sec Loss 1.1592 LearningRate 0.0023 Epoch: 18 Global Step: 190170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:06:27,639-Speed 5490.51 samples/sec Loss 1.1311 LearningRate 0.0023 Epoch: 18 Global Step: 190180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:06:35,165-Speed 5443.00 samples/sec Loss 1.1361 LearningRate 0.0023 Epoch: 18 Global Step: 190190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:06:42,613-Speed 5500.43 samples/sec Loss 1.1636 LearningRate 0.0023 Epoch: 18 Global Step: 190200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:06:50,091-Speed 5477.85 samples/sec Loss 1.1446 LearningRate 0.0023 Epoch: 18 Global Step: 190210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:06:57,565-Speed 5481.35 samples/sec Loss 1.1492 LearningRate 0.0023 Epoch: 18 Global Step: 190220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:07:05,113-Speed 5427.21 samples/sec Loss 1.1308 LearningRate 0.0023 Epoch: 18 Global Step: 190230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:07:12,610-Speed 5464.09 samples/sec Loss 1.1480 LearningRate 0.0023 Epoch: 18 Global Step: 190240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:07:20,150-Speed 5433.15 samples/sec Loss 1.1569 LearningRate 0.0023 Epoch: 18 Global Step: 190250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:07:27,675-Speed 5444.20 samples/sec Loss 1.1485 LearningRate 0.0023 Epoch: 18 Global Step: 190260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:07:35,137-Speed 5489.94 samples/sec Loss 1.1682 LearningRate 0.0023 Epoch: 18 Global Step: 190270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:07:42,663-Speed 5443.37 samples/sec Loss 1.1531 LearningRate 0.0023 Epoch: 18 Global Step: 190280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:07:50,161-Speed 5463.87 samples/sec Loss 1.1474 LearningRate 0.0023 Epoch: 18 Global Step: 190290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:07:57,592-Speed 5513.06 samples/sec Loss 1.1465 LearningRate 0.0023 Epoch: 18 Global Step: 190300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:08:05,100-Speed 5455.89 samples/sec Loss 1.1458 LearningRate 0.0023 Epoch: 18 Global Step: 190310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:08:12,592-Speed 5467.74 samples/sec Loss 1.1583 LearningRate 0.0022 Epoch: 18 Global Step: 190320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:08:20,049-Speed 5494.24 samples/sec Loss 1.1733 LearningRate 0.0022 Epoch: 18 Global Step: 190330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:08:27,628-Speed 5404.88 samples/sec Loss 1.1216 LearningRate 0.0022 Epoch: 18 Global Step: 190340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:08:35,204-Speed 5407.24 samples/sec Loss 1.1451 LearningRate 0.0022 Epoch: 18 Global Step: 190350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:08:42,699-Speed 5465.81 samples/sec Loss 1.1443 LearningRate 0.0022 Epoch: 18 Global Step: 190360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:08:50,356-Speed 5350.32 samples/sec Loss 1.1204 LearningRate 0.0022 Epoch: 18 Global Step: 190370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:08:57,938-Speed 5402.69 samples/sec Loss 1.1375 LearningRate 0.0022 Epoch: 18 Global Step: 190380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:09:05,371-Speed 5511.20 samples/sec Loss 1.1260 LearningRate 0.0022 Epoch: 18 Global Step: 190390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:09:12,827-Speed 5494.39 samples/sec Loss 1.1700 LearningRate 0.0022 Epoch: 18 Global Step: 190400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:09:20,266-Speed 5507.10 samples/sec Loss 1.1488 LearningRate 0.0022 Epoch: 18 Global Step: 190410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:09:27,755-Speed 5469.55 samples/sec Loss 1.1704 LearningRate 0.0022 Epoch: 18 Global Step: 190420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:09:35,296-Speed 5432.63 samples/sec Loss 1.1655 LearningRate 0.0022 Epoch: 18 Global Step: 190430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:09:42,820-Speed 5444.93 samples/sec Loss 1.1488 LearningRate 0.0022 Epoch: 18 Global Step: 190440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:09:50,349-Speed 5440.94 samples/sec Loss 1.1335 LearningRate 0.0022 Epoch: 18 Global Step: 190450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:09:57,894-Speed 5428.77 samples/sec Loss 1.1312 LearningRate 0.0022 Epoch: 18 Global Step: 190460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:10:05,348-Speed 5496.43 samples/sec Loss 1.1532 LearningRate 0.0022 Epoch: 18 Global Step: 190470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:10:12,876-Speed 5441.88 samples/sec Loss 1.1589 LearningRate 0.0022 Epoch: 18 Global Step: 190480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:10:20,337-Speed 5490.24 samples/sec Loss 1.1320 LearningRate 0.0022 Epoch: 18 Global Step: 190490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:10:27,816-Speed 5477.62 samples/sec Loss 1.1612 LearningRate 0.0022 Epoch: 18 Global Step: 190500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:10:35,340-Speed 5444.13 samples/sec Loss 1.1548 LearningRate 0.0022 Epoch: 18 Global Step: 190510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:10:42,908-Speed 5412.91 samples/sec Loss 1.1522 LearningRate 0.0022 Epoch: 18 Global Step: 190520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:10:50,466-Speed 5420.80 samples/sec Loss 1.1263 LearningRate 0.0022 Epoch: 18 Global Step: 190530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:10:57,937-Speed 5482.94 samples/sec Loss 1.1632 LearningRate 0.0022 Epoch: 18 Global Step: 190540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:11:05,406-Speed 5484.36 samples/sec Loss 1.1616 LearningRate 0.0022 Epoch: 18 Global Step: 190550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:11:12,951-Speed 5430.01 samples/sec Loss 1.1432 LearningRate 0.0022 Epoch: 18 Global Step: 190560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:11:20,485-Speed 5437.70 samples/sec Loss 1.1134 LearningRate 0.0022 Epoch: 18 Global Step: 190570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:11:27,977-Speed 5467.75 samples/sec Loss 1.1476 LearningRate 0.0022 Epoch: 18 Global Step: 190580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:11:35,437-Speed 5490.69 samples/sec Loss 1.1476 LearningRate 0.0022 Epoch: 18 Global Step: 190590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:11:42,955-Speed 5449.62 samples/sec Loss 1.1246 LearningRate 0.0022 Epoch: 18 Global Step: 190600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:11:50,431-Speed 5479.13 samples/sec Loss 1.1536 LearningRate 0.0022 Epoch: 18 Global Step: 190610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:11:57,983-Speed 5425.07 samples/sec Loss 1.1222 LearningRate 0.0022 Epoch: 18 Global Step: 190620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:12:05,517-Speed 5437.19 samples/sec Loss 1.1527 LearningRate 0.0022 Epoch: 18 Global Step: 190630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:12:12,973-Speed 5493.99 samples/sec Loss 1.1421 LearningRate 0.0022 Epoch: 18 Global Step: 190640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:12:20,501-Speed 5441.56 samples/sec Loss 1.1212 LearningRate 0.0022 Epoch: 18 Global Step: 190650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:12:27,982-Speed 5476.19 samples/sec Loss 1.1330 LearningRate 0.0022 Epoch: 18 Global Step: 190660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:12:35,587-Speed 5386.82 samples/sec Loss 1.1361 LearningRate 0.0022 Epoch: 18 Global Step: 190670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:12:43,220-Speed 5367.09 samples/sec Loss 1.1305 LearningRate 0.0022 Epoch: 18 Global Step: 190680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:12:50,849-Speed 5369.08 samples/sec Loss 1.1210 LearningRate 0.0022 Epoch: 18 Global Step: 190690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:12:58,292-Speed 5504.14 samples/sec Loss 1.1534 LearningRate 0.0022 Epoch: 18 Global Step: 190700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:13:05,862-Speed 5411.67 samples/sec Loss 1.1326 LearningRate 0.0021 Epoch: 18 Global Step: 190710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:13:13,240-Speed 5552.67 samples/sec Loss 1.1257 LearningRate 0.0021 Epoch: 18 Global Step: 190720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:13:20,772-Speed 5439.19 samples/sec Loss 1.1500 LearningRate 0.0021 Epoch: 18 Global Step: 190730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:13:28,313-Speed 5431.70 samples/sec Loss 1.1379 LearningRate 0.0021 Epoch: 18 Global Step: 190740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:13:36,128-Speed 5242.01 samples/sec Loss 1.1208 LearningRate 0.0021 Epoch: 18 Global Step: 190750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:13:43,584-Speed 5494.99 samples/sec Loss 1.1230 LearningRate 0.0021 Epoch: 18 Global Step: 190760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:13:51,047-Speed 5488.72 samples/sec Loss 1.1317 LearningRate 0.0021 Epoch: 18 Global Step: 190770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:13:58,603-Speed 5421.90 samples/sec Loss 1.1231 LearningRate 0.0021 Epoch: 18 Global Step: 190780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:14:06,180-Speed 5406.85 samples/sec Loss 1.1356 LearningRate 0.0021 Epoch: 18 Global Step: 190790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:14:13,742-Speed 5416.74 samples/sec Loss 1.1629 LearningRate 0.0021 Epoch: 18 Global Step: 190800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:14:21,172-Speed 5513.66 samples/sec Loss 1.1228 LearningRate 0.0021 Epoch: 18 Global Step: 190810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:14:28,648-Speed 5479.45 samples/sec Loss 1.1555 LearningRate 0.0021 Epoch: 18 Global Step: 190820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:14:36,137-Speed 5470.33 samples/sec Loss 1.1359 LearningRate 0.0021 Epoch: 18 Global Step: 190830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:14:43,741-Speed 5387.59 samples/sec Loss 1.1221 LearningRate 0.0021 Epoch: 18 Global Step: 190840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:14:51,209-Speed 5485.49 samples/sec Loss 1.1289 LearningRate 0.0021 Epoch: 18 Global Step: 190850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:14:58,672-Speed 5488.97 samples/sec Loss 1.1280 LearningRate 0.0021 Epoch: 18 Global Step: 190860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:15:06,226-Speed 5423.43 samples/sec Loss 1.1428 LearningRate 0.0021 Epoch: 18 Global Step: 190870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:15:13,909-Speed 5331.28 samples/sec Loss 1.1178 LearningRate 0.0021 Epoch: 18 Global Step: 190880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:15:21,535-Speed 5371.88 samples/sec Loss 1.1398 LearningRate 0.0021 Epoch: 18 Global Step: 190890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:15:29,025-Speed 5469.14 samples/sec Loss 1.1182 LearningRate 0.0021 Epoch: 18 Global Step: 190900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:15:36,634-Speed 5384.57 samples/sec Loss 1.1454 LearningRate 0.0021 Epoch: 18 Global Step: 190910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:15:44,150-Speed 5450.05 samples/sec Loss 1.1163 LearningRate 0.0021 Epoch: 18 Global Step: 190920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:15:51,611-Speed 5490.69 samples/sec Loss 1.1108 LearningRate 0.0021 Epoch: 18 Global Step: 190930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:15:59,071-Speed 5491.19 samples/sec Loss 1.1152 LearningRate 0.0021 Epoch: 18 Global Step: 190940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:16:06,793-Speed 5305.26 samples/sec Loss 1.1276 LearningRate 0.0021 Epoch: 18 Global Step: 190950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:16:14,357-Speed 5416.22 samples/sec Loss 1.1311 LearningRate 0.0021 Epoch: 18 Global Step: 190960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:16:21,879-Speed 5446.00 samples/sec Loss 1.1168 LearningRate 0.0021 Epoch: 18 Global Step: 190970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:16:29,361-Speed 5475.05 samples/sec Loss 1.1254 LearningRate 0.0021 Epoch: 18 Global Step: 190980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:16:36,861-Speed 5461.85 samples/sec Loss 1.1192 LearningRate 0.0021 Epoch: 18 Global Step: 190990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:16:44,311-Speed 5499.17 samples/sec Loss 1.1380 LearningRate 0.0021 Epoch: 18 Global Step: 191000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:16:51,762-Speed 5497.50 samples/sec Loss 1.1279 LearningRate 0.0021 Epoch: 18 Global Step: 191010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:16:59,370-Speed 5384.65 samples/sec Loss 1.1576 LearningRate 0.0021 Epoch: 18 Global Step: 191020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:17:06,922-Speed 5424.70 samples/sec Loss 1.1240 LearningRate 0.0021 Epoch: 18 Global Step: 191030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:17:14,409-Speed 5471.29 samples/sec Loss 1.1208 LearningRate 0.0021 Epoch: 18 Global Step: 191040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:17:21,929-Speed 5447.52 samples/sec Loss 1.1246 LearningRate 0.0021 Epoch: 18 Global Step: 191050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:17:29,437-Speed 5456.54 samples/sec Loss 1.1034 LearningRate 0.0021 Epoch: 18 Global Step: 191060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:17:36,964-Speed 5442.18 samples/sec Loss 1.0985 LearningRate 0.0021 Epoch: 18 Global Step: 191070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:17:44,404-Speed 5506.69 samples/sec Loss 1.1462 LearningRate 0.0021 Epoch: 18 Global Step: 191080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:17:51,892-Speed 5470.84 samples/sec Loss 1.1475 LearningRate 0.0021 Epoch: 18 Global Step: 191090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:17:59,530-Speed 5363.03 samples/sec Loss 1.1230 LearningRate 0.0020 Epoch: 18 Global Step: 191100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:18:07,188-Speed 5349.29 samples/sec Loss 1.1065 LearningRate 0.0020 Epoch: 18 Global Step: 191110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:18:14,723-Speed 5436.48 samples/sec Loss 1.1052 LearningRate 0.0020 Epoch: 18 Global Step: 191120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:18:22,245-Speed 5446.61 samples/sec Loss 1.1048 LearningRate 0.0020 Epoch: 18 Global Step: 191130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:18:29,698-Speed 5496.42 samples/sec Loss 1.1314 LearningRate 0.0020 Epoch: 18 Global Step: 191140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:18:37,181-Speed 5474.53 samples/sec Loss 1.1207 LearningRate 0.0020 Epoch: 18 Global Step: 191150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:18:44,770-Speed 5397.59 samples/sec Loss 1.1205 LearningRate 0.0020 Epoch: 18 Global Step: 191160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:18:52,266-Speed 5464.83 samples/sec Loss 1.1223 LearningRate 0.0020 Epoch: 18 Global Step: 191170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:18:59,746-Speed 5476.82 samples/sec Loss 1.1237 LearningRate 0.0020 Epoch: 18 Global Step: 191180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:19:07,250-Speed 5459.64 samples/sec Loss 1.1053 LearningRate 0.0020 Epoch: 18 Global Step: 191190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:19:14,889-Speed 5361.99 samples/sec Loss 1.1205 LearningRate 0.0020 Epoch: 18 Global Step: 191200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:19:22,329-Speed 5506.64 samples/sec Loss 1.1432 LearningRate 0.0020 Epoch: 18 Global Step: 191210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:19:29,846-Speed 5450.66 samples/sec Loss 1.1026 LearningRate 0.0020 Epoch: 18 Global Step: 191220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:19:37,318-Speed 5482.37 samples/sec Loss 1.1378 LearningRate 0.0020 Epoch: 18 Global Step: 191230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:19:44,860-Speed 5431.77 samples/sec Loss 1.1261 LearningRate 0.0020 Epoch: 18 Global Step: 191240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:19:52,352-Speed 5467.63 samples/sec Loss 1.1103 LearningRate 0.0020 Epoch: 18 Global Step: 191250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:19:59,885-Speed 5437.77 samples/sec Loss 1.1224 LearningRate 0.0020 Epoch: 18 Global Step: 191260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:20:07,537-Speed 5354.18 samples/sec Loss 1.1052 LearningRate 0.0020 Epoch: 18 Global Step: 191270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:20:15,168-Speed 5367.97 samples/sec Loss 1.1235 LearningRate 0.0020 Epoch: 18 Global Step: 191280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:20:22,855-Speed 5329.05 samples/sec Loss 1.1323 LearningRate 0.0020 Epoch: 18 Global Step: 191290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:20:30,546-Speed 5326.77 samples/sec Loss 1.1287 LearningRate 0.0020 Epoch: 18 Global Step: 191300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:20:38,095-Speed 5426.28 samples/sec Loss 1.1129 LearningRate 0.0020 Epoch: 18 Global Step: 191310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:20:45,689-Speed 5394.48 samples/sec Loss 1.0883 LearningRate 0.0020 Epoch: 18 Global Step: 191320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:20:53,128-Speed 5506.90 samples/sec Loss 1.1107 LearningRate 0.0020 Epoch: 18 Global Step: 191330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:21:00,530-Speed 5534.03 samples/sec Loss 1.1124 LearningRate 0.0020 Epoch: 18 Global Step: 191340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:21:07,980-Speed 5498.61 samples/sec Loss 1.1069 LearningRate 0.0020 Epoch: 18 Global Step: 191350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:21:15,717-Speed 5295.31 samples/sec Loss 1.1257 LearningRate 0.0020 Epoch: 18 Global Step: 191360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:21:23,259-Speed 5431.38 samples/sec Loss 1.1150 LearningRate 0.0020 Epoch: 18 Global Step: 191370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:21:30,859-Speed 5390.16 samples/sec Loss 1.1039 LearningRate 0.0020 Epoch: 18 Global Step: 191380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:21:38,456-Speed 5392.33 samples/sec Loss 1.1138 LearningRate 0.0020 Epoch: 18 Global Step: 191390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:21:45,964-Speed 5456.35 samples/sec Loss 1.1153 LearningRate 0.0020 Epoch: 18 Global Step: 191400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:21:53,447-Speed 5474.38 samples/sec Loss 1.1321 LearningRate 0.0020 Epoch: 18 Global Step: 191410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:22:01,138-Speed 5326.74 samples/sec Loss 1.1273 LearningRate 0.0020 Epoch: 18 Global Step: 191420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:22:08,645-Speed 5456.46 samples/sec Loss 1.1034 LearningRate 0.0020 Epoch: 18 Global Step: 191430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:22:16,201-Speed 5422.30 samples/sec Loss 1.0974 LearningRate 0.0020 Epoch: 18 Global Step: 191440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:22:23,707-Speed 5457.09 samples/sec Loss 1.1105 LearningRate 0.0020 Epoch: 18 Global Step: 191450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:22:31,283-Speed 5407.27 samples/sec Loss 1.1051 LearningRate 0.0020 Epoch: 18 Global Step: 191460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:22:38,763-Speed 5476.73 samples/sec Loss 1.1358 LearningRate 0.0020 Epoch: 18 Global Step: 191470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:22:46,199-Speed 5509.71 samples/sec Loss 1.1340 LearningRate 0.0020 Epoch: 18 Global Step: 191480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:22:53,705-Speed 5457.46 samples/sec Loss 1.1262 LearningRate 0.0020 Epoch: 18 Global Step: 191490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:23:01,251-Speed 5429.01 samples/sec Loss 1.1401 LearningRate 0.0019 Epoch: 18 Global Step: 191500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:23:08,805-Speed 5422.36 samples/sec Loss 1.1166 LearningRate 0.0019 Epoch: 18 Global Step: 191510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:23:16,416-Speed 5383.00 samples/sec Loss 1.1267 LearningRate 0.0019 Epoch: 18 Global Step: 191520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:23:23,923-Speed 5457.17 samples/sec Loss 1.1185 LearningRate 0.0019 Epoch: 18 Global Step: 191530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:23:31,569-Speed 5357.77 samples/sec Loss 1.1207 LearningRate 0.0019 Epoch: 18 Global Step: 191540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:23:39,217-Speed 5356.59 samples/sec Loss 1.0921 LearningRate 0.0019 Epoch: 18 Global Step: 191550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:23:46,779-Speed 5417.17 samples/sec Loss 1.1105 LearningRate 0.0019 Epoch: 18 Global Step: 191560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:23:54,297-Speed 5449.51 samples/sec Loss 1.0997 LearningRate 0.0019 Epoch: 18 Global Step: 191570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:24:01,865-Speed 5412.94 samples/sec Loss 1.1138 LearningRate 0.0019 Epoch: 18 Global Step: 191580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:24:09,416-Speed 5424.48 samples/sec Loss 1.1102 LearningRate 0.0019 Epoch: 18 Global Step: 191590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:24:16,964-Speed 5427.95 samples/sec Loss 1.1132 LearningRate 0.0019 Epoch: 18 Global Step: 191600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:24:24,505-Speed 5431.98 samples/sec Loss 1.1008 LearningRate 0.0019 Epoch: 18 Global Step: 191610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:24:32,125-Speed 5376.48 samples/sec Loss 1.0865 LearningRate 0.0019 Epoch: 18 Global Step: 191620 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:24:39,590-Speed 5486.98 samples/sec Loss 1.1082 LearningRate 0.0019 Epoch: 18 Global Step: 191630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:24:47,084-Speed 5466.82 samples/sec Loss 1.1281 LearningRate 0.0019 Epoch: 18 Global Step: 191640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:24:54,715-Speed 5368.51 samples/sec Loss 1.1111 LearningRate 0.0019 Epoch: 18 Global Step: 191650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:25:02,229-Speed 5451.39 samples/sec Loss 1.1078 LearningRate 0.0019 Epoch: 18 Global Step: 191660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:25:09,740-Speed 5454.44 samples/sec Loss 1.1176 LearningRate 0.0019 Epoch: 18 Global Step: 191670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:25:17,225-Speed 5472.81 samples/sec Loss 1.1123 LearningRate 0.0019 Epoch: 18 Global Step: 191680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:25:24,687-Speed 5490.45 samples/sec Loss 1.1105 LearningRate 0.0019 Epoch: 18 Global Step: 191690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:25:32,092-Speed 5531.98 samples/sec Loss 1.1084 LearningRate 0.0019 Epoch: 18 Global Step: 191700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:25:39,601-Speed 5455.28 samples/sec Loss 1.1022 LearningRate 0.0019 Epoch: 18 Global Step: 191710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:25:47,182-Speed 5403.54 samples/sec Loss 1.0970 LearningRate 0.0019 Epoch: 18 Global Step: 191720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:25:54,650-Speed 5485.85 samples/sec Loss 1.1174 LearningRate 0.0019 Epoch: 18 Global Step: 191730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:26:02,225-Speed 5408.07 samples/sec Loss 1.1100 LearningRate 0.0019 Epoch: 18 Global Step: 191740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:26:09,999-Speed 5268.91 samples/sec Loss 1.1221 LearningRate 0.0019 Epoch: 18 Global Step: 191750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:26:17,459-Speed 5491.80 samples/sec Loss 1.0991 LearningRate 0.0019 Epoch: 18 Global Step: 191760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:26:24,994-Speed 5437.14 samples/sec Loss 1.1236 LearningRate 0.0019 Epoch: 18 Global Step: 191770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:26:32,539-Speed 5429.05 samples/sec Loss 1.0831 LearningRate 0.0019 Epoch: 18 Global Step: 191780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:26:40,074-Speed 5437.34 samples/sec Loss 1.1271 LearningRate 0.0019 Epoch: 18 Global Step: 191790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:26:47,456-Speed 5548.74 samples/sec Loss 1.1310 LearningRate 0.0019 Epoch: 18 Global Step: 191800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:26:55,052-Speed 5393.12 samples/sec Loss 1.0900 LearningRate 0.0019 Epoch: 18 Global Step: 191810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:27:02,769-Speed 5308.54 samples/sec Loss 1.1146 LearningRate 0.0019 Epoch: 18 Global Step: 191820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:27:10,221-Speed 5497.40 samples/sec Loss 1.1011 LearningRate 0.0019 Epoch: 18 Global Step: 191830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:27:17,703-Speed 5475.07 samples/sec Loss 1.0953 LearningRate 0.0019 Epoch: 18 Global Step: 191840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:27:25,088-Speed 5547.76 samples/sec Loss 1.1217 LearningRate 0.0019 Epoch: 18 Global Step: 191850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:27:32,534-Speed 5501.36 samples/sec Loss 1.1190 LearningRate 0.0019 Epoch: 18 Global Step: 191860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:27:40,015-Speed 5476.11 samples/sec Loss 1.0998 LearningRate 0.0019 Epoch: 18 Global Step: 191870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:27:47,468-Speed 5496.18 samples/sec Loss 1.0964 LearningRate 0.0019 Epoch: 18 Global Step: 191880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:27:54,938-Speed 5484.15 samples/sec Loss 1.0967 LearningRate 0.0019 Epoch: 18 Global Step: 191890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-09 14:28:02,428-Speed 5469.55 samples/sec Loss 1.0922 LearningRate 0.0019 Epoch: 18 Global Step: 191900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:28:09,905-Speed 5479.03 samples/sec Loss 1.1068 LearningRate 0.0018 Epoch: 18 Global Step: 191910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:28:17,381-Speed 5479.39 samples/sec Loss 1.0809 LearningRate 0.0018 Epoch: 18 Global Step: 191920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:28:24,834-Speed 5496.31 samples/sec Loss 1.0961 LearningRate 0.0018 Epoch: 18 Global Step: 191930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:28:32,334-Speed 5462.27 samples/sec Loss 1.1162 LearningRate 0.0018 Epoch: 18 Global Step: 191940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:28:39,762-Speed 5515.01 samples/sec Loss 1.0912 LearningRate 0.0018 Epoch: 18 Global Step: 191950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:28:47,216-Speed 5495.55 samples/sec Loss 1.0973 LearningRate 0.0018 Epoch: 18 Global Step: 191960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:28:54,848-Speed 5367.79 samples/sec Loss 1.1096 LearningRate 0.0018 Epoch: 18 Global Step: 191970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:29:02,361-Speed 5452.74 samples/sec Loss 1.0868 LearningRate 0.0018 Epoch: 18 Global Step: 191980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:29:09,863-Speed 5460.83 samples/sec Loss 1.0998 LearningRate 0.0018 Epoch: 18 Global Step: 191990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:29:17,337-Speed 5481.28 samples/sec Loss 1.0940 LearningRate 0.0018 Epoch: 18 Global Step: 192000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:30:01,117-[lfw][192000]XNorm: 22.344807 Training: 2022-01-09 14:30:01,118-[lfw][192000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 14:30:01,118-[lfw][192000]Accuracy-Highest: 0.99850 Training: 2022-01-09 14:30:51,921-[cfp_fp][192000]XNorm: 22.083677 Training: 2022-01-09 14:30:51,922-[cfp_fp][192000]Accuracy-Flip: 0.99443+-0.00341 Training: 2022-01-09 14:30:51,922-[cfp_fp][192000]Accuracy-Highest: 0.99443 Training: 2022-01-09 14:31:35,781-[agedb_30][192000]XNorm: 22.929586 Training: 2022-01-09 14:31:35,781-[agedb_30][192000]Accuracy-Flip: 0.98617+-0.00619 Training: 2022-01-09 14:31:35,782-[agedb_30][192000]Accuracy-Highest: 0.98617 Training: 2022-01-09 14:31:43,305-Speed 280.61 samples/sec Loss 1.0934 LearningRate 0.0018 Epoch: 18 Global Step: 192010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:31:50,859-Speed 5422.79 samples/sec Loss 1.0835 LearningRate 0.0018 Epoch: 18 Global Step: 192020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:31:58,388-Speed 5441.54 samples/sec Loss 1.0993 LearningRate 0.0018 Epoch: 18 Global Step: 192030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:32:05,951-Speed 5416.40 samples/sec Loss 1.0793 LearningRate 0.0018 Epoch: 18 Global Step: 192040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:32:13,553-Speed 5388.22 samples/sec Loss 1.0838 LearningRate 0.0018 Epoch: 18 Global Step: 192050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:32:21,054-Speed 5461.29 samples/sec Loss 1.0829 LearningRate 0.0018 Epoch: 18 Global Step: 192060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:32:28,686-Speed 5368.00 samples/sec Loss 1.0745 LearningRate 0.0018 Epoch: 18 Global Step: 192070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:32:36,169-Speed 5474.52 samples/sec Loss 1.0968 LearningRate 0.0018 Epoch: 18 Global Step: 192080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-09 14:32:43,774-Speed 5386.64 samples/sec Loss 1.0893 LearningRate 0.0018 Epoch: 18 Global Step: 192090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:32:51,243-Speed 5484.66 samples/sec Loss 1.1081 LearningRate 0.0018 Epoch: 18 Global Step: 192100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-09 14:32:58,718-Speed 5480.65 samples/sec Loss 1.0916 LearningRate 0.0018 Epoch: 18 Global Step: 192110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:33:06,375-Speed 5349.45 samples/sec Loss 1.0764 LearningRate 0.0018 Epoch: 18 Global Step: 192120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:33:14,000-Speed 5372.19 samples/sec Loss 1.0854 LearningRate 0.0018 Epoch: 18 Global Step: 192130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:33:21,622-Speed 5374.98 samples/sec Loss 1.0836 LearningRate 0.0018 Epoch: 18 Global Step: 192140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:33:29,072-Speed 5498.99 samples/sec Loss 1.0895 LearningRate 0.0018 Epoch: 18 Global Step: 192150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:33:36,513-Speed 5504.72 samples/sec Loss 1.1269 LearningRate 0.0018 Epoch: 18 Global Step: 192160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:33:44,020-Speed 5457.27 samples/sec Loss 1.0903 LearningRate 0.0018 Epoch: 18 Global Step: 192170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:33:51,563-Speed 5430.34 samples/sec Loss 1.1141 LearningRate 0.0018 Epoch: 18 Global Step: 192180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:33:59,088-Speed 5444.62 samples/sec Loss 1.0849 LearningRate 0.0018 Epoch: 18 Global Step: 192190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 14:34:06,601-Speed 5452.19 samples/sec Loss 1.0990 LearningRate 0.0018 Epoch: 18 Global Step: 192200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 14:34:14,140-Speed 5433.61 samples/sec Loss 1.0858 LearningRate 0.0018 Epoch: 18 Global Step: 192210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 14:34:21,691-Speed 5424.97 samples/sec Loss 1.0946 LearningRate 0.0018 Epoch: 18 Global Step: 192220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 14:34:29,196-Speed 5458.85 samples/sec Loss 1.1040 LearningRate 0.0018 Epoch: 18 Global Step: 192230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:34:36,662-Speed 5487.16 samples/sec Loss 1.0866 LearningRate 0.0018 Epoch: 18 Global Step: 192240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:34:44,168-Speed 5457.27 samples/sec Loss 1.0879 LearningRate 0.0018 Epoch: 18 Global Step: 192250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:34:51,658-Speed 5469.22 samples/sec Loss 1.1032 LearningRate 0.0018 Epoch: 18 Global Step: 192260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:34:59,108-Speed 5499.35 samples/sec Loss 1.0756 LearningRate 0.0018 Epoch: 18 Global Step: 192270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:35:06,630-Speed 5445.91 samples/sec Loss 1.0972 LearningRate 0.0018 Epoch: 18 Global Step: 192280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:35:14,200-Speed 5411.39 samples/sec Loss 1.0825 LearningRate 0.0018 Epoch: 18 Global Step: 192290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:35:21,674-Speed 5481.04 samples/sec Loss 1.1086 LearningRate 0.0018 Epoch: 18 Global Step: 192300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:35:29,162-Speed 5470.81 samples/sec Loss 1.0938 LearningRate 0.0018 Epoch: 18 Global Step: 192310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:35:36,699-Speed 5434.82 samples/sec Loss 1.0949 LearningRate 0.0018 Epoch: 18 Global Step: 192320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:35:44,180-Speed 5476.10 samples/sec Loss 1.0854 LearningRate 0.0018 Epoch: 18 Global Step: 192330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:35:51,786-Speed 5385.59 samples/sec Loss 1.0835 LearningRate 0.0017 Epoch: 18 Global Step: 192340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:35:59,513-Speed 5301.94 samples/sec Loss 1.0833 LearningRate 0.0017 Epoch: 18 Global Step: 192350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:36:07,049-Speed 5435.63 samples/sec Loss 1.0999 LearningRate 0.0017 Epoch: 18 Global Step: 192360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:36:14,636-Speed 5399.25 samples/sec Loss 1.1032 LearningRate 0.0017 Epoch: 18 Global Step: 192370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:36:22,106-Speed 5484.47 samples/sec Loss 1.1002 LearningRate 0.0017 Epoch: 18 Global Step: 192380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:36:29,585-Speed 5477.27 samples/sec Loss 1.0821 LearningRate 0.0017 Epoch: 18 Global Step: 192390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:36:37,079-Speed 5466.76 samples/sec Loss 1.0838 LearningRate 0.0017 Epoch: 18 Global Step: 192400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:36:44,692-Speed 5380.59 samples/sec Loss 1.1032 LearningRate 0.0017 Epoch: 18 Global Step: 192410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:36:52,177-Speed 5472.72 samples/sec Loss 1.1005 LearningRate 0.0017 Epoch: 18 Global Step: 192420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:36:59,666-Speed 5470.20 samples/sec Loss 1.0899 LearningRate 0.0017 Epoch: 18 Global Step: 192430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:37:07,287-Speed 5375.26 samples/sec Loss 1.0816 LearningRate 0.0017 Epoch: 18 Global Step: 192440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:37:14,848-Speed 5418.22 samples/sec Loss 1.0950 LearningRate 0.0017 Epoch: 18 Global Step: 192450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:37:22,370-Speed 5445.97 samples/sec Loss 1.0899 LearningRate 0.0017 Epoch: 18 Global Step: 192460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:37:29,864-Speed 5467.12 samples/sec Loss 1.1230 LearningRate 0.0017 Epoch: 18 Global Step: 192470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:37:37,387-Speed 5445.30 samples/sec Loss 1.0760 LearningRate 0.0017 Epoch: 18 Global Step: 192480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:37:44,962-Speed 5407.71 samples/sec Loss 1.0897 LearningRate 0.0017 Epoch: 18 Global Step: 192490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:37:52,507-Speed 5429.70 samples/sec Loss 1.0877 LearningRate 0.0017 Epoch: 18 Global Step: 192500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:38:00,130-Speed 5373.71 samples/sec Loss 1.0754 LearningRate 0.0017 Epoch: 18 Global Step: 192510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:38:07,730-Speed 5390.34 samples/sec Loss 1.0932 LearningRate 0.0017 Epoch: 18 Global Step: 192520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:38:15,250-Speed 5448.03 samples/sec Loss 1.0915 LearningRate 0.0017 Epoch: 18 Global Step: 192530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:38:22,691-Speed 5505.10 samples/sec Loss 1.0800 LearningRate 0.0017 Epoch: 18 Global Step: 192540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:38:30,200-Speed 5455.77 samples/sec Loss 1.0936 LearningRate 0.0017 Epoch: 18 Global Step: 192550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:38:37,682-Speed 5475.02 samples/sec Loss 1.0810 LearningRate 0.0017 Epoch: 18 Global Step: 192560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:38:45,143-Speed 5490.66 samples/sec Loss 1.0929 LearningRate 0.0017 Epoch: 18 Global Step: 192570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:38:52,691-Speed 5427.36 samples/sec Loss 1.0826 LearningRate 0.0017 Epoch: 18 Global Step: 192580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:39:00,141-Speed 5498.52 samples/sec Loss 1.0894 LearningRate 0.0017 Epoch: 18 Global Step: 192590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:39:07,563-Speed 5520.03 samples/sec Loss 1.0842 LearningRate 0.0017 Epoch: 18 Global Step: 192600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:39:15,039-Speed 5479.84 samples/sec Loss 1.0621 LearningRate 0.0017 Epoch: 18 Global Step: 192610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:39:22,482-Speed 5503.38 samples/sec Loss 1.0730 LearningRate 0.0017 Epoch: 18 Global Step: 192620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:39:30,017-Speed 5437.09 samples/sec Loss 1.0992 LearningRate 0.0017 Epoch: 18 Global Step: 192630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:39:37,558-Speed 5432.38 samples/sec Loss 1.0693 LearningRate 0.0017 Epoch: 18 Global Step: 192640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:39:45,047-Speed 5470.08 samples/sec Loss 1.0808 LearningRate 0.0017 Epoch: 18 Global Step: 192650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:39:52,590-Speed 5431.02 samples/sec Loss 1.0989 LearningRate 0.0017 Epoch: 18 Global Step: 192660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:40:00,247-Speed 5349.88 samples/sec Loss 1.0792 LearningRate 0.0017 Epoch: 18 Global Step: 192670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:40:07,716-Speed 5485.18 samples/sec Loss 1.0971 LearningRate 0.0017 Epoch: 18 Global Step: 192680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:40:15,456-Speed 5292.88 samples/sec Loss 1.0851 LearningRate 0.0017 Epoch: 18 Global Step: 192690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:40:22,987-Speed 5439.73 samples/sec Loss 1.0953 LearningRate 0.0017 Epoch: 18 Global Step: 192700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 14:40:30,436-Speed 5499.52 samples/sec Loss 1.0876 LearningRate 0.0017 Epoch: 18 Global Step: 192710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 14:40:37,942-Speed 5457.39 samples/sec Loss 1.0842 LearningRate 0.0017 Epoch: 18 Global Step: 192720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:40:45,417-Speed 5480.15 samples/sec Loss 1.0726 LearningRate 0.0017 Epoch: 18 Global Step: 192730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:40:52,903-Speed 5472.66 samples/sec Loss 1.0729 LearningRate 0.0017 Epoch: 18 Global Step: 192740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:41:00,374-Speed 5483.46 samples/sec Loss 1.0886 LearningRate 0.0017 Epoch: 18 Global Step: 192750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:41:07,885-Speed 5453.80 samples/sec Loss 1.0821 LearningRate 0.0017 Epoch: 18 Global Step: 192760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:41:15,347-Speed 5490.03 samples/sec Loss 1.0742 LearningRate 0.0016 Epoch: 18 Global Step: 192770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:41:22,805-Speed 5493.07 samples/sec Loss 1.0791 LearningRate 0.0016 Epoch: 18 Global Step: 192780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:41:30,325-Speed 5447.49 samples/sec Loss 1.0577 LearningRate 0.0016 Epoch: 18 Global Step: 192790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:41:37,781-Speed 5494.29 samples/sec Loss 1.0686 LearningRate 0.0016 Epoch: 18 Global Step: 192800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:41:45,263-Speed 5475.33 samples/sec Loss 1.0702 LearningRate 0.0016 Epoch: 18 Global Step: 192810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:41:52,996-Speed 5297.83 samples/sec Loss 1.0717 LearningRate 0.0016 Epoch: 18 Global Step: 192820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:42:00,632-Speed 5364.56 samples/sec Loss 1.0750 LearningRate 0.0016 Epoch: 18 Global Step: 192830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:42:08,186-Speed 5422.85 samples/sec Loss 1.0806 LearningRate 0.0016 Epoch: 18 Global Step: 192840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:42:15,688-Speed 5460.54 samples/sec Loss 1.0831 LearningRate 0.0016 Epoch: 18 Global Step: 192850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:42:23,173-Speed 5473.17 samples/sec Loss 1.0908 LearningRate 0.0016 Epoch: 18 Global Step: 192860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:42:30,882-Speed 5313.94 samples/sec Loss 1.0684 LearningRate 0.0016 Epoch: 18 Global Step: 192870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:42:38,418-Speed 5436.24 samples/sec Loss 1.0711 LearningRate 0.0016 Epoch: 18 Global Step: 192880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:42:46,115-Speed 5322.12 samples/sec Loss 1.0805 LearningRate 0.0016 Epoch: 18 Global Step: 192890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:42:53,568-Speed 5496.79 samples/sec Loss 1.0928 LearningRate 0.0016 Epoch: 18 Global Step: 192900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:43:01,054-Speed 5472.31 samples/sec Loss 1.0736 LearningRate 0.0016 Epoch: 18 Global Step: 192910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:43:08,567-Speed 5452.36 samples/sec Loss 1.0857 LearningRate 0.0016 Epoch: 18 Global Step: 192920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:43:16,154-Speed 5399.86 samples/sec Loss 1.0589 LearningRate 0.0016 Epoch: 18 Global Step: 192930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:43:23,660-Speed 5457.69 samples/sec Loss 1.0711 LearningRate 0.0016 Epoch: 18 Global Step: 192940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:43:31,047-Speed 5545.34 samples/sec Loss 1.0741 LearningRate 0.0016 Epoch: 18 Global Step: 192950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:43:38,670-Speed 5373.87 samples/sec Loss 1.0746 LearningRate 0.0016 Epoch: 18 Global Step: 192960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:43:46,243-Speed 5409.66 samples/sec Loss 1.0839 LearningRate 0.0016 Epoch: 18 Global Step: 192970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:43:53,675-Speed 5512.32 samples/sec Loss 1.0584 LearningRate 0.0016 Epoch: 18 Global Step: 192980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:44:01,142-Speed 5485.99 samples/sec Loss 1.0657 LearningRate 0.0016 Epoch: 18 Global Step: 192990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:44:08,645-Speed 5460.13 samples/sec Loss 1.0919 LearningRate 0.0016 Epoch: 18 Global Step: 193000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:44:16,227-Speed 5403.08 samples/sec Loss 1.0821 LearningRate 0.0016 Epoch: 18 Global Step: 193010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:44:23,651-Speed 5517.36 samples/sec Loss 1.0756 LearningRate 0.0016 Epoch: 18 Global Step: 193020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:44:31,116-Speed 5487.97 samples/sec Loss 1.0691 LearningRate 0.0016 Epoch: 18 Global Step: 193030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:44:38,568-Speed 5497.08 samples/sec Loss 1.0592 LearningRate 0.0016 Epoch: 18 Global Step: 193040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:44:46,040-Speed 5483.08 samples/sec Loss 1.0573 LearningRate 0.0016 Epoch: 18 Global Step: 193050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:44:53,590-Speed 5425.67 samples/sec Loss 1.0566 LearningRate 0.0016 Epoch: 18 Global Step: 193060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:45:01,127-Speed 5435.60 samples/sec Loss 1.0651 LearningRate 0.0016 Epoch: 18 Global Step: 193070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:45:08,703-Speed 5406.97 samples/sec Loss 1.0480 LearningRate 0.0016 Epoch: 18 Global Step: 193080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:45:16,175-Speed 5482.90 samples/sec Loss 1.0716 LearningRate 0.0016 Epoch: 18 Global Step: 193090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:45:23,711-Speed 5435.97 samples/sec Loss 1.0645 LearningRate 0.0016 Epoch: 18 Global Step: 193100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:45:31,207-Speed 5465.08 samples/sec Loss 1.0495 LearningRate 0.0016 Epoch: 18 Global Step: 193110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:45:38,658-Speed 5497.58 samples/sec Loss 1.0641 LearningRate 0.0016 Epoch: 18 Global Step: 193120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:45:46,190-Speed 5439.20 samples/sec Loss 1.0726 LearningRate 0.0016 Epoch: 18 Global Step: 193130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 14:45:53,693-Speed 5459.87 samples/sec Loss 1.0507 LearningRate 0.0016 Epoch: 18 Global Step: 193140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:46:01,254-Speed 5417.94 samples/sec Loss 1.0653 LearningRate 0.0016 Epoch: 18 Global Step: 193150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:46:08,728-Speed 5481.48 samples/sec Loss 1.0785 LearningRate 0.0016 Epoch: 18 Global Step: 193160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:46:16,199-Speed 5482.97 samples/sec Loss 1.0692 LearningRate 0.0016 Epoch: 18 Global Step: 193170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:46:23,770-Speed 5410.93 samples/sec Loss 1.0610 LearningRate 0.0016 Epoch: 18 Global Step: 193180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:46:31,418-Speed 5356.52 samples/sec Loss 1.0753 LearningRate 0.0016 Epoch: 18 Global Step: 193190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:46:38,995-Speed 5406.13 samples/sec Loss 1.0917 LearningRate 0.0016 Epoch: 18 Global Step: 193200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:46:46,511-Speed 5450.80 samples/sec Loss 1.0463 LearningRate 0.0016 Epoch: 18 Global Step: 193210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:46:54,094-Speed 5402.23 samples/sec Loss 1.0391 LearningRate 0.0015 Epoch: 18 Global Step: 193220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:47:01,561-Speed 5486.42 samples/sec Loss 1.0547 LearningRate 0.0015 Epoch: 18 Global Step: 193230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:47:09,036-Speed 5479.99 samples/sec Loss 1.0599 LearningRate 0.0015 Epoch: 18 Global Step: 193240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:47:16,424-Speed 5545.04 samples/sec Loss 1.0713 LearningRate 0.0015 Epoch: 18 Global Step: 193250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:47:23,951-Speed 5442.60 samples/sec Loss 1.0765 LearningRate 0.0015 Epoch: 18 Global Step: 193260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:47:31,493-Speed 5431.35 samples/sec Loss 1.0783 LearningRate 0.0015 Epoch: 18 Global Step: 193270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:47:38,961-Speed 5485.37 samples/sec Loss 1.0450 LearningRate 0.0015 Epoch: 18 Global Step: 193280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:47:46,450-Speed 5470.47 samples/sec Loss 1.0485 LearningRate 0.0015 Epoch: 18 Global Step: 193290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:47:53,894-Speed 5503.08 samples/sec Loss 1.0632 LearningRate 0.0015 Epoch: 18 Global Step: 193300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:48:01,472-Speed 5405.91 samples/sec Loss 1.0550 LearningRate 0.0015 Epoch: 18 Global Step: 193310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:48:09,068-Speed 5393.19 samples/sec Loss 1.0533 LearningRate 0.0015 Epoch: 18 Global Step: 193320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:48:16,622-Speed 5422.80 samples/sec Loss 1.0682 LearningRate 0.0015 Epoch: 18 Global Step: 193330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:48:24,093-Speed 5483.87 samples/sec Loss 1.0379 LearningRate 0.0015 Epoch: 18 Global Step: 193340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:48:31,654-Speed 5417.65 samples/sec Loss 1.0604 LearningRate 0.0015 Epoch: 18 Global Step: 193350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:48:39,166-Speed 5453.45 samples/sec Loss 1.0650 LearningRate 0.0015 Epoch: 18 Global Step: 193360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:48:46,837-Speed 5340.15 samples/sec Loss 1.0580 LearningRate 0.0015 Epoch: 18 Global Step: 193370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:48:54,365-Speed 5442.01 samples/sec Loss 1.0473 LearningRate 0.0015 Epoch: 18 Global Step: 193380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:49:01,896-Speed 5439.08 samples/sec Loss 1.0611 LearningRate 0.0015 Epoch: 18 Global Step: 193390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:49:09,396-Speed 5462.03 samples/sec Loss 1.0788 LearningRate 0.0015 Epoch: 18 Global Step: 193400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:49:16,865-Speed 5486.84 samples/sec Loss 1.0533 LearningRate 0.0015 Epoch: 18 Global Step: 193410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:49:24,406-Speed 5432.29 samples/sec Loss 1.0659 LearningRate 0.0015 Epoch: 18 Global Step: 193420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:49:31,772-Speed 5561.77 samples/sec Loss 1.0488 LearningRate 0.0015 Epoch: 18 Global Step: 193430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:49:39,309-Speed 5434.76 samples/sec Loss 1.0879 LearningRate 0.0015 Epoch: 18 Global Step: 193440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:49:46,790-Speed 5475.98 samples/sec Loss 1.0596 LearningRate 0.0015 Epoch: 18 Global Step: 193450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:49:54,210-Speed 5520.89 samples/sec Loss 1.0538 LearningRate 0.0015 Epoch: 18 Global Step: 193460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:50:01,757-Speed 5427.75 samples/sec Loss 1.0685 LearningRate 0.0015 Epoch: 18 Global Step: 193470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:50:09,261-Speed 5458.97 samples/sec Loss 1.0671 LearningRate 0.0015 Epoch: 18 Global Step: 193480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:50:16,678-Speed 5523.60 samples/sec Loss 1.0640 LearningRate 0.0015 Epoch: 18 Global Step: 193490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:50:24,157-Speed 5477.28 samples/sec Loss 1.0644 LearningRate 0.0015 Epoch: 18 Global Step: 193500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:50:31,567-Speed 5528.38 samples/sec Loss 1.0423 LearningRate 0.0015 Epoch: 18 Global Step: 193510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:50:38,984-Speed 5523.38 samples/sec Loss 1.0631 LearningRate 0.0015 Epoch: 18 Global Step: 193520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:50:46,415-Speed 5512.07 samples/sec Loss 1.0511 LearningRate 0.0015 Epoch: 18 Global Step: 193530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:50:53,934-Speed 5448.27 samples/sec Loss 1.0588 LearningRate 0.0015 Epoch: 18 Global Step: 193540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:51:01,433-Speed 5463.21 samples/sec Loss 1.0415 LearningRate 0.0015 Epoch: 18 Global Step: 193550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:51:09,037-Speed 5387.20 samples/sec Loss 1.0638 LearningRate 0.0015 Epoch: 18 Global Step: 193560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:51:16,520-Speed 5474.31 samples/sec Loss 1.0373 LearningRate 0.0015 Epoch: 18 Global Step: 193570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:51:24,057-Speed 5435.50 samples/sec Loss 1.0334 LearningRate 0.0015 Epoch: 18 Global Step: 193580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:51:31,675-Speed 5377.54 samples/sec Loss 1.0745 LearningRate 0.0015 Epoch: 18 Global Step: 193590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:51:39,309-Speed 5366.80 samples/sec Loss 1.0525 LearningRate 0.0015 Epoch: 18 Global Step: 193600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:51:46,788-Speed 5476.80 samples/sec Loss 1.0546 LearningRate 0.0015 Epoch: 18 Global Step: 193610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:51:54,342-Speed 5423.55 samples/sec Loss 1.0749 LearningRate 0.0015 Epoch: 18 Global Step: 193620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:52:01,842-Speed 5462.24 samples/sec Loss 1.0430 LearningRate 0.0015 Epoch: 18 Global Step: 193630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:52:09,373-Speed 5439.90 samples/sec Loss 1.0336 LearningRate 0.0015 Epoch: 18 Global Step: 193640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:52:16,856-Speed 5473.76 samples/sec Loss 1.0841 LearningRate 0.0015 Epoch: 18 Global Step: 193650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:52:24,315-Speed 5491.91 samples/sec Loss 1.0411 LearningRate 0.0015 Epoch: 18 Global Step: 193660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:52:31,755-Speed 5506.81 samples/sec Loss 1.0780 LearningRate 0.0015 Epoch: 18 Global Step: 193670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 14:52:39,184-Speed 5513.94 samples/sec Loss 1.0497 LearningRate 0.0015 Epoch: 18 Global Step: 193680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 14:52:46,577-Speed 5541.33 samples/sec Loss 1.0633 LearningRate 0.0014 Epoch: 18 Global Step: 193690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 14:52:54,057-Speed 5476.27 samples/sec Loss 1.0546 LearningRate 0.0014 Epoch: 18 Global Step: 193700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 14:53:01,471-Speed 5526.22 samples/sec Loss 1.0483 LearningRate 0.0014 Epoch: 18 Global Step: 193710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 14:53:08,883-Speed 5526.23 samples/sec Loss 1.0193 LearningRate 0.0014 Epoch: 18 Global Step: 193720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:53:16,428-Speed 5429.67 samples/sec Loss 1.0722 LearningRate 0.0014 Epoch: 18 Global Step: 193730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:53:23,938-Speed 5454.85 samples/sec Loss 1.0352 LearningRate 0.0014 Epoch: 18 Global Step: 193740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:53:31,562-Speed 5372.99 samples/sec Loss 1.0504 LearningRate 0.0014 Epoch: 18 Global Step: 193750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:53:39,080-Speed 5449.49 samples/sec Loss 1.0542 LearningRate 0.0014 Epoch: 18 Global Step: 193760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:53:46,668-Speed 5398.30 samples/sec Loss 1.0773 LearningRate 0.0014 Epoch: 18 Global Step: 193770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:53:54,233-Speed 5415.12 samples/sec Loss 1.0692 LearningRate 0.0014 Epoch: 18 Global Step: 193780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:54:01,825-Speed 5396.18 samples/sec Loss 1.0556 LearningRate 0.0014 Epoch: 18 Global Step: 193790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:54:09,412-Speed 5399.06 samples/sec Loss 1.0561 LearningRate 0.0014 Epoch: 18 Global Step: 193800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:54:16,869-Speed 5493.55 samples/sec Loss 1.0626 LearningRate 0.0014 Epoch: 18 Global Step: 193810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:54:24,334-Speed 5487.87 samples/sec Loss 1.0507 LearningRate 0.0014 Epoch: 18 Global Step: 193820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:54:31,837-Speed 5459.87 samples/sec Loss 1.0511 LearningRate 0.0014 Epoch: 18 Global Step: 193830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:54:39,297-Speed 5491.71 samples/sec Loss 1.0422 LearningRate 0.0014 Epoch: 18 Global Step: 193840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:54:46,766-Speed 5484.58 samples/sec Loss 1.0556 LearningRate 0.0014 Epoch: 18 Global Step: 193850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:54:54,254-Speed 5470.95 samples/sec Loss 1.0737 LearningRate 0.0014 Epoch: 18 Global Step: 193860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:55:01,696-Speed 5504.71 samples/sec Loss 1.0661 LearningRate 0.0014 Epoch: 18 Global Step: 193870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:55:09,142-Speed 5501.77 samples/sec Loss 1.0512 LearningRate 0.0014 Epoch: 18 Global Step: 193880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:55:16,601-Speed 5491.74 samples/sec Loss 1.0688 LearningRate 0.0014 Epoch: 18 Global Step: 193890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:55:24,067-Speed 5487.44 samples/sec Loss 1.0516 LearningRate 0.0014 Epoch: 18 Global Step: 193900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:55:31,507-Speed 5506.14 samples/sec Loss 1.0617 LearningRate 0.0014 Epoch: 18 Global Step: 193910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:55:38,995-Speed 5470.88 samples/sec Loss 1.0402 LearningRate 0.0014 Epoch: 18 Global Step: 193920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:55:46,643-Speed 5356.27 samples/sec Loss 1.0496 LearningRate 0.0014 Epoch: 18 Global Step: 193930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:55:54,092-Speed 5499.72 samples/sec Loss 1.0300 LearningRate 0.0014 Epoch: 18 Global Step: 193940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:56:01,575-Speed 5474.25 samples/sec Loss 1.0365 LearningRate 0.0014 Epoch: 18 Global Step: 193950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:56:09,057-Speed 5475.04 samples/sec Loss 1.0324 LearningRate 0.0014 Epoch: 18 Global Step: 193960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:56:16,559-Speed 5460.71 samples/sec Loss 1.0552 LearningRate 0.0014 Epoch: 18 Global Step: 193970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:56:24,066-Speed 5456.87 samples/sec Loss 1.0204 LearningRate 0.0014 Epoch: 18 Global Step: 193980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:56:31,584-Speed 5449.20 samples/sec Loss 1.0411 LearningRate 0.0014 Epoch: 18 Global Step: 193990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:56:39,101-Speed 5449.61 samples/sec Loss 1.0506 LearningRate 0.0014 Epoch: 18 Global Step: 194000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:57:23,135-[lfw][194000]XNorm: 22.458612 Training: 2022-01-09 14:57:23,136-[lfw][194000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 14:57:23,136-[lfw][194000]Accuracy-Highest: 0.99850 Training: 2022-01-09 14:58:15,006-[cfp_fp][194000]XNorm: 22.183354 Training: 2022-01-09 14:58:15,006-[cfp_fp][194000]Accuracy-Flip: 0.99386+-0.00332 Training: 2022-01-09 14:58:15,007-[cfp_fp][194000]Accuracy-Highest: 0.99443 Training: 2022-01-09 14:58:59,428-[agedb_30][194000]XNorm: 23.049855 Training: 2022-01-09 14:58:59,429-[agedb_30][194000]Accuracy-Flip: 0.98583+-0.00518 Training: 2022-01-09 14:58:59,429-[agedb_30][194000]Accuracy-Highest: 0.98617 Training: 2022-01-09 14:59:07,005-Speed 276.94 samples/sec Loss 1.0463 LearningRate 0.0014 Epoch: 18 Global Step: 194010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:59:14,395-Speed 5543.08 samples/sec Loss 1.0297 LearningRate 0.0014 Epoch: 18 Global Step: 194020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:59:21,803-Speed 5529.75 samples/sec Loss 1.0459 LearningRate 0.0014 Epoch: 18 Global Step: 194030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:59:29,274-Speed 5483.83 samples/sec Loss 1.0609 LearningRate 0.0014 Epoch: 18 Global Step: 194040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 14:59:36,679-Speed 5531.84 samples/sec Loss 1.0424 LearningRate 0.0014 Epoch: 18 Global Step: 194050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:59:44,236-Speed 5421.01 samples/sec Loss 1.0310 LearningRate 0.0014 Epoch: 18 Global Step: 194060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:59:51,663-Speed 5515.88 samples/sec Loss 1.0331 LearningRate 0.0014 Epoch: 18 Global Step: 194070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 14:59:59,084-Speed 5520.25 samples/sec Loss 1.0424 LearningRate 0.0014 Epoch: 18 Global Step: 194080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:00:06,487-Speed 5533.20 samples/sec Loss 1.0442 LearningRate 0.0014 Epoch: 18 Global Step: 194090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:00:13,864-Speed 5553.01 samples/sec Loss 1.0339 LearningRate 0.0014 Epoch: 18 Global Step: 194100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:00:21,285-Speed 5520.22 samples/sec Loss 1.0371 LearningRate 0.0014 Epoch: 18 Global Step: 194110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:00:28,677-Speed 5541.97 samples/sec Loss 1.0587 LearningRate 0.0014 Epoch: 18 Global Step: 194120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:00:36,061-Speed 5548.11 samples/sec Loss 1.0343 LearningRate 0.0014 Epoch: 18 Global Step: 194130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:00:43,475-Speed 5524.92 samples/sec Loss 1.0461 LearningRate 0.0014 Epoch: 18 Global Step: 194140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:00:50,926-Speed 5498.39 samples/sec Loss 1.0432 LearningRate 0.0014 Epoch: 18 Global Step: 194150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:00:58,380-Speed 5495.50 samples/sec Loss 1.0525 LearningRate 0.0014 Epoch: 18 Global Step: 194160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:01:05,856-Speed 5479.48 samples/sec Loss 1.0330 LearningRate 0.0013 Epoch: 18 Global Step: 194170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:01:13,256-Speed 5536.07 samples/sec Loss 1.0458 LearningRate 0.0013 Epoch: 18 Global Step: 194180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:01:20,696-Speed 5506.33 samples/sec Loss 1.0302 LearningRate 0.0013 Epoch: 18 Global Step: 194190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:01:28,089-Speed 5540.65 samples/sec Loss 1.0238 LearningRate 0.0013 Epoch: 18 Global Step: 194200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:01:35,534-Speed 5502.31 samples/sec Loss 1.0188 LearningRate 0.0013 Epoch: 18 Global Step: 194210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:01:43,013-Speed 5477.28 samples/sec Loss 1.0303 LearningRate 0.0013 Epoch: 18 Global Step: 194220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:01:50,400-Speed 5546.09 samples/sec Loss 1.0204 LearningRate 0.0013 Epoch: 18 Global Step: 194230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:01:57,832-Speed 5511.98 samples/sec Loss 1.0224 LearningRate 0.0013 Epoch: 18 Global Step: 194240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:02:05,241-Speed 5528.61 samples/sec Loss 1.0335 LearningRate 0.0013 Epoch: 18 Global Step: 194250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:02:12,629-Speed 5545.07 samples/sec Loss 1.0133 LearningRate 0.0013 Epoch: 18 Global Step: 194260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:02:20,085-Speed 5494.84 samples/sec Loss 1.0210 LearningRate 0.0013 Epoch: 18 Global Step: 194270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:02:27,545-Speed 5490.87 samples/sec Loss 1.0435 LearningRate 0.0013 Epoch: 18 Global Step: 194280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:02:34,979-Speed 5510.65 samples/sec Loss 1.0427 LearningRate 0.0013 Epoch: 18 Global Step: 194290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:02:42,421-Speed 5505.05 samples/sec Loss 1.0243 LearningRate 0.0013 Epoch: 18 Global Step: 194300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:02:49,826-Speed 5531.87 samples/sec Loss 1.0452 LearningRate 0.0013 Epoch: 18 Global Step: 194310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:02:57,285-Speed 5492.52 samples/sec Loss 1.0207 LearningRate 0.0013 Epoch: 18 Global Step: 194320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:03:04,691-Speed 5531.45 samples/sec Loss 1.0352 LearningRate 0.0013 Epoch: 18 Global Step: 194330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:03:12,058-Speed 5560.07 samples/sec Loss 1.0448 LearningRate 0.0013 Epoch: 18 Global Step: 194340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:03:19,511-Speed 5497.24 samples/sec Loss 1.0228 LearningRate 0.0013 Epoch: 18 Global Step: 194350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:03:26,961-Speed 5497.99 samples/sec Loss 1.0579 LearningRate 0.0013 Epoch: 18 Global Step: 194360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:03:34,435-Speed 5481.59 samples/sec Loss 1.0311 LearningRate 0.0013 Epoch: 18 Global Step: 194370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:03:41,887-Speed 5497.03 samples/sec Loss 1.0462 LearningRate 0.0013 Epoch: 18 Global Step: 194380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:03:49,303-Speed 5523.80 samples/sec Loss 1.0286 LearningRate 0.0013 Epoch: 18 Global Step: 194390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:03:56,767-Speed 5488.37 samples/sec Loss 1.0469 LearningRate 0.0013 Epoch: 18 Global Step: 194400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:04:04,215-Speed 5500.21 samples/sec Loss 1.0449 LearningRate 0.0013 Epoch: 18 Global Step: 194410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:04:11,655-Speed 5506.05 samples/sec Loss 1.0227 LearningRate 0.0013 Epoch: 18 Global Step: 194420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:04:19,083-Speed 5515.73 samples/sec Loss 1.0116 LearningRate 0.0013 Epoch: 18 Global Step: 194430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:04:26,525-Speed 5504.44 samples/sec Loss 1.0431 LearningRate 0.0013 Epoch: 18 Global Step: 194440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:04:33,984-Speed 5492.37 samples/sec Loss 1.0546 LearningRate 0.0013 Epoch: 18 Global Step: 194450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:04:41,543-Speed 5419.45 samples/sec Loss 1.0374 LearningRate 0.0013 Epoch: 18 Global Step: 194460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:04:48,992-Speed 5499.27 samples/sec Loss 1.0294 LearningRate 0.0013 Epoch: 18 Global Step: 194470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:04:56,422-Speed 5513.78 samples/sec Loss 1.0494 LearningRate 0.0013 Epoch: 18 Global Step: 194480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:05:03,814-Speed 5541.76 samples/sec Loss 1.0435 LearningRate 0.0013 Epoch: 18 Global Step: 194490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:05:11,338-Speed 5444.43 samples/sec Loss 1.0203 LearningRate 0.0013 Epoch: 18 Global Step: 194500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:05:18,775-Speed 5508.22 samples/sec Loss 1.0109 LearningRate 0.0013 Epoch: 18 Global Step: 194510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:05:26,281-Speed 5458.15 samples/sec Loss 1.0314 LearningRate 0.0013 Epoch: 18 Global Step: 194520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:05:33,776-Speed 5465.41 samples/sec Loss 1.0153 LearningRate 0.0013 Epoch: 18 Global Step: 194530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:05:41,351-Speed 5407.98 samples/sec Loss 1.0273 LearningRate 0.0013 Epoch: 18 Global Step: 194540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:05:48,791-Speed 5506.39 samples/sec Loss 1.0194 LearningRate 0.0013 Epoch: 18 Global Step: 194550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:05:56,235-Speed 5503.40 samples/sec Loss 1.0349 LearningRate 0.0013 Epoch: 18 Global Step: 194560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:06:03,743-Speed 5455.67 samples/sec Loss 1.0307 LearningRate 0.0013 Epoch: 18 Global Step: 194570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:06:11,364-Speed 5375.57 samples/sec Loss 1.0204 LearningRate 0.0013 Epoch: 18 Global Step: 194580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:06:18,788-Speed 5518.18 samples/sec Loss 1.0298 LearningRate 0.0013 Epoch: 18 Global Step: 194590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:06:26,339-Speed 5425.25 samples/sec Loss 1.0304 LearningRate 0.0013 Epoch: 18 Global Step: 194600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:06:33,779-Speed 5506.14 samples/sec Loss 1.0265 LearningRate 0.0013 Epoch: 18 Global Step: 194610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:06:41,400-Speed 5374.81 samples/sec Loss 1.0308 LearningRate 0.0013 Epoch: 18 Global Step: 194620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:06:48,958-Speed 5420.63 samples/sec Loss 1.0375 LearningRate 0.0013 Epoch: 18 Global Step: 194630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:06:56,392-Speed 5510.82 samples/sec Loss 1.0376 LearningRate 0.0013 Epoch: 18 Global Step: 194640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:07:03,874-Speed 5474.74 samples/sec Loss 1.0265 LearningRate 0.0013 Epoch: 18 Global Step: 194650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:07:11,387-Speed 5452.40 samples/sec Loss 1.0195 LearningRate 0.0013 Epoch: 18 Global Step: 194660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:07:18,951-Speed 5416.37 samples/sec Loss 1.0219 LearningRate 0.0012 Epoch: 18 Global Step: 194670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:07:26,399-Speed 5499.78 samples/sec Loss 1.0358 LearningRate 0.0012 Epoch: 18 Global Step: 194680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:07:33,775-Speed 5553.53 samples/sec Loss 1.0360 LearningRate 0.0012 Epoch: 18 Global Step: 194690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:07:41,185-Speed 5528.85 samples/sec Loss 1.0402 LearningRate 0.0012 Epoch: 18 Global Step: 194700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:07:48,588-Speed 5533.65 samples/sec Loss 1.0492 LearningRate 0.0012 Epoch: 18 Global Step: 194710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:07:56,086-Speed 5463.09 samples/sec Loss 1.0337 LearningRate 0.0012 Epoch: 18 Global Step: 194720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:08:03,668-Speed 5403.57 samples/sec Loss 1.0135 LearningRate 0.0012 Epoch: 18 Global Step: 194730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:08:11,143-Speed 5480.27 samples/sec Loss 1.0170 LearningRate 0.0012 Epoch: 18 Global Step: 194740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:08:18,573-Speed 5513.45 samples/sec Loss 1.0009 LearningRate 0.0012 Epoch: 18 Global Step: 194750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:08:26,062-Speed 5469.94 samples/sec Loss 1.0232 LearningRate 0.0012 Epoch: 18 Global Step: 194760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:08:33,506-Speed 5503.21 samples/sec Loss 1.0225 LearningRate 0.0012 Epoch: 18 Global Step: 194770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:08:40,957-Speed 5499.06 samples/sec Loss 1.0146 LearningRate 0.0012 Epoch: 18 Global Step: 194780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:08:48,507-Speed 5425.38 samples/sec Loss 1.0285 LearningRate 0.0012 Epoch: 18 Global Step: 194790 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:08:55,963-Speed 5494.25 samples/sec Loss 1.0346 LearningRate 0.0012 Epoch: 18 Global Step: 194800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:09:03,410-Speed 5501.63 samples/sec Loss 1.0190 LearningRate 0.0012 Epoch: 18 Global Step: 194810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:09:10,798-Speed 5545.09 samples/sec Loss 1.0324 LearningRate 0.0012 Epoch: 18 Global Step: 194820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:09:18,340-Speed 5431.14 samples/sec Loss 1.0290 LearningRate 0.0012 Epoch: 18 Global Step: 194830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:09:25,742-Speed 5534.30 samples/sec Loss 1.0288 LearningRate 0.0012 Epoch: 18 Global Step: 194840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:09:33,228-Speed 5472.46 samples/sec Loss 1.0268 LearningRate 0.0012 Epoch: 18 Global Step: 194850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:09:40,662-Speed 5510.72 samples/sec Loss 0.9927 LearningRate 0.0012 Epoch: 18 Global Step: 194860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:09:48,161-Speed 5463.08 samples/sec Loss 1.0092 LearningRate 0.0012 Epoch: 18 Global Step: 194870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:09:55,686-Speed 5443.78 samples/sec Loss 1.0469 LearningRate 0.0012 Epoch: 18 Global Step: 194880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:10:03,135-Speed 5499.27 samples/sec Loss 1.0184 LearningRate 0.0012 Epoch: 18 Global Step: 194890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:10:10,550-Speed 5524.65 samples/sec Loss 1.0305 LearningRate 0.0012 Epoch: 18 Global Step: 194900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:10:18,062-Speed 5453.51 samples/sec Loss 1.0317 LearningRate 0.0012 Epoch: 18 Global Step: 194910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:10:25,596-Speed 5437.21 samples/sec Loss 1.0211 LearningRate 0.0012 Epoch: 18 Global Step: 194920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:10:33,047-Speed 5498.20 samples/sec Loss 1.0356 LearningRate 0.0012 Epoch: 18 Global Step: 194930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:10:40,443-Speed 5538.32 samples/sec Loss 1.0178 LearningRate 0.0012 Epoch: 18 Global Step: 194940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:10:48,060-Speed 5378.60 samples/sec Loss 1.0397 LearningRate 0.0012 Epoch: 18 Global Step: 194950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:10:55,563-Speed 5459.40 samples/sec Loss 1.0151 LearningRate 0.0012 Epoch: 18 Global Step: 194960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:11:03,095-Speed 5439.23 samples/sec Loss 1.0326 LearningRate 0.0012 Epoch: 18 Global Step: 194970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:11:10,528-Speed 5511.57 samples/sec Loss 1.0210 LearningRate 0.0012 Epoch: 18 Global Step: 194980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:11:17,977-Speed 5499.55 samples/sec Loss 0.9873 LearningRate 0.0012 Epoch: 18 Global Step: 194990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:11:25,410-Speed 5511.12 samples/sec Loss 1.0145 LearningRate 0.0012 Epoch: 18 Global Step: 195000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:11:32,859-Speed 5499.55 samples/sec Loss 1.0140 LearningRate 0.0012 Epoch: 18 Global Step: 195010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:11:40,345-Speed 5472.67 samples/sec Loss 1.0185 LearningRate 0.0012 Epoch: 18 Global Step: 195020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:11:47,920-Speed 5407.54 samples/sec Loss 1.0167 LearningRate 0.0012 Epoch: 18 Global Step: 195030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:11:55,349-Speed 5514.60 samples/sec Loss 1.0132 LearningRate 0.0012 Epoch: 18 Global Step: 195040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:12:02,802-Speed 5496.42 samples/sec Loss 1.0107 LearningRate 0.0012 Epoch: 18 Global Step: 195050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:12:10,293-Speed 5468.39 samples/sec Loss 1.0203 LearningRate 0.0012 Epoch: 18 Global Step: 195060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:12:17,808-Speed 5451.37 samples/sec Loss 1.0026 LearningRate 0.0012 Epoch: 18 Global Step: 195070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:12:25,413-Speed 5386.13 samples/sec Loss 1.0124 LearningRate 0.0012 Epoch: 18 Global Step: 195080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:12:32,871-Speed 5493.57 samples/sec Loss 1.0198 LearningRate 0.0012 Epoch: 18 Global Step: 195090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:12:40,327-Speed 5494.31 samples/sec Loss 1.0075 LearningRate 0.0012 Epoch: 18 Global Step: 195100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:12:47,784-Speed 5493.07 samples/sec Loss 1.0433 LearningRate 0.0012 Epoch: 18 Global Step: 195110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:12:55,247-Speed 5489.72 samples/sec Loss 1.0187 LearningRate 0.0012 Epoch: 18 Global Step: 195120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:13:02,684-Speed 5507.88 samples/sec Loss 1.0131 LearningRate 0.0012 Epoch: 18 Global Step: 195130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:13:10,115-Speed 5513.15 samples/sec Loss 1.0327 LearningRate 0.0012 Epoch: 18 Global Step: 195140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:13:17,585-Speed 5483.52 samples/sec Loss 1.0322 LearningRate 0.0012 Epoch: 18 Global Step: 195150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:13:24,977-Speed 5542.07 samples/sec Loss 0.9920 LearningRate 0.0012 Epoch: 18 Global Step: 195160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:13:32,460-Speed 5475.07 samples/sec Loss 1.0194 LearningRate 0.0012 Epoch: 18 Global Step: 195170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:13:39,855-Speed 5539.15 samples/sec Loss 1.0211 LearningRate 0.0012 Epoch: 18 Global Step: 195180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:13:47,304-Speed 5499.47 samples/sec Loss 1.0046 LearningRate 0.0011 Epoch: 18 Global Step: 195190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:13:54,753-Speed 5499.51 samples/sec Loss 0.9980 LearningRate 0.0011 Epoch: 18 Global Step: 195200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:14:02,257-Speed 5459.76 samples/sec Loss 1.0141 LearningRate 0.0011 Epoch: 18 Global Step: 195210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:14:09,697-Speed 5505.65 samples/sec Loss 1.0187 LearningRate 0.0011 Epoch: 18 Global Step: 195220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:14:17,234-Speed 5435.05 samples/sec Loss 0.9992 LearningRate 0.0011 Epoch: 18 Global Step: 195230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:14:24,647-Speed 5526.17 samples/sec Loss 1.0368 LearningRate 0.0011 Epoch: 18 Global Step: 195240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:14:32,167-Speed 5448.20 samples/sec Loss 1.0324 LearningRate 0.0011 Epoch: 18 Global Step: 195250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 15:14:39,590-Speed 5518.72 samples/sec Loss 1.0077 LearningRate 0.0011 Epoch: 18 Global Step: 195260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 15:14:47,019-Speed 5513.88 samples/sec Loss 1.0052 LearningRate 0.0011 Epoch: 18 Global Step: 195270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 15:14:54,472-Speed 5496.21 samples/sec Loss 1.0141 LearningRate 0.0011 Epoch: 18 Global Step: 195280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 15:15:01,943-Speed 5483.76 samples/sec Loss 1.0156 LearningRate 0.0011 Epoch: 18 Global Step: 195290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 15:15:09,396-Speed 5496.43 samples/sec Loss 1.0238 LearningRate 0.0011 Epoch: 18 Global Step: 195300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 15:15:16,811-Speed 5524.54 samples/sec Loss 1.0435 LearningRate 0.0011 Epoch: 18 Global Step: 195310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:15:24,215-Speed 5532.95 samples/sec Loss 1.0093 LearningRate 0.0011 Epoch: 18 Global Step: 195320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:15:31,768-Speed 5423.75 samples/sec Loss 1.0153 LearningRate 0.0011 Epoch: 18 Global Step: 195330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:15:39,170-Speed 5534.33 samples/sec Loss 1.0176 LearningRate 0.0011 Epoch: 18 Global Step: 195340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:15:46,657-Speed 5471.71 samples/sec Loss 1.0109 LearningRate 0.0011 Epoch: 18 Global Step: 195350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:15:54,099-Speed 5505.17 samples/sec Loss 1.0164 LearningRate 0.0011 Epoch: 18 Global Step: 195360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:16:01,528-Speed 5514.07 samples/sec Loss 1.0202 LearningRate 0.0011 Epoch: 18 Global Step: 195370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:16:08,993-Speed 5487.65 samples/sec Loss 1.0049 LearningRate 0.0011 Epoch: 18 Global Step: 195380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:16:16,407-Speed 5525.10 samples/sec Loss 1.0044 LearningRate 0.0011 Epoch: 18 Global Step: 195390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:16:23,930-Speed 5445.27 samples/sec Loss 0.9997 LearningRate 0.0011 Epoch: 18 Global Step: 195400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:16:31,377-Speed 5501.00 samples/sec Loss 1.0204 LearningRate 0.0011 Epoch: 18 Global Step: 195410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:16:38,810-Speed 5511.39 samples/sec Loss 1.0228 LearningRate 0.0011 Epoch: 18 Global Step: 195420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:16:46,238-Speed 5514.84 samples/sec Loss 1.0051 LearningRate 0.0011 Epoch: 18 Global Step: 195430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:16:53,706-Speed 5485.59 samples/sec Loss 1.0164 LearningRate 0.0011 Epoch: 18 Global Step: 195440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:17:01,182-Speed 5479.79 samples/sec Loss 1.0044 LearningRate 0.0011 Epoch: 18 Global Step: 195450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:17:08,749-Speed 5414.16 samples/sec Loss 1.0079 LearningRate 0.0011 Epoch: 18 Global Step: 195460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:17:16,219-Speed 5483.22 samples/sec Loss 1.0323 LearningRate 0.0011 Epoch: 18 Global Step: 195470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:17:23,684-Speed 5488.18 samples/sec Loss 1.0132 LearningRate 0.0011 Epoch: 18 Global Step: 195480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:17:31,198-Speed 5451.44 samples/sec Loss 1.0166 LearningRate 0.0011 Epoch: 18 Global Step: 195490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:17:38,683-Speed 5473.34 samples/sec Loss 1.0128 LearningRate 0.0011 Epoch: 18 Global Step: 195500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:17:46,117-Speed 5510.83 samples/sec Loss 1.0001 LearningRate 0.0011 Epoch: 18 Global Step: 195510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:17:53,592-Speed 5479.83 samples/sec Loss 0.9994 LearningRate 0.0011 Epoch: 18 Global Step: 195520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:18:01,053-Speed 5490.69 samples/sec Loss 1.0207 LearningRate 0.0011 Epoch: 18 Global Step: 195530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:18:08,491-Speed 5508.26 samples/sec Loss 1.0076 LearningRate 0.0011 Epoch: 18 Global Step: 195540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:18:15,877-Speed 5545.82 samples/sec Loss 1.0094 LearningRate 0.0011 Epoch: 18 Global Step: 195550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:18:23,277-Speed 5536.12 samples/sec Loss 1.0076 LearningRate 0.0011 Epoch: 18 Global Step: 195560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:18:30,691-Speed 5525.25 samples/sec Loss 0.9901 LearningRate 0.0011 Epoch: 18 Global Step: 195570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:18:38,061-Speed 5558.66 samples/sec Loss 1.0171 LearningRate 0.0011 Epoch: 18 Global Step: 195580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:18:45,567-Speed 5457.53 samples/sec Loss 0.9987 LearningRate 0.0011 Epoch: 18 Global Step: 195590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:18:52,943-Speed 5554.05 samples/sec Loss 1.0171 LearningRate 0.0011 Epoch: 18 Global Step: 195600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:19:00,395-Speed 5497.67 samples/sec Loss 1.0141 LearningRate 0.0011 Epoch: 18 Global Step: 195610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:19:07,802-Speed 5530.29 samples/sec Loss 1.0252 LearningRate 0.0011 Epoch: 18 Global Step: 195620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:19:15,210-Speed 5530.50 samples/sec Loss 1.0154 LearningRate 0.0011 Epoch: 18 Global Step: 195630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:19:22,602-Speed 5541.59 samples/sec Loss 1.0007 LearningRate 0.0011 Epoch: 18 Global Step: 195640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:19:30,070-Speed 5485.28 samples/sec Loss 0.9871 LearningRate 0.0011 Epoch: 18 Global Step: 195650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:19:37,568-Speed 5463.99 samples/sec Loss 1.0169 LearningRate 0.0011 Epoch: 18 Global Step: 195660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:19:45,144-Speed 5407.00 samples/sec Loss 1.0111 LearningRate 0.0011 Epoch: 18 Global Step: 195670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:19:52,594-Speed 5498.75 samples/sec Loss 0.9928 LearningRate 0.0011 Epoch: 18 Global Step: 195680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:20:00,049-Speed 5494.85 samples/sec Loss 1.0017 LearningRate 0.0011 Epoch: 18 Global Step: 195690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:20:07,507-Speed 5492.75 samples/sec Loss 1.0014 LearningRate 0.0011 Epoch: 18 Global Step: 195700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:20:14,982-Speed 5481.30 samples/sec Loss 1.0001 LearningRate 0.0011 Epoch: 18 Global Step: 195710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:20:22,588-Speed 5385.21 samples/sec Loss 0.9994 LearningRate 0.0011 Epoch: 18 Global Step: 195720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:20:30,087-Speed 5462.94 samples/sec Loss 0.9678 LearningRate 0.0010 Epoch: 18 Global Step: 195730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:20:37,647-Speed 5418.85 samples/sec Loss 1.0104 LearningRate 0.0010 Epoch: 18 Global Step: 195740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:20:45,135-Speed 5471.43 samples/sec Loss 1.0130 LearningRate 0.0010 Epoch: 18 Global Step: 195750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:20:52,611-Speed 5479.07 samples/sec Loss 0.9912 LearningRate 0.0010 Epoch: 18 Global Step: 195760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:21:00,032-Speed 5520.10 samples/sec Loss 0.9873 LearningRate 0.0010 Epoch: 18 Global Step: 195770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:21:07,445-Speed 5526.51 samples/sec Loss 0.9988 LearningRate 0.0010 Epoch: 18 Global Step: 195780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:21:14,926-Speed 5476.27 samples/sec Loss 0.9860 LearningRate 0.0010 Epoch: 18 Global Step: 195790 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:21:22,513-Speed 5399.08 samples/sec Loss 1.0110 LearningRate 0.0010 Epoch: 18 Global Step: 195800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:21:30,067-Speed 5423.32 samples/sec Loss 1.0100 LearningRate 0.0010 Epoch: 18 Global Step: 195810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:21:37,534-Speed 5485.93 samples/sec Loss 1.0342 LearningRate 0.0010 Epoch: 18 Global Step: 195820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:21:45,029-Speed 5465.73 samples/sec Loss 0.9837 LearningRate 0.0010 Epoch: 18 Global Step: 195830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:21:52,456-Speed 5516.26 samples/sec Loss 0.9903 LearningRate 0.0010 Epoch: 18 Global Step: 195840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:21:59,963-Speed 5457.01 samples/sec Loss 1.0018 LearningRate 0.0010 Epoch: 18 Global Step: 195850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:22:07,384-Speed 5520.09 samples/sec Loss 1.0075 LearningRate 0.0010 Epoch: 18 Global Step: 195860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:22:14,873-Speed 5469.99 samples/sec Loss 0.9968 LearningRate 0.0010 Epoch: 18 Global Step: 195870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:22:22,261-Speed 5545.04 samples/sec Loss 1.0137 LearningRate 0.0010 Epoch: 18 Global Step: 195880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:22:29,714-Speed 5496.49 samples/sec Loss 0.9952 LearningRate 0.0010 Epoch: 18 Global Step: 195890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:22:37,138-Speed 5517.77 samples/sec Loss 0.9868 LearningRate 0.0010 Epoch: 18 Global Step: 195900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:22:44,575-Speed 5508.26 samples/sec Loss 0.9951 LearningRate 0.0010 Epoch: 18 Global Step: 195910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:22:52,022-Speed 5501.51 samples/sec Loss 0.9917 LearningRate 0.0010 Epoch: 18 Global Step: 195920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:22:59,468-Speed 5501.77 samples/sec Loss 0.9897 LearningRate 0.0010 Epoch: 18 Global Step: 195930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:23:08,055-Speed 4770.07 samples/sec Loss 0.9902 LearningRate 0.0010 Epoch: 18 Global Step: 195940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:23:15,524-Speed 5484.81 samples/sec Loss 1.0115 LearningRate 0.0010 Epoch: 18 Global Step: 195950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:23:22,996-Speed 5483.40 samples/sec Loss 0.9955 LearningRate 0.0010 Epoch: 18 Global Step: 195960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:23:30,670-Speed 5337.76 samples/sec Loss 0.9860 LearningRate 0.0010 Epoch: 18 Global Step: 195970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:23:38,128-Speed 5492.63 samples/sec Loss 1.0126 LearningRate 0.0010 Epoch: 18 Global Step: 195980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:23:45,555-Speed 5515.27 samples/sec Loss 0.9937 LearningRate 0.0010 Epoch: 18 Global Step: 195990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:23:52,955-Speed 5536.60 samples/sec Loss 1.0107 LearningRate 0.0010 Epoch: 18 Global Step: 196000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:24:37,124-[lfw][196000]XNorm: 22.506727 Training: 2022-01-09 15:24:37,125-[lfw][196000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 15:24:37,126-[lfw][196000]Accuracy-Highest: 0.99850 Training: 2022-01-09 15:25:28,575-[cfp_fp][196000]XNorm: 22.295594 Training: 2022-01-09 15:25:28,576-[cfp_fp][196000]Accuracy-Flip: 0.99386+-0.00367 Training: 2022-01-09 15:25:28,577-[cfp_fp][196000]Accuracy-Highest: 0.99443 Training: 2022-01-09 15:26:12,807-[agedb_30][196000]XNorm: 23.196709 Training: 2022-01-09 15:26:12,808-[agedb_30][196000]Accuracy-Flip: 0.98583+-0.00512 Training: 2022-01-09 15:26:12,809-[agedb_30][196000]Accuracy-Highest: 0.98617 Training: 2022-01-09 15:26:19,930-Speed 278.69 samples/sec Loss 1.0175 LearningRate 0.0010 Epoch: 18 Global Step: 196010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:26:26,913-Speed 5866.27 samples/sec Loss 0.9964 LearningRate 0.0010 Epoch: 18 Global Step: 196020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:26:34,220-Speed 5605.98 samples/sec Loss 1.0135 LearningRate 0.0010 Epoch: 18 Global Step: 196030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-09 15:26:41,773-Speed 5423.61 samples/sec Loss 0.9841 LearningRate 0.0010 Epoch: 18 Global Step: 196040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:26:49,381-Speed 5384.79 samples/sec Loss 1.0097 LearningRate 0.0010 Epoch: 18 Global Step: 196050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:26:56,857-Speed 5479.84 samples/sec Loss 0.9840 LearningRate 0.0010 Epoch: 18 Global Step: 196060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:27:04,265-Speed 5529.70 samples/sec Loss 0.9810 LearningRate 0.0010 Epoch: 18 Global Step: 196070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:27:11,751-Speed 5471.91 samples/sec Loss 1.0062 LearningRate 0.0010 Epoch: 18 Global Step: 196080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:27:19,162-Speed 5528.17 samples/sec Loss 0.9806 LearningRate 0.0010 Epoch: 18 Global Step: 196090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:27:26,615-Speed 5496.83 samples/sec Loss 0.9813 LearningRate 0.0010 Epoch: 18 Global Step: 196100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:27:34,132-Speed 5449.78 samples/sec Loss 1.0189 LearningRate 0.0010 Epoch: 18 Global Step: 196110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:27:41,743-Speed 5382.26 samples/sec Loss 0.9957 LearningRate 0.0010 Epoch: 18 Global Step: 196120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:27:49,289-Speed 5428.47 samples/sec Loss 0.9853 LearningRate 0.0010 Epoch: 18 Global Step: 196130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:27:56,732-Speed 5504.21 samples/sec Loss 1.0055 LearningRate 0.0010 Epoch: 18 Global Step: 196140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 15:28:04,196-Speed 5488.85 samples/sec Loss 0.9870 LearningRate 0.0010 Epoch: 18 Global Step: 196150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:28:11,723-Speed 5442.22 samples/sec Loss 0.9943 LearningRate 0.0010 Epoch: 18 Global Step: 196160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:28:19,106-Speed 5548.55 samples/sec Loss 1.0017 LearningRate 0.0010 Epoch: 18 Global Step: 196170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:28:26,460-Speed 5570.60 samples/sec Loss 1.0034 LearningRate 0.0010 Epoch: 18 Global Step: 196180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:28:33,870-Speed 5528.91 samples/sec Loss 0.9970 LearningRate 0.0010 Epoch: 18 Global Step: 196190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:28:41,387-Speed 5448.72 samples/sec Loss 0.9894 LearningRate 0.0010 Epoch: 18 Global Step: 196200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:28:48,802-Speed 5524.92 samples/sec Loss 1.0037 LearningRate 0.0010 Epoch: 18 Global Step: 196210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:28:56,233-Speed 5513.25 samples/sec Loss 0.9923 LearningRate 0.0010 Epoch: 18 Global Step: 196220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:29:03,768-Speed 5436.72 samples/sec Loss 0.9998 LearningRate 0.0010 Epoch: 18 Global Step: 196230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:29:11,206-Speed 5507.40 samples/sec Loss 1.0223 LearningRate 0.0010 Epoch: 18 Global Step: 196240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:29:18,831-Speed 5372.98 samples/sec Loss 0.9813 LearningRate 0.0010 Epoch: 18 Global Step: 196250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 15:29:26,266-Speed 5509.78 samples/sec Loss 1.0024 LearningRate 0.0010 Epoch: 18 Global Step: 196260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 15:29:33,786-Speed 5447.91 samples/sec Loss 0.9932 LearningRate 0.0010 Epoch: 18 Global Step: 196270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 15:29:41,375-Speed 5397.76 samples/sec Loss 0.9904 LearningRate 0.0010 Epoch: 18 Global Step: 196280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 15:29:48,873-Speed 5463.33 samples/sec Loss 0.9900 LearningRate 0.0010 Epoch: 18 Global Step: 196290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:29:56,494-Speed 5375.72 samples/sec Loss 0.9719 LearningRate 0.0009 Epoch: 18 Global Step: 196300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:30:03,987-Speed 5466.91 samples/sec Loss 0.9918 LearningRate 0.0009 Epoch: 18 Global Step: 196310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:30:11,391-Speed 5533.14 samples/sec Loss 1.0146 LearningRate 0.0009 Epoch: 18 Global Step: 196320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:30:18,843-Speed 5496.90 samples/sec Loss 0.9962 LearningRate 0.0009 Epoch: 18 Global Step: 196330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:30:26,353-Speed 5455.07 samples/sec Loss 1.0042 LearningRate 0.0009 Epoch: 18 Global Step: 196340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:30:33,822-Speed 5484.55 samples/sec Loss 0.9949 LearningRate 0.0009 Epoch: 18 Global Step: 196350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:30:41,236-Speed 5525.11 samples/sec Loss 0.9834 LearningRate 0.0009 Epoch: 18 Global Step: 196360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:30:48,692-Speed 5494.37 samples/sec Loss 0.9729 LearningRate 0.0009 Epoch: 18 Global Step: 196370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:30:56,245-Speed 5424.19 samples/sec Loss 0.9977 LearningRate 0.0009 Epoch: 18 Global Step: 196380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:31:03,782-Speed 5435.39 samples/sec Loss 0.9907 LearningRate 0.0009 Epoch: 18 Global Step: 196390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-09 15:31:11,304-Speed 5446.17 samples/sec Loss 0.9884 LearningRate 0.0009 Epoch: 18 Global Step: 196400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:31:18,928-Speed 5372.59 samples/sec Loss 0.9760 LearningRate 0.0009 Epoch: 18 Global Step: 196410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:31:26,431-Speed 5460.44 samples/sec Loss 0.9890 LearningRate 0.0009 Epoch: 18 Global Step: 196420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:31:34,016-Speed 5401.22 samples/sec Loss 0.9934 LearningRate 0.0009 Epoch: 18 Global Step: 196430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:31:41,542-Speed 5442.98 samples/sec Loss 0.9743 LearningRate 0.0009 Epoch: 18 Global Step: 196440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:31:49,073-Speed 5439.02 samples/sec Loss 0.9999 LearningRate 0.0009 Epoch: 18 Global Step: 196450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:31:56,545-Speed 5482.80 samples/sec Loss 1.0019 LearningRate 0.0009 Epoch: 18 Global Step: 196460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-09 15:32:03,985-Speed 5511.15 samples/sec Loss 1.0004 LearningRate 0.0009 Epoch: 18 Global Step: 196470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:32:11,499-Speed 5451.83 samples/sec Loss 0.9831 LearningRate 0.0009 Epoch: 18 Global Step: 196480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:32:19,025-Speed 5443.04 samples/sec Loss 0.9863 LearningRate 0.0009 Epoch: 18 Global Step: 196490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:32:26,527-Speed 5460.51 samples/sec Loss 0.9987 LearningRate 0.0009 Epoch: 18 Global Step: 196500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:32:33,956-Speed 5514.99 samples/sec Loss 0.9995 LearningRate 0.0009 Epoch: 18 Global Step: 196510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:32:41,404-Speed 5500.11 samples/sec Loss 0.9687 LearningRate 0.0009 Epoch: 18 Global Step: 196520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:32:48,843-Speed 5506.31 samples/sec Loss 0.9836 LearningRate 0.0009 Epoch: 18 Global Step: 196530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:32:56,260-Speed 5523.24 samples/sec Loss 0.9865 LearningRate 0.0009 Epoch: 18 Global Step: 196540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:33:03,767-Speed 5457.90 samples/sec Loss 1.0062 LearningRate 0.0009 Epoch: 18 Global Step: 196550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:33:11,266-Speed 5462.72 samples/sec Loss 0.9623 LearningRate 0.0009 Epoch: 18 Global Step: 196560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:33:18,681-Speed 5524.53 samples/sec Loss 1.0081 LearningRate 0.0009 Epoch: 18 Global Step: 196570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:33:26,049-Speed 5560.07 samples/sec Loss 0.9853 LearningRate 0.0009 Epoch: 18 Global Step: 196580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:33:33,442-Speed 5541.07 samples/sec Loss 0.9874 LearningRate 0.0009 Epoch: 18 Global Step: 196590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:33:40,830-Speed 5544.98 samples/sec Loss 0.9958 LearningRate 0.0009 Epoch: 18 Global Step: 196600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:33:48,254-Speed 5518.24 samples/sec Loss 0.9752 LearningRate 0.0009 Epoch: 18 Global Step: 196610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:33:55,666-Speed 5526.23 samples/sec Loss 0.9950 LearningRate 0.0009 Epoch: 18 Global Step: 196620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:34:03,118-Speed 5497.72 samples/sec Loss 0.9834 LearningRate 0.0009 Epoch: 18 Global Step: 196630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:34:10,668-Speed 5425.83 samples/sec Loss 0.9561 LearningRate 0.0009 Epoch: 18 Global Step: 196640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:34:18,082-Speed 5525.21 samples/sec Loss 1.0114 LearningRate 0.0009 Epoch: 18 Global Step: 196650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:34:25,514-Speed 5512.08 samples/sec Loss 0.9883 LearningRate 0.0009 Epoch: 18 Global Step: 196660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:34:32,955-Speed 5505.62 samples/sec Loss 1.0085 LearningRate 0.0009 Epoch: 18 Global Step: 196670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:34:40,362-Speed 5530.77 samples/sec Loss 0.9939 LearningRate 0.0009 Epoch: 18 Global Step: 196680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:34:47,838-Speed 5479.59 samples/sec Loss 0.9620 LearningRate 0.0009 Epoch: 18 Global Step: 196690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:34:55,318-Speed 5476.55 samples/sec Loss 0.9985 LearningRate 0.0009 Epoch: 18 Global Step: 196700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:35:02,874-Speed 5421.82 samples/sec Loss 0.9723 LearningRate 0.0009 Epoch: 18 Global Step: 196710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:35:10,385-Speed 5454.26 samples/sec Loss 0.9967 LearningRate 0.0009 Epoch: 18 Global Step: 196720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:35:17,832-Speed 5501.28 samples/sec Loss 0.9785 LearningRate 0.0009 Epoch: 18 Global Step: 196730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:35:25,253-Speed 5519.27 samples/sec Loss 0.9760 LearningRate 0.0009 Epoch: 18 Global Step: 196740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:35:32,736-Speed 5474.73 samples/sec Loss 0.9916 LearningRate 0.0009 Epoch: 18 Global Step: 196750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:35:40,246-Speed 5455.47 samples/sec Loss 1.0037 LearningRate 0.0009 Epoch: 18 Global Step: 196760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:35:47,707-Speed 5490.54 samples/sec Loss 0.9782 LearningRate 0.0009 Epoch: 18 Global Step: 196770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:35:55,166-Speed 5491.89 samples/sec Loss 0.9714 LearningRate 0.0009 Epoch: 18 Global Step: 196780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:36:02,716-Speed 5425.95 samples/sec Loss 0.9806 LearningRate 0.0009 Epoch: 18 Global Step: 196790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:36:10,164-Speed 5500.15 samples/sec Loss 0.9943 LearningRate 0.0009 Epoch: 18 Global Step: 196800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:36:17,693-Speed 5440.92 samples/sec Loss 0.9898 LearningRate 0.0009 Epoch: 18 Global Step: 196810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:36:25,088-Speed 5539.78 samples/sec Loss 0.9749 LearningRate 0.0009 Epoch: 18 Global Step: 196820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:36:32,538-Speed 5498.89 samples/sec Loss 0.9780 LearningRate 0.0009 Epoch: 18 Global Step: 196830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:36:40,014-Speed 5479.62 samples/sec Loss 0.9859 LearningRate 0.0009 Epoch: 18 Global Step: 196840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:36:47,406-Speed 5541.87 samples/sec Loss 0.9910 LearningRate 0.0009 Epoch: 18 Global Step: 196850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:36:54,920-Speed 5451.61 samples/sec Loss 0.9848 LearningRate 0.0009 Epoch: 18 Global Step: 196860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:37:02,315-Speed 5539.12 samples/sec Loss 0.9806 LearningRate 0.0009 Epoch: 18 Global Step: 196870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:37:09,718-Speed 5534.37 samples/sec Loss 0.9830 LearningRate 0.0009 Epoch: 18 Global Step: 196880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:37:17,184-Speed 5486.54 samples/sec Loss 0.9764 LearningRate 0.0009 Epoch: 18 Global Step: 196890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:37:24,708-Speed 5445.15 samples/sec Loss 0.9880 LearningRate 0.0008 Epoch: 18 Global Step: 196900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:37:32,154-Speed 5501.01 samples/sec Loss 0.9842 LearningRate 0.0008 Epoch: 18 Global Step: 196910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:37:39,621-Speed 5486.53 samples/sec Loss 1.0009 LearningRate 0.0008 Epoch: 18 Global Step: 196920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:37:47,106-Speed 5473.17 samples/sec Loss 0.9975 LearningRate 0.0008 Epoch: 18 Global Step: 196930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:37:54,554-Speed 5500.63 samples/sec Loss 0.9600 LearningRate 0.0008 Epoch: 18 Global Step: 196940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:38:02,009-Speed 5494.47 samples/sec Loss 0.9813 LearningRate 0.0008 Epoch: 18 Global Step: 196950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:38:09,488-Speed 5477.33 samples/sec Loss 0.9817 LearningRate 0.0008 Epoch: 18 Global Step: 196960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:38:16,931-Speed 5504.06 samples/sec Loss 0.9773 LearningRate 0.0008 Epoch: 18 Global Step: 196970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:38:24,359-Speed 5514.98 samples/sec Loss 0.9963 LearningRate 0.0008 Epoch: 18 Global Step: 196980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:38:31,872-Speed 5452.40 samples/sec Loss 0.9738 LearningRate 0.0008 Epoch: 18 Global Step: 196990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:38:39,312-Speed 5506.68 samples/sec Loss 0.9898 LearningRate 0.0008 Epoch: 18 Global Step: 197000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:38:46,759-Speed 5500.73 samples/sec Loss 0.9537 LearningRate 0.0008 Epoch: 18 Global Step: 197010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:38:54,238-Speed 5477.43 samples/sec Loss 0.9557 LearningRate 0.0008 Epoch: 18 Global Step: 197020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:39:16,683-Speed 1825.00 samples/sec Loss 0.9695 LearningRate 0.0008 Epoch: 19 Global Step: 197030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:39:24,182-Speed 5463.13 samples/sec Loss 0.9923 LearningRate 0.0008 Epoch: 19 Global Step: 197040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:39:31,618-Speed 5508.88 samples/sec Loss 0.9889 LearningRate 0.0008 Epoch: 19 Global Step: 197050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:39:39,076-Speed 5492.82 samples/sec Loss 0.9573 LearningRate 0.0008 Epoch: 19 Global Step: 197060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:39:46,568-Speed 5467.78 samples/sec Loss 0.9978 LearningRate 0.0008 Epoch: 19 Global Step: 197070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:39:53,972-Speed 5533.52 samples/sec Loss 0.9845 LearningRate 0.0008 Epoch: 19 Global Step: 197080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:40:01,472-Speed 5461.81 samples/sec Loss 0.9754 LearningRate 0.0008 Epoch: 19 Global Step: 197090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:40:08,927-Speed 5495.03 samples/sec Loss 0.9807 LearningRate 0.0008 Epoch: 19 Global Step: 197100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:40:16,398-Speed 5483.29 samples/sec Loss 1.0016 LearningRate 0.0008 Epoch: 19 Global Step: 197110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:40:23,882-Speed 5474.42 samples/sec Loss 0.9751 LearningRate 0.0008 Epoch: 19 Global Step: 197120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:40:31,360-Speed 5477.83 samples/sec Loss 0.9707 LearningRate 0.0008 Epoch: 19 Global Step: 197130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:40:38,904-Speed 5430.35 samples/sec Loss 0.9713 LearningRate 0.0008 Epoch: 19 Global Step: 197140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:40:46,322-Speed 5522.72 samples/sec Loss 0.9580 LearningRate 0.0008 Epoch: 19 Global Step: 197150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:40:53,964-Speed 5359.99 samples/sec Loss 0.9936 LearningRate 0.0008 Epoch: 19 Global Step: 197160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:41:01,420-Speed 5494.68 samples/sec Loss 0.9643 LearningRate 0.0008 Epoch: 19 Global Step: 197170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:41:08,983-Speed 5416.25 samples/sec Loss 0.9996 LearningRate 0.0008 Epoch: 19 Global Step: 197180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:41:16,396-Speed 5526.33 samples/sec Loss 0.9657 LearningRate 0.0008 Epoch: 19 Global Step: 197190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:41:23,848-Speed 5497.43 samples/sec Loss 0.9761 LearningRate 0.0008 Epoch: 19 Global Step: 197200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:41:31,251-Speed 5533.44 samples/sec Loss 0.9394 LearningRate 0.0008 Epoch: 19 Global Step: 197210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:41:38,668-Speed 5523.55 samples/sec Loss 1.0043 LearningRate 0.0008 Epoch: 19 Global Step: 197220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:41:46,042-Speed 5555.16 samples/sec Loss 0.9745 LearningRate 0.0008 Epoch: 19 Global Step: 197230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:41:53,677-Speed 5365.91 samples/sec Loss 0.9705 LearningRate 0.0008 Epoch: 19 Global Step: 197240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:42:01,310-Speed 5366.91 samples/sec Loss 0.9705 LearningRate 0.0008 Epoch: 19 Global Step: 197250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:42:08,983-Speed 5339.28 samples/sec Loss 0.9638 LearningRate 0.0008 Epoch: 19 Global Step: 197260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:42:16,676-Speed 5324.83 samples/sec Loss 0.9747 LearningRate 0.0008 Epoch: 19 Global Step: 197270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:42:24,222-Speed 5428.63 samples/sec Loss 0.9654 LearningRate 0.0008 Epoch: 19 Global Step: 197280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:42:31,632-Speed 5528.74 samples/sec Loss 0.9667 LearningRate 0.0008 Epoch: 19 Global Step: 197290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:42:39,107-Speed 5480.14 samples/sec Loss 0.9580 LearningRate 0.0008 Epoch: 19 Global Step: 197300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:42:46,567-Speed 5491.47 samples/sec Loss 0.9764 LearningRate 0.0008 Epoch: 19 Global Step: 197310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:42:54,029-Speed 5489.95 samples/sec Loss 0.9554 LearningRate 0.0008 Epoch: 19 Global Step: 197320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:43:01,643-Speed 5380.39 samples/sec Loss 0.9673 LearningRate 0.0008 Epoch: 19 Global Step: 197330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:43:09,065-Speed 5519.20 samples/sec Loss 0.9633 LearningRate 0.0008 Epoch: 19 Global Step: 197340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:43:16,545-Speed 5477.38 samples/sec Loss 0.9919 LearningRate 0.0008 Epoch: 19 Global Step: 197350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:43:23,958-Speed 5526.15 samples/sec Loss 0.9810 LearningRate 0.0008 Epoch: 19 Global Step: 197360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:43:31,407-Speed 5498.96 samples/sec Loss 0.9352 LearningRate 0.0008 Epoch: 19 Global Step: 197370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:43:38,792-Speed 5547.54 samples/sec Loss 0.9895 LearningRate 0.0008 Epoch: 19 Global Step: 197380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:43:46,275-Speed 5474.63 samples/sec Loss 0.9652 LearningRate 0.0008 Epoch: 19 Global Step: 197390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:43:53,708-Speed 5510.77 samples/sec Loss 0.9716 LearningRate 0.0008 Epoch: 19 Global Step: 197400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:44:01,197-Speed 5470.99 samples/sec Loss 0.9671 LearningRate 0.0008 Epoch: 19 Global Step: 197410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:44:08,664-Speed 5486.47 samples/sec Loss 0.9487 LearningRate 0.0008 Epoch: 19 Global Step: 197420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:44:16,132-Speed 5484.99 samples/sec Loss 0.9569 LearningRate 0.0008 Epoch: 19 Global Step: 197430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:44:23,606-Speed 5480.79 samples/sec Loss 0.9698 LearningRate 0.0008 Epoch: 19 Global Step: 197440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:44:31,067-Speed 5491.40 samples/sec Loss 0.9555 LearningRate 0.0008 Epoch: 19 Global Step: 197450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:44:38,486-Speed 5520.91 samples/sec Loss 0.9653 LearningRate 0.0008 Epoch: 19 Global Step: 197460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:44:46,054-Speed 5413.42 samples/sec Loss 0.9886 LearningRate 0.0008 Epoch: 19 Global Step: 197470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:44:53,492-Speed 5507.41 samples/sec Loss 0.9800 LearningRate 0.0008 Epoch: 19 Global Step: 197480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:45:00,912-Speed 5520.83 samples/sec Loss 0.9683 LearningRate 0.0008 Epoch: 19 Global Step: 197490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:45:08,295-Speed 5548.98 samples/sec Loss 0.9709 LearningRate 0.0008 Epoch: 19 Global Step: 197500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:45:15,780-Speed 5473.49 samples/sec Loss 0.9684 LearningRate 0.0008 Epoch: 19 Global Step: 197510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:45:23,148-Speed 5559.16 samples/sec Loss 0.9778 LearningRate 0.0008 Epoch: 19 Global Step: 197520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:45:30,716-Speed 5413.52 samples/sec Loss 0.9791 LearningRate 0.0007 Epoch: 19 Global Step: 197530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:45:38,219-Speed 5459.95 samples/sec Loss 0.9647 LearningRate 0.0007 Epoch: 19 Global Step: 197540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:45:45,693-Speed 5480.55 samples/sec Loss 0.9676 LearningRate 0.0007 Epoch: 19 Global Step: 197550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:45:53,118-Speed 5517.46 samples/sec Loss 0.9739 LearningRate 0.0007 Epoch: 19 Global Step: 197560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:46:00,528-Speed 5528.82 samples/sec Loss 0.9742 LearningRate 0.0007 Epoch: 19 Global Step: 197570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:46:08,074-Speed 5428.54 samples/sec Loss 0.9674 LearningRate 0.0007 Epoch: 19 Global Step: 197580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:46:15,560-Speed 5472.23 samples/sec Loss 0.9472 LearningRate 0.0007 Epoch: 19 Global Step: 197590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:46:23,042-Speed 5475.14 samples/sec Loss 0.9482 LearningRate 0.0007 Epoch: 19 Global Step: 197600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:46:30,549-Speed 5456.64 samples/sec Loss 0.9743 LearningRate 0.0007 Epoch: 19 Global Step: 197610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:46:38,035-Speed 5472.48 samples/sec Loss 0.9309 LearningRate 0.0007 Epoch: 19 Global Step: 197620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:46:45,557-Speed 5445.91 samples/sec Loss 0.9539 LearningRate 0.0007 Epoch: 19 Global Step: 197630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:46:53,035-Speed 5478.70 samples/sec Loss 0.9541 LearningRate 0.0007 Epoch: 19 Global Step: 197640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:47:00,510-Speed 5479.91 samples/sec Loss 0.9577 LearningRate 0.0007 Epoch: 19 Global Step: 197650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:47:08,040-Speed 5440.39 samples/sec Loss 0.9623 LearningRate 0.0007 Epoch: 19 Global Step: 197660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:47:15,578-Speed 5434.28 samples/sec Loss 0.9598 LearningRate 0.0007 Epoch: 19 Global Step: 197670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:47:23,070-Speed 5468.15 samples/sec Loss 0.9726 LearningRate 0.0007 Epoch: 19 Global Step: 197680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:47:30,542-Speed 5482.28 samples/sec Loss 0.9794 LearningRate 0.0007 Epoch: 19 Global Step: 197690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:47:37,980-Speed 5507.47 samples/sec Loss 0.9656 LearningRate 0.0007 Epoch: 19 Global Step: 197700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:47:45,500-Speed 5448.15 samples/sec Loss 0.9677 LearningRate 0.0007 Epoch: 19 Global Step: 197710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:47:53,004-Speed 5459.01 samples/sec Loss 0.9684 LearningRate 0.0007 Epoch: 19 Global Step: 197720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:48:00,655-Speed 5353.69 samples/sec Loss 0.9722 LearningRate 0.0007 Epoch: 19 Global Step: 197730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:48:08,246-Speed 5396.35 samples/sec Loss 0.9567 LearningRate 0.0007 Epoch: 19 Global Step: 197740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:48:15,755-Speed 5456.57 samples/sec Loss 0.9619 LearningRate 0.0007 Epoch: 19 Global Step: 197750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:48:23,210-Speed 5494.52 samples/sec Loss 0.9811 LearningRate 0.0007 Epoch: 19 Global Step: 197760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:48:30,685-Speed 5479.97 samples/sec Loss 0.9996 LearningRate 0.0007 Epoch: 19 Global Step: 197770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:48:38,118-Speed 5511.09 samples/sec Loss 0.9668 LearningRate 0.0007 Epoch: 19 Global Step: 197780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:48:45,507-Speed 5544.90 samples/sec Loss 0.9416 LearningRate 0.0007 Epoch: 19 Global Step: 197790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:48:52,922-Speed 5524.59 samples/sec Loss 0.9631 LearningRate 0.0007 Epoch: 19 Global Step: 197800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:49:00,431-Speed 5455.21 samples/sec Loss 0.9752 LearningRate 0.0007 Epoch: 19 Global Step: 197810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:49:07,906-Speed 5480.39 samples/sec Loss 0.9392 LearningRate 0.0007 Epoch: 19 Global Step: 197820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:49:15,518-Speed 5381.89 samples/sec Loss 0.9555 LearningRate 0.0007 Epoch: 19 Global Step: 197830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:49:23,092-Speed 5408.88 samples/sec Loss 0.9661 LearningRate 0.0007 Epoch: 19 Global Step: 197840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:49:30,520-Speed 5514.66 samples/sec Loss 0.9665 LearningRate 0.0007 Epoch: 19 Global Step: 197850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:49:38,009-Speed 5469.54 samples/sec Loss 0.9709 LearningRate 0.0007 Epoch: 19 Global Step: 197860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:49:45,427-Speed 5522.94 samples/sec Loss 0.9561 LearningRate 0.0007 Epoch: 19 Global Step: 197870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:49:52,969-Speed 5432.11 samples/sec Loss 0.9724 LearningRate 0.0007 Epoch: 19 Global Step: 197880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:50:00,435-Speed 5486.46 samples/sec Loss 0.9732 LearningRate 0.0007 Epoch: 19 Global Step: 197890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:50:07,944-Speed 5455.28 samples/sec Loss 0.9645 LearningRate 0.0007 Epoch: 19 Global Step: 197900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:50:15,435-Speed 5468.90 samples/sec Loss 0.9658 LearningRate 0.0007 Epoch: 19 Global Step: 197910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:50:22,874-Speed 5507.33 samples/sec Loss 0.9542 LearningRate 0.0007 Epoch: 19 Global Step: 197920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:50:30,318-Speed 5502.74 samples/sec Loss 0.9597 LearningRate 0.0007 Epoch: 19 Global Step: 197930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:50:37,788-Speed 5483.80 samples/sec Loss 0.9711 LearningRate 0.0007 Epoch: 19 Global Step: 197940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:50:45,250-Speed 5489.96 samples/sec Loss 0.9682 LearningRate 0.0007 Epoch: 19 Global Step: 197950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:50:52,671-Speed 5520.10 samples/sec Loss 0.9617 LearningRate 0.0007 Epoch: 19 Global Step: 197960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:51:00,148-Speed 5479.17 samples/sec Loss 0.9614 LearningRate 0.0007 Epoch: 19 Global Step: 197970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:51:07,683-Speed 5436.10 samples/sec Loss 0.9641 LearningRate 0.0007 Epoch: 19 Global Step: 197980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:51:15,174-Speed 5469.22 samples/sec Loss 0.9746 LearningRate 0.0007 Epoch: 19 Global Step: 197990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:51:22,593-Speed 5521.80 samples/sec Loss 0.9513 LearningRate 0.0007 Epoch: 19 Global Step: 198000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:52:20,582-[lfw][198000]XNorm: 22.175054 Training: 2022-01-09 15:52:20,583-[lfw][198000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 15:52:20,583-[lfw][198000]Accuracy-Highest: 0.99850 Training: 2022-01-09 15:53:28,293-[cfp_fp][198000]XNorm: 22.006353 Training: 2022-01-09 15:53:28,293-[cfp_fp][198000]Accuracy-Flip: 0.99386+-0.00367 Training: 2022-01-09 15:53:28,294-[cfp_fp][198000]Accuracy-Highest: 0.99443 Training: 2022-01-09 15:54:26,509-[agedb_30][198000]XNorm: 22.937108 Training: 2022-01-09 15:54:26,510-[agedb_30][198000]Accuracy-Flip: 0.98650+-0.00555 Training: 2022-01-09 15:54:26,511-[agedb_30][198000]Accuracy-Highest: 0.98650 Training: 2022-01-09 15:54:34,193-Speed 213.78 samples/sec Loss 0.9591 LearningRate 0.0007 Epoch: 19 Global Step: 198010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:54:41,800-Speed 5384.62 samples/sec Loss 0.9621 LearningRate 0.0007 Epoch: 19 Global Step: 198020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:54:49,188-Speed 5545.01 samples/sec Loss 0.9599 LearningRate 0.0007 Epoch: 19 Global Step: 198030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:54:56,593-Speed 5532.02 samples/sec Loss 0.9497 LearningRate 0.0007 Epoch: 19 Global Step: 198040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:55:04,144-Speed 5425.68 samples/sec Loss 0.9593 LearningRate 0.0007 Epoch: 19 Global Step: 198050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:55:11,696-Speed 5424.32 samples/sec Loss 0.9570 LearningRate 0.0007 Epoch: 19 Global Step: 198060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:55:19,268-Speed 5410.56 samples/sec Loss 0.9467 LearningRate 0.0007 Epoch: 19 Global Step: 198070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:55:26,757-Speed 5469.32 samples/sec Loss 0.9465 LearningRate 0.0007 Epoch: 19 Global Step: 198080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:55:34,255-Speed 5464.33 samples/sec Loss 0.9447 LearningRate 0.0007 Epoch: 19 Global Step: 198090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:55:41,872-Speed 5378.18 samples/sec Loss 0.9395 LearningRate 0.0007 Epoch: 19 Global Step: 198100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:55:49,357-Speed 5473.02 samples/sec Loss 0.9725 LearningRate 0.0007 Epoch: 19 Global Step: 198110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:55:56,786-Speed 5514.02 samples/sec Loss 0.9695 LearningRate 0.0007 Epoch: 19 Global Step: 198120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:56:04,304-Speed 5448.88 samples/sec Loss 0.9644 LearningRate 0.0007 Epoch: 19 Global Step: 198130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:56:11,864-Speed 5418.77 samples/sec Loss 0.9496 LearningRate 0.0007 Epoch: 19 Global Step: 198140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:56:19,330-Speed 5487.21 samples/sec Loss 0.9550 LearningRate 0.0007 Epoch: 19 Global Step: 198150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 15:56:26,826-Speed 5464.47 samples/sec Loss 0.9714 LearningRate 0.0007 Epoch: 19 Global Step: 198160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:56:34,439-Speed 5380.96 samples/sec Loss 0.9451 LearningRate 0.0007 Epoch: 19 Global Step: 198170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:56:41,959-Speed 5448.25 samples/sec Loss 0.9522 LearningRate 0.0007 Epoch: 19 Global Step: 198180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:56:49,438-Speed 5477.62 samples/sec Loss 0.9663 LearningRate 0.0007 Epoch: 19 Global Step: 198190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:56:56,900-Speed 5488.93 samples/sec Loss 0.9458 LearningRate 0.0007 Epoch: 19 Global Step: 198200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:57:04,432-Speed 5439.75 samples/sec Loss 0.9670 LearningRate 0.0006 Epoch: 19 Global Step: 198210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:57:11,969-Speed 5434.80 samples/sec Loss 0.9571 LearningRate 0.0006 Epoch: 19 Global Step: 198220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:57:19,533-Speed 5416.42 samples/sec Loss 0.9462 LearningRate 0.0006 Epoch: 19 Global Step: 198230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:57:27,094-Speed 5417.55 samples/sec Loss 0.9332 LearningRate 0.0006 Epoch: 19 Global Step: 198240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:57:34,689-Speed 5394.11 samples/sec Loss 0.9673 LearningRate 0.0006 Epoch: 19 Global Step: 198250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:57:42,143-Speed 5495.80 samples/sec Loss 0.9581 LearningRate 0.0006 Epoch: 19 Global Step: 198260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:57:49,613-Speed 5483.54 samples/sec Loss 0.9491 LearningRate 0.0006 Epoch: 19 Global Step: 198270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:57:57,140-Speed 5442.61 samples/sec Loss 0.9524 LearningRate 0.0006 Epoch: 19 Global Step: 198280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:58:04,573-Speed 5511.07 samples/sec Loss 0.9625 LearningRate 0.0006 Epoch: 19 Global Step: 198290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:58:12,006-Speed 5511.21 samples/sec Loss 0.9505 LearningRate 0.0006 Epoch: 19 Global Step: 198300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:58:19,534-Speed 5442.11 samples/sec Loss 0.9303 LearningRate 0.0006 Epoch: 19 Global Step: 198310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:58:26,981-Speed 5500.81 samples/sec Loss 0.9641 LearningRate 0.0006 Epoch: 19 Global Step: 198320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:58:34,489-Speed 5456.27 samples/sec Loss 0.9526 LearningRate 0.0006 Epoch: 19 Global Step: 198330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:58:41,975-Speed 5472.78 samples/sec Loss 0.9580 LearningRate 0.0006 Epoch: 19 Global Step: 198340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:58:49,478-Speed 5459.47 samples/sec Loss 0.9472 LearningRate 0.0006 Epoch: 19 Global Step: 198350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 15:58:56,968-Speed 5469.31 samples/sec Loss 0.9533 LearningRate 0.0006 Epoch: 19 Global Step: 198360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:59:04,553-Speed 5401.32 samples/sec Loss 0.9707 LearningRate 0.0006 Epoch: 19 Global Step: 198370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:59:12,181-Speed 5370.07 samples/sec Loss 0.9370 LearningRate 0.0006 Epoch: 19 Global Step: 198380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:59:19,692-Speed 5454.19 samples/sec Loss 0.9468 LearningRate 0.0006 Epoch: 19 Global Step: 198390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:59:27,129-Speed 5508.68 samples/sec Loss 0.9596 LearningRate 0.0006 Epoch: 19 Global Step: 198400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:59:34,564-Speed 5509.42 samples/sec Loss 0.9663 LearningRate 0.0006 Epoch: 19 Global Step: 198410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:59:42,003-Speed 5507.14 samples/sec Loss 0.9431 LearningRate 0.0006 Epoch: 19 Global Step: 198420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:59:49,474-Speed 5483.56 samples/sec Loss 0.9543 LearningRate 0.0006 Epoch: 19 Global Step: 198430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 15:59:56,861-Speed 5545.97 samples/sec Loss 0.9705 LearningRate 0.0006 Epoch: 19 Global Step: 198440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:00:04,414-Speed 5423.21 samples/sec Loss 0.9503 LearningRate 0.0006 Epoch: 19 Global Step: 198450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:00:11,923-Speed 5455.67 samples/sec Loss 0.9298 LearningRate 0.0006 Epoch: 19 Global Step: 198460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:00:19,400-Speed 5478.72 samples/sec Loss 0.9294 LearningRate 0.0006 Epoch: 19 Global Step: 198470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:00:26,828-Speed 5515.22 samples/sec Loss 0.9295 LearningRate 0.0006 Epoch: 19 Global Step: 198480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:00:34,295-Speed 5485.95 samples/sec Loss 0.9561 LearningRate 0.0006 Epoch: 19 Global Step: 198490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:00:41,815-Speed 5447.77 samples/sec Loss 0.9472 LearningRate 0.0006 Epoch: 19 Global Step: 198500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:00:49,339-Speed 5445.01 samples/sec Loss 0.9518 LearningRate 0.0006 Epoch: 19 Global Step: 198510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:00:56,929-Speed 5397.20 samples/sec Loss 0.9624 LearningRate 0.0006 Epoch: 19 Global Step: 198520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:01:04,413-Speed 5473.30 samples/sec Loss 0.9273 LearningRate 0.0006 Epoch: 19 Global Step: 198530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:01:11,840-Speed 5516.04 samples/sec Loss 0.9545 LearningRate 0.0006 Epoch: 19 Global Step: 198540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:01:19,359-Speed 5448.54 samples/sec Loss 0.9483 LearningRate 0.0006 Epoch: 19 Global Step: 198550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:01:26,828-Speed 5484.64 samples/sec Loss 0.9341 LearningRate 0.0006 Epoch: 19 Global Step: 198560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:01:34,262-Speed 5510.71 samples/sec Loss 0.9279 LearningRate 0.0006 Epoch: 19 Global Step: 198570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:01:41,787-Speed 5443.48 samples/sec Loss 0.9505 LearningRate 0.0006 Epoch: 19 Global Step: 198580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:01:49,286-Speed 5463.19 samples/sec Loss 0.9478 LearningRate 0.0006 Epoch: 19 Global Step: 198590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:01:56,728-Speed 5504.38 samples/sec Loss 0.9417 LearningRate 0.0006 Epoch: 19 Global Step: 198600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:02:04,210-Speed 5475.60 samples/sec Loss 0.9522 LearningRate 0.0006 Epoch: 19 Global Step: 198610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:02:11,634-Speed 5517.44 samples/sec Loss 0.9378 LearningRate 0.0006 Epoch: 19 Global Step: 198620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:02:19,088-Speed 5496.24 samples/sec Loss 0.9630 LearningRate 0.0006 Epoch: 19 Global Step: 198630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:02:26,578-Speed 5469.70 samples/sec Loss 0.9569 LearningRate 0.0006 Epoch: 19 Global Step: 198640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:02:34,040-Speed 5489.65 samples/sec Loss 0.9400 LearningRate 0.0006 Epoch: 19 Global Step: 198650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:02:41,451-Speed 5527.58 samples/sec Loss 0.9418 LearningRate 0.0006 Epoch: 19 Global Step: 198660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:02:48,868-Speed 5523.36 samples/sec Loss 0.9488 LearningRate 0.0006 Epoch: 19 Global Step: 198670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:02:56,358-Speed 5468.88 samples/sec Loss 0.9624 LearningRate 0.0006 Epoch: 19 Global Step: 198680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:03:03,766-Speed 5529.87 samples/sec Loss 0.9569 LearningRate 0.0006 Epoch: 19 Global Step: 198690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:03:11,176-Speed 5529.09 samples/sec Loss 0.9229 LearningRate 0.0006 Epoch: 19 Global Step: 198700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:03:18,721-Speed 5429.33 samples/sec Loss 0.9531 LearningRate 0.0006 Epoch: 19 Global Step: 198710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:03:26,149-Speed 5515.16 samples/sec Loss 0.9219 LearningRate 0.0006 Epoch: 19 Global Step: 198720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:03:33,808-Speed 5348.92 samples/sec Loss 0.9306 LearningRate 0.0006 Epoch: 19 Global Step: 198730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:03:41,301-Speed 5466.61 samples/sec Loss 0.9457 LearningRate 0.0006 Epoch: 19 Global Step: 198740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:03:48,780-Speed 5477.90 samples/sec Loss 0.9380 LearningRate 0.0006 Epoch: 19 Global Step: 198750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:03:56,215-Speed 5509.57 samples/sec Loss 0.9552 LearningRate 0.0006 Epoch: 19 Global Step: 198760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:04:03,733-Speed 5449.09 samples/sec Loss 0.9481 LearningRate 0.0006 Epoch: 19 Global Step: 198770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:04:11,266-Speed 5438.15 samples/sec Loss 0.9592 LearningRate 0.0006 Epoch: 19 Global Step: 198780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:04:18,727-Speed 5491.01 samples/sec Loss 0.9401 LearningRate 0.0006 Epoch: 19 Global Step: 198790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:04:26,268-Speed 5431.82 samples/sec Loss 0.9558 LearningRate 0.0006 Epoch: 19 Global Step: 198800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:04:33,650-Speed 5549.57 samples/sec Loss 0.9365 LearningRate 0.0006 Epoch: 19 Global Step: 198810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:04:41,048-Speed 5537.28 samples/sec Loss 0.9359 LearningRate 0.0006 Epoch: 19 Global Step: 198820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:04:48,641-Speed 5395.63 samples/sec Loss 0.9441 LearningRate 0.0006 Epoch: 19 Global Step: 198830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:04:56,221-Speed 5404.17 samples/sec Loss 0.9602 LearningRate 0.0006 Epoch: 19 Global Step: 198840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:05:03,657-Speed 5509.38 samples/sec Loss 0.9382 LearningRate 0.0006 Epoch: 19 Global Step: 198850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:05:11,236-Speed 5405.27 samples/sec Loss 0.9555 LearningRate 0.0006 Epoch: 19 Global Step: 198860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:05:18,677-Speed 5505.31 samples/sec Loss 0.9491 LearningRate 0.0006 Epoch: 19 Global Step: 198870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:05:26,175-Speed 5463.53 samples/sec Loss 0.9375 LearningRate 0.0006 Epoch: 19 Global Step: 198880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:05:33,620-Speed 5502.29 samples/sec Loss 0.9287 LearningRate 0.0006 Epoch: 19 Global Step: 198890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:05:41,165-Speed 5429.59 samples/sec Loss 0.9461 LearningRate 0.0006 Epoch: 19 Global Step: 198900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:05:48,662-Speed 5464.41 samples/sec Loss 0.9519 LearningRate 0.0006 Epoch: 19 Global Step: 198910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:05:56,138-Speed 5480.63 samples/sec Loss 0.9043 LearningRate 0.0006 Epoch: 19 Global Step: 198920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:06:03,562-Speed 5518.39 samples/sec Loss 0.9274 LearningRate 0.0006 Epoch: 19 Global Step: 198930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:06:11,092-Speed 5440.52 samples/sec Loss 0.9159 LearningRate 0.0006 Epoch: 19 Global Step: 198940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:06:18,538-Speed 5501.19 samples/sec Loss 0.9531 LearningRate 0.0005 Epoch: 19 Global Step: 198950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:06:25,954-Speed 5524.22 samples/sec Loss 0.9267 LearningRate 0.0005 Epoch: 19 Global Step: 198960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:06:33,378-Speed 5518.05 samples/sec Loss 0.9430 LearningRate 0.0005 Epoch: 19 Global Step: 198970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:06:40,762-Speed 5548.07 samples/sec Loss 0.9251 LearningRate 0.0005 Epoch: 19 Global Step: 198980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:06:48,257-Speed 5465.79 samples/sec Loss 0.9365 LearningRate 0.0005 Epoch: 19 Global Step: 198990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:06:55,669-Speed 5527.16 samples/sec Loss 0.9466 LearningRate 0.0005 Epoch: 19 Global Step: 199000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:07:03,229-Speed 5418.32 samples/sec Loss 0.9242 LearningRate 0.0005 Epoch: 19 Global Step: 199010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:07:10,676-Speed 5501.08 samples/sec Loss 0.9269 LearningRate 0.0005 Epoch: 19 Global Step: 199020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:07:18,089-Speed 5526.40 samples/sec Loss 0.9395 LearningRate 0.0005 Epoch: 19 Global Step: 199030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:07:25,539-Speed 5498.42 samples/sec Loss 0.9345 LearningRate 0.0005 Epoch: 19 Global Step: 199040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:07:33,048-Speed 5455.29 samples/sec Loss 0.9438 LearningRate 0.0005 Epoch: 19 Global Step: 199050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:07:40,529-Speed 5475.96 samples/sec Loss 0.9515 LearningRate 0.0005 Epoch: 19 Global Step: 199060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:07:47,984-Speed 5494.97 samples/sec Loss 0.9214 LearningRate 0.0005 Epoch: 19 Global Step: 199070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:07:55,537-Speed 5423.90 samples/sec Loss 0.9300 LearningRate 0.0005 Epoch: 19 Global Step: 199080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:08:03,001-Speed 5488.75 samples/sec Loss 0.9371 LearningRate 0.0005 Epoch: 19 Global Step: 199090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:08:10,450-Speed 5499.09 samples/sec Loss 0.9768 LearningRate 0.0005 Epoch: 19 Global Step: 199100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:08:17,899-Speed 5500.15 samples/sec Loss 0.9460 LearningRate 0.0005 Epoch: 19 Global Step: 199110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:08:25,302-Speed 5533.74 samples/sec Loss 0.9155 LearningRate 0.0005 Epoch: 19 Global Step: 199120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:08:32,802-Speed 5461.48 samples/sec Loss 0.9490 LearningRate 0.0005 Epoch: 19 Global Step: 199130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:08:40,301-Speed 5463.23 samples/sec Loss 0.9375 LearningRate 0.0005 Epoch: 19 Global Step: 199140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:08:47,747-Speed 5501.52 samples/sec Loss 0.9401 LearningRate 0.0005 Epoch: 19 Global Step: 199150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:08:55,117-Speed 5558.26 samples/sec Loss 0.9275 LearningRate 0.0005 Epoch: 19 Global Step: 199160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:09:02,600-Speed 5474.86 samples/sec Loss 0.9442 LearningRate 0.0005 Epoch: 19 Global Step: 199170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:09:10,022-Speed 5519.19 samples/sec Loss 0.9574 LearningRate 0.0005 Epoch: 19 Global Step: 199180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:09:17,617-Speed 5394.15 samples/sec Loss 0.9323 LearningRate 0.0005 Epoch: 19 Global Step: 199190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:09:25,041-Speed 5518.12 samples/sec Loss 0.9298 LearningRate 0.0005 Epoch: 19 Global Step: 199200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:09:32,520-Speed 5477.71 samples/sec Loss 0.9447 LearningRate 0.0005 Epoch: 19 Global Step: 199210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:09:40,010-Speed 5469.29 samples/sec Loss 0.9474 LearningRate 0.0005 Epoch: 19 Global Step: 199220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:09:47,489-Speed 5477.20 samples/sec Loss 0.9421 LearningRate 0.0005 Epoch: 19 Global Step: 199230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:09:54,921-Speed 5511.93 samples/sec Loss 0.9357 LearningRate 0.0005 Epoch: 19 Global Step: 199240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:10:02,383-Speed 5490.28 samples/sec Loss 0.9344 LearningRate 0.0005 Epoch: 19 Global Step: 199250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:10:09,804-Speed 5520.51 samples/sec Loss 0.9428 LearningRate 0.0005 Epoch: 19 Global Step: 199260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:10:17,278-Speed 5481.17 samples/sec Loss 0.9364 LearningRate 0.0005 Epoch: 19 Global Step: 199270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:10:24,706-Speed 5514.69 samples/sec Loss 0.9370 LearningRate 0.0005 Epoch: 19 Global Step: 199280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:10:32,135-Speed 5514.80 samples/sec Loss 0.9535 LearningRate 0.0005 Epoch: 19 Global Step: 199290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:10:39,614-Speed 5477.04 samples/sec Loss 0.9591 LearningRate 0.0005 Epoch: 19 Global Step: 199300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:10:47,132-Speed 5449.24 samples/sec Loss 0.9560 LearningRate 0.0005 Epoch: 19 Global Step: 199310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:10:54,591-Speed 5492.11 samples/sec Loss 0.9077 LearningRate 0.0005 Epoch: 19 Global Step: 199320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:11:02,095-Speed 5459.12 samples/sec Loss 0.9405 LearningRate 0.0005 Epoch: 19 Global Step: 199330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:11:09,725-Speed 5369.20 samples/sec Loss 0.9420 LearningRate 0.0005 Epoch: 19 Global Step: 199340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:11:17,192-Speed 5485.96 samples/sec Loss 0.9288 LearningRate 0.0005 Epoch: 19 Global Step: 199350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:11:24,652-Speed 5491.65 samples/sec Loss 0.9311 LearningRate 0.0005 Epoch: 19 Global Step: 199360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:11:32,015-Speed 5563.63 samples/sec Loss 0.9398 LearningRate 0.0005 Epoch: 19 Global Step: 199370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:11:39,491-Speed 5480.02 samples/sec Loss 0.9268 LearningRate 0.0005 Epoch: 19 Global Step: 199380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:11:47,110-Speed 5376.25 samples/sec Loss 0.9491 LearningRate 0.0005 Epoch: 19 Global Step: 199390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:11:54,574-Speed 5488.62 samples/sec Loss 0.9297 LearningRate 0.0005 Epoch: 19 Global Step: 199400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:12:01,972-Speed 5537.74 samples/sec Loss 0.9374 LearningRate 0.0005 Epoch: 19 Global Step: 199410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:12:09,409-Speed 5508.31 samples/sec Loss 0.9239 LearningRate 0.0005 Epoch: 19 Global Step: 199420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:12:16,825-Speed 5523.93 samples/sec Loss 0.9366 LearningRate 0.0005 Epoch: 19 Global Step: 199430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:12:24,425-Speed 5390.06 samples/sec Loss 0.9204 LearningRate 0.0005 Epoch: 19 Global Step: 199440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:12:31,820-Speed 5539.47 samples/sec Loss 0.9524 LearningRate 0.0005 Epoch: 19 Global Step: 199450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:12:39,243-Speed 5519.54 samples/sec Loss 0.9404 LearningRate 0.0005 Epoch: 19 Global Step: 199460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:12:46,716-Speed 5481.80 samples/sec Loss 0.9428 LearningRate 0.0005 Epoch: 19 Global Step: 199470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:12:54,252-Speed 5435.94 samples/sec Loss 0.9379 LearningRate 0.0005 Epoch: 19 Global Step: 199480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:13:01,640-Speed 5544.81 samples/sec Loss 0.9373 LearningRate 0.0005 Epoch: 19 Global Step: 199490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:13:09,086-Speed 5501.65 samples/sec Loss 0.9126 LearningRate 0.0005 Epoch: 19 Global Step: 199500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:13:16,537-Speed 5497.74 samples/sec Loss 0.9044 LearningRate 0.0005 Epoch: 19 Global Step: 199510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:13:23,920-Speed 5548.72 samples/sec Loss 0.9460 LearningRate 0.0005 Epoch: 19 Global Step: 199520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:13:31,396-Speed 5480.08 samples/sec Loss 0.9313 LearningRate 0.0005 Epoch: 19 Global Step: 199530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:13:38,805-Speed 5529.17 samples/sec Loss 0.9352 LearningRate 0.0005 Epoch: 19 Global Step: 199540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:13:46,274-Speed 5484.29 samples/sec Loss 0.9330 LearningRate 0.0005 Epoch: 19 Global Step: 199550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:13:53,742-Speed 5485.31 samples/sec Loss 0.9328 LearningRate 0.0005 Epoch: 19 Global Step: 199560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:14:01,184-Speed 5504.97 samples/sec Loss 0.9180 LearningRate 0.0005 Epoch: 19 Global Step: 199570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:14:08,543-Speed 5566.75 samples/sec Loss 0.9138 LearningRate 0.0005 Epoch: 19 Global Step: 199580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:14:16,061-Speed 5448.70 samples/sec Loss 0.9245 LearningRate 0.0005 Epoch: 19 Global Step: 199590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:14:23,827-Speed 5275.22 samples/sec Loss 0.9385 LearningRate 0.0005 Epoch: 19 Global Step: 199600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:14:31,255-Speed 5515.56 samples/sec Loss 0.9413 LearningRate 0.0005 Epoch: 19 Global Step: 199610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:14:38,679-Speed 5517.57 samples/sec Loss 0.9301 LearningRate 0.0005 Epoch: 19 Global Step: 199620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:14:46,152-Speed 5481.43 samples/sec Loss 0.9436 LearningRate 0.0005 Epoch: 19 Global Step: 199630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:14:53,623-Speed 5483.67 samples/sec Loss 0.9281 LearningRate 0.0005 Epoch: 19 Global Step: 199640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:15:01,076-Speed 5496.35 samples/sec Loss 0.9406 LearningRate 0.0005 Epoch: 19 Global Step: 199650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:15:08,479-Speed 5533.70 samples/sec Loss 0.9325 LearningRate 0.0005 Epoch: 19 Global Step: 199660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:15:15,931-Speed 5497.39 samples/sec Loss 0.9365 LearningRate 0.0005 Epoch: 19 Global Step: 199670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:15:23,387-Speed 5494.54 samples/sec Loss 0.9218 LearningRate 0.0005 Epoch: 19 Global Step: 199680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:15:30,857-Speed 5483.49 samples/sec Loss 0.9396 LearningRate 0.0005 Epoch: 19 Global Step: 199690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:15:38,320-Speed 5489.68 samples/sec Loss 0.9309 LearningRate 0.0005 Epoch: 19 Global Step: 199700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:15:45,827-Speed 5456.57 samples/sec Loss 0.9250 LearningRate 0.0005 Epoch: 19 Global Step: 199710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:15:53,366-Speed 5434.01 samples/sec Loss 0.9122 LearningRate 0.0005 Epoch: 19 Global Step: 199720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:16:00,870-Speed 5459.69 samples/sec Loss 0.9468 LearningRate 0.0005 Epoch: 19 Global Step: 199730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:16:08,437-Speed 5413.18 samples/sec Loss 0.9433 LearningRate 0.0005 Epoch: 19 Global Step: 199740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:16:15,931-Speed 5467.08 samples/sec Loss 0.9044 LearningRate 0.0004 Epoch: 19 Global Step: 199750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:16:23,407-Speed 5479.35 samples/sec Loss 0.9236 LearningRate 0.0004 Epoch: 19 Global Step: 199760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:16:30,818-Speed 5527.39 samples/sec Loss 0.9155 LearningRate 0.0004 Epoch: 19 Global Step: 199770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:16:38,242-Speed 5518.12 samples/sec Loss 0.9232 LearningRate 0.0004 Epoch: 19 Global Step: 199780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:16:45,699-Speed 5493.90 samples/sec Loss 0.9180 LearningRate 0.0004 Epoch: 19 Global Step: 199790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:16:53,143-Speed 5502.65 samples/sec Loss 0.9203 LearningRate 0.0004 Epoch: 19 Global Step: 199800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:17:00,631-Speed 5471.07 samples/sec Loss 0.9229 LearningRate 0.0004 Epoch: 19 Global Step: 199810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:17:08,071-Speed 5506.74 samples/sec Loss 0.9097 LearningRate 0.0004 Epoch: 19 Global Step: 199820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:17:15,493-Speed 5519.02 samples/sec Loss 0.9360 LearningRate 0.0004 Epoch: 19 Global Step: 199830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:17:23,004-Speed 5454.40 samples/sec Loss 0.9278 LearningRate 0.0004 Epoch: 19 Global Step: 199840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:17:30,472-Speed 5485.53 samples/sec Loss 0.9279 LearningRate 0.0004 Epoch: 19 Global Step: 199850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:17:38,052-Speed 5404.32 samples/sec Loss 0.9265 LearningRate 0.0004 Epoch: 19 Global Step: 199860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:17:45,482-Speed 5513.74 samples/sec Loss 0.9211 LearningRate 0.0004 Epoch: 19 Global Step: 199870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:17:53,080-Speed 5391.21 samples/sec Loss 0.9478 LearningRate 0.0004 Epoch: 19 Global Step: 199880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:18:00,620-Speed 5433.43 samples/sec Loss 0.9450 LearningRate 0.0004 Epoch: 19 Global Step: 199890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:18:08,017-Speed 5538.42 samples/sec Loss 0.9497 LearningRate 0.0004 Epoch: 19 Global Step: 199900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:18:15,501-Speed 5473.53 samples/sec Loss 0.9248 LearningRate 0.0004 Epoch: 19 Global Step: 199910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:18:22,943-Speed 5504.26 samples/sec Loss 0.9319 LearningRate 0.0004 Epoch: 19 Global Step: 199920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:18:30,419-Speed 5480.18 samples/sec Loss 0.9325 LearningRate 0.0004 Epoch: 19 Global Step: 199930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:18:37,867-Speed 5500.59 samples/sec Loss 0.9268 LearningRate 0.0004 Epoch: 19 Global Step: 199940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:18:45,338-Speed 5482.41 samples/sec Loss 0.9379 LearningRate 0.0004 Epoch: 19 Global Step: 199950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:18:52,825-Speed 5471.87 samples/sec Loss 0.9373 LearningRate 0.0004 Epoch: 19 Global Step: 199960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:19:00,276-Speed 5497.86 samples/sec Loss 0.9092 LearningRate 0.0004 Epoch: 19 Global Step: 199970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:19:07,701-Speed 5517.38 samples/sec Loss 0.9265 LearningRate 0.0004 Epoch: 19 Global Step: 199980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:19:15,148-Speed 5500.87 samples/sec Loss 0.9145 LearningRate 0.0004 Epoch: 19 Global Step: 199990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:19:22,613-Speed 5487.77 samples/sec Loss 0.9165 LearningRate 0.0004 Epoch: 19 Global Step: 200000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:20:07,022-[lfw][200000]XNorm: 22.299258 Training: 2022-01-09 16:20:07,023-[lfw][200000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 16:20:07,023-[lfw][200000]Accuracy-Highest: 0.99850 Training: 2022-01-09 16:20:58,656-[cfp_fp][200000]XNorm: 22.083487 Training: 2022-01-09 16:20:58,657-[cfp_fp][200000]Accuracy-Flip: 0.99371+-0.00368 Training: 2022-01-09 16:20:58,657-[cfp_fp][200000]Accuracy-Highest: 0.99443 Training: 2022-01-09 16:21:43,027-[agedb_30][200000]XNorm: 23.061013 Training: 2022-01-09 16:21:43,028-[agedb_30][200000]Accuracy-Flip: 0.98667+-0.00532 Training: 2022-01-09 16:21:43,028-[agedb_30][200000]Accuracy-Highest: 0.98667 Training: 2022-01-09 16:21:50,606-Speed 276.77 samples/sec Loss 0.9496 LearningRate 0.0004 Epoch: 19 Global Step: 200010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:21:58,044-Speed 5507.20 samples/sec Loss 0.9432 LearningRate 0.0004 Epoch: 19 Global Step: 200020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:22:05,594-Speed 5425.94 samples/sec Loss 0.9422 LearningRate 0.0004 Epoch: 19 Global Step: 200030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:22:13,214-Speed 5376.19 samples/sec Loss 0.9192 LearningRate 0.0004 Epoch: 19 Global Step: 200040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:22:20,671-Speed 5493.93 samples/sec Loss 0.9508 LearningRate 0.0004 Epoch: 19 Global Step: 200050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:22:28,103-Speed 5511.44 samples/sec Loss 0.9342 LearningRate 0.0004 Epoch: 19 Global Step: 200060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:22:35,602-Speed 5463.11 samples/sec Loss 0.9253 LearningRate 0.0004 Epoch: 19 Global Step: 200070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:22:43,064-Speed 5489.77 samples/sec Loss 0.9119 LearningRate 0.0004 Epoch: 19 Global Step: 200080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:22:50,493-Speed 5514.83 samples/sec Loss 0.9327 LearningRate 0.0004 Epoch: 19 Global Step: 200090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:22:57,933-Speed 5505.65 samples/sec Loss 0.9334 LearningRate 0.0004 Epoch: 19 Global Step: 200100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:23:05,417-Speed 5473.82 samples/sec Loss 0.9407 LearningRate 0.0004 Epoch: 19 Global Step: 200110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:23:12,877-Speed 5491.31 samples/sec Loss 0.9252 LearningRate 0.0004 Epoch: 19 Global Step: 200120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:23:20,317-Speed 5506.20 samples/sec Loss 0.9240 LearningRate 0.0004 Epoch: 19 Global Step: 200130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:23:27,734-Speed 5523.48 samples/sec Loss 0.9288 LearningRate 0.0004 Epoch: 19 Global Step: 200140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:23:35,181-Speed 5500.63 samples/sec Loss 0.9217 LearningRate 0.0004 Epoch: 19 Global Step: 200150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:23:42,587-Speed 5531.49 samples/sec Loss 0.9379 LearningRate 0.0004 Epoch: 19 Global Step: 200160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:23:50,084-Speed 5464.59 samples/sec Loss 0.9276 LearningRate 0.0004 Epoch: 19 Global Step: 200170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:23:57,525-Speed 5504.97 samples/sec Loss 0.9294 LearningRate 0.0004 Epoch: 19 Global Step: 200180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:24:04,979-Speed 5496.06 samples/sec Loss 0.9141 LearningRate 0.0004 Epoch: 19 Global Step: 200190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:24:12,386-Speed 5530.47 samples/sec Loss 0.9211 LearningRate 0.0004 Epoch: 19 Global Step: 200200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:24:19,825-Speed 5506.75 samples/sec Loss 0.9143 LearningRate 0.0004 Epoch: 19 Global Step: 200210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:24:27,263-Speed 5507.63 samples/sec Loss 0.9133 LearningRate 0.0004 Epoch: 19 Global Step: 200220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:24:34,675-Speed 5527.07 samples/sec Loss 0.9434 LearningRate 0.0004 Epoch: 19 Global Step: 200230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:24:42,087-Speed 5527.14 samples/sec Loss 0.9051 LearningRate 0.0004 Epoch: 19 Global Step: 200240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:24:49,523-Speed 5509.23 samples/sec Loss 0.9193 LearningRate 0.0004 Epoch: 19 Global Step: 200250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:24:56,946-Speed 5518.15 samples/sec Loss 0.9349 LearningRate 0.0004 Epoch: 19 Global Step: 200260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-09 16:25:04,396-Speed 5498.32 samples/sec Loss 0.9243 LearningRate 0.0004 Epoch: 19 Global Step: 200270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:25:11,845-Speed 5500.43 samples/sec Loss 0.9162 LearningRate 0.0004 Epoch: 19 Global Step: 200280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:25:19,268-Speed 5518.53 samples/sec Loss 0.9437 LearningRate 0.0004 Epoch: 19 Global Step: 200290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:25:26,709-Speed 5505.22 samples/sec Loss 0.9110 LearningRate 0.0004 Epoch: 19 Global Step: 200300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:25:34,135-Speed 5516.02 samples/sec Loss 0.9172 LearningRate 0.0004 Epoch: 19 Global Step: 200310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:25:41,636-Speed 5461.55 samples/sec Loss 0.9282 LearningRate 0.0004 Epoch: 19 Global Step: 200320 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:25:49,208-Speed 5410.53 samples/sec Loss 0.8999 LearningRate 0.0004 Epoch: 19 Global Step: 200330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:25:56,922-Speed 5310.70 samples/sec Loss 0.9370 LearningRate 0.0004 Epoch: 19 Global Step: 200340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:26:04,533-Speed 5382.26 samples/sec Loss 0.9271 LearningRate 0.0004 Epoch: 19 Global Step: 200350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:26:12,004-Speed 5483.80 samples/sec Loss 0.9074 LearningRate 0.0004 Epoch: 19 Global Step: 200360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:26:19,377-Speed 5555.87 samples/sec Loss 0.9257 LearningRate 0.0004 Epoch: 19 Global Step: 200370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:26:26,811-Speed 5510.31 samples/sec Loss 0.9395 LearningRate 0.0004 Epoch: 19 Global Step: 200380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:26:34,311-Speed 5461.74 samples/sec Loss 0.9182 LearningRate 0.0004 Epoch: 19 Global Step: 200390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:26:41,744-Speed 5512.33 samples/sec Loss 0.9161 LearningRate 0.0004 Epoch: 19 Global Step: 200400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:26:49,211-Speed 5486.34 samples/sec Loss 0.9243 LearningRate 0.0004 Epoch: 19 Global Step: 200410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:26:56,682-Speed 5483.44 samples/sec Loss 0.9287 LearningRate 0.0004 Epoch: 19 Global Step: 200420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:27:04,069-Speed 5545.61 samples/sec Loss 0.9244 LearningRate 0.0004 Epoch: 19 Global Step: 200430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:27:11,421-Speed 5571.54 samples/sec Loss 0.9243 LearningRate 0.0004 Epoch: 19 Global Step: 200440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:27:18,756-Speed 5585.29 samples/sec Loss 0.9055 LearningRate 0.0004 Epoch: 19 Global Step: 200450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:27:26,134-Speed 5558.28 samples/sec Loss 0.9101 LearningRate 0.0004 Epoch: 19 Global Step: 200460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:27:34,150-Speed 5539.81 samples/sec Loss 0.9176 LearningRate 0.0004 Epoch: 19 Global Step: 200470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:27:41,618-Speed 5484.90 samples/sec Loss 0.9035 LearningRate 0.0004 Epoch: 19 Global Step: 200480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:27:48,997-Speed 5552.55 samples/sec Loss 0.9286 LearningRate 0.0004 Epoch: 19 Global Step: 200490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:27:56,426-Speed 5514.20 samples/sec Loss 0.9095 LearningRate 0.0004 Epoch: 19 Global Step: 200500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:28:03,978-Speed 5424.54 samples/sec Loss 0.9075 LearningRate 0.0004 Epoch: 19 Global Step: 200510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:28:11,426-Speed 5499.78 samples/sec Loss 0.8897 LearningRate 0.0004 Epoch: 19 Global Step: 200520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:28:18,769-Speed 5579.24 samples/sec Loss 0.9174 LearningRate 0.0004 Epoch: 19 Global Step: 200530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:28:26,170-Speed 5535.02 samples/sec Loss 0.9129 LearningRate 0.0004 Epoch: 19 Global Step: 200540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:28:33,620-Speed 5499.16 samples/sec Loss 0.8802 LearningRate 0.0004 Epoch: 19 Global Step: 200550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:28:41,059-Speed 5506.31 samples/sec Loss 0.9201 LearningRate 0.0004 Epoch: 19 Global Step: 200560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:28:48,482-Speed 5518.69 samples/sec Loss 0.9185 LearningRate 0.0004 Epoch: 19 Global Step: 200570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:28:55,912-Speed 5513.78 samples/sec Loss 0.9333 LearningRate 0.0004 Epoch: 19 Global Step: 200580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:29:03,401-Speed 5470.11 samples/sec Loss 0.9259 LearningRate 0.0004 Epoch: 19 Global Step: 200590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:29:10,879-Speed 5478.10 samples/sec Loss 0.9220 LearningRate 0.0004 Epoch: 19 Global Step: 200600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:29:18,384-Speed 5458.30 samples/sec Loss 0.8901 LearningRate 0.0004 Epoch: 19 Global Step: 200610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:29:25,776-Speed 5542.49 samples/sec Loss 0.9234 LearningRate 0.0004 Epoch: 19 Global Step: 200620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:29:33,318-Speed 5431.67 samples/sec Loss 0.9392 LearningRate 0.0004 Epoch: 19 Global Step: 200630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:29:40,768-Speed 5498.82 samples/sec Loss 0.9357 LearningRate 0.0004 Epoch: 19 Global Step: 200640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:29:48,171-Speed 5533.49 samples/sec Loss 0.9358 LearningRate 0.0004 Epoch: 19 Global Step: 200650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:29:55,591-Speed 5521.05 samples/sec Loss 0.9315 LearningRate 0.0003 Epoch: 19 Global Step: 200660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:30:03,028-Speed 5508.02 samples/sec Loss 0.9136 LearningRate 0.0003 Epoch: 19 Global Step: 200670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:30:10,452-Speed 5517.70 samples/sec Loss 0.9223 LearningRate 0.0003 Epoch: 19 Global Step: 200680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:30:17,949-Speed 5464.27 samples/sec Loss 0.9200 LearningRate 0.0003 Epoch: 19 Global Step: 200690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:30:25,371-Speed 5520.19 samples/sec Loss 0.9299 LearningRate 0.0003 Epoch: 19 Global Step: 200700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:30:32,810-Speed 5506.34 samples/sec Loss 0.8991 LearningRate 0.0003 Epoch: 19 Global Step: 200710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:30:40,254-Speed 5503.38 samples/sec Loss 0.9170 LearningRate 0.0003 Epoch: 19 Global Step: 200720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:30:47,665-Speed 5527.71 samples/sec Loss 0.9260 LearningRate 0.0003 Epoch: 19 Global Step: 200730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:30:55,210-Speed 5429.38 samples/sec Loss 0.9055 LearningRate 0.0003 Epoch: 19 Global Step: 200740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:31:02,597-Speed 5546.00 samples/sec Loss 0.8996 LearningRate 0.0003 Epoch: 19 Global Step: 200750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:31:10,138-Speed 5432.15 samples/sec Loss 0.9200 LearningRate 0.0003 Epoch: 19 Global Step: 200760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:31:17,639-Speed 5461.65 samples/sec Loss 0.8991 LearningRate 0.0003 Epoch: 19 Global Step: 200770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:31:25,135-Speed 5465.34 samples/sec Loss 0.9140 LearningRate 0.0003 Epoch: 19 Global Step: 200780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:31:32,660-Speed 5443.88 samples/sec Loss 0.9340 LearningRate 0.0003 Epoch: 19 Global Step: 200790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:31:40,119-Speed 5491.95 samples/sec Loss 0.9262 LearningRate 0.0003 Epoch: 19 Global Step: 200800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:31:47,561-Speed 5504.22 samples/sec Loss 0.9006 LearningRate 0.0003 Epoch: 19 Global Step: 200810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-09 16:31:54,964-Speed 5534.29 samples/sec Loss 0.9104 LearningRate 0.0003 Epoch: 19 Global Step: 200820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:32:02,498-Speed 5436.90 samples/sec Loss 0.9030 LearningRate 0.0003 Epoch: 19 Global Step: 200830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-09 16:32:09,898-Speed 5536.36 samples/sec Loss 0.9057 LearningRate 0.0003 Epoch: 19 Global Step: 200840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:32:17,440-Speed 5431.55 samples/sec Loss 0.9248 LearningRate 0.0003 Epoch: 19 Global Step: 200850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:32:24,848-Speed 5529.90 samples/sec Loss 0.9236 LearningRate 0.0003 Epoch: 19 Global Step: 200860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:32:32,269-Speed 5520.44 samples/sec Loss 0.9158 LearningRate 0.0003 Epoch: 19 Global Step: 200870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:32:39,740-Speed 5485.06 samples/sec Loss 0.9073 LearningRate 0.0003 Epoch: 19 Global Step: 200880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:32:47,182-Speed 5503.73 samples/sec Loss 0.9217 LearningRate 0.0003 Epoch: 19 Global Step: 200890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:32:54,589-Speed 5530.91 samples/sec Loss 0.9066 LearningRate 0.0003 Epoch: 19 Global Step: 200900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:33:02,011-Speed 5520.22 samples/sec Loss 0.9234 LearningRate 0.0003 Epoch: 19 Global Step: 200910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:33:09,409-Speed 5537.53 samples/sec Loss 0.9254 LearningRate 0.0003 Epoch: 19 Global Step: 200920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:33:16,841-Speed 5511.33 samples/sec Loss 0.9094 LearningRate 0.0003 Epoch: 19 Global Step: 200930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:33:24,255-Speed 5525.36 samples/sec Loss 0.9303 LearningRate 0.0003 Epoch: 19 Global Step: 200940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:33:31,608-Speed 5571.47 samples/sec Loss 0.9155 LearningRate 0.0003 Epoch: 19 Global Step: 200950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:33:39,061-Speed 5496.82 samples/sec Loss 0.9139 LearningRate 0.0003 Epoch: 19 Global Step: 200960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:33:46,599-Speed 5434.44 samples/sec Loss 0.9262 LearningRate 0.0003 Epoch: 19 Global Step: 200970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:33:54,111-Speed 5452.98 samples/sec Loss 0.9081 LearningRate 0.0003 Epoch: 19 Global Step: 200980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:34:01,702-Speed 5396.75 samples/sec Loss 0.9147 LearningRate 0.0003 Epoch: 19 Global Step: 200990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:34:09,237-Speed 5437.18 samples/sec Loss 0.9238 LearningRate 0.0003 Epoch: 19 Global Step: 201000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:34:16,668-Speed 5512.45 samples/sec Loss 0.9097 LearningRate 0.0003 Epoch: 19 Global Step: 201010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:34:24,113-Speed 5502.18 samples/sec Loss 0.9043 LearningRate 0.0003 Epoch: 19 Global Step: 201020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:34:31,805-Speed 5326.20 samples/sec Loss 0.9287 LearningRate 0.0003 Epoch: 19 Global Step: 201030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:34:39,213-Speed 5530.10 samples/sec Loss 0.9047 LearningRate 0.0003 Epoch: 19 Global Step: 201040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:34:46,699-Speed 5472.29 samples/sec Loss 0.9148 LearningRate 0.0003 Epoch: 19 Global Step: 201050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:34:54,138-Speed 5506.59 samples/sec Loss 0.9301 LearningRate 0.0003 Epoch: 19 Global Step: 201060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:35:01,574-Speed 5508.97 samples/sec Loss 0.9134 LearningRate 0.0003 Epoch: 19 Global Step: 201070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:35:08,987-Speed 5526.34 samples/sec Loss 0.9088 LearningRate 0.0003 Epoch: 19 Global Step: 201080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:35:16,407-Speed 5520.87 samples/sec Loss 0.8951 LearningRate 0.0003 Epoch: 19 Global Step: 201090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:35:23,854-Speed 5501.33 samples/sec Loss 0.8981 LearningRate 0.0003 Epoch: 19 Global Step: 201100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:35:31,269-Speed 5524.68 samples/sec Loss 0.9124 LearningRate 0.0003 Epoch: 19 Global Step: 201110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:35:38,890-Speed 5375.09 samples/sec Loss 0.8982 LearningRate 0.0003 Epoch: 19 Global Step: 201120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:35:46,352-Speed 5490.34 samples/sec Loss 0.8800 LearningRate 0.0003 Epoch: 19 Global Step: 201130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:35:53,826-Speed 5480.84 samples/sec Loss 0.9298 LearningRate 0.0003 Epoch: 19 Global Step: 201140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:36:01,245-Speed 5522.05 samples/sec Loss 0.8974 LearningRate 0.0003 Epoch: 19 Global Step: 201150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:36:08,679-Speed 5510.38 samples/sec Loss 0.8980 LearningRate 0.0003 Epoch: 19 Global Step: 201160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:36:16,168-Speed 5469.70 samples/sec Loss 0.9221 LearningRate 0.0003 Epoch: 19 Global Step: 201170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:36:23,595-Speed 5516.00 samples/sec Loss 0.9273 LearningRate 0.0003 Epoch: 19 Global Step: 201180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:36:31,011-Speed 5524.17 samples/sec Loss 0.9106 LearningRate 0.0003 Epoch: 19 Global Step: 201190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:36:38,453-Speed 5504.29 samples/sec Loss 0.8898 LearningRate 0.0003 Epoch: 19 Global Step: 201200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:36:45,897-Speed 5502.98 samples/sec Loss 0.8934 LearningRate 0.0003 Epoch: 19 Global Step: 201210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:36:53,325-Speed 5515.11 samples/sec Loss 0.9267 LearningRate 0.0003 Epoch: 19 Global Step: 201220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:37:00,715-Speed 5543.19 samples/sec Loss 0.9210 LearningRate 0.0003 Epoch: 19 Global Step: 201230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:37:08,140-Speed 5517.28 samples/sec Loss 0.9161 LearningRate 0.0003 Epoch: 19 Global Step: 201240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:37:15,567-Speed 5515.68 samples/sec Loss 0.9105 LearningRate 0.0003 Epoch: 19 Global Step: 201250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:37:22,957-Speed 5544.01 samples/sec Loss 0.8953 LearningRate 0.0003 Epoch: 19 Global Step: 201260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:37:30,356-Speed 5536.32 samples/sec Loss 0.9135 LearningRate 0.0003 Epoch: 19 Global Step: 201270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:37:37,751-Speed 5539.37 samples/sec Loss 0.9057 LearningRate 0.0003 Epoch: 19 Global Step: 201280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:37:45,218-Speed 5486.75 samples/sec Loss 0.9126 LearningRate 0.0003 Epoch: 19 Global Step: 201290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:37:52,706-Speed 5470.80 samples/sec Loss 0.9053 LearningRate 0.0003 Epoch: 19 Global Step: 201300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:38:00,115-Speed 5528.78 samples/sec Loss 0.8854 LearningRate 0.0003 Epoch: 19 Global Step: 201310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:38:07,669-Speed 5423.21 samples/sec Loss 0.8976 LearningRate 0.0003 Epoch: 19 Global Step: 201320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:38:15,059-Speed 5543.81 samples/sec Loss 0.9017 LearningRate 0.0003 Epoch: 19 Global Step: 201330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:38:22,475-Speed 5523.99 samples/sec Loss 0.9135 LearningRate 0.0003 Epoch: 19 Global Step: 201340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:38:29,953-Speed 5477.72 samples/sec Loss 0.9099 LearningRate 0.0003 Epoch: 19 Global Step: 201350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:38:37,337-Speed 5547.93 samples/sec Loss 0.9211 LearningRate 0.0003 Epoch: 19 Global Step: 201360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:38:44,746-Speed 5529.36 samples/sec Loss 0.9261 LearningRate 0.0003 Epoch: 19 Global Step: 201370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:38:52,295-Speed 5426.53 samples/sec Loss 0.9121 LearningRate 0.0003 Epoch: 19 Global Step: 201380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:38:59,856-Speed 5417.64 samples/sec Loss 0.9007 LearningRate 0.0003 Epoch: 19 Global Step: 201390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:39:07,476-Speed 5376.70 samples/sec Loss 0.9187 LearningRate 0.0003 Epoch: 19 Global Step: 201400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:39:14,859-Speed 5548.89 samples/sec Loss 0.9021 LearningRate 0.0003 Epoch: 19 Global Step: 201410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:39:22,311-Speed 5497.38 samples/sec Loss 0.9063 LearningRate 0.0003 Epoch: 19 Global Step: 201420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:39:29,695-Speed 5547.01 samples/sec Loss 0.8990 LearningRate 0.0003 Epoch: 19 Global Step: 201430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:39:37,149-Speed 5496.18 samples/sec Loss 0.9209 LearningRate 0.0003 Epoch: 19 Global Step: 201440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:39:44,667-Speed 5448.70 samples/sec Loss 0.9054 LearningRate 0.0003 Epoch: 19 Global Step: 201450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:39:52,269-Speed 5389.18 samples/sec Loss 0.8958 LearningRate 0.0003 Epoch: 19 Global Step: 201460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:39:59,685-Speed 5524.12 samples/sec Loss 0.8994 LearningRate 0.0003 Epoch: 19 Global Step: 201470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:40:07,127-Speed 5504.93 samples/sec Loss 0.9141 LearningRate 0.0003 Epoch: 19 Global Step: 201480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:40:14,522-Speed 5539.39 samples/sec Loss 0.9104 LearningRate 0.0003 Epoch: 19 Global Step: 201490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:40:21,930-Speed 5529.66 samples/sec Loss 0.9220 LearningRate 0.0003 Epoch: 19 Global Step: 201500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:40:29,442-Speed 5453.69 samples/sec Loss 0.8927 LearningRate 0.0003 Epoch: 19 Global Step: 201510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:40:37,044-Speed 5388.58 samples/sec Loss 0.9163 LearningRate 0.0003 Epoch: 19 Global Step: 201520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:40:44,529-Speed 5473.36 samples/sec Loss 0.9127 LearningRate 0.0003 Epoch: 19 Global Step: 201530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:40:52,005-Speed 5479.56 samples/sec Loss 0.8859 LearningRate 0.0003 Epoch: 19 Global Step: 201540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:40:59,461-Speed 5494.03 samples/sec Loss 0.9109 LearningRate 0.0003 Epoch: 19 Global Step: 201550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:41:06,875-Speed 5525.08 samples/sec Loss 0.9207 LearningRate 0.0003 Epoch: 19 Global Step: 201560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:41:14,463-Speed 5399.35 samples/sec Loss 0.9054 LearningRate 0.0003 Epoch: 19 Global Step: 201570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:41:21,903-Speed 5506.18 samples/sec Loss 0.9212 LearningRate 0.0003 Epoch: 19 Global Step: 201580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:41:29,327-Speed 5517.82 samples/sec Loss 0.9098 LearningRate 0.0003 Epoch: 19 Global Step: 201590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:41:36,747-Speed 5520.43 samples/sec Loss 0.9396 LearningRate 0.0003 Epoch: 19 Global Step: 201600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:41:44,156-Speed 5530.08 samples/sec Loss 0.9127 LearningRate 0.0003 Epoch: 19 Global Step: 201610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:41:51,548-Speed 5542.13 samples/sec Loss 0.8931 LearningRate 0.0003 Epoch: 19 Global Step: 201620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:41:59,015-Speed 5485.67 samples/sec Loss 0.9013 LearningRate 0.0003 Epoch: 19 Global Step: 201630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:42:06,449-Speed 5510.24 samples/sec Loss 0.9051 LearningRate 0.0003 Epoch: 19 Global Step: 201640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:42:13,861-Speed 5527.76 samples/sec Loss 0.8747 LearningRate 0.0003 Epoch: 19 Global Step: 201650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:42:21,213-Speed 5572.18 samples/sec Loss 0.9182 LearningRate 0.0003 Epoch: 19 Global Step: 201660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:42:28,635-Speed 5519.23 samples/sec Loss 0.8719 LearningRate 0.0003 Epoch: 19 Global Step: 201670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:42:36,141-Speed 5457.14 samples/sec Loss 0.9129 LearningRate 0.0003 Epoch: 19 Global Step: 201680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:42:43,554-Speed 5526.82 samples/sec Loss 0.8923 LearningRate 0.0003 Epoch: 19 Global Step: 201690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:42:51,081-Speed 5441.98 samples/sec Loss 0.8864 LearningRate 0.0002 Epoch: 19 Global Step: 201700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:42:58,499-Speed 5522.88 samples/sec Loss 0.9199 LearningRate 0.0002 Epoch: 19 Global Step: 201710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:43:05,967-Speed 5485.07 samples/sec Loss 0.9034 LearningRate 0.0002 Epoch: 19 Global Step: 201720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:43:13,369-Speed 5534.21 samples/sec Loss 0.8845 LearningRate 0.0002 Epoch: 19 Global Step: 201730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:43:20,778-Speed 5529.14 samples/sec Loss 0.8990 LearningRate 0.0002 Epoch: 19 Global Step: 201740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:43:28,212-Speed 5511.03 samples/sec Loss 0.8993 LearningRate 0.0002 Epoch: 19 Global Step: 201750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:43:35,640-Speed 5515.11 samples/sec Loss 0.8875 LearningRate 0.0002 Epoch: 19 Global Step: 201760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:43:43,094-Speed 5495.01 samples/sec Loss 0.9146 LearningRate 0.0002 Epoch: 19 Global Step: 201770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:43:50,530-Speed 5509.92 samples/sec Loss 0.8958 LearningRate 0.0002 Epoch: 19 Global Step: 201780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:43:57,965-Speed 5509.87 samples/sec Loss 0.9046 LearningRate 0.0002 Epoch: 19 Global Step: 201790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:44:05,401-Speed 5508.82 samples/sec Loss 0.8971 LearningRate 0.0002 Epoch: 19 Global Step: 201800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:44:12,813-Speed 5526.67 samples/sec Loss 0.8863 LearningRate 0.0002 Epoch: 19 Global Step: 201810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:44:20,228-Speed 5525.09 samples/sec Loss 0.9103 LearningRate 0.0002 Epoch: 19 Global Step: 201820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:44:27,782-Speed 5422.96 samples/sec Loss 0.8901 LearningRate 0.0002 Epoch: 19 Global Step: 201830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:44:35,228-Speed 5501.39 samples/sec Loss 0.9152 LearningRate 0.0002 Epoch: 19 Global Step: 201840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:44:42,756-Speed 5441.87 samples/sec Loss 0.9163 LearningRate 0.0002 Epoch: 19 Global Step: 201850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:44:50,167-Speed 5527.79 samples/sec Loss 0.9153 LearningRate 0.0002 Epoch: 19 Global Step: 201860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:44:57,704-Speed 5434.99 samples/sec Loss 0.9120 LearningRate 0.0002 Epoch: 19 Global Step: 201870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:45:05,183-Speed 5477.53 samples/sec Loss 0.9054 LearningRate 0.0002 Epoch: 19 Global Step: 201880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:45:12,667-Speed 5473.95 samples/sec Loss 0.9076 LearningRate 0.0002 Epoch: 19 Global Step: 201890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:45:20,212-Speed 5429.66 samples/sec Loss 0.9031 LearningRate 0.0002 Epoch: 19 Global Step: 201900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:45:27,632-Speed 5520.82 samples/sec Loss 0.8888 LearningRate 0.0002 Epoch: 19 Global Step: 201910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:45:35,094-Speed 5490.05 samples/sec Loss 0.9078 LearningRate 0.0002 Epoch: 19 Global Step: 201920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:45:42,635-Speed 5431.72 samples/sec Loss 0.8986 LearningRate 0.0002 Epoch: 19 Global Step: 201930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:45:50,082-Speed 5501.06 samples/sec Loss 0.9211 LearningRate 0.0002 Epoch: 19 Global Step: 201940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:45:57,536-Speed 5496.12 samples/sec Loss 0.8990 LearningRate 0.0002 Epoch: 19 Global Step: 201950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:46:05,021-Speed 5472.70 samples/sec Loss 0.8986 LearningRate 0.0002 Epoch: 19 Global Step: 201960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:46:12,433-Speed 5526.94 samples/sec Loss 0.9017 LearningRate 0.0002 Epoch: 19 Global Step: 201970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:46:19,855-Speed 5519.96 samples/sec Loss 0.9062 LearningRate 0.0002 Epoch: 19 Global Step: 201980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:46:27,331-Speed 5479.09 samples/sec Loss 0.9079 LearningRate 0.0002 Epoch: 19 Global Step: 201990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:46:34,873-Speed 5431.67 samples/sec Loss 0.9042 LearningRate 0.0002 Epoch: 19 Global Step: 202000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:47:18,840-[lfw][202000]XNorm: 22.206904 Training: 2022-01-09 16:47:18,841-[lfw][202000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 16:47:18,841-[lfw][202000]Accuracy-Highest: 0.99850 Training: 2022-01-09 16:48:10,304-[cfp_fp][202000]XNorm: 22.083832 Training: 2022-01-09 16:48:10,305-[cfp_fp][202000]Accuracy-Flip: 0.99414+-0.00316 Training: 2022-01-09 16:48:10,305-[cfp_fp][202000]Accuracy-Highest: 0.99443 Training: 2022-01-09 16:48:54,221-[agedb_30][202000]XNorm: 23.016906 Training: 2022-01-09 16:48:54,222-[agedb_30][202000]Accuracy-Flip: 0.98650+-0.00545 Training: 2022-01-09 16:48:54,222-[agedb_30][202000]Accuracy-Highest: 0.98667 Training: 2022-01-09 16:49:01,733-Speed 278.91 samples/sec Loss 0.9049 LearningRate 0.0002 Epoch: 19 Global Step: 202010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:49:09,159-Speed 5516.67 samples/sec Loss 0.9177 LearningRate 0.0002 Epoch: 19 Global Step: 202020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:49:16,505-Speed 5576.36 samples/sec Loss 0.8849 LearningRate 0.0002 Epoch: 19 Global Step: 202030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:49:23,930-Speed 5517.79 samples/sec Loss 0.9119 LearningRate 0.0002 Epoch: 19 Global Step: 202040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:49:31,364-Speed 5510.51 samples/sec Loss 0.8963 LearningRate 0.0002 Epoch: 19 Global Step: 202050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:49:38,811-Speed 5500.64 samples/sec Loss 0.9061 LearningRate 0.0002 Epoch: 19 Global Step: 202060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:49:46,230-Speed 5521.17 samples/sec Loss 0.8965 LearningRate 0.0002 Epoch: 19 Global Step: 202070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:49:53,648-Speed 5522.96 samples/sec Loss 0.9039 LearningRate 0.0002 Epoch: 19 Global Step: 202080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:50:01,101-Speed 5496.71 samples/sec Loss 0.9098 LearningRate 0.0002 Epoch: 19 Global Step: 202090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:50:08,599-Speed 5463.10 samples/sec Loss 0.9102 LearningRate 0.0002 Epoch: 19 Global Step: 202100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:50:15,993-Speed 5540.32 samples/sec Loss 0.9030 LearningRate 0.0002 Epoch: 19 Global Step: 202110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:50:23,410-Speed 5523.42 samples/sec Loss 0.9090 LearningRate 0.0002 Epoch: 19 Global Step: 202120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:50:30,843-Speed 5511.60 samples/sec Loss 0.8934 LearningRate 0.0002 Epoch: 19 Global Step: 202130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:50:38,236-Speed 5540.83 samples/sec Loss 0.8964 LearningRate 0.0002 Epoch: 19 Global Step: 202140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:50:45,635-Speed 5536.84 samples/sec Loss 0.9146 LearningRate 0.0002 Epoch: 19 Global Step: 202150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:50:53,076-Speed 5504.95 samples/sec Loss 0.9142 LearningRate 0.0002 Epoch: 19 Global Step: 202160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:51:00,473-Speed 5538.61 samples/sec Loss 0.8926 LearningRate 0.0002 Epoch: 19 Global Step: 202170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:51:07,970-Speed 5464.10 samples/sec Loss 0.9141 LearningRate 0.0002 Epoch: 19 Global Step: 202180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:51:15,526-Speed 5422.04 samples/sec Loss 0.9199 LearningRate 0.0002 Epoch: 19 Global Step: 202190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:51:22,925-Speed 5536.30 samples/sec Loss 0.8716 LearningRate 0.0002 Epoch: 19 Global Step: 202200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:51:30,367-Speed 5504.59 samples/sec Loss 0.9106 LearningRate 0.0002 Epoch: 19 Global Step: 202210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:51:37,802-Speed 5510.32 samples/sec Loss 0.9126 LearningRate 0.0002 Epoch: 19 Global Step: 202220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:51:45,176-Speed 5555.01 samples/sec Loss 0.8959 LearningRate 0.0002 Epoch: 19 Global Step: 202230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:51:52,589-Speed 5526.21 samples/sec Loss 0.9003 LearningRate 0.0002 Epoch: 19 Global Step: 202240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:52:00,093-Speed 5459.44 samples/sec Loss 0.9119 LearningRate 0.0002 Epoch: 19 Global Step: 202250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:52:07,678-Speed 5401.08 samples/sec Loss 0.8955 LearningRate 0.0002 Epoch: 19 Global Step: 202260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:52:15,259-Speed 5403.29 samples/sec Loss 0.8878 LearningRate 0.0002 Epoch: 19 Global Step: 202270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:52:22,834-Speed 5407.75 samples/sec Loss 0.8871 LearningRate 0.0002 Epoch: 19 Global Step: 202280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:52:30,476-Speed 5360.83 samples/sec Loss 0.8940 LearningRate 0.0002 Epoch: 19 Global Step: 202290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:52:38,017-Speed 5432.50 samples/sec Loss 0.8934 LearningRate 0.0002 Epoch: 19 Global Step: 202300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:52:45,552-Speed 5436.79 samples/sec Loss 0.8864 LearningRate 0.0002 Epoch: 19 Global Step: 202310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:52:53,027-Speed 5480.15 samples/sec Loss 0.9013 LearningRate 0.0002 Epoch: 19 Global Step: 202320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:53:00,540-Speed 5453.30 samples/sec Loss 0.8878 LearningRate 0.0002 Epoch: 19 Global Step: 202330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:53:07,951-Speed 5527.30 samples/sec Loss 0.9065 LearningRate 0.0002 Epoch: 19 Global Step: 202340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:53:15,440-Speed 5470.36 samples/sec Loss 0.9071 LearningRate 0.0002 Epoch: 19 Global Step: 202350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:53:23,005-Speed 5414.99 samples/sec Loss 0.9082 LearningRate 0.0002 Epoch: 19 Global Step: 202360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:53:30,453-Speed 5499.97 samples/sec Loss 0.8912 LearningRate 0.0002 Epoch: 19 Global Step: 202370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:53:37,891-Speed 5507.60 samples/sec Loss 0.9023 LearningRate 0.0002 Epoch: 19 Global Step: 202380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:53:45,450-Speed 5419.45 samples/sec Loss 0.8863 LearningRate 0.0002 Epoch: 19 Global Step: 202390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:53:52,904-Speed 5496.13 samples/sec Loss 0.8953 LearningRate 0.0002 Epoch: 19 Global Step: 202400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:54:00,387-Speed 5474.15 samples/sec Loss 0.8933 LearningRate 0.0002 Epoch: 19 Global Step: 202410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:54:07,940-Speed 5424.27 samples/sec Loss 0.8926 LearningRate 0.0002 Epoch: 19 Global Step: 202420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:54:15,434-Speed 5465.88 samples/sec Loss 0.8914 LearningRate 0.0002 Epoch: 19 Global Step: 202430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:54:22,908-Speed 5481.38 samples/sec Loss 0.8977 LearningRate 0.0002 Epoch: 19 Global Step: 202440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:54:30,498-Speed 5396.99 samples/sec Loss 0.8745 LearningRate 0.0002 Epoch: 19 Global Step: 202450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:54:37,942-Speed 5503.57 samples/sec Loss 0.9100 LearningRate 0.0002 Epoch: 19 Global Step: 202460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:54:45,321-Speed 5551.34 samples/sec Loss 0.8889 LearningRate 0.0002 Epoch: 19 Global Step: 202470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:54:52,754-Speed 5511.06 samples/sec Loss 0.8929 LearningRate 0.0002 Epoch: 19 Global Step: 202480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:55:00,321-Speed 5413.97 samples/sec Loss 0.9052 LearningRate 0.0002 Epoch: 19 Global Step: 202490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:55:07,775-Speed 5496.32 samples/sec Loss 0.9008 LearningRate 0.0002 Epoch: 19 Global Step: 202500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:55:15,231-Speed 5493.98 samples/sec Loss 0.9021 LearningRate 0.0002 Epoch: 19 Global Step: 202510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:55:22,717-Speed 5472.41 samples/sec Loss 0.8921 LearningRate 0.0002 Epoch: 19 Global Step: 202520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:55:30,112-Speed 5540.00 samples/sec Loss 0.8821 LearningRate 0.0002 Epoch: 19 Global Step: 202530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:55:37,615-Speed 5460.11 samples/sec Loss 0.8993 LearningRate 0.0002 Epoch: 19 Global Step: 202540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:55:45,230-Speed 5379.48 samples/sec Loss 0.9106 LearningRate 0.0002 Epoch: 19 Global Step: 202550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:55:52,701-Speed 5482.75 samples/sec Loss 0.8749 LearningRate 0.0002 Epoch: 19 Global Step: 202560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:56:00,152-Speed 5497.97 samples/sec Loss 0.9219 LearningRate 0.0002 Epoch: 19 Global Step: 202570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:56:07,526-Speed 5555.69 samples/sec Loss 0.8875 LearningRate 0.0002 Epoch: 19 Global Step: 202580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:56:14,967-Speed 5505.86 samples/sec Loss 0.8842 LearningRate 0.0002 Epoch: 19 Global Step: 202590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:56:22,396-Speed 5514.06 samples/sec Loss 0.9100 LearningRate 0.0002 Epoch: 19 Global Step: 202600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:56:29,848-Speed 5497.00 samples/sec Loss 0.8823 LearningRate 0.0002 Epoch: 19 Global Step: 202610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:56:37,278-Speed 5513.92 samples/sec Loss 0.8958 LearningRate 0.0002 Epoch: 19 Global Step: 202620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 16:56:44,686-Speed 5529.70 samples/sec Loss 0.8882 LearningRate 0.0002 Epoch: 19 Global Step: 202630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:56:52,185-Speed 5463.05 samples/sec Loss 0.9005 LearningRate 0.0002 Epoch: 19 Global Step: 202640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:56:59,641-Speed 5494.36 samples/sec Loss 0.9154 LearningRate 0.0002 Epoch: 19 Global Step: 202650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:57:07,091-Speed 5498.38 samples/sec Loss 0.9056 LearningRate 0.0002 Epoch: 19 Global Step: 202660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:57:14,524-Speed 5511.82 samples/sec Loss 0.8791 LearningRate 0.0002 Epoch: 19 Global Step: 202670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:57:22,004-Speed 5476.28 samples/sec Loss 0.8767 LearningRate 0.0002 Epoch: 19 Global Step: 202680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:57:29,455-Speed 5498.17 samples/sec Loss 0.8890 LearningRate 0.0002 Epoch: 19 Global Step: 202690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:57:36,904-Speed 5498.95 samples/sec Loss 0.9097 LearningRate 0.0002 Epoch: 19 Global Step: 202700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:57:44,335-Speed 5512.80 samples/sec Loss 0.8983 LearningRate 0.0002 Epoch: 19 Global Step: 202710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:57:51,779-Speed 5503.46 samples/sec Loss 0.8848 LearningRate 0.0002 Epoch: 19 Global Step: 202720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:57:59,211-Speed 5511.77 samples/sec Loss 0.8799 LearningRate 0.0002 Epoch: 19 Global Step: 202730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:58:06,583-Speed 5557.15 samples/sec Loss 0.9005 LearningRate 0.0002 Epoch: 19 Global Step: 202740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:58:14,059-Speed 5480.16 samples/sec Loss 0.8926 LearningRate 0.0002 Epoch: 19 Global Step: 202750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:58:21,671-Speed 5381.37 samples/sec Loss 0.8974 LearningRate 0.0002 Epoch: 19 Global Step: 202760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:58:29,201-Speed 5440.38 samples/sec Loss 0.8852 LearningRate 0.0002 Epoch: 19 Global Step: 202770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:58:36,662-Speed 5490.65 samples/sec Loss 0.9158 LearningRate 0.0002 Epoch: 19 Global Step: 202780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:58:44,050-Speed 5545.14 samples/sec Loss 0.8871 LearningRate 0.0002 Epoch: 19 Global Step: 202790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:58:51,471-Speed 5520.09 samples/sec Loss 0.8974 LearningRate 0.0002 Epoch: 19 Global Step: 202800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:58:58,869-Speed 5536.81 samples/sec Loss 0.8922 LearningRate 0.0002 Epoch: 19 Global Step: 202810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:59:06,283-Speed 5525.52 samples/sec Loss 0.8884 LearningRate 0.0002 Epoch: 19 Global Step: 202820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:59:13,666-Speed 5549.15 samples/sec Loss 0.8842 LearningRate 0.0002 Epoch: 19 Global Step: 202830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:59:21,096-Speed 5513.83 samples/sec Loss 0.9004 LearningRate 0.0002 Epoch: 19 Global Step: 202840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:59:28,505-Speed 5528.74 samples/sec Loss 0.8940 LearningRate 0.0002 Epoch: 19 Global Step: 202850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:59:35,915-Speed 5528.66 samples/sec Loss 0.8778 LearningRate 0.0002 Epoch: 19 Global Step: 202860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 16:59:43,323-Speed 5530.03 samples/sec Loss 0.8954 LearningRate 0.0002 Epoch: 19 Global Step: 202870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:59:50,728-Speed 5531.80 samples/sec Loss 0.8971 LearningRate 0.0002 Epoch: 19 Global Step: 202880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 16:59:58,145-Speed 5523.46 samples/sec Loss 0.8997 LearningRate 0.0002 Epoch: 19 Global Step: 202890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:00:05,568-Speed 5518.35 samples/sec Loss 0.8947 LearningRate 0.0002 Epoch: 19 Global Step: 202900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:00:13,042-Speed 5481.67 samples/sec Loss 0.9007 LearningRate 0.0002 Epoch: 19 Global Step: 202910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:00:20,448-Speed 5531.26 samples/sec Loss 0.9022 LearningRate 0.0002 Epoch: 19 Global Step: 202920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:00:27,951-Speed 5459.28 samples/sec Loss 0.8779 LearningRate 0.0002 Epoch: 19 Global Step: 202930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:00:35,432-Speed 5476.43 samples/sec Loss 0.9000 LearningRate 0.0002 Epoch: 19 Global Step: 202940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:00:42,818-Speed 5545.71 samples/sec Loss 0.8900 LearningRate 0.0002 Epoch: 19 Global Step: 202950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:00:50,223-Speed 5532.20 samples/sec Loss 0.8772 LearningRate 0.0002 Epoch: 19 Global Step: 202960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:00:57,696-Speed 5482.45 samples/sec Loss 0.8923 LearningRate 0.0002 Epoch: 19 Global Step: 202970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:01:05,144-Speed 5499.64 samples/sec Loss 0.8794 LearningRate 0.0001 Epoch: 19 Global Step: 202980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:01:12,539-Speed 5540.09 samples/sec Loss 0.9116 LearningRate 0.0001 Epoch: 19 Global Step: 202990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:01:19,973-Speed 5510.85 samples/sec Loss 0.9037 LearningRate 0.0001 Epoch: 19 Global Step: 203000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:01:27,482-Speed 5455.02 samples/sec Loss 0.9107 LearningRate 0.0001 Epoch: 19 Global Step: 203010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:01:34,886-Speed 5532.57 samples/sec Loss 0.8879 LearningRate 0.0001 Epoch: 19 Global Step: 203020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:01:42,371-Speed 5473.52 samples/sec Loss 0.8924 LearningRate 0.0001 Epoch: 19 Global Step: 203030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:01:49,836-Speed 5487.97 samples/sec Loss 0.9057 LearningRate 0.0001 Epoch: 19 Global Step: 203040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:01:57,337-Speed 5461.34 samples/sec Loss 0.8850 LearningRate 0.0001 Epoch: 19 Global Step: 203050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:02:04,888-Speed 5424.62 samples/sec Loss 0.8992 LearningRate 0.0001 Epoch: 19 Global Step: 203060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:02:12,361-Speed 5482.16 samples/sec Loss 0.8959 LearningRate 0.0001 Epoch: 19 Global Step: 203070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:02:19,785-Speed 5518.04 samples/sec Loss 0.8807 LearningRate 0.0001 Epoch: 19 Global Step: 203080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:02:27,349-Speed 5416.08 samples/sec Loss 0.9052 LearningRate 0.0001 Epoch: 19 Global Step: 203090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:02:34,788-Speed 5506.53 samples/sec Loss 0.8939 LearningRate 0.0001 Epoch: 19 Global Step: 203100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:02:42,261-Speed 5481.84 samples/sec Loss 0.8883 LearningRate 0.0001 Epoch: 19 Global Step: 203110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:02:49,772-Speed 5454.11 samples/sec Loss 0.9016 LearningRate 0.0001 Epoch: 19 Global Step: 203120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:02:57,288-Speed 5450.44 samples/sec Loss 0.8749 LearningRate 0.0001 Epoch: 19 Global Step: 203130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:03:04,742-Speed 5495.58 samples/sec Loss 0.8985 LearningRate 0.0001 Epoch: 19 Global Step: 203140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:03:12,207-Speed 5488.29 samples/sec Loss 0.8773 LearningRate 0.0001 Epoch: 19 Global Step: 203150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:03:19,706-Speed 5462.99 samples/sec Loss 0.8911 LearningRate 0.0001 Epoch: 19 Global Step: 203160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:03:27,099-Speed 5540.91 samples/sec Loss 0.9058 LearningRate 0.0001 Epoch: 19 Global Step: 203170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:03:34,639-Speed 5433.15 samples/sec Loss 0.8898 LearningRate 0.0001 Epoch: 19 Global Step: 203180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:03:42,118-Speed 5477.31 samples/sec Loss 0.8804 LearningRate 0.0001 Epoch: 19 Global Step: 203190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:03:49,525-Speed 5530.86 samples/sec Loss 0.8778 LearningRate 0.0001 Epoch: 19 Global Step: 203200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:03:57,061-Speed 5436.18 samples/sec Loss 0.8922 LearningRate 0.0001 Epoch: 19 Global Step: 203210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:04:04,538-Speed 5478.48 samples/sec Loss 0.8911 LearningRate 0.0001 Epoch: 19 Global Step: 203220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:04:11,982-Speed 5503.18 samples/sec Loss 0.8962 LearningRate 0.0001 Epoch: 19 Global Step: 203230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:04:19,468-Speed 5472.44 samples/sec Loss 0.8961 LearningRate 0.0001 Epoch: 19 Global Step: 203240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:04:26,917-Speed 5499.57 samples/sec Loss 0.8909 LearningRate 0.0001 Epoch: 19 Global Step: 203250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:04:34,390-Speed 5481.46 samples/sec Loss 0.8945 LearningRate 0.0001 Epoch: 19 Global Step: 203260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:04:41,824-Speed 5510.32 samples/sec Loss 0.9078 LearningRate 0.0001 Epoch: 19 Global Step: 203270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:04:49,313-Speed 5470.38 samples/sec Loss 0.9100 LearningRate 0.0001 Epoch: 19 Global Step: 203280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:04:56,766-Speed 5497.12 samples/sec Loss 0.8909 LearningRate 0.0001 Epoch: 19 Global Step: 203290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:05:04,198-Speed 5511.75 samples/sec Loss 0.8863 LearningRate 0.0001 Epoch: 19 Global Step: 203300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:05:11,760-Speed 5416.87 samples/sec Loss 0.8948 LearningRate 0.0001 Epoch: 19 Global Step: 203310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:05:19,264-Speed 5459.36 samples/sec Loss 0.8942 LearningRate 0.0001 Epoch: 19 Global Step: 203320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:05:26,701-Speed 5508.44 samples/sec Loss 0.8836 LearningRate 0.0001 Epoch: 19 Global Step: 203330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:05:34,140-Speed 5506.79 samples/sec Loss 0.8892 LearningRate 0.0001 Epoch: 19 Global Step: 203340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:05:41,546-Speed 5531.75 samples/sec Loss 0.9046 LearningRate 0.0001 Epoch: 19 Global Step: 203350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:05:48,937-Speed 5542.18 samples/sec Loss 0.8933 LearningRate 0.0001 Epoch: 19 Global Step: 203360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:05:56,383-Speed 5501.97 samples/sec Loss 0.8817 LearningRate 0.0001 Epoch: 19 Global Step: 203370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:06:03,834-Speed 5497.99 samples/sec Loss 0.8851 LearningRate 0.0001 Epoch: 19 Global Step: 203380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:06:11,251-Speed 5522.77 samples/sec Loss 0.9270 LearningRate 0.0001 Epoch: 19 Global Step: 203390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:06:18,672-Speed 5520.22 samples/sec Loss 0.8762 LearningRate 0.0001 Epoch: 19 Global Step: 203400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:06:26,175-Speed 5460.34 samples/sec Loss 0.8844 LearningRate 0.0001 Epoch: 19 Global Step: 203410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:06:33,699-Speed 5444.74 samples/sec Loss 0.8965 LearningRate 0.0001 Epoch: 19 Global Step: 203420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:06:41,113-Speed 5525.12 samples/sec Loss 0.8711 LearningRate 0.0001 Epoch: 19 Global Step: 203430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:06:48,647-Speed 5437.84 samples/sec Loss 0.9013 LearningRate 0.0001 Epoch: 19 Global Step: 203440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:06:56,036-Speed 5543.56 samples/sec Loss 0.8942 LearningRate 0.0001 Epoch: 19 Global Step: 203450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:07:03,418-Speed 5550.16 samples/sec Loss 0.8803 LearningRate 0.0001 Epoch: 19 Global Step: 203460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:07:10,845-Speed 5515.49 samples/sec Loss 0.8791 LearningRate 0.0001 Epoch: 19 Global Step: 203470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:07:18,269-Speed 5517.34 samples/sec Loss 0.8866 LearningRate 0.0001 Epoch: 19 Global Step: 203480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:07:25,724-Speed 5495.27 samples/sec Loss 0.8895 LearningRate 0.0001 Epoch: 19 Global Step: 203490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:07:33,158-Speed 5510.63 samples/sec Loss 0.8914 LearningRate 0.0001 Epoch: 19 Global Step: 203500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:07:40,515-Speed 5568.67 samples/sec Loss 0.9066 LearningRate 0.0001 Epoch: 19 Global Step: 203510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:07:47,953-Speed 5507.36 samples/sec Loss 0.8851 LearningRate 0.0001 Epoch: 19 Global Step: 203520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:07:55,485-Speed 5438.55 samples/sec Loss 0.8701 LearningRate 0.0001 Epoch: 19 Global Step: 203530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:08:03,111-Speed 5372.06 samples/sec Loss 0.9103 LearningRate 0.0001 Epoch: 19 Global Step: 203540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:08:10,616-Speed 5458.86 samples/sec Loss 0.8814 LearningRate 0.0001 Epoch: 19 Global Step: 203550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:08:17,989-Speed 5555.45 samples/sec Loss 0.8642 LearningRate 0.0001 Epoch: 19 Global Step: 203560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:08:25,349-Speed 5566.52 samples/sec Loss 0.8812 LearningRate 0.0001 Epoch: 19 Global Step: 203570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:08:32,790-Speed 5505.55 samples/sec Loss 0.8753 LearningRate 0.0001 Epoch: 19 Global Step: 203580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:08:40,210-Speed 5520.68 samples/sec Loss 0.8789 LearningRate 0.0001 Epoch: 19 Global Step: 203590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:08:47,693-Speed 5474.02 samples/sec Loss 0.8849 LearningRate 0.0001 Epoch: 19 Global Step: 203600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:08:55,079-Speed 5547.03 samples/sec Loss 0.8757 LearningRate 0.0001 Epoch: 19 Global Step: 203610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:09:02,502-Speed 5518.22 samples/sec Loss 0.8763 LearningRate 0.0001 Epoch: 19 Global Step: 203620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:09:09,962-Speed 5491.37 samples/sec Loss 0.8928 LearningRate 0.0001 Epoch: 19 Global Step: 203630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:09:17,389-Speed 5516.00 samples/sec Loss 0.9077 LearningRate 0.0001 Epoch: 19 Global Step: 203640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:09:24,828-Speed 5506.93 samples/sec Loss 0.8903 LearningRate 0.0001 Epoch: 19 Global Step: 203650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:09:32,189-Speed 5564.99 samples/sec Loss 0.8745 LearningRate 0.0001 Epoch: 19 Global Step: 203660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:09:39,555-Speed 5561.66 samples/sec Loss 0.8954 LearningRate 0.0001 Epoch: 19 Global Step: 203670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:09:46,941-Speed 5546.46 samples/sec Loss 0.8806 LearningRate 0.0001 Epoch: 19 Global Step: 203680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:09:54,366-Speed 5516.94 samples/sec Loss 0.8965 LearningRate 0.0001 Epoch: 19 Global Step: 203690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:10:01,753-Speed 5545.72 samples/sec Loss 0.8806 LearningRate 0.0001 Epoch: 19 Global Step: 203700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:10:09,278-Speed 5443.83 samples/sec Loss 0.8887 LearningRate 0.0001 Epoch: 19 Global Step: 203710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:10:16,733-Speed 5495.50 samples/sec Loss 0.8941 LearningRate 0.0001 Epoch: 19 Global Step: 203720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:10:24,126-Speed 5541.34 samples/sec Loss 0.8974 LearningRate 0.0001 Epoch: 19 Global Step: 203730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:10:31,568-Speed 5504.59 samples/sec Loss 0.8954 LearningRate 0.0001 Epoch: 19 Global Step: 203740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:10:38,967-Speed 5536.84 samples/sec Loss 0.8879 LearningRate 0.0001 Epoch: 19 Global Step: 203750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:10:46,379-Speed 5527.22 samples/sec Loss 0.8890 LearningRate 0.0001 Epoch: 19 Global Step: 203760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:10:53,784-Speed 5532.26 samples/sec Loss 0.8938 LearningRate 0.0001 Epoch: 19 Global Step: 203770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:11:01,211-Speed 5515.16 samples/sec Loss 0.8905 LearningRate 0.0001 Epoch: 19 Global Step: 203780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:11:08,615-Speed 5532.86 samples/sec Loss 0.8970 LearningRate 0.0001 Epoch: 19 Global Step: 203790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:11:15,991-Speed 5554.91 samples/sec Loss 0.8819 LearningRate 0.0001 Epoch: 19 Global Step: 203800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:11:23,454-Speed 5488.83 samples/sec Loss 0.8842 LearningRate 0.0001 Epoch: 19 Global Step: 203810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:11:31,162-Speed 5314.59 samples/sec Loss 0.8841 LearningRate 0.0001 Epoch: 19 Global Step: 203820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:11:38,620-Speed 5492.41 samples/sec Loss 0.8829 LearningRate 0.0001 Epoch: 19 Global Step: 203830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:11:46,030-Speed 5529.32 samples/sec Loss 0.8627 LearningRate 0.0001 Epoch: 19 Global Step: 203840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:11:53,484-Speed 5495.19 samples/sec Loss 0.9094 LearningRate 0.0001 Epoch: 19 Global Step: 203850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:12:00,892-Speed 5530.57 samples/sec Loss 0.8778 LearningRate 0.0001 Epoch: 19 Global Step: 203860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:12:08,283-Speed 5542.19 samples/sec Loss 0.8858 LearningRate 0.0001 Epoch: 19 Global Step: 203870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:12:15,709-Speed 5516.82 samples/sec Loss 0.8874 LearningRate 0.0001 Epoch: 19 Global Step: 203880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:12:23,178-Speed 5484.57 samples/sec Loss 0.8923 LearningRate 0.0001 Epoch: 19 Global Step: 203890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:12:30,734-Speed 5421.41 samples/sec Loss 0.8811 LearningRate 0.0001 Epoch: 19 Global Step: 203900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:12:38,251-Speed 5450.29 samples/sec Loss 0.8777 LearningRate 0.0001 Epoch: 19 Global Step: 203910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:12:45,674-Speed 5518.62 samples/sec Loss 0.8867 LearningRate 0.0001 Epoch: 19 Global Step: 203920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:12:53,083-Speed 5528.76 samples/sec Loss 0.8760 LearningRate 0.0001 Epoch: 19 Global Step: 203930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:13:00,583-Speed 5462.72 samples/sec Loss 0.8784 LearningRate 0.0001 Epoch: 19 Global Step: 203940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:13:08,030-Speed 5500.54 samples/sec Loss 0.8942 LearningRate 0.0001 Epoch: 19 Global Step: 203950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:13:15,472-Speed 5504.22 samples/sec Loss 0.8676 LearningRate 0.0001 Epoch: 19 Global Step: 203960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:13:22,901-Speed 5514.81 samples/sec Loss 0.8745 LearningRate 0.0001 Epoch: 19 Global Step: 203970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:13:30,473-Speed 5410.26 samples/sec Loss 0.8820 LearningRate 0.0001 Epoch: 19 Global Step: 203980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:13:37,922-Speed 5499.30 samples/sec Loss 0.9134 LearningRate 0.0001 Epoch: 19 Global Step: 203990 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-01-09 17:13:45,333-Speed 5527.43 samples/sec Loss 0.8809 LearningRate 0.0001 Epoch: 19 Global Step: 204000 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-01-09 17:14:29,924-[lfw][204000]XNorm: 22.147161 Training: 2022-01-09 17:14:29,925-[lfw][204000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 17:14:29,926-[lfw][204000]Accuracy-Highest: 0.99850 Training: 2022-01-09 17:15:21,191-[cfp_fp][204000]XNorm: 22.013811 Training: 2022-01-09 17:15:21,192-[cfp_fp][204000]Accuracy-Flip: 0.99400+-0.00343 Training: 2022-01-09 17:15:21,193-[cfp_fp][204000]Accuracy-Highest: 0.99443 Training: 2022-01-09 17:16:05,085-[agedb_30][204000]XNorm: 22.905120 Training: 2022-01-09 17:16:05,085-[agedb_30][204000]Accuracy-Flip: 0.98683+-0.00529 Training: 2022-01-09 17:16:05,086-[agedb_30][204000]Accuracy-Highest: 0.98683 Training: 2022-01-09 17:16:12,689-Speed 277.97 samples/sec Loss 0.8988 LearningRate 0.0001 Epoch: 19 Global Step: 204010 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-01-09 17:16:20,190-Speed 5461.43 samples/sec Loss 0.8793 LearningRate 0.0001 Epoch: 19 Global Step: 204020 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-01-09 17:16:27,631-Speed 5504.90 samples/sec Loss 0.8764 LearningRate 0.0001 Epoch: 19 Global Step: 204030 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-01-09 17:16:35,131-Speed 5462.16 samples/sec Loss 0.8731 LearningRate 0.0001 Epoch: 19 Global Step: 204040 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-01-09 17:16:42,621-Speed 5469.62 samples/sec Loss 0.8792 LearningRate 0.0001 Epoch: 19 Global Step: 204050 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-01-09 17:16:50,097-Speed 5479.02 samples/sec Loss 0.8592 LearningRate 0.0001 Epoch: 19 Global Step: 204060 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-01-09 17:16:57,511-Speed 5526.02 samples/sec Loss 0.8968 LearningRate 0.0001 Epoch: 19 Global Step: 204070 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-01-09 17:17:05,034-Speed 5445.38 samples/sec Loss 0.8976 LearningRate 0.0001 Epoch: 19 Global Step: 204080 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-01-09 17:17:12,574-Speed 5433.26 samples/sec Loss 0.8697 LearningRate 0.0001 Epoch: 19 Global Step: 204090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:17:19,980-Speed 5530.92 samples/sec Loss 0.8838 LearningRate 0.0001 Epoch: 19 Global Step: 204100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:17:27,419-Speed 5507.39 samples/sec Loss 0.8800 LearningRate 0.0001 Epoch: 19 Global Step: 204110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:17:34,813-Speed 5539.96 samples/sec Loss 0.8955 LearningRate 0.0001 Epoch: 19 Global Step: 204120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:17:42,345-Speed 5439.20 samples/sec Loss 0.8895 LearningRate 0.0001 Epoch: 19 Global Step: 204130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:17:49,936-Speed 5396.21 samples/sec Loss 0.9062 LearningRate 0.0001 Epoch: 19 Global Step: 204140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:17:57,398-Speed 5490.52 samples/sec Loss 0.8984 LearningRate 0.0001 Epoch: 19 Global Step: 204150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:18:05,004-Speed 5385.54 samples/sec Loss 0.8869 LearningRate 0.0001 Epoch: 19 Global Step: 204160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:18:12,431-Speed 5515.58 samples/sec Loss 0.9113 LearningRate 0.0001 Epoch: 19 Global Step: 204170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:18:19,868-Speed 5508.43 samples/sec Loss 0.8836 LearningRate 0.0001 Epoch: 19 Global Step: 204180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:18:27,282-Speed 5525.29 samples/sec Loss 0.8772 LearningRate 0.0001 Epoch: 19 Global Step: 204190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:18:34,763-Speed 5476.13 samples/sec Loss 0.8826 LearningRate 0.0001 Epoch: 19 Global Step: 204200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:18:42,262-Speed 5462.84 samples/sec Loss 0.8780 LearningRate 0.0001 Epoch: 19 Global Step: 204210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:18:49,690-Speed 5515.07 samples/sec Loss 0.8842 LearningRate 0.0001 Epoch: 19 Global Step: 204220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:18:57,334-Speed 5359.11 samples/sec Loss 0.8800 LearningRate 0.0001 Epoch: 19 Global Step: 204230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:19:05,003-Speed 5342.19 samples/sec Loss 0.8859 LearningRate 0.0001 Epoch: 19 Global Step: 204240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:19:12,630-Speed 5370.93 samples/sec Loss 0.8778 LearningRate 0.0001 Epoch: 19 Global Step: 204250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:19:20,166-Speed 5435.70 samples/sec Loss 0.9080 LearningRate 0.0001 Epoch: 19 Global Step: 204260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:19:27,689-Speed 5445.25 samples/sec Loss 0.8810 LearningRate 0.0001 Epoch: 19 Global Step: 204270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:19:35,098-Speed 5529.77 samples/sec Loss 0.8844 LearningRate 0.0001 Epoch: 19 Global Step: 204280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:19:42,645-Speed 5428.00 samples/sec Loss 0.8825 LearningRate 0.0001 Epoch: 19 Global Step: 204290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:19:50,187-Speed 5431.17 samples/sec Loss 0.8736 LearningRate 0.0001 Epoch: 19 Global Step: 204300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:19:57,766-Speed 5405.24 samples/sec Loss 0.8839 LearningRate 0.0001 Epoch: 19 Global Step: 204310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:20:05,244-Speed 5478.82 samples/sec Loss 0.8948 LearningRate 0.0001 Epoch: 19 Global Step: 204320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:20:12,768-Speed 5444.15 samples/sec Loss 0.8945 LearningRate 0.0001 Epoch: 19 Global Step: 204330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:20:20,334-Speed 5414.13 samples/sec Loss 0.8594 LearningRate 0.0001 Epoch: 19 Global Step: 204340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:20:27,791-Speed 5494.04 samples/sec Loss 0.8775 LearningRate 0.0001 Epoch: 19 Global Step: 204350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:20:35,280-Speed 5470.33 samples/sec Loss 0.8870 LearningRate 0.0001 Epoch: 19 Global Step: 204360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:20:42,746-Speed 5487.16 samples/sec Loss 0.8786 LearningRate 0.0001 Epoch: 19 Global Step: 204370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:20:50,306-Speed 5418.15 samples/sec Loss 0.8779 LearningRate 0.0001 Epoch: 19 Global Step: 204380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:20:57,946-Speed 5362.47 samples/sec Loss 0.8733 LearningRate 0.0001 Epoch: 19 Global Step: 204390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:21:05,485-Speed 5433.35 samples/sec Loss 0.8783 LearningRate 0.0001 Epoch: 19 Global Step: 204400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:21:13,037-Speed 5424.74 samples/sec Loss 0.8712 LearningRate 0.0001 Epoch: 19 Global Step: 204410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:21:20,424-Speed 5545.55 samples/sec Loss 0.8867 LearningRate 0.0001 Epoch: 19 Global Step: 204420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:21:27,845-Speed 5520.35 samples/sec Loss 0.8768 LearningRate 0.0001 Epoch: 19 Global Step: 204430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:21:35,322-Speed 5479.11 samples/sec Loss 0.8988 LearningRate 0.0001 Epoch: 19 Global Step: 204440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:21:42,742-Speed 5521.02 samples/sec Loss 0.8720 LearningRate 0.0001 Epoch: 19 Global Step: 204450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:21:50,142-Speed 5536.05 samples/sec Loss 0.8650 LearningRate 0.0001 Epoch: 19 Global Step: 204460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:21:57,550-Speed 5529.58 samples/sec Loss 0.8710 LearningRate 0.0001 Epoch: 19 Global Step: 204470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:22:04,997-Speed 5500.64 samples/sec Loss 0.8772 LearningRate 0.0001 Epoch: 19 Global Step: 204480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:22:12,370-Speed 5556.69 samples/sec Loss 0.8852 LearningRate 0.0001 Epoch: 19 Global Step: 204490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:22:19,793-Speed 5518.43 samples/sec Loss 0.8725 LearningRate 0.0001 Epoch: 19 Global Step: 204500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:22:27,174-Speed 5550.41 samples/sec Loss 0.8834 LearningRate 0.0001 Epoch: 19 Global Step: 204510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:22:34,566-Speed 5542.05 samples/sec Loss 0.8712 LearningRate 0.0001 Epoch: 19 Global Step: 204520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:22:41,953-Speed 5545.61 samples/sec Loss 0.8730 LearningRate 0.0001 Epoch: 19 Global Step: 204530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:22:49,382-Speed 5514.56 samples/sec Loss 0.8645 LearningRate 0.0001 Epoch: 19 Global Step: 204540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:22:56,756-Speed 5554.69 samples/sec Loss 0.8924 LearningRate 0.0001 Epoch: 19 Global Step: 204550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:23:04,211-Speed 5495.53 samples/sec Loss 0.8920 LearningRate 0.0001 Epoch: 19 Global Step: 204560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:23:11,660-Speed 5499.73 samples/sec Loss 0.8782 LearningRate 0.0001 Epoch: 19 Global Step: 204570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:23:19,049-Speed 5544.42 samples/sec Loss 0.8830 LearningRate 0.0001 Epoch: 19 Global Step: 204580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:23:26,428-Speed 5551.09 samples/sec Loss 0.8993 LearningRate 0.0001 Epoch: 19 Global Step: 204590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:23:33,843-Speed 5525.11 samples/sec Loss 0.8736 LearningRate 0.0001 Epoch: 19 Global Step: 204600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:23:41,226-Speed 5548.61 samples/sec Loss 0.8848 LearningRate 0.0001 Epoch: 19 Global Step: 204610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:23:48,596-Speed 5558.70 samples/sec Loss 0.8912 LearningRate 0.0001 Epoch: 19 Global Step: 204620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:23:55,999-Speed 5533.52 samples/sec Loss 0.8749 LearningRate 0.0001 Epoch: 19 Global Step: 204630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:24:03,415-Speed 5523.58 samples/sec Loss 0.8859 LearningRate 0.0001 Epoch: 19 Global Step: 204640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:24:10,861-Speed 5502.49 samples/sec Loss 0.8761 LearningRate 0.0001 Epoch: 19 Global Step: 204650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:24:18,428-Speed 5413.68 samples/sec Loss 0.8675 LearningRate 0.0001 Epoch: 19 Global Step: 204660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:24:25,912-Speed 5473.36 samples/sec Loss 0.8581 LearningRate 0.0001 Epoch: 19 Global Step: 204670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:24:33,398-Speed 5472.27 samples/sec Loss 0.8770 LearningRate 0.0001 Epoch: 19 Global Step: 204680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:24:40,967-Speed 5412.56 samples/sec Loss 0.8915 LearningRate 0.0001 Epoch: 19 Global Step: 204690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:24:48,380-Speed 5526.11 samples/sec Loss 0.8658 LearningRate 0.0001 Epoch: 19 Global Step: 204700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:24:55,782-Speed 5534.39 samples/sec Loss 0.8697 LearningRate 0.0001 Epoch: 19 Global Step: 204710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:25:03,195-Speed 5525.55 samples/sec Loss 0.8842 LearningRate 0.0001 Epoch: 19 Global Step: 204720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:25:10,705-Speed 5455.64 samples/sec Loss 0.8995 LearningRate 0.0001 Epoch: 19 Global Step: 204730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:25:18,155-Speed 5498.62 samples/sec Loss 0.8620 LearningRate 0.0001 Epoch: 19 Global Step: 204740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:25:25,580-Speed 5517.34 samples/sec Loss 0.8601 LearningRate 0.0001 Epoch: 19 Global Step: 204750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:25:33,000-Speed 5520.65 samples/sec Loss 0.8710 LearningRate 0.0001 Epoch: 19 Global Step: 204760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:25:40,419-Speed 5522.11 samples/sec Loss 0.8693 LearningRate 0.0001 Epoch: 19 Global Step: 204770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:25:48,004-Speed 5401.12 samples/sec Loss 0.8733 LearningRate 0.0001 Epoch: 19 Global Step: 204780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:25:55,438-Speed 5510.68 samples/sec Loss 0.8732 LearningRate 0.0001 Epoch: 19 Global Step: 204790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:26:02,968-Speed 5439.80 samples/sec Loss 0.8783 LearningRate 0.0001 Epoch: 19 Global Step: 204800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:26:10,579-Speed 5382.92 samples/sec Loss 0.8740 LearningRate 0.0001 Epoch: 19 Global Step: 204810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:26:18,050-Speed 5483.25 samples/sec Loss 0.8823 LearningRate 0.0001 Epoch: 19 Global Step: 204820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:26:25,432-Speed 5549.22 samples/sec Loss 0.8743 LearningRate 0.0001 Epoch: 19 Global Step: 204830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:26:32,848-Speed 5524.31 samples/sec Loss 0.8624 LearningRate 0.0000 Epoch: 19 Global Step: 204840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:26:40,252-Speed 5533.07 samples/sec Loss 0.8630 LearningRate 0.0000 Epoch: 19 Global Step: 204850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:26:47,668-Speed 5524.07 samples/sec Loss 0.8747 LearningRate 0.0000 Epoch: 19 Global Step: 204860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:26:55,103-Speed 5509.69 samples/sec Loss 0.8823 LearningRate 0.0000 Epoch: 19 Global Step: 204870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:27:02,503-Speed 5535.85 samples/sec Loss 0.8765 LearningRate 0.0000 Epoch: 19 Global Step: 204880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:27:09,949-Speed 5501.81 samples/sec Loss 0.8726 LearningRate 0.0000 Epoch: 19 Global Step: 204890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:27:17,350-Speed 5535.13 samples/sec Loss 0.8779 LearningRate 0.0000 Epoch: 19 Global Step: 204900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-09 17:27:24,821-Speed 5483.62 samples/sec Loss 0.8730 LearningRate 0.0000 Epoch: 19 Global Step: 204910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:27:32,264-Speed 5504.05 samples/sec Loss 0.8711 LearningRate 0.0000 Epoch: 19 Global Step: 204920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:27:39,685-Speed 5520.14 samples/sec Loss 0.8924 LearningRate 0.0000 Epoch: 19 Global Step: 204930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:27:47,098-Speed 5526.54 samples/sec Loss 0.8768 LearningRate 0.0000 Epoch: 19 Global Step: 204940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:27:54,473-Speed 5554.46 samples/sec Loss 0.8773 LearningRate 0.0000 Epoch: 19 Global Step: 204950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:28:01,845-Speed 5556.77 samples/sec Loss 0.8897 LearningRate 0.0000 Epoch: 19 Global Step: 204960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:28:09,328-Speed 5474.30 samples/sec Loss 0.8648 LearningRate 0.0000 Epoch: 19 Global Step: 204970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:28:16,775-Speed 5501.19 samples/sec Loss 0.8846 LearningRate 0.0000 Epoch: 19 Global Step: 204980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:28:24,147-Speed 5557.20 samples/sec Loss 0.8826 LearningRate 0.0000 Epoch: 19 Global Step: 204990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:28:31,567-Speed 5520.89 samples/sec Loss 0.8522 LearningRate 0.0000 Epoch: 19 Global Step: 205000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:28:38,931-Speed 5562.68 samples/sec Loss 0.8786 LearningRate 0.0000 Epoch: 19 Global Step: 205010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:28:46,361-Speed 5513.70 samples/sec Loss 0.8688 LearningRate 0.0000 Epoch: 19 Global Step: 205020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:28:53,823-Speed 5490.44 samples/sec Loss 0.8737 LearningRate 0.0000 Epoch: 19 Global Step: 205030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:29:01,383-Speed 5418.03 samples/sec Loss 0.8956 LearningRate 0.0000 Epoch: 19 Global Step: 205040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:29:08,796-Speed 5526.05 samples/sec Loss 0.8988 LearningRate 0.0000 Epoch: 19 Global Step: 205050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:29:16,195-Speed 5537.05 samples/sec Loss 0.8716 LearningRate 0.0000 Epoch: 19 Global Step: 205060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:29:23,732-Speed 5435.41 samples/sec Loss 0.8817 LearningRate 0.0000 Epoch: 19 Global Step: 205070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:29:31,138-Speed 5531.57 samples/sec Loss 0.8785 LearningRate 0.0000 Epoch: 19 Global Step: 205080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:29:38,549-Speed 5526.95 samples/sec Loss 0.8729 LearningRate 0.0000 Epoch: 19 Global Step: 205090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:29:45,989-Speed 5506.68 samples/sec Loss 0.8887 LearningRate 0.0000 Epoch: 19 Global Step: 205100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:29:53,407-Speed 5523.08 samples/sec Loss 0.8892 LearningRate 0.0000 Epoch: 19 Global Step: 205110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:30:00,841-Speed 5510.19 samples/sec Loss 0.8931 LearningRate 0.0000 Epoch: 19 Global Step: 205120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:30:08,308-Speed 5486.29 samples/sec Loss 0.8691 LearningRate 0.0000 Epoch: 19 Global Step: 205130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:30:15,711-Speed 5533.42 samples/sec Loss 0.8733 LearningRate 0.0000 Epoch: 19 Global Step: 205140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:30:23,131-Speed 5521.14 samples/sec Loss 0.8981 LearningRate 0.0000 Epoch: 19 Global Step: 205150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-09 17:30:30,630-Speed 5463.23 samples/sec Loss 0.8789 LearningRate 0.0000 Epoch: 19 Global Step: 205160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:30:38,179-Speed 5426.62 samples/sec Loss 0.8819 LearningRate 0.0000 Epoch: 19 Global Step: 205170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:30:45,793-Speed 5380.53 samples/sec Loss 0.8946 LearningRate 0.0000 Epoch: 19 Global Step: 205180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:30:53,178-Speed 5547.06 samples/sec Loss 0.8788 LearningRate 0.0000 Epoch: 19 Global Step: 205190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-09 17:31:00,682-Speed 5459.04 samples/sec Loss 0.8711 LearningRate 0.0000 Epoch: 19 Global Step: 205200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:31:08,120-Speed 5507.54 samples/sec Loss 0.8586 LearningRate 0.0000 Epoch: 19 Global Step: 205210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:31:15,533-Speed 5525.82 samples/sec Loss 0.8618 LearningRate 0.0000 Epoch: 19 Global Step: 205220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:31:22,963-Speed 5514.39 samples/sec Loss 0.8601 LearningRate 0.0000 Epoch: 19 Global Step: 205230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:31:30,518-Speed 5422.27 samples/sec Loss 0.8401 LearningRate 0.0000 Epoch: 19 Global Step: 205240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:31:37,966-Speed 5499.75 samples/sec Loss 0.8931 LearningRate 0.0000 Epoch: 19 Global Step: 205250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:31:45,436-Speed 5484.19 samples/sec Loss 0.8657 LearningRate 0.0000 Epoch: 19 Global Step: 205260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:31:52,889-Speed 5496.69 samples/sec Loss 0.8786 LearningRate 0.0000 Epoch: 19 Global Step: 205270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:32:00,334-Speed 5502.07 samples/sec Loss 0.8892 LearningRate 0.0000 Epoch: 19 Global Step: 205280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:32:07,895-Speed 5417.69 samples/sec Loss 0.8731 LearningRate 0.0000 Epoch: 19 Global Step: 205290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:32:15,340-Speed 5502.97 samples/sec Loss 0.8787 LearningRate 0.0000 Epoch: 19 Global Step: 205300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:32:22,761-Speed 5520.16 samples/sec Loss 0.8745 LearningRate 0.0000 Epoch: 19 Global Step: 205310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:32:30,175-Speed 5525.71 samples/sec Loss 0.8938 LearningRate 0.0000 Epoch: 19 Global Step: 205320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:32:37,604-Speed 5514.03 samples/sec Loss 0.8730 LearningRate 0.0000 Epoch: 19 Global Step: 205330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:32:45,003-Speed 5536.48 samples/sec Loss 0.9006 LearningRate 0.0000 Epoch: 19 Global Step: 205340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:32:52,532-Speed 5441.63 samples/sec Loss 0.8715 LearningRate 0.0000 Epoch: 19 Global Step: 205350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:32:59,978-Speed 5502.00 samples/sec Loss 0.8730 LearningRate 0.0000 Epoch: 19 Global Step: 205360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:33:07,409-Speed 5512.37 samples/sec Loss 0.8865 LearningRate 0.0000 Epoch: 19 Global Step: 205370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:33:14,858-Speed 5499.75 samples/sec Loss 0.8733 LearningRate 0.0000 Epoch: 19 Global Step: 205380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:33:22,268-Speed 5528.69 samples/sec Loss 0.8545 LearningRate 0.0000 Epoch: 19 Global Step: 205390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:33:29,752-Speed 5473.38 samples/sec Loss 0.8610 LearningRate 0.0000 Epoch: 19 Global Step: 205400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:33:37,195-Speed 5504.02 samples/sec Loss 0.8782 LearningRate 0.0000 Epoch: 19 Global Step: 205410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:33:44,573-Speed 5552.03 samples/sec Loss 0.8825 LearningRate 0.0000 Epoch: 19 Global Step: 205420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:33:52,044-Speed 5483.45 samples/sec Loss 0.8914 LearningRate 0.0000 Epoch: 19 Global Step: 205430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:33:59,539-Speed 5466.10 samples/sec Loss 0.8824 LearningRate 0.0000 Epoch: 19 Global Step: 205440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:34:07,145-Speed 5386.18 samples/sec Loss 0.8584 LearningRate 0.0000 Epoch: 19 Global Step: 205450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:34:14,586-Speed 5505.13 samples/sec Loss 0.8783 LearningRate 0.0000 Epoch: 19 Global Step: 205460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:34:22,390-Speed 5249.73 samples/sec Loss 0.8613 LearningRate 0.0000 Epoch: 19 Global Step: 205470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:34:29,938-Speed 5427.06 samples/sec Loss 0.8855 LearningRate 0.0000 Epoch: 19 Global Step: 205480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:34:37,375-Speed 5508.57 samples/sec Loss 0.8875 LearningRate 0.0000 Epoch: 19 Global Step: 205490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:34:44,809-Speed 5510.07 samples/sec Loss 0.8877 LearningRate 0.0000 Epoch: 19 Global Step: 205500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:34:52,331-Speed 5446.32 samples/sec Loss 0.8615 LearningRate 0.0000 Epoch: 19 Global Step: 205510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:34:59,770-Speed 5507.61 samples/sec Loss 0.8945 LearningRate 0.0000 Epoch: 19 Global Step: 205520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:35:07,228-Speed 5492.46 samples/sec Loss 0.8778 LearningRate 0.0000 Epoch: 19 Global Step: 205530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:35:14,667-Speed 5507.13 samples/sec Loss 0.8688 LearningRate 0.0000 Epoch: 19 Global Step: 205540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:35:22,073-Speed 5531.49 samples/sec Loss 0.8718 LearningRate 0.0000 Epoch: 19 Global Step: 205550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:35:29,584-Speed 5454.22 samples/sec Loss 0.8913 LearningRate 0.0000 Epoch: 19 Global Step: 205560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:35:37,060-Speed 5479.43 samples/sec Loss 0.8784 LearningRate 0.0000 Epoch: 19 Global Step: 205570 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:35:44,559-Speed 5462.85 samples/sec Loss 0.8658 LearningRate 0.0000 Epoch: 19 Global Step: 205580 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:35:52,169-Speed 5382.83 samples/sec Loss 0.8751 LearningRate 0.0000 Epoch: 19 Global Step: 205590 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:35:59,634-Speed 5488.01 samples/sec Loss 0.8708 LearningRate 0.0000 Epoch: 19 Global Step: 205600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:36:07,241-Speed 5385.00 samples/sec Loss 0.8809 LearningRate 0.0000 Epoch: 19 Global Step: 205610 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:36:14,792-Speed 5424.82 samples/sec Loss 0.8649 LearningRate 0.0000 Epoch: 19 Global Step: 205620 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:36:22,277-Speed 5472.72 samples/sec Loss 0.8877 LearningRate 0.0000 Epoch: 19 Global Step: 205630 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:36:29,798-Speed 5447.36 samples/sec Loss 0.8633 LearningRate 0.0000 Epoch: 19 Global Step: 205640 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:36:37,219-Speed 5520.06 samples/sec Loss 0.8652 LearningRate 0.0000 Epoch: 19 Global Step: 205650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:36:44,727-Speed 5456.41 samples/sec Loss 0.8800 LearningRate 0.0000 Epoch: 19 Global Step: 205660 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:36:52,234-Speed 5456.56 samples/sec Loss 0.8672 LearningRate 0.0000 Epoch: 19 Global Step: 205670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:36:59,669-Speed 5510.51 samples/sec Loss 0.8763 LearningRate 0.0000 Epoch: 19 Global Step: 205680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:37:07,104-Speed 5509.33 samples/sec Loss 0.8851 LearningRate 0.0000 Epoch: 19 Global Step: 205690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:37:14,626-Speed 5446.05 samples/sec Loss 0.8713 LearningRate 0.0000 Epoch: 19 Global Step: 205700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:37:22,106-Speed 5476.97 samples/sec Loss 0.8624 LearningRate 0.0000 Epoch: 19 Global Step: 205710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:37:29,578-Speed 5482.44 samples/sec Loss 0.8755 LearningRate 0.0000 Epoch: 19 Global Step: 205720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:37:37,037-Speed 5492.29 samples/sec Loss 0.8636 LearningRate 0.0000 Epoch: 19 Global Step: 205730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:37:44,534-Speed 5463.77 samples/sec Loss 0.8713 LearningRate 0.0000 Epoch: 19 Global Step: 205740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:37:51,931-Speed 5538.74 samples/sec Loss 0.8740 LearningRate 0.0000 Epoch: 19 Global Step: 205750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:37:59,452-Speed 5446.82 samples/sec Loss 0.8859 LearningRate 0.0000 Epoch: 19 Global Step: 205760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:38:06,939-Speed 5471.94 samples/sec Loss 0.8932 LearningRate 0.0000 Epoch: 19 Global Step: 205770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:38:14,504-Speed 5414.36 samples/sec Loss 0.8899 LearningRate 0.0000 Epoch: 19 Global Step: 205780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:38:21,944-Speed 5506.77 samples/sec Loss 0.8808 LearningRate 0.0000 Epoch: 19 Global Step: 205790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:38:29,416-Speed 5482.65 samples/sec Loss 0.8740 LearningRate 0.0000 Epoch: 19 Global Step: 205800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:38:36,832-Speed 5524.18 samples/sec Loss 0.8758 LearningRate 0.0000 Epoch: 19 Global Step: 205810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:38:44,232-Speed 5535.29 samples/sec Loss 0.8955 LearningRate 0.0000 Epoch: 19 Global Step: 205820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:38:51,743-Speed 5454.61 samples/sec Loss 0.8714 LearningRate 0.0000 Epoch: 19 Global Step: 205830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:38:59,131-Speed 5544.62 samples/sec Loss 0.8713 LearningRate 0.0000 Epoch: 19 Global Step: 205840 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:39:06,636-Speed 5457.96 samples/sec Loss 0.8893 LearningRate 0.0000 Epoch: 19 Global Step: 205850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:39:14,095-Speed 5492.18 samples/sec Loss 0.8805 LearningRate 0.0000 Epoch: 19 Global Step: 205860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:39:21,557-Speed 5489.96 samples/sec Loss 0.8588 LearningRate 0.0000 Epoch: 19 Global Step: 205870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:39:29,072-Speed 5451.00 samples/sec Loss 0.8759 LearningRate 0.0000 Epoch: 19 Global Step: 205880 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:39:36,495-Speed 5518.79 samples/sec Loss 0.8634 LearningRate 0.0000 Epoch: 19 Global Step: 205890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:39:43,929-Speed 5511.08 samples/sec Loss 0.8911 LearningRate 0.0000 Epoch: 19 Global Step: 205900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:39:51,381-Speed 5496.85 samples/sec Loss 0.8699 LearningRate 0.0000 Epoch: 19 Global Step: 205910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:39:58,832-Speed 5497.86 samples/sec Loss 0.8842 LearningRate 0.0000 Epoch: 19 Global Step: 205920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:40:06,246-Speed 5526.11 samples/sec Loss 0.8965 LearningRate 0.0000 Epoch: 19 Global Step: 205930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:40:13,782-Speed 5435.51 samples/sec Loss 0.8775 LearningRate 0.0000 Epoch: 19 Global Step: 205940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:40:21,343-Speed 5418.15 samples/sec Loss 0.8730 LearningRate 0.0000 Epoch: 19 Global Step: 205950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:40:28,788-Speed 5502.56 samples/sec Loss 0.8657 LearningRate 0.0000 Epoch: 19 Global Step: 205960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:40:36,238-Speed 5499.01 samples/sec Loss 0.8779 LearningRate 0.0000 Epoch: 19 Global Step: 205970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:40:43,660-Speed 5519.44 samples/sec Loss 0.8805 LearningRate 0.0000 Epoch: 19 Global Step: 205980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:40:51,190-Speed 5440.30 samples/sec Loss 0.8866 LearningRate 0.0000 Epoch: 19 Global Step: 205990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:40:58,673-Speed 5474.79 samples/sec Loss 0.8730 LearningRate 0.0000 Epoch: 19 Global Step: 206000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:41:42,849-[lfw][206000]XNorm: 22.171104 Training: 2022-01-09 17:41:42,850-[lfw][206000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-09 17:41:42,851-[lfw][206000]Accuracy-Highest: 0.99850 Training: 2022-01-09 17:42:34,401-[cfp_fp][206000]XNorm: 22.023721 Training: 2022-01-09 17:42:34,402-[cfp_fp][206000]Accuracy-Flip: 0.99400+-0.00343 Training: 2022-01-09 17:42:34,402-[cfp_fp][206000]Accuracy-Highest: 0.99443 Training: 2022-01-09 17:43:19,063-[agedb_30][206000]XNorm: 22.929383 Training: 2022-01-09 17:43:19,064-[agedb_30][206000]Accuracy-Flip: 0.98650+-0.00529 Training: 2022-01-09 17:43:19,064-[agedb_30][206000]Accuracy-Highest: 0.98683 Training: 2022-01-09 17:43:26,662-Speed 276.78 samples/sec Loss 0.9022 LearningRate 0.0000 Epoch: 19 Global Step: 206010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:43:34,161-Speed 5463.02 samples/sec Loss 0.8746 LearningRate 0.0000 Epoch: 19 Global Step: 206020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:43:41,629-Speed 5485.63 samples/sec Loss 0.8807 LearningRate 0.0000 Epoch: 19 Global Step: 206030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:43:49,165-Speed 5436.19 samples/sec Loss 0.8638 LearningRate 0.0000 Epoch: 19 Global Step: 206040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:43:56,605-Speed 5505.47 samples/sec Loss 0.8566 LearningRate 0.0000 Epoch: 19 Global Step: 206050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:44:04,138-Speed 5437.97 samples/sec Loss 0.8731 LearningRate 0.0000 Epoch: 19 Global Step: 206060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:44:11,607-Speed 5485.03 samples/sec Loss 0.8533 LearningRate 0.0000 Epoch: 19 Global Step: 206070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:44:19,045-Speed 5507.48 samples/sec Loss 0.8671 LearningRate 0.0000 Epoch: 19 Global Step: 206080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:44:26,479-Speed 5510.44 samples/sec Loss 0.8579 LearningRate 0.0000 Epoch: 19 Global Step: 206090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:44:34,041-Speed 5417.64 samples/sec Loss 0.8639 LearningRate 0.0000 Epoch: 19 Global Step: 206100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:44:41,472-Speed 5512.66 samples/sec Loss 0.8660 LearningRate 0.0000 Epoch: 19 Global Step: 206110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:44:48,890-Speed 5522.66 samples/sec Loss 0.8790 LearningRate 0.0000 Epoch: 19 Global Step: 206120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:44:56,338-Speed 5499.77 samples/sec Loss 0.8771 LearningRate 0.0000 Epoch: 19 Global Step: 206130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:45:03,773-Speed 5510.35 samples/sec Loss 0.8663 LearningRate 0.0000 Epoch: 19 Global Step: 206140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:45:11,189-Speed 5524.23 samples/sec Loss 0.8621 LearningRate 0.0000 Epoch: 19 Global Step: 206150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:45:18,664-Speed 5480.08 samples/sec Loss 0.8806 LearningRate 0.0000 Epoch: 19 Global Step: 206160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:45:26,082-Speed 5522.47 samples/sec Loss 0.8641 LearningRate 0.0000 Epoch: 19 Global Step: 206170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:45:33,491-Speed 5529.47 samples/sec Loss 0.8713 LearningRate 0.0000 Epoch: 19 Global Step: 206180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:45:40,919-Speed 5514.48 samples/sec Loss 0.8977 LearningRate 0.0000 Epoch: 19 Global Step: 206190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:45:48,371-Speed 5497.70 samples/sec Loss 0.8714 LearningRate 0.0000 Epoch: 19 Global Step: 206200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:45:55,831-Speed 5491.08 samples/sec Loss 0.8623 LearningRate 0.0000 Epoch: 19 Global Step: 206210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:46:03,235-Speed 5533.51 samples/sec Loss 0.8760 LearningRate 0.0000 Epoch: 19 Global Step: 206220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:46:10,668-Speed 5510.85 samples/sec Loss 0.8739 LearningRate 0.0000 Epoch: 19 Global Step: 206230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:46:18,173-Speed 5458.56 samples/sec Loss 0.8710 LearningRate 0.0000 Epoch: 19 Global Step: 206240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:46:25,705-Speed 5439.44 samples/sec Loss 0.8603 LearningRate 0.0000 Epoch: 19 Global Step: 206250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:46:33,082-Speed 5552.56 samples/sec Loss 0.8677 LearningRate 0.0000 Epoch: 19 Global Step: 206260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:46:40,520-Speed 5507.63 samples/sec Loss 0.8848 LearningRate 0.0000 Epoch: 19 Global Step: 206270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:46:48,014-Speed 5466.69 samples/sec Loss 0.8811 LearningRate 0.0000 Epoch: 19 Global Step: 206280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:46:55,518-Speed 5459.68 samples/sec Loss 0.8952 LearningRate 0.0000 Epoch: 19 Global Step: 206290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:47:03,051-Speed 5438.15 samples/sec Loss 0.8632 LearningRate 0.0000 Epoch: 19 Global Step: 206300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:47:10,494-Speed 5503.82 samples/sec Loss 0.8690 LearningRate 0.0000 Epoch: 19 Global Step: 206310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:47:17,921-Speed 5516.10 samples/sec Loss 0.8671 LearningRate 0.0000 Epoch: 19 Global Step: 206320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:47:25,334-Speed 5526.28 samples/sec Loss 0.8761 LearningRate 0.0000 Epoch: 19 Global Step: 206330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:47:32,806-Speed 5482.37 samples/sec Loss 0.8615 LearningRate 0.0000 Epoch: 19 Global Step: 206340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:47:40,237-Speed 5512.71 samples/sec Loss 0.8708 LearningRate 0.0000 Epoch: 19 Global Step: 206350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:47:47,655-Speed 5522.59 samples/sec Loss 0.8831 LearningRate 0.0000 Epoch: 19 Global Step: 206360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:47:55,119-Speed 5488.23 samples/sec Loss 0.8792 LearningRate 0.0000 Epoch: 19 Global Step: 206370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:48:02,592-Speed 5482.03 samples/sec Loss 0.8962 LearningRate 0.0000 Epoch: 19 Global Step: 206380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:48:10,056-Speed 5487.97 samples/sec Loss 0.8785 LearningRate 0.0000 Epoch: 19 Global Step: 206390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:48:17,498-Speed 5505.07 samples/sec Loss 0.8466 LearningRate 0.0000 Epoch: 19 Global Step: 206400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:48:24,898-Speed 5535.84 samples/sec Loss 0.8867 LearningRate 0.0000 Epoch: 19 Global Step: 206410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:48:32,347-Speed 5499.29 samples/sec Loss 0.8894 LearningRate 0.0000 Epoch: 19 Global Step: 206420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:48:39,742-Speed 5539.52 samples/sec Loss 0.8959 LearningRate 0.0000 Epoch: 19 Global Step: 206430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:48:47,188-Speed 5502.02 samples/sec Loss 0.8568 LearningRate 0.0000 Epoch: 19 Global Step: 206440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:48:54,706-Speed 5449.04 samples/sec Loss 0.8727 LearningRate 0.0000 Epoch: 19 Global Step: 206450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:49:02,156-Speed 5498.50 samples/sec Loss 0.8730 LearningRate 0.0000 Epoch: 19 Global Step: 206460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:49:09,733-Speed 5406.82 samples/sec Loss 0.8620 LearningRate 0.0000 Epoch: 19 Global Step: 206470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:49:17,275-Speed 5431.81 samples/sec Loss 0.8590 LearningRate 0.0000 Epoch: 19 Global Step: 206480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:49:24,697-Speed 5519.81 samples/sec Loss 0.8634 LearningRate 0.0000 Epoch: 19 Global Step: 206490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:49:32,129-Speed 5512.09 samples/sec Loss 0.8723 LearningRate 0.0000 Epoch: 19 Global Step: 206500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:49:39,594-Speed 5487.22 samples/sec Loss 0.8579 LearningRate 0.0000 Epoch: 19 Global Step: 206510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:49:47,094-Speed 5461.94 samples/sec Loss 0.8852 LearningRate 0.0000 Epoch: 19 Global Step: 206520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:49:54,501-Speed 5531.20 samples/sec Loss 0.8758 LearningRate 0.0000 Epoch: 19 Global Step: 206530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:50:01,962-Speed 5490.61 samples/sec Loss 0.8829 LearningRate 0.0000 Epoch: 19 Global Step: 206540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:50:09,390-Speed 5514.91 samples/sec Loss 0.8691 LearningRate 0.0000 Epoch: 19 Global Step: 206550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:50:16,878-Speed 5470.82 samples/sec Loss 0.8830 LearningRate 0.0000 Epoch: 19 Global Step: 206560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:50:24,333-Speed 5494.84 samples/sec Loss 0.9007 LearningRate 0.0000 Epoch: 19 Global Step: 206570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:50:31,734-Speed 5535.60 samples/sec Loss 0.8756 LearningRate 0.0000 Epoch: 19 Global Step: 206580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:50:39,125-Speed 5541.95 samples/sec Loss 0.8670 LearningRate 0.0000 Epoch: 19 Global Step: 206590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:50:46,527-Speed 5534.72 samples/sec Loss 0.8687 LearningRate 0.0000 Epoch: 19 Global Step: 206600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:50:53,953-Speed 5517.06 samples/sec Loss 0.8605 LearningRate 0.0000 Epoch: 19 Global Step: 206610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:51:01,383-Speed 5513.43 samples/sec Loss 0.8609 LearningRate 0.0000 Epoch: 19 Global Step: 206620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:51:08,857-Speed 5480.46 samples/sec Loss 0.8574 LearningRate 0.0000 Epoch: 19 Global Step: 206630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:51:16,374-Speed 5449.75 samples/sec Loss 0.8663 LearningRate 0.0000 Epoch: 19 Global Step: 206640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:51:23,860-Speed 5472.76 samples/sec Loss 0.8748 LearningRate 0.0000 Epoch: 19 Global Step: 206650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:51:31,258-Speed 5537.26 samples/sec Loss 0.8514 LearningRate 0.0000 Epoch: 19 Global Step: 206660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:51:38,761-Speed 5459.43 samples/sec Loss 0.8700 LearningRate 0.0000 Epoch: 19 Global Step: 206670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:51:46,190-Speed 5515.17 samples/sec Loss 0.8748 LearningRate 0.0000 Epoch: 19 Global Step: 206680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:51:53,638-Speed 5500.49 samples/sec Loss 0.8713 LearningRate 0.0000 Epoch: 19 Global Step: 206690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:52:01,079-Speed 5504.87 samples/sec Loss 0.8913 LearningRate 0.0000 Epoch: 19 Global Step: 206700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:52:08,584-Speed 5458.81 samples/sec Loss 0.8471 LearningRate 0.0000 Epoch: 19 Global Step: 206710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:52:16,076-Speed 5467.72 samples/sec Loss 0.8801 LearningRate 0.0000 Epoch: 19 Global Step: 206720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:52:23,481-Speed 5532.71 samples/sec Loss 0.8497 LearningRate 0.0000 Epoch: 19 Global Step: 206730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:52:30,962-Speed 5475.68 samples/sec Loss 0.8596 LearningRate 0.0000 Epoch: 19 Global Step: 206740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:52:38,391-Speed 5514.04 samples/sec Loss 0.8937 LearningRate 0.0000 Epoch: 19 Global Step: 206750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:52:45,802-Speed 5528.18 samples/sec Loss 0.8811 LearningRate 0.0000 Epoch: 19 Global Step: 206760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:52:53,210-Speed 5529.84 samples/sec Loss 0.8892 LearningRate 0.0000 Epoch: 19 Global Step: 206770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:53:00,635-Speed 5516.91 samples/sec Loss 0.8896 LearningRate 0.0000 Epoch: 19 Global Step: 206780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:53:08,056-Speed 5520.73 samples/sec Loss 0.8675 LearningRate 0.0000 Epoch: 19 Global Step: 206790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:53:15,510-Speed 5495.44 samples/sec Loss 0.8689 LearningRate 0.0000 Epoch: 19 Global Step: 206800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:53:22,943-Speed 5511.58 samples/sec Loss 0.8688 LearningRate 0.0000 Epoch: 19 Global Step: 206810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:53:30,525-Speed 5402.93 samples/sec Loss 0.8968 LearningRate 0.0000 Epoch: 19 Global Step: 206820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:53:37,905-Speed 5551.20 samples/sec Loss 0.8567 LearningRate 0.0000 Epoch: 19 Global Step: 206830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:53:45,305-Speed 5535.44 samples/sec Loss 0.8700 LearningRate 0.0000 Epoch: 19 Global Step: 206840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:53:52,708-Speed 5533.59 samples/sec Loss 0.8684 LearningRate 0.0000 Epoch: 19 Global Step: 206850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:54:00,232-Speed 5445.35 samples/sec Loss 0.8575 LearningRate 0.0000 Epoch: 19 Global Step: 206860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:54:07,693-Speed 5490.04 samples/sec Loss 0.8813 LearningRate 0.0000 Epoch: 19 Global Step: 206870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:54:15,100-Speed 5531.15 samples/sec Loss 0.8631 LearningRate 0.0000 Epoch: 19 Global Step: 206880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:54:22,508-Speed 5529.21 samples/sec Loss 0.8635 LearningRate 0.0000 Epoch: 19 Global Step: 206890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:54:29,953-Speed 5503.24 samples/sec Loss 0.8742 LearningRate 0.0000 Epoch: 19 Global Step: 206900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:54:37,362-Speed 5528.65 samples/sec Loss 0.8798 LearningRate 0.0000 Epoch: 19 Global Step: 206910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:54:44,936-Speed 5409.10 samples/sec Loss 0.8810 LearningRate 0.0000 Epoch: 19 Global Step: 206920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:54:52,359-Speed 5518.95 samples/sec Loss 0.8727 LearningRate 0.0000 Epoch: 19 Global Step: 206930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:54:59,831-Speed 5482.24 samples/sec Loss 0.8706 LearningRate 0.0000 Epoch: 19 Global Step: 206940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:55:07,220-Speed 5544.00 samples/sec Loss 0.8785 LearningRate 0.0000 Epoch: 19 Global Step: 206950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:55:14,622-Speed 5534.97 samples/sec Loss 0.8737 LearningRate 0.0000 Epoch: 19 Global Step: 206960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:55:22,336-Speed 5310.06 samples/sec Loss 0.8756 LearningRate 0.0000 Epoch: 19 Global Step: 206970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:55:30,136-Speed 5252.32 samples/sec Loss 0.8611 LearningRate 0.0000 Epoch: 19 Global Step: 206980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:55:37,743-Speed 5385.11 samples/sec Loss 0.8601 LearningRate 0.0000 Epoch: 19 Global Step: 206990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:55:45,156-Speed 5526.41 samples/sec Loss 0.8796 LearningRate 0.0000 Epoch: 19 Global Step: 207000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-09 17:55:52,643-Speed 5471.80 samples/sec Loss 0.8752 LearningRate 0.0000 Epoch: 19 Global Step: 207010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:56:00,063-Speed 5520.29 samples/sec Loss 0.8794 LearningRate 0.0000 Epoch: 19 Global Step: 207020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:56:07,517-Speed 5495.97 samples/sec Loss 0.8744 LearningRate 0.0000 Epoch: 19 Global Step: 207030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:56:15,078-Speed 5418.29 samples/sec Loss 0.8972 LearningRate 0.0000 Epoch: 19 Global Step: 207040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:56:22,740-Speed 5346.71 samples/sec Loss 0.8884 LearningRate 0.0000 Epoch: 19 Global Step: 207050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:56:30,446-Speed 5316.02 samples/sec Loss 0.8682 LearningRate 0.0000 Epoch: 19 Global Step: 207060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:56:38,039-Speed 5395.09 samples/sec Loss 0.8746 LearningRate 0.0000 Epoch: 19 Global Step: 207070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:56:45,645-Speed 5385.79 samples/sec Loss 0.8822 LearningRate 0.0000 Epoch: 19 Global Step: 207080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:56:53,243-Speed 5391.52 samples/sec Loss 0.8694 LearningRate 0.0000 Epoch: 19 Global Step: 207090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:57:00,866-Speed 5373.94 samples/sec Loss 0.8862 LearningRate 0.0000 Epoch: 19 Global Step: 207100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:57:08,406-Speed 5433.19 samples/sec Loss 0.8929 LearningRate 0.0000 Epoch: 19 Global Step: 207110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:57:15,984-Speed 5405.92 samples/sec Loss 0.8719 LearningRate 0.0000 Epoch: 19 Global Step: 207120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:57:23,543-Speed 5418.60 samples/sec Loss 0.8692 LearningRate 0.0000 Epoch: 19 Global Step: 207130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:57:30,966-Speed 5519.46 samples/sec Loss 0.8681 LearningRate 0.0000 Epoch: 19 Global Step: 207140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:57:38,499-Speed 5438.15 samples/sec Loss 0.8705 LearningRate 0.0000 Epoch: 19 Global Step: 207150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:57:45,947-Speed 5499.96 samples/sec Loss 0.8641 LearningRate 0.0000 Epoch: 19 Global Step: 207160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:57:53,381-Speed 5510.62 samples/sec Loss 0.8891 LearningRate 0.0000 Epoch: 19 Global Step: 207170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:58:00,801-Speed 5520.45 samples/sec Loss 0.8711 LearningRate 0.0000 Epoch: 19 Global Step: 207180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:58:08,226-Speed 5517.71 samples/sec Loss 0.8790 LearningRate 0.0000 Epoch: 19 Global Step: 207190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:58:15,650-Speed 5517.70 samples/sec Loss 0.8652 LearningRate 0.0000 Epoch: 19 Global Step: 207200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:58:23,056-Speed 5531.15 samples/sec Loss 0.8701 LearningRate 0.0000 Epoch: 19 Global Step: 207210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:58:30,606-Speed 5425.81 samples/sec Loss 0.8779 LearningRate 0.0000 Epoch: 19 Global Step: 207220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:58:38,087-Speed 5475.79 samples/sec Loss 0.8842 LearningRate 0.0000 Epoch: 19 Global Step: 207230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:58:45,483-Speed 5539.10 samples/sec Loss 0.8616 LearningRate 0.0000 Epoch: 19 Global Step: 207240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:58:52,921-Speed 5507.58 samples/sec Loss 0.8714 LearningRate 0.0000 Epoch: 19 Global Step: 207250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:59:00,367-Speed 5501.64 samples/sec Loss 0.8804 LearningRate 0.0000 Epoch: 19 Global Step: 207260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 17:59:07,827-Speed 5491.28 samples/sec Loss 0.8747 LearningRate 0.0000 Epoch: 19 Global Step: 207270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:59:15,268-Speed 5505.76 samples/sec Loss 0.8707 LearningRate 0.0000 Epoch: 19 Global Step: 207280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:59:22,772-Speed 5459.30 samples/sec Loss 0.8572 LearningRate 0.0000 Epoch: 19 Global Step: 207290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:59:30,228-Speed 5494.36 samples/sec Loss 0.8690 LearningRate 0.0000 Epoch: 19 Global Step: 207300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:59:37,791-Speed 5416.53 samples/sec Loss 0.8708 LearningRate 0.0000 Epoch: 19 Global Step: 207310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:59:45,245-Speed 5495.67 samples/sec Loss 0.8849 LearningRate 0.0000 Epoch: 19 Global Step: 207320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 17:59:52,718-Speed 5481.61 samples/sec Loss 0.8631 LearningRate 0.0000 Epoch: 19 Global Step: 207330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 18:00:00,263-Speed 5429.54 samples/sec Loss 0.8702 LearningRate 0.0000 Epoch: 19 Global Step: 207340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 18:00:07,896-Speed 5367.24 samples/sec Loss 0.8911 LearningRate 0.0000 Epoch: 19 Global Step: 207350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 18:00:15,321-Speed 5517.24 samples/sec Loss 0.8496 LearningRate 0.0000 Epoch: 19 Global Step: 207360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-09 18:00:22,685-Speed 5562.67 samples/sec Loss 0.8369 LearningRate 0.0000 Epoch: 19 Global Step: 207370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-09 18:00:30,122-Speed 5508.61 samples/sec Loss 0.8911 LearningRate 0.0000 Epoch: 19 Global Step: 207380 Fp16 Grad Scale: 16384 Required: -0 hours Training: 2022-01-09 18:00:37,532-Speed 5527.72 samples/sec Loss 0.8652 LearningRate 0.0000 Epoch: 19 Global Step: 207390 Fp16 Grad Scale: 16384 Required: -0 hours